How to search for large directory containing thousands of files?Distributing thousands of files over subfoldersHow to exclude hidden files in recursive chmod?How can I run a script recursively in a directoryFind Directory if Number of files is over XSearch for files inside multiple subdirectories and zip directoriesHow to erase directories that contain less than a certain number of filesHow to remove files and folders underneath a specific folderfind + how to verify that all files and folders are with groups and owner hdfs:hadoopIn Linux, how can I find all sub-folders within a specific directory that do not contain a specific file?How to copy files from HDFS recursive to the local file system

What is the intuitive meaning of having a linear relationship between the logs of two variables?

CREATE opcode: what does it really do?

A particular customize with green line and letters for subfloat

Why not increase contact surface when reentering the atmosphere?

How do we know the LHC results are robust?

Where does the Z80 processor start executing from?

Is HostGator storing my password in plaintext?

How do scammers retract money, while you can’t?

Is `x >> pure y` equivalent to `liftM (const y) x`

Was Spock the First Vulcan in Starfleet?

Do all network devices need to make routing decisions, regardless of communication across networks or within a network?

Is the destination of a commercial flight important for the pilot?

Term for the "extreme-extension" version of a straw man fallacy?

How do I find the solutions of the following equation?

Is there a good way to store credentials outside of a password manager?

Would a high gravity rocky planet be guaranteed to have an atmosphere?

Is there a problem with hiding "forgot password" until it's needed?

How does buying out courses with grant money work?

How to be diplomatic in refusing to write code that breaches the privacy of our users

Increase performance creating Mandelbrot set in python

How to run a prison with the smallest amount of guards?

Hostile work environment after whistle-blowing on coworker and our boss. What do I do?

You cannot touch me, but I can touch you, who am I?

Why, precisely, is argon used in neutrino experiments?



How to search for large directory containing thousands of files?


Distributing thousands of files over subfoldersHow to exclude hidden files in recursive chmod?How can I run a script recursively in a directoryFind Directory if Number of files is over XSearch for files inside multiple subdirectories and zip directoriesHow to erase directories that contain less than a certain number of filesHow to remove files and folders underneath a specific folderfind + how to verify that all files and folders are with groups and owner hdfs:hadoopIn Linux, how can I find all sub-folders within a specific directory that do not contain a specific file?How to copy files from HDFS recursive to the local file system













0















Under the folder



/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache


we have more than 100 recursive folders.



One of the folders contain thousands of files.
Is it possible to identify this folder?



I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.










share|improve this question
























  • could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files

    – yael
    yesterday











  • Have you run fsck just to make sure that your filesystem is not corrupted?

    – John1024
    yesterday











  • no we not run , but why you think about this direction ?

    – yael
    yesterday











  • Because if ls fails in the directory, one possible cause is a corrupted filesystem. fsck checks for filesystem corruption.

    – John1024
    yesterday











  • With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.

    – Stéphane Chazelas
    yesterday















0















Under the folder



/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache


we have more than 100 recursive folders.



One of the folders contain thousands of files.
Is it possible to identify this folder?



I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.










share|improve this question
























  • could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files

    – yael
    yesterday











  • Have you run fsck just to make sure that your filesystem is not corrupted?

    – John1024
    yesterday











  • no we not run , but why you think about this direction ?

    – yael
    yesterday











  • Because if ls fails in the directory, one possible cause is a corrupted filesystem. fsck checks for filesystem corruption.

    – John1024
    yesterday











  • With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.

    – Stéphane Chazelas
    yesterday













0












0








0


1






Under the folder



/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache


we have more than 100 recursive folders.



One of the folders contain thousands of files.
Is it possible to identify this folder?



I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.










share|improve this question
















Under the folder



/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache


we have more than 100 recursive folders.



One of the folders contain thousands of files.
Is it possible to identify this folder?



I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.







linux bash shell-script find performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









Kusalananda

138k17258426




138k17258426










asked yesterday









yaelyael

2,78132777




2,78132777












  • could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files

    – yael
    yesterday











  • Have you run fsck just to make sure that your filesystem is not corrupted?

    – John1024
    yesterday











  • no we not run , but why you think about this direction ?

    – yael
    yesterday











  • Because if ls fails in the directory, one possible cause is a corrupted filesystem. fsck checks for filesystem corruption.

    – John1024
    yesterday











  • With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.

    – Stéphane Chazelas
    yesterday

















  • could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files

    – yael
    yesterday











  • Have you run fsck just to make sure that your filesystem is not corrupted?

    – John1024
    yesterday











  • no we not run , but why you think about this direction ?

    – yael
    yesterday











  • Because if ls fails in the directory, one possible cause is a corrupted filesystem. fsck checks for filesystem corruption.

    – John1024
    yesterday











  • With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.

    – Stéphane Chazelas
    yesterday
















could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files

– yael
yesterday





could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files

– yael
yesterday













Have you run fsck just to make sure that your filesystem is not corrupted?

– John1024
yesterday





Have you run fsck just to make sure that your filesystem is not corrupted?

– John1024
yesterday













no we not run , but why you think about this direction ?

– yael
yesterday





no we not run , but why you think about this direction ?

– yael
yesterday













Because if ls fails in the directory, one possible cause is a corrupted filesystem. fsck checks for filesystem corruption.

– John1024
yesterday





Because if ls fails in the directory, one possible cause is a corrupted filesystem. fsck checks for filesystem corruption.

– John1024
yesterday













With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.

– Stéphane Chazelas
yesterday





With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.

– Stéphane Chazelas
yesterday










2 Answers
2






active

oldest

votes


















2














The number of items in a directory may be counted using



set -- *


This sets the positional parameters ($1, $2, etc.) to the names in the current directory.
The number of names that * expands to is found in $#. If you use the bash shell and set the dotglob shell option, this will additionally count hidden names.



Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache that contain more than 1000 names:



find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache 
-type d -exec bash -O dotglob -c '
for pathname do
set -- "$pathname"/*
if [ "$#" -gt 1000 ]; then
printf "%dt%sn" "$#" "$pathname"
fi
done' bash +


This expands the * shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash script for batches of directories. The script will loop over each batch of directories and for each, it will expand the * glob inside it to count the number of entries. An if statement then triggers printf if appropriate.



Note that if a directory contains millions of names, then it may take a bit of time to actually expand the * glob in that directory.






share|improve this answer

























  • do we need to run "set -- *" before we run your find code?

    – yael
    yesterday











  • @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

    – Kusalananda
    yesterday












  • so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

    – yael
    yesterday












  • @yael Yes, it would include the names of any entry in the directory.

    – Kusalananda
    yesterday


















1














On a GNU system



(export LC_ALL=C
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
tr 'n' 'n' |
sed 's|/[^/]*$||' |
sort |
uniq -c |
sort -rn |
head |
tr 'z' 'n')


Would list the 10 directories with the most entries.



If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.



 find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d 
-size +10000000c -print -prune


Would list the directories that are more than 10MB large and not enter them.






share|improve this answer
























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f508658%2fhow-to-search-for-large-directory-containing-thousands-of-files%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    The number of items in a directory may be counted using



    set -- *


    This sets the positional parameters ($1, $2, etc.) to the names in the current directory.
    The number of names that * expands to is found in $#. If you use the bash shell and set the dotglob shell option, this will additionally count hidden names.



    Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache that contain more than 1000 names:



    find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache 
    -type d -exec bash -O dotglob -c '
    for pathname do
    set -- "$pathname"/*
    if [ "$#" -gt 1000 ]; then
    printf "%dt%sn" "$#" "$pathname"
    fi
    done' bash +


    This expands the * shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash script for batches of directories. The script will loop over each batch of directories and for each, it will expand the * glob inside it to count the number of entries. An if statement then triggers printf if appropriate.



    Note that if a directory contains millions of names, then it may take a bit of time to actually expand the * glob in that directory.






    share|improve this answer

























    • do we need to run "set -- *" before we run your find code?

      – yael
      yesterday











    • @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

      – Kusalananda
      yesterday












    • so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

      – yael
      yesterday












    • @yael Yes, it would include the names of any entry in the directory.

      – Kusalananda
      yesterday















    2














    The number of items in a directory may be counted using



    set -- *


    This sets the positional parameters ($1, $2, etc.) to the names in the current directory.
    The number of names that * expands to is found in $#. If you use the bash shell and set the dotglob shell option, this will additionally count hidden names.



    Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache that contain more than 1000 names:



    find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache 
    -type d -exec bash -O dotglob -c '
    for pathname do
    set -- "$pathname"/*
    if [ "$#" -gt 1000 ]; then
    printf "%dt%sn" "$#" "$pathname"
    fi
    done' bash +


    This expands the * shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash script for batches of directories. The script will loop over each batch of directories and for each, it will expand the * glob inside it to count the number of entries. An if statement then triggers printf if appropriate.



    Note that if a directory contains millions of names, then it may take a bit of time to actually expand the * glob in that directory.






    share|improve this answer

























    • do we need to run "set -- *" before we run your find code?

      – yael
      yesterday











    • @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

      – Kusalananda
      yesterday












    • so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

      – yael
      yesterday












    • @yael Yes, it would include the names of any entry in the directory.

      – Kusalananda
      yesterday













    2












    2








    2







    The number of items in a directory may be counted using



    set -- *


    This sets the positional parameters ($1, $2, etc.) to the names in the current directory.
    The number of names that * expands to is found in $#. If you use the bash shell and set the dotglob shell option, this will additionally count hidden names.



    Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache that contain more than 1000 names:



    find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache 
    -type d -exec bash -O dotglob -c '
    for pathname do
    set -- "$pathname"/*
    if [ "$#" -gt 1000 ]; then
    printf "%dt%sn" "$#" "$pathname"
    fi
    done' bash +


    This expands the * shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash script for batches of directories. The script will loop over each batch of directories and for each, it will expand the * glob inside it to count the number of entries. An if statement then triggers printf if appropriate.



    Note that if a directory contains millions of names, then it may take a bit of time to actually expand the * glob in that directory.






    share|improve this answer















    The number of items in a directory may be counted using



    set -- *


    This sets the positional parameters ($1, $2, etc.) to the names in the current directory.
    The number of names that * expands to is found in $#. If you use the bash shell and set the dotglob shell option, this will additionally count hidden names.



    Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache that contain more than 1000 names:



    find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache 
    -type d -exec bash -O dotglob -c '
    for pathname do
    set -- "$pathname"/*
    if [ "$#" -gt 1000 ]; then
    printf "%dt%sn" "$#" "$pathname"
    fi
    done' bash +


    This expands the * shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash script for batches of directories. The script will loop over each batch of directories and for each, it will expand the * glob inside it to count the number of entries. An if statement then triggers printf if appropriate.



    Note that if a directory contains millions of names, then it may take a bit of time to actually expand the * glob in that directory.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered yesterday









    KusalanandaKusalananda

    138k17258426




    138k17258426












    • do we need to run "set -- *" before we run your find code?

      – yael
      yesterday











    • @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

      – Kusalananda
      yesterday












    • so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

      – yael
      yesterday












    • @yael Yes, it would include the names of any entry in the directory.

      – Kusalananda
      yesterday

















    • do we need to run "set -- *" before we run your find code?

      – yael
      yesterday











    • @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

      – Kusalananda
      yesterday












    • so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

      – yael
      yesterday












    • @yael Yes, it would include the names of any entry in the directory.

      – Kusalananda
      yesterday
















    do we need to run "set -- *" before we run your find code?

    – yael
    yesterday





    do we need to run "set -- *" before we run your find code?

    – yael
    yesterday













    @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

    – Kusalananda
    yesterday






    @yael No, that is not necessary. That was just a bit of explaining the way that the embedded bash script works that find calls.

    – Kusalananda
    yesterday














    so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

    – yael
    yesterday






    so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it

    – yael
    yesterday














    @yael Yes, it would include the names of any entry in the directory.

    – Kusalananda
    yesterday





    @yael Yes, it would include the names of any entry in the directory.

    – Kusalananda
    yesterday













    1














    On a GNU system



    (export LC_ALL=C
    find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
    tr 'n' 'n' |
    sed 's|/[^/]*$||' |
    sort |
    uniq -c |
    sort -rn |
    head |
    tr 'z' 'n')


    Would list the 10 directories with the most entries.



    If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.



     find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d 
    -size +10000000c -print -prune


    Would list the directories that are more than 10MB large and not enter them.






    share|improve this answer





























      1














      On a GNU system



      (export LC_ALL=C
      find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
      tr 'n' 'n' |
      sed 's|/[^/]*$||' |
      sort |
      uniq -c |
      sort -rn |
      head |
      tr 'z' 'n')


      Would list the 10 directories with the most entries.



      If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.



       find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d 
      -size +10000000c -print -prune


      Would list the directories that are more than 10MB large and not enter them.






      share|improve this answer



























        1












        1








        1







        On a GNU system



        (export LC_ALL=C
        find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
        tr 'n' 'n' |
        sed 's|/[^/]*$||' |
        sort |
        uniq -c |
        sort -rn |
        head |
        tr 'z' 'n')


        Would list the 10 directories with the most entries.



        If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.



         find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d 
        -size +10000000c -print -prune


        Would list the directories that are more than 10MB large and not enter them.






        share|improve this answer















        On a GNU system



        (export LC_ALL=C
        find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
        tr 'n' 'n' |
        sed 's|/[^/]*$||' |
        sort |
        uniq -c |
        sort -rn |
        head |
        tr 'z' 'n')


        Would list the 10 directories with the most entries.



        If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.



         find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d 
        -size +10000000c -print -prune


        Would list the directories that are more than 10MB large and not enter them.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday

























        answered yesterday









        Stéphane ChazelasStéphane Chazelas

        311k57588946




        311k57588946



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f508658%2fhow-to-search-for-large-directory-containing-thousands-of-files%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            getting Checkpoint VPN SSL Network Extender working in the command lineHow to connect to CheckPoint VPN on Ubuntu 18.04LTS?Will the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayVPN SSL Network Extender in FirefoxLinux Checkpoint SNX tool configuration issuesCheck Point - Connect under Linux - snx + OTPSNX VPN Ububuntu 18.XXUsing Checkpoint VPN SSL Network Extender CLI with certificateVPN with network manager (nm-applet) is not workingWill the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayImport VPN config files to NetworkManager from command lineTrouble connecting to VPN using network-manager, while command line worksStart a VPN connection with PPTP protocol on command linestarting a docker service daemon breaks the vpn networkCan't connect to vpn with Network-managerVPN SSL Network Extender in FirefoxUsing Checkpoint VPN SSL Network Extender CLI with certificate

            Cannot Extend partition with GParted The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsCan't increase partition size with GParted?GParted doesn't recognize the unallocated space after my current partitionWhat is the best way to add unallocated space located before to Ubuntu 12.04 partition with GParted live?I can't figure out how to extend my Arch home partition into free spaceGparted Linux Mint 18.1 issueTrying to extend but swap partition is showing as Unknown in Gparted, shows proper from fdiskRearrange partitions in gparted to extend a partitionUnable to extend partition even though unallocated space is next to it using GPartedAllocate free space to root partitiongparted: how to merge unallocated space with a partition

            Marilyn Monroe Ny fiainany manokana | Jereo koa | Meny fitetezanafanitarana azy.