How to search for large directory containing thousands of files?Distributing thousands of files over subfoldersHow to exclude hidden files in recursive chmod?How can I run a script recursively in a directoryFind Directory if Number of files is over XSearch for files inside multiple subdirectories and zip directoriesHow to erase directories that contain less than a certain number of filesHow to remove files and folders underneath a specific folderfind + how to verify that all files and folders are with groups and owner hdfs:hadoopIn Linux, how can I find all sub-folders within a specific directory that do not contain a specific file?How to copy files from HDFS recursive to the local file system
What is the intuitive meaning of having a linear relationship between the logs of two variables?
CREATE opcode: what does it really do?
A particular customize with green line and letters for subfloat
Why not increase contact surface when reentering the atmosphere?
How do we know the LHC results are robust?
Where does the Z80 processor start executing from?
Is HostGator storing my password in plaintext?
How do scammers retract money, while you can’t?
Is `x >> pure y` equivalent to `liftM (const y) x`
Was Spock the First Vulcan in Starfleet?
Do all network devices need to make routing decisions, regardless of communication across networks or within a network?
Is the destination of a commercial flight important for the pilot?
Term for the "extreme-extension" version of a straw man fallacy?
How do I find the solutions of the following equation?
Is there a good way to store credentials outside of a password manager?
Would a high gravity rocky planet be guaranteed to have an atmosphere?
Is there a problem with hiding "forgot password" until it's needed?
How does buying out courses with grant money work?
How to be diplomatic in refusing to write code that breaches the privacy of our users
Increase performance creating Mandelbrot set in python
How to run a prison with the smallest amount of guards?
Hostile work environment after whistle-blowing on coworker and our boss. What do I do?
You cannot touch me, but I can touch you, who am I?
Why, precisely, is argon used in neutrino experiments?
How to search for large directory containing thousands of files?
Distributing thousands of files over subfoldersHow to exclude hidden files in recursive chmod?How can I run a script recursively in a directoryFind Directory if Number of files is over XSearch for files inside multiple subdirectories and zip directoriesHow to erase directories that contain less than a certain number of filesHow to remove files and folders underneath a specific folderfind + how to verify that all files and folders are with groups and owner hdfs:hadoopIn Linux, how can I find all sub-folders within a specific directory that do not contain a specific file?How to copy files from HDFS recursive to the local file system
Under the folder
/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
we have more than 100 recursive folders.
One of the folders contain thousands of files.
Is it possible to identify this folder?
I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.
linux bash shell-script find performance
add a comment |
Under the folder
/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
we have more than 100 recursive folders.
One of the folders contain thousands of files.
Is it possible to identify this folder?
I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.
linux bash shell-script find performance
could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files
– yael
yesterday
Have you runfsck
just to make sure that your filesystem is not corrupted?
– John1024
yesterday
no we not run , but why you think about this direction ?
– yael
yesterday
Because ifls
fails in the directory, one possible cause is a corrupted filesystem.fsck
checks for filesystem corruption.
– John1024
yesterday
With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.
– Stéphane Chazelas
yesterday
add a comment |
Under the folder
/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
we have more than 100 recursive folders.
One of the folders contain thousands of files.
Is it possible to identify this folder?
I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.
linux bash shell-script find performance
Under the folder
/grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
we have more than 100 recursive folders.
One of the folders contain thousands of files.
Is it possible to identify this folder?
I am asking because this one folder that contains the thousands of files
and we may have a problem that we can't remove the files there because of the thousands of files.
linux bash shell-script find performance
linux bash shell-script find performance
edited yesterday
Kusalananda♦
138k17258426
138k17258426
asked yesterday
yaelyael
2,78132777
2,78132777
could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files
– yael
yesterday
Have you runfsck
just to make sure that your filesystem is not corrupted?
– John1024
yesterday
no we not run , but why you think about this direction ?
– yael
yesterday
Because ifls
fails in the directory, one possible cause is a corrupted filesystem.fsck
checks for filesystem corruption.
– John1024
yesterday
With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.
– Stéphane Chazelas
yesterday
add a comment |
could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files
– yael
yesterday
Have you runfsck
just to make sure that your filesystem is not corrupted?
– John1024
yesterday
no we not run , but why you think about this direction ?
– yael
yesterday
Because ifls
fails in the directory, one possible cause is a corrupted filesystem.fsck
checks for filesystem corruption.
– John1024
yesterday
With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.
– Stéphane Chazelas
yesterday
could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files
– yael
yesterday
could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files
– yael
yesterday
Have you run
fsck
just to make sure that your filesystem is not corrupted?– John1024
yesterday
Have you run
fsck
just to make sure that your filesystem is not corrupted?– John1024
yesterday
no we not run , but why you think about this direction ?
– yael
yesterday
no we not run , but why you think about this direction ?
– yael
yesterday
Because if
ls
fails in the directory, one possible cause is a corrupted filesystem. fsck
checks for filesystem corruption.– John1024
yesterday
Because if
ls
fails in the directory, one possible cause is a corrupted filesystem. fsck
checks for filesystem corruption.– John1024
yesterday
With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.
– Stéphane Chazelas
yesterday
With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.
– Stéphane Chazelas
yesterday
add a comment |
2 Answers
2
active
oldest
votes
The number of items in a directory may be counted using
set -- *
This sets the positional parameters ($1
, $2
, etc.) to the names in the current directory.
The number of names that *
expands to is found in $#
. If you use the bash
shell and set the dotglob
shell option, this will additionally count hidden names.
Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
that contain more than 1000 names:
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
-type d -exec bash -O dotglob -c '
for pathname do
set -- "$pathname"/*
if [ "$#" -gt 1000 ]; then
printf "%dt%sn" "$#" "$pathname"
fi
done' bash +
This expands the *
shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash
script for batches of directories. The script will loop over each batch of directories and for each, it will expand the *
glob inside it to count the number of entries. An if
statement then triggers printf
if appropriate.
Note that if a directory contains millions of names, then it may take a bit of time to actually expand the *
glob in that directory.
do we need to run "set -- *" before we run your find code?
– yael
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embeddedbash
script works thatfind
calls.
– Kusalananda♦
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
add a comment |
On a GNU system
(export LC_ALL=C
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
tr 'n' 'n' |
sed 's|/[^/]*$||' |
sort |
uniq -c |
sort -rn |
head |
tr 'z' 'n')
Would list the 10 directories with the most entries.
If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d
-size +10000000c -print -prune
Would list the directories that are more than 10MB large and not enter them.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f508658%2fhow-to-search-for-large-directory-containing-thousands-of-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The number of items in a directory may be counted using
set -- *
This sets the positional parameters ($1
, $2
, etc.) to the names in the current directory.
The number of names that *
expands to is found in $#
. If you use the bash
shell and set the dotglob
shell option, this will additionally count hidden names.
Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
that contain more than 1000 names:
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
-type d -exec bash -O dotglob -c '
for pathname do
set -- "$pathname"/*
if [ "$#" -gt 1000 ]; then
printf "%dt%sn" "$#" "$pathname"
fi
done' bash +
This expands the *
shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash
script for batches of directories. The script will loop over each batch of directories and for each, it will expand the *
glob inside it to count the number of entries. An if
statement then triggers printf
if appropriate.
Note that if a directory contains millions of names, then it may take a bit of time to actually expand the *
glob in that directory.
do we need to run "set -- *" before we run your find code?
– yael
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embeddedbash
script works thatfind
calls.
– Kusalananda♦
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
add a comment |
The number of items in a directory may be counted using
set -- *
This sets the positional parameters ($1
, $2
, etc.) to the names in the current directory.
The number of names that *
expands to is found in $#
. If you use the bash
shell and set the dotglob
shell option, this will additionally count hidden names.
Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
that contain more than 1000 names:
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
-type d -exec bash -O dotglob -c '
for pathname do
set -- "$pathname"/*
if [ "$#" -gt 1000 ]; then
printf "%dt%sn" "$#" "$pathname"
fi
done' bash +
This expands the *
shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash
script for batches of directories. The script will loop over each batch of directories and for each, it will expand the *
glob inside it to count the number of entries. An if
statement then triggers printf
if appropriate.
Note that if a directory contains millions of names, then it may take a bit of time to actually expand the *
glob in that directory.
do we need to run "set -- *" before we run your find code?
– yael
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embeddedbash
script works thatfind
calls.
– Kusalananda♦
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
add a comment |
The number of items in a directory may be counted using
set -- *
This sets the positional parameters ($1
, $2
, etc.) to the names in the current directory.
The number of names that *
expands to is found in $#
. If you use the bash
shell and set the dotglob
shell option, this will additionally count hidden names.
Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
that contain more than 1000 names:
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
-type d -exec bash -O dotglob -c '
for pathname do
set -- "$pathname"/*
if [ "$#" -gt 1000 ]; then
printf "%dt%sn" "$#" "$pathname"
fi
done' bash +
This expands the *
shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash
script for batches of directories. The script will loop over each batch of directories and for each, it will expand the *
glob inside it to count the number of entries. An if
statement then triggers printf
if appropriate.
Note that if a directory contains millions of names, then it may take a bit of time to actually expand the *
glob in that directory.
The number of items in a directory may be counted using
set -- *
This sets the positional parameters ($1
, $2
, etc.) to the names in the current directory.
The number of names that *
expands to is found in $#
. If you use the bash
shell and set the dotglob
shell option, this will additionally count hidden names.
Using this to find directories under /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
that contain more than 1000 names:
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache
-type d -exec bash -O dotglob -c '
for pathname do
set -- "$pathname"/*
if [ "$#" -gt 1000 ]; then
printf "%dt%sn" "$#" "$pathname"
fi
done' bash +
This expands the *
shell glob in each found directory and outputs the pathname of the directory if there are more than 1000 names in it, along with the number of names. It does this by executing a short bash
script for batches of directories. The script will loop over each batch of directories and for each, it will expand the *
glob inside it to count the number of entries. An if
statement then triggers printf
if appropriate.
Note that if a directory contains millions of names, then it may take a bit of time to actually expand the *
glob in that directory.
edited yesterday
answered yesterday
Kusalananda♦Kusalananda
138k17258426
138k17258426
do we need to run "set -- *" before we run your find code?
– yael
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embeddedbash
script works thatfind
calls.
– Kusalananda♦
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
add a comment |
do we need to run "set -- *" before we run your find code?
– yael
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embeddedbash
script works thatfind
calls.
– Kusalananda♦
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
do we need to run "set -- *" before we run your find code?
– yael
yesterday
do we need to run "set -- *" before we run your find code?
– yael
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embedded
bash
script works that find
calls.– Kusalananda♦
yesterday
@yael No, that is not necessary. That was just a bit of explaining the way that the embedded
bash
script works that find
calls.– Kusalananda♦
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
so when you say "names" it include folders and files - am I correct ? ( I mean also if we have more the 1000 folders under folder , it will print it
– yael
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
@yael Yes, it would include the names of any entry in the directory.
– Kusalananda♦
yesterday
add a comment |
On a GNU system
(export LC_ALL=C
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
tr 'n' 'n' |
sed 's|/[^/]*$||' |
sort |
uniq -c |
sort -rn |
head |
tr 'z' 'n')
Would list the 10 directories with the most entries.
If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d
-size +10000000c -print -prune
Would list the directories that are more than 10MB large and not enter them.
add a comment |
On a GNU system
(export LC_ALL=C
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
tr 'n' 'n' |
sed 's|/[^/]*$||' |
sort |
uniq -c |
sort -rn |
head |
tr 'z' 'n')
Would list the 10 directories with the most entries.
If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d
-size +10000000c -print -prune
Would list the directories that are more than 10MB large and not enter them.
add a comment |
On a GNU system
(export LC_ALL=C
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
tr 'n' 'n' |
sed 's|/[^/]*$||' |
sort |
uniq -c |
sort -rn |
head |
tr 'z' 'n')
Would list the 10 directories with the most entries.
If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d
-size +10000000c -print -prune
Would list the directories that are more than 10MB large and not enter them.
On a GNU system
(export LC_ALL=C
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -print0 |
tr 'n' 'n' |
sed 's|/[^/]*$||' |
sort |
uniq -c |
sort -rn |
head |
tr 'z' 'n')
Would list the 10 directories with the most entries.
If the directories have so many files than even listing them would be too expensive, you can try and guess which they are without entering them by looking at their size.
find /grid/sdh/hadoop/yarn/local/usercache/hdfs/appcache -type d
-size +10000000c -print -prune
Would list the directories that are more than 10MB large and not enter them.
edited yesterday
answered yesterday
Stéphane ChazelasStéphane Chazelas
311k57588946
311k57588946
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f508658%2fhow-to-search-for-large-directory-containing-thousands-of-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
could be thousand or more , in this folder for example if you type ls , then it not return the output because the huge files
– yael
yesterday
Have you run
fsck
just to make sure that your filesystem is not corrupted?– John1024
yesterday
no we not run , but why you think about this direction ?
– yael
yesterday
Because if
ls
fails in the directory, one possible cause is a corrupted filesystem.fsck
checks for filesystem corruption.– John1024
yesterday
With some file systems, the size of the directory files can give an indication of how many entries they have without having to list their content.
– Stéphane Chazelas
yesterday