What's the most resource efficient way to count how many files are in a directory? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionThe ls command is not working for a directory with a huge number of filesPortable check empty directoryHow can I tell if “unzip” will create a single folder ahead of time?How can I list only the directories present in a directory using ls?Why is deleting files by name painfully slow and also exceptionally fast?What's the best way to count the number of files in a directory?Move files and delete directories with rsync?The ls command is not working for a directory with a huge number of filesHow to use rsync or scp to efficiently copy the files from machineB and machineC to machineA?What's the most efficient way to log crontab shell script errors with error occurred time?Why does it take longer to add new files to a directory that has a large number of files in it?Update an existing SFV file using CFV
What is the correct way to use the pinch test for dehydration?
Are my PIs rude or am I just being too sensitive?
3 doors, three guards, one stone
Do I really need recursive chmod to restrict access to a folder?
What are the motives behind Cersei's orders given to Bronn?
Should I call the interviewer directly, if HR aren't responding?
Why don't the Weasley twins use magic outside of school if the Trace can only find the location of spells cast?
How can I fade player when goes inside or outside of the area?
G-Code for resetting to 100% speed
How do I stop a creek from eroding my steep embankment?
Is the Standard Deduction better than Itemized when both are the same amount?
Why does Python start at index -1 when indexing a list from the end?
How much radiation do nuclear physics experiments expose researchers to nowadays?
Do you forfeit tax refunds/credits if you aren't required to and don't file by April 15?
Did Kevin spill real chili?
Stars Make Stars
Is the argument below valid?
What are the pros and cons of Aerospike nosecones?
Is it true that "carbohydrates are of no use for the basal metabolic need"?
How does a Death Domain cleric's Touch of Death feature work with Touch-range spells delivered by familiars?
I am not a queen, who am I?
Is the address of a local variable a constexpr?
What makes black pepper strong or mild?
How do I keep my slimes from escaping their pens?
What's the most resource efficient way to count how many files are in a directory?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionThe ls command is not working for a directory with a huge number of filesPortable check empty directoryHow can I tell if “unzip” will create a single folder ahead of time?How can I list only the directories present in a directory using ls?Why is deleting files by name painfully slow and also exceptionally fast?What's the best way to count the number of files in a directory?Move files and delete directories with rsync?The ls command is not working for a directory with a huge number of filesHow to use rsync or scp to efficiently copy the files from machineB and machineC to machineA?What's the most efficient way to log crontab shell script errors with error occurred time?Why does it take longer to add new files to a directory that has a large number of files in it?Update an existing SFV file using CFV
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
CentOS 5.9
I came across an issue the other day where a directory had a lot of files. To count it, I ran ls -l /foo/foo2/ | wc -l
Turns out that there were over 1 million files in a single directory (long story -- the root cause is getting fixed).
My question is: is there a faster way to do the count? What would be the most efficient way to get the count?
bash shell directory ls
add a comment |
CentOS 5.9
I came across an issue the other day where a directory had a lot of files. To count it, I ran ls -l /foo/foo2/ | wc -l
Turns out that there were over 1 million files in a single directory (long story -- the root cause is getting fixed).
My question is: is there a faster way to do the count? What would be the most efficient way to get the count?
bash shell directory ls
5
ls -l|wc -l
would be off by one due to the total blocks in the first line ofls -l
output
– Thomas Nyman
Sep 10 '13 at 21:18
3
@ThomasNyman It would actually be off by several because of the dot and dotdot pseudo entries, but those can be avoided by using the-A
flag.-l
is also problematic because of the reading file meta data in order to generate the extended list format. Forcing NOT-l
by usingls
is a much better option (-1
is assumed when piping output.) See Gilles's answer for the best solution here.
– Caleb
Sep 11 '13 at 9:29
2
@Calebls -l
doesn't output any hidden files nor the.
and..
entries.ls -a
output includes hidden files, including.
and..
whilels -A
output includes hidden files excluding.
and..
. In Gilles's answer the bashdotglob
shell option causes the expansion to include hidden files excluding.
and..
.
– Thomas Nyman
Sep 11 '13 at 9:45
add a comment |
CentOS 5.9
I came across an issue the other day where a directory had a lot of files. To count it, I ran ls -l /foo/foo2/ | wc -l
Turns out that there were over 1 million files in a single directory (long story -- the root cause is getting fixed).
My question is: is there a faster way to do the count? What would be the most efficient way to get the count?
bash shell directory ls
CentOS 5.9
I came across an issue the other day where a directory had a lot of files. To count it, I ran ls -l /foo/foo2/ | wc -l
Turns out that there were over 1 million files in a single directory (long story -- the root cause is getting fixed).
My question is: is there a faster way to do the count? What would be the most efficient way to get the count?
bash shell directory ls
bash shell directory ls
edited Sep 11 '13 at 0:05
Gilles
548k13011131631
548k13011131631
asked Sep 10 '13 at 19:33
Mike BMike B
3,269195578
3,269195578
5
ls -l|wc -l
would be off by one due to the total blocks in the first line ofls -l
output
– Thomas Nyman
Sep 10 '13 at 21:18
3
@ThomasNyman It would actually be off by several because of the dot and dotdot pseudo entries, but those can be avoided by using the-A
flag.-l
is also problematic because of the reading file meta data in order to generate the extended list format. Forcing NOT-l
by usingls
is a much better option (-1
is assumed when piping output.) See Gilles's answer for the best solution here.
– Caleb
Sep 11 '13 at 9:29
2
@Calebls -l
doesn't output any hidden files nor the.
and..
entries.ls -a
output includes hidden files, including.
and..
whilels -A
output includes hidden files excluding.
and..
. In Gilles's answer the bashdotglob
shell option causes the expansion to include hidden files excluding.
and..
.
– Thomas Nyman
Sep 11 '13 at 9:45
add a comment |
5
ls -l|wc -l
would be off by one due to the total blocks in the first line ofls -l
output
– Thomas Nyman
Sep 10 '13 at 21:18
3
@ThomasNyman It would actually be off by several because of the dot and dotdot pseudo entries, but those can be avoided by using the-A
flag.-l
is also problematic because of the reading file meta data in order to generate the extended list format. Forcing NOT-l
by usingls
is a much better option (-1
is assumed when piping output.) See Gilles's answer for the best solution here.
– Caleb
Sep 11 '13 at 9:29
2
@Calebls -l
doesn't output any hidden files nor the.
and..
entries.ls -a
output includes hidden files, including.
and..
whilels -A
output includes hidden files excluding.
and..
. In Gilles's answer the bashdotglob
shell option causes the expansion to include hidden files excluding.
and..
.
– Thomas Nyman
Sep 11 '13 at 9:45
5
5
ls -l|wc -l
would be off by one due to the total blocks in the first line of ls -l
output– Thomas Nyman
Sep 10 '13 at 21:18
ls -l|wc -l
would be off by one due to the total blocks in the first line of ls -l
output– Thomas Nyman
Sep 10 '13 at 21:18
3
3
@ThomasNyman It would actually be off by several because of the dot and dotdot pseudo entries, but those can be avoided by using the
-A
flag. -l
is also problematic because of the reading file meta data in order to generate the extended list format. Forcing NOT -l
by using ls
is a much better option (-1
is assumed when piping output.) See Gilles's answer for the best solution here.– Caleb
Sep 11 '13 at 9:29
@ThomasNyman It would actually be off by several because of the dot and dotdot pseudo entries, but those can be avoided by using the
-A
flag. -l
is also problematic because of the reading file meta data in order to generate the extended list format. Forcing NOT -l
by using ls
is a much better option (-1
is assumed when piping output.) See Gilles's answer for the best solution here.– Caleb
Sep 11 '13 at 9:29
2
2
@Caleb
ls -l
doesn't output any hidden files nor the .
and ..
entries. ls -a
output includes hidden files, including .
and ..
while ls -A
output includes hidden files excluding .
and ..
. In Gilles's answer the bash dotglob
shell option causes the expansion to include hidden files excluding .
and ..
.– Thomas Nyman
Sep 11 '13 at 9:45
@Caleb
ls -l
doesn't output any hidden files nor the .
and ..
entries. ls -a
output includes hidden files, including .
and ..
while ls -A
output includes hidden files excluding .
and ..
. In Gilles's answer the bash dotglob
shell option causes the expansion to include hidden files excluding .
and ..
.– Thomas Nyman
Sep 11 '13 at 9:45
add a comment |
13 Answers
13
active
oldest
votes
Short answer:
ls -afq | wc -l
(This includes .
and ..
, so subtract 2.)
When you list the files in a directory, three common things might happen:
- Enumerating the file names in the directory. This is inescapable: there is no way to count the files in a directory without enumerating them.
- Sorting the file names. Shell wildcards and the
ls
command do that. - Calling
stat
to retrieve metadata about each directory entry, such as whether it is a directory.
#3 is the most expensive by far, because it requires loading an inode for each file. In comparison all the file names needed for #1 are compactly stored in a few blocks. #2 wastes some CPU time but it is often not a deal breaker.
If there are no newlines in file names, a simple ls -A | wc -l
tells you how many files there are in the directory. Beware that if you have an alias for ls
, this may trigger a call to stat
(e.g. ls --color
or ls -F
need to know the file type, which requires a call to stat
), so from the command line, call command ls -A | wc -l
or ls -A | wc -l
to avoid an alias.
If there are newlines in the file name, whether newlines are listed or not depends on the Unix variant. GNU coreutils and BusyBox default to displaying ?
for a newline, so they're safe.
Call ls -f
to list the entries without sorting them (#2). This automatically turns on -a
(at least on modern systems). The -f
option is in POSIX but with optional status; most implementations support it, but not BusyBox. The option -q
replaces non-printable characters including newlines by ?
; it's POSIX but isn't supported by BusyBox, so omit it if you need BusyBox support at the expense of overcounting files whose name contains a newline character.
If the directory has no subdirectories, then most versions of find
will not call stat
on its entries (leaf directory optimization: a directory that has a link count of 2 cannot have subdirectories, so find
doesn't need to look up the metadata of the entries unless a condition such as -type
requires it). So find . | wc -l
is a portable, fast way to count files in a directory provided that the directory has no subdirectories and that no file name contains a newline.
If the directory has no subdirectories but file names may contain newlines, try one of these (the second one should be faster if it's supported, but may not be noticeably so).
find -print0 | tr -dc \0 | wc -c
find -printf a | wc -c
On the other hand, don't use find
if the directory has subdirectories: even find . -maxdepth 1
calls stat
on every entry (at least with GNU find and BusyBox find). You avoid sorting (#2) but you pay the price of an inode lookup (#3) which kills performance.
In the shell without external tools, you can run count the files in the current directory with set -- *; echo $#
. This misses dot files (files whose name begins with .
) and reports 1 instead of 0 in an empty directory. This is the fastest way to count files in small directories because it doesn't require starting an external program, but (except in zsh) wastes time for larger directories due to the sorting step (#2).
In bash, this is a reliable way to count the files in the current directory:
shopt -s dotglob nullglob
a=(*)
echo $#a[@]In ksh93, this is a reliable way to count the files in the current directory:
FIGNORE='@(.|..)'
a=(~(N)*)
echo $#a[@]In zsh, this is a reliable way to count the files in the current directory:
a=(*(DNoN))
echo $#aIf you have the
mark_dirs
option set, make sure to turn it off:a=(*(DNoN^M))
.In any POSIX shell, this is a reliable way to count the files in the current directory:
total=0
set -- *
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- .[!.]*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- ..?*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
echo "$total"
All of these methods sort the file names, except for the zsh one.
1
My empirical testing on >1 million files shows thatfind -maxdepth 1
easily keeps pace withls -U
as long as you don't add anything like a-type
declaration that has to do further checks. Are you sure GNU find actually callsstat
? Even the slowdown onfind -type
is nothing compared to how muchls -l
bogs if you make it return file details. On the other hand the clear speed winner iszsh
using the non sorting glob. (sorted globs are 2x slower thanls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.
– Caleb
Sep 11 '13 at 9:44
@Caleb I ranstrace
. This is only true if the directory has subdirectories: otherwisefind
's leaf directory optimization kicks in (even without-maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (callingstat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.
– Gilles
Sep 11 '13 at 9:55
1
Historically,ls -f
has been the reliable way to prevent callingstat
- this is often simply described today as "output is not sorted" (which it also causes), and does include.
and..
.-A
and-U
are not standard options.
– Random832
Sep 11 '13 at 12:59
1
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
FYI, with ksh93version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system,FIGNORE
doesn't seem to work. The.
and..
entries are included into the resulting array
– Sergiy Kolodyazhnyy
Jan 4 at 8:52
|
show 3 more comments
find /foo/foo2/ -maxdepth 1 | wc -l
Is considerably faster on my machine but the local .
directory is added to the count.
1
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
2
Yes, that's my understanding. As long as your not using the-type
parameterfind
should be faster thanls
– Joel Taylor
Sep 10 '13 at 21:02
1
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Add a-mindepth 1
to omit the directory itself.
– Stéphane Chazelas
Jan 4 at 9:53
add a comment |
ls -1U
before the pipe should spend just a bit less resources, as it does no attempt to sort the file entries, it just reads them as they are sorted in the folder on disk. It also produces less output, meaning slightly less work for wc
.
You could also use ls -f
which is more or less a shortcut for ls -1aU
.
I don't know if there is a resource-efficient way to do it via a command without piping though.
8
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
add a comment |
Another point of comparison. While not being a shell oneliner, this C program doesn't do anything superflous. Note that hidden files are ignored to match the output of ls|wc -l
(ls -l|wc -l
is off by one due to the total blocks in the first line of output).
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <error.h>
#include <errno.h>
int main(int argc, char *argv[])
int file_count = 0;
DIR * dirp;
struct dirent * entry;
if (argc < 2)
error(EXIT_FAILURE, 0, "missing argument");
if(!(dirp = opendir(argv[1])))
error(EXIT_FAILURE, errno, "could not open '%s'", argv[1]);
while ((entry = readdir(dirp)) != NULL)
if (entry->d_name[0] == '.') /* ignore hidden files */
continue;
file_count++;
closedir(dirp);
printf("%dn", file_count);
Using thereaddir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)
– Stéphane Chazelas
Jan 4 at 9:41
add a comment |
You could try perl -e 'opendir($dh,".");$i=0;while(readdir $dh)$i++;print "$in";'
It'd be interesting to compare timings with your shell pipe.
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (find -maxdepth 1 | wc -l
,ls -AU | wc -l
and thezsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)
– Caleb
Sep 11 '13 at 9:53
Note that this will include the.
and..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl,perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.
– Ilmari Karonen
Sep 11 '13 at 10:36
add a comment |
From this answer, I can think of this one as a possible solution.
/*
* List directories using getdents() because ls, find and Python libraries
* use readdir() which is slower (but uses getdents() underneath.
*
* Compile with
* ]$ gcc getdents.c -o getdents
*/
#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#define handle_error(msg)
do perror(msg); exit(EXIT_FAILURE); while (0)
struct linux_dirent
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
;
#define BUF_SIZE 1024*1024*5
int
main(int argc, char *argv[])
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;
fd = open(argc > 1 ? argv[1] : ".", O_RDONLY
Copy the C program above into directory in which the files need to be listed. Then execute these commands:
gcc getdents.c -o getdents
./getdents | wc -l
1
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare withls -f
, don't filter ond_type
at all, just ond->d_ino != 0
; 3) subtract 2 for.
and..
.
– Matei David
Jan 17 '17 at 16:01
See linked answer for a timings example where this is 40x faster than the acceptedls -f
.
– Matei David
Jan 17 '17 at 16:02
add a comment |
A bash-only solution, not requiring any external program, but don't know how much efficient:
list=(*)
echo "$#list[@]"
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
add a comment |
Probably the most resource efficient way would involve no outside process invocations. So I'd wager on...
cglb() ( c=0 ; set --
tglb() [ -e "$2" ]
for glb in '.?*' *
do tglb $1 $glb##.* $glb#*
set -- ..
done
echo $c
)
1
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
add a comment |
After fixing the issue from @Joel 's answer, where it added .
as a file:
find /foo/foo2 -maxdepth 1 | tail -n +2 | wc -l
tail
simply removes the first line, meaning that .
isn't counted anymore.
1
Adding a pair of pipes in order to omit one line ofwc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation:echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
1
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
add a comment |
os.listdir() in python can do the work for you. It gives an array of the contents of the directory, excluding the special '.' and '..' files. Also, no need to worry abt files with special characters like 'n' in the name.
python -c 'import os;print len(os.listdir("."))'
following is the time taken by the above python command compared with the 'ls -Af' command.
~/test$ time ls -Af |wc -l
399144
real 0m0.300s
user 0m0.104s
sys 0m0.240s
~/test$ time python -c 'import os;print len(os.listdir("."))'
399142
real 0m0.249s
user 0m0.064s
sys 0m0.180s
add a comment |
ls -1 | wc -l
comes immediately to my mind. Whether ls -1U
is faster than ls -1
is purely academic - the difference should be negligible but for very large directories.
add a comment |
I know this is old but I feel that awk
has to be mentioned here. The suggestions that include the use of wc
simply aren't correct in regards to OP's question: "the most resource efficient way." I recently had a log file get way out of control (due to some bad software) and therefore stumbled onto this post. There was roughly 232 million entries! I first tried wc -l
and waited 15 minutes - it was not even able to finish counting the lines. The following awk
statement gave me an accurate line count in 3 minutes on that log file. I've learned over the years to never underestimate awk's ability to simulate standard shell programs in a much more efficient fashion. Hope it helps someone like me. Happy hacking!
awk 'BEGINi=0 i++ ENDprint i' /foo/foo2
And if you need to substitute a command like ls
for counting files in a directory:
`#Normal:` awk 'BEGINi=0 i++ ENDprint i' <(ls /foo/foo2/)
`#Hidden:` awk 'BEGINi=0 i++ ENDprint (i-2)' <(ls -f /foo/foo2/)
Or simply,awk 'ENDprint NR'
. But in this particular situation,awk
may be overkill becausels
is the bottleneck, notwc
.
– Amit Naidu
May 29 '18 at 5:19
add a comment |
I would think echo * would be more efficient than any 'ls' command:
echo * | wc -w
4
What about files with a space in their name?echo 'Hello World'|wc -w
produces2
.
– Joseph R.
Sep 11 '13 at 20:52
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f90106%2fwhats-the-most-resource-efficient-way-to-count-how-many-files-are-in-a-director%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
13 Answers
13
active
oldest
votes
13 Answers
13
active
oldest
votes
active
oldest
votes
active
oldest
votes
Short answer:
ls -afq | wc -l
(This includes .
and ..
, so subtract 2.)
When you list the files in a directory, three common things might happen:
- Enumerating the file names in the directory. This is inescapable: there is no way to count the files in a directory without enumerating them.
- Sorting the file names. Shell wildcards and the
ls
command do that. - Calling
stat
to retrieve metadata about each directory entry, such as whether it is a directory.
#3 is the most expensive by far, because it requires loading an inode for each file. In comparison all the file names needed for #1 are compactly stored in a few blocks. #2 wastes some CPU time but it is often not a deal breaker.
If there are no newlines in file names, a simple ls -A | wc -l
tells you how many files there are in the directory. Beware that if you have an alias for ls
, this may trigger a call to stat
(e.g. ls --color
or ls -F
need to know the file type, which requires a call to stat
), so from the command line, call command ls -A | wc -l
or ls -A | wc -l
to avoid an alias.
If there are newlines in the file name, whether newlines are listed or not depends on the Unix variant. GNU coreutils and BusyBox default to displaying ?
for a newline, so they're safe.
Call ls -f
to list the entries without sorting them (#2). This automatically turns on -a
(at least on modern systems). The -f
option is in POSIX but with optional status; most implementations support it, but not BusyBox. The option -q
replaces non-printable characters including newlines by ?
; it's POSIX but isn't supported by BusyBox, so omit it if you need BusyBox support at the expense of overcounting files whose name contains a newline character.
If the directory has no subdirectories, then most versions of find
will not call stat
on its entries (leaf directory optimization: a directory that has a link count of 2 cannot have subdirectories, so find
doesn't need to look up the metadata of the entries unless a condition such as -type
requires it). So find . | wc -l
is a portable, fast way to count files in a directory provided that the directory has no subdirectories and that no file name contains a newline.
If the directory has no subdirectories but file names may contain newlines, try one of these (the second one should be faster if it's supported, but may not be noticeably so).
find -print0 | tr -dc \0 | wc -c
find -printf a | wc -c
On the other hand, don't use find
if the directory has subdirectories: even find . -maxdepth 1
calls stat
on every entry (at least with GNU find and BusyBox find). You avoid sorting (#2) but you pay the price of an inode lookup (#3) which kills performance.
In the shell without external tools, you can run count the files in the current directory with set -- *; echo $#
. This misses dot files (files whose name begins with .
) and reports 1 instead of 0 in an empty directory. This is the fastest way to count files in small directories because it doesn't require starting an external program, but (except in zsh) wastes time for larger directories due to the sorting step (#2).
In bash, this is a reliable way to count the files in the current directory:
shopt -s dotglob nullglob
a=(*)
echo $#a[@]In ksh93, this is a reliable way to count the files in the current directory:
FIGNORE='@(.|..)'
a=(~(N)*)
echo $#a[@]In zsh, this is a reliable way to count the files in the current directory:
a=(*(DNoN))
echo $#aIf you have the
mark_dirs
option set, make sure to turn it off:a=(*(DNoN^M))
.In any POSIX shell, this is a reliable way to count the files in the current directory:
total=0
set -- *
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- .[!.]*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- ..?*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
echo "$total"
All of these methods sort the file names, except for the zsh one.
1
My empirical testing on >1 million files shows thatfind -maxdepth 1
easily keeps pace withls -U
as long as you don't add anything like a-type
declaration that has to do further checks. Are you sure GNU find actually callsstat
? Even the slowdown onfind -type
is nothing compared to how muchls -l
bogs if you make it return file details. On the other hand the clear speed winner iszsh
using the non sorting glob. (sorted globs are 2x slower thanls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.
– Caleb
Sep 11 '13 at 9:44
@Caleb I ranstrace
. This is only true if the directory has subdirectories: otherwisefind
's leaf directory optimization kicks in (even without-maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (callingstat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.
– Gilles
Sep 11 '13 at 9:55
1
Historically,ls -f
has been the reliable way to prevent callingstat
- this is often simply described today as "output is not sorted" (which it also causes), and does include.
and..
.-A
and-U
are not standard options.
– Random832
Sep 11 '13 at 12:59
1
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
FYI, with ksh93version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system,FIGNORE
doesn't seem to work. The.
and..
entries are included into the resulting array
– Sergiy Kolodyazhnyy
Jan 4 at 8:52
|
show 3 more comments
Short answer:
ls -afq | wc -l
(This includes .
and ..
, so subtract 2.)
When you list the files in a directory, three common things might happen:
- Enumerating the file names in the directory. This is inescapable: there is no way to count the files in a directory without enumerating them.
- Sorting the file names. Shell wildcards and the
ls
command do that. - Calling
stat
to retrieve metadata about each directory entry, such as whether it is a directory.
#3 is the most expensive by far, because it requires loading an inode for each file. In comparison all the file names needed for #1 are compactly stored in a few blocks. #2 wastes some CPU time but it is often not a deal breaker.
If there are no newlines in file names, a simple ls -A | wc -l
tells you how many files there are in the directory. Beware that if you have an alias for ls
, this may trigger a call to stat
(e.g. ls --color
or ls -F
need to know the file type, which requires a call to stat
), so from the command line, call command ls -A | wc -l
or ls -A | wc -l
to avoid an alias.
If there are newlines in the file name, whether newlines are listed or not depends on the Unix variant. GNU coreutils and BusyBox default to displaying ?
for a newline, so they're safe.
Call ls -f
to list the entries without sorting them (#2). This automatically turns on -a
(at least on modern systems). The -f
option is in POSIX but with optional status; most implementations support it, but not BusyBox. The option -q
replaces non-printable characters including newlines by ?
; it's POSIX but isn't supported by BusyBox, so omit it if you need BusyBox support at the expense of overcounting files whose name contains a newline character.
If the directory has no subdirectories, then most versions of find
will not call stat
on its entries (leaf directory optimization: a directory that has a link count of 2 cannot have subdirectories, so find
doesn't need to look up the metadata of the entries unless a condition such as -type
requires it). So find . | wc -l
is a portable, fast way to count files in a directory provided that the directory has no subdirectories and that no file name contains a newline.
If the directory has no subdirectories but file names may contain newlines, try one of these (the second one should be faster if it's supported, but may not be noticeably so).
find -print0 | tr -dc \0 | wc -c
find -printf a | wc -c
On the other hand, don't use find
if the directory has subdirectories: even find . -maxdepth 1
calls stat
on every entry (at least with GNU find and BusyBox find). You avoid sorting (#2) but you pay the price of an inode lookup (#3) which kills performance.
In the shell without external tools, you can run count the files in the current directory with set -- *; echo $#
. This misses dot files (files whose name begins with .
) and reports 1 instead of 0 in an empty directory. This is the fastest way to count files in small directories because it doesn't require starting an external program, but (except in zsh) wastes time for larger directories due to the sorting step (#2).
In bash, this is a reliable way to count the files in the current directory:
shopt -s dotglob nullglob
a=(*)
echo $#a[@]In ksh93, this is a reliable way to count the files in the current directory:
FIGNORE='@(.|..)'
a=(~(N)*)
echo $#a[@]In zsh, this is a reliable way to count the files in the current directory:
a=(*(DNoN))
echo $#aIf you have the
mark_dirs
option set, make sure to turn it off:a=(*(DNoN^M))
.In any POSIX shell, this is a reliable way to count the files in the current directory:
total=0
set -- *
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- .[!.]*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- ..?*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
echo "$total"
All of these methods sort the file names, except for the zsh one.
1
My empirical testing on >1 million files shows thatfind -maxdepth 1
easily keeps pace withls -U
as long as you don't add anything like a-type
declaration that has to do further checks. Are you sure GNU find actually callsstat
? Even the slowdown onfind -type
is nothing compared to how muchls -l
bogs if you make it return file details. On the other hand the clear speed winner iszsh
using the non sorting glob. (sorted globs are 2x slower thanls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.
– Caleb
Sep 11 '13 at 9:44
@Caleb I ranstrace
. This is only true if the directory has subdirectories: otherwisefind
's leaf directory optimization kicks in (even without-maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (callingstat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.
– Gilles
Sep 11 '13 at 9:55
1
Historically,ls -f
has been the reliable way to prevent callingstat
- this is often simply described today as "output is not sorted" (which it also causes), and does include.
and..
.-A
and-U
are not standard options.
– Random832
Sep 11 '13 at 12:59
1
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
FYI, with ksh93version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system,FIGNORE
doesn't seem to work. The.
and..
entries are included into the resulting array
– Sergiy Kolodyazhnyy
Jan 4 at 8:52
|
show 3 more comments
Short answer:
ls -afq | wc -l
(This includes .
and ..
, so subtract 2.)
When you list the files in a directory, three common things might happen:
- Enumerating the file names in the directory. This is inescapable: there is no way to count the files in a directory without enumerating them.
- Sorting the file names. Shell wildcards and the
ls
command do that. - Calling
stat
to retrieve metadata about each directory entry, such as whether it is a directory.
#3 is the most expensive by far, because it requires loading an inode for each file. In comparison all the file names needed for #1 are compactly stored in a few blocks. #2 wastes some CPU time but it is often not a deal breaker.
If there are no newlines in file names, a simple ls -A | wc -l
tells you how many files there are in the directory. Beware that if you have an alias for ls
, this may trigger a call to stat
(e.g. ls --color
or ls -F
need to know the file type, which requires a call to stat
), so from the command line, call command ls -A | wc -l
or ls -A | wc -l
to avoid an alias.
If there are newlines in the file name, whether newlines are listed or not depends on the Unix variant. GNU coreutils and BusyBox default to displaying ?
for a newline, so they're safe.
Call ls -f
to list the entries without sorting them (#2). This automatically turns on -a
(at least on modern systems). The -f
option is in POSIX but with optional status; most implementations support it, but not BusyBox. The option -q
replaces non-printable characters including newlines by ?
; it's POSIX but isn't supported by BusyBox, so omit it if you need BusyBox support at the expense of overcounting files whose name contains a newline character.
If the directory has no subdirectories, then most versions of find
will not call stat
on its entries (leaf directory optimization: a directory that has a link count of 2 cannot have subdirectories, so find
doesn't need to look up the metadata of the entries unless a condition such as -type
requires it). So find . | wc -l
is a portable, fast way to count files in a directory provided that the directory has no subdirectories and that no file name contains a newline.
If the directory has no subdirectories but file names may contain newlines, try one of these (the second one should be faster if it's supported, but may not be noticeably so).
find -print0 | tr -dc \0 | wc -c
find -printf a | wc -c
On the other hand, don't use find
if the directory has subdirectories: even find . -maxdepth 1
calls stat
on every entry (at least with GNU find and BusyBox find). You avoid sorting (#2) but you pay the price of an inode lookup (#3) which kills performance.
In the shell without external tools, you can run count the files in the current directory with set -- *; echo $#
. This misses dot files (files whose name begins with .
) and reports 1 instead of 0 in an empty directory. This is the fastest way to count files in small directories because it doesn't require starting an external program, but (except in zsh) wastes time for larger directories due to the sorting step (#2).
In bash, this is a reliable way to count the files in the current directory:
shopt -s dotglob nullglob
a=(*)
echo $#a[@]In ksh93, this is a reliable way to count the files in the current directory:
FIGNORE='@(.|..)'
a=(~(N)*)
echo $#a[@]In zsh, this is a reliable way to count the files in the current directory:
a=(*(DNoN))
echo $#aIf you have the
mark_dirs
option set, make sure to turn it off:a=(*(DNoN^M))
.In any POSIX shell, this is a reliable way to count the files in the current directory:
total=0
set -- *
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- .[!.]*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- ..?*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
echo "$total"
All of these methods sort the file names, except for the zsh one.
Short answer:
ls -afq | wc -l
(This includes .
and ..
, so subtract 2.)
When you list the files in a directory, three common things might happen:
- Enumerating the file names in the directory. This is inescapable: there is no way to count the files in a directory without enumerating them.
- Sorting the file names. Shell wildcards and the
ls
command do that. - Calling
stat
to retrieve metadata about each directory entry, such as whether it is a directory.
#3 is the most expensive by far, because it requires loading an inode for each file. In comparison all the file names needed for #1 are compactly stored in a few blocks. #2 wastes some CPU time but it is often not a deal breaker.
If there are no newlines in file names, a simple ls -A | wc -l
tells you how many files there are in the directory. Beware that if you have an alias for ls
, this may trigger a call to stat
(e.g. ls --color
or ls -F
need to know the file type, which requires a call to stat
), so from the command line, call command ls -A | wc -l
or ls -A | wc -l
to avoid an alias.
If there are newlines in the file name, whether newlines are listed or not depends on the Unix variant. GNU coreutils and BusyBox default to displaying ?
for a newline, so they're safe.
Call ls -f
to list the entries without sorting them (#2). This automatically turns on -a
(at least on modern systems). The -f
option is in POSIX but with optional status; most implementations support it, but not BusyBox. The option -q
replaces non-printable characters including newlines by ?
; it's POSIX but isn't supported by BusyBox, so omit it if you need BusyBox support at the expense of overcounting files whose name contains a newline character.
If the directory has no subdirectories, then most versions of find
will not call stat
on its entries (leaf directory optimization: a directory that has a link count of 2 cannot have subdirectories, so find
doesn't need to look up the metadata of the entries unless a condition such as -type
requires it). So find . | wc -l
is a portable, fast way to count files in a directory provided that the directory has no subdirectories and that no file name contains a newline.
If the directory has no subdirectories but file names may contain newlines, try one of these (the second one should be faster if it's supported, but may not be noticeably so).
find -print0 | tr -dc \0 | wc -c
find -printf a | wc -c
On the other hand, don't use find
if the directory has subdirectories: even find . -maxdepth 1
calls stat
on every entry (at least with GNU find and BusyBox find). You avoid sorting (#2) but you pay the price of an inode lookup (#3) which kills performance.
In the shell without external tools, you can run count the files in the current directory with set -- *; echo $#
. This misses dot files (files whose name begins with .
) and reports 1 instead of 0 in an empty directory. This is the fastest way to count files in small directories because it doesn't require starting an external program, but (except in zsh) wastes time for larger directories due to the sorting step (#2).
In bash, this is a reliable way to count the files in the current directory:
shopt -s dotglob nullglob
a=(*)
echo $#a[@]In ksh93, this is a reliable way to count the files in the current directory:
FIGNORE='@(.|..)'
a=(~(N)*)
echo $#a[@]In zsh, this is a reliable way to count the files in the current directory:
a=(*(DNoN))
echo $#aIf you have the
mark_dirs
option set, make sure to turn it off:a=(*(DNoN^M))
.In any POSIX shell, this is a reliable way to count the files in the current directory:
total=0
set -- *
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- .[!.]*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
set -- ..?*
if [ $# -ne 1 ] || [ -e "$1" ] || [ -L "$1" ]; then total=$((total+$#)); fi
echo "$total"
All of these methods sort the file names, except for the zsh one.
edited Jan 4 at 9:35
answered Sep 11 '13 at 0:30
GillesGilles
548k13011131631
548k13011131631
1
My empirical testing on >1 million files shows thatfind -maxdepth 1
easily keeps pace withls -U
as long as you don't add anything like a-type
declaration that has to do further checks. Are you sure GNU find actually callsstat
? Even the slowdown onfind -type
is nothing compared to how muchls -l
bogs if you make it return file details. On the other hand the clear speed winner iszsh
using the non sorting glob. (sorted globs are 2x slower thanls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.
– Caleb
Sep 11 '13 at 9:44
@Caleb I ranstrace
. This is only true if the directory has subdirectories: otherwisefind
's leaf directory optimization kicks in (even without-maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (callingstat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.
– Gilles
Sep 11 '13 at 9:55
1
Historically,ls -f
has been the reliable way to prevent callingstat
- this is often simply described today as "output is not sorted" (which it also causes), and does include.
and..
.-A
and-U
are not standard options.
– Random832
Sep 11 '13 at 12:59
1
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
FYI, with ksh93version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system,FIGNORE
doesn't seem to work. The.
and..
entries are included into the resulting array
– Sergiy Kolodyazhnyy
Jan 4 at 8:52
|
show 3 more comments
1
My empirical testing on >1 million files shows thatfind -maxdepth 1
easily keeps pace withls -U
as long as you don't add anything like a-type
declaration that has to do further checks. Are you sure GNU find actually callsstat
? Even the slowdown onfind -type
is nothing compared to how muchls -l
bogs if you make it return file details. On the other hand the clear speed winner iszsh
using the non sorting glob. (sorted globs are 2x slower thanls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.
– Caleb
Sep 11 '13 at 9:44
@Caleb I ranstrace
. This is only true if the directory has subdirectories: otherwisefind
's leaf directory optimization kicks in (even without-maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (callingstat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.
– Gilles
Sep 11 '13 at 9:55
1
Historically,ls -f
has been the reliable way to prevent callingstat
- this is often simply described today as "output is not sorted" (which it also causes), and does include.
and..
.-A
and-U
are not standard options.
– Random832
Sep 11 '13 at 12:59
1
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
FYI, with ksh93version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system,FIGNORE
doesn't seem to work. The.
and..
entries are included into the resulting array
– Sergiy Kolodyazhnyy
Jan 4 at 8:52
1
1
My empirical testing on >1 million files shows that
find -maxdepth 1
easily keeps pace with ls -U
as long as you don't add anything like a -type
declaration that has to do further checks. Are you sure GNU find actually calls stat
? Even the slowdown on find -type
is nothing compared to how much ls -l
bogs if you make it return file details. On the other hand the clear speed winner is zsh
using the non sorting glob. (sorted globs are 2x slower than ls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.– Caleb
Sep 11 '13 at 9:44
My empirical testing on >1 million files shows that
find -maxdepth 1
easily keeps pace with ls -U
as long as you don't add anything like a -type
declaration that has to do further checks. Are you sure GNU find actually calls stat
? Even the slowdown on find -type
is nothing compared to how much ls -l
bogs if you make it return file details. On the other hand the clear speed winner is zsh
using the non sorting glob. (sorted globs are 2x slower than ls
while the non-sorting one is 2x faster). I wonder if file system types would significantly effect these results.– Caleb
Sep 11 '13 at 9:44
@Caleb I ran
strace
. This is only true if the directory has subdirectories: otherwise find
's leaf directory optimization kicks in (even without -maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (calling stat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.– Gilles
Sep 11 '13 at 9:55
@Caleb I ran
strace
. This is only true if the directory has subdirectories: otherwise find
's leaf directory optimization kicks in (even without -maxdepth 1
), I should have mentioned that. A lot of things can affect the result, including the filesystem type (calling stat
is a lot more expensive on filesystems that represent directories as linear lists than on filesystems that represent directories as trees), whether the inodes were all created together and are thus close by on the disk, cold or hot cache, etc.– Gilles
Sep 11 '13 at 9:55
1
1
Historically,
ls -f
has been the reliable way to prevent calling stat
- this is often simply described today as "output is not sorted" (which it also causes), and does include .
and ..
. -A
and -U
are not standard options.– Random832
Sep 11 '13 at 12:59
Historically,
ls -f
has been the reliable way to prevent calling stat
- this is often simply described today as "output is not sorted" (which it also causes), and does include .
and ..
. -A
and -U
are not standard options.– Random832
Sep 11 '13 at 12:59
1
1
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:
ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
If you specifically want to count file with a common extension (or other string), inserting that into the command eliminates the extra 2. Here is an example:
ls -afq *[0-9].pdb | wc -l
– Steven C. Howell
Jun 12 '15 at 13:18
FYI, with ksh93
version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system, FIGNORE
doesn't seem to work. The .
and ..
entries are included into the resulting array– Sergiy Kolodyazhnyy
Jan 4 at 8:52
FYI, with ksh93
version sh (AT&T Research) 93u+ 2012-08-01
on my Debian-based system, FIGNORE
doesn't seem to work. The .
and ..
entries are included into the resulting array– Sergiy Kolodyazhnyy
Jan 4 at 8:52
|
show 3 more comments
find /foo/foo2/ -maxdepth 1 | wc -l
Is considerably faster on my machine but the local .
directory is added to the count.
1
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
2
Yes, that's my understanding. As long as your not using the-type
parameterfind
should be faster thanls
– Joel Taylor
Sep 10 '13 at 21:02
1
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Add a-mindepth 1
to omit the directory itself.
– Stéphane Chazelas
Jan 4 at 9:53
add a comment |
find /foo/foo2/ -maxdepth 1 | wc -l
Is considerably faster on my machine but the local .
directory is added to the count.
1
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
2
Yes, that's my understanding. As long as your not using the-type
parameterfind
should be faster thanls
– Joel Taylor
Sep 10 '13 at 21:02
1
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Add a-mindepth 1
to omit the directory itself.
– Stéphane Chazelas
Jan 4 at 9:53
add a comment |
find /foo/foo2/ -maxdepth 1 | wc -l
Is considerably faster on my machine but the local .
directory is added to the count.
find /foo/foo2/ -maxdepth 1 | wc -l
Is considerably faster on my machine but the local .
directory is added to the count.
answered Sep 10 '13 at 20:40
Joel TaylorJoel Taylor
743413
743413
1
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
2
Yes, that's my understanding. As long as your not using the-type
parameterfind
should be faster thanls
– Joel Taylor
Sep 10 '13 at 21:02
1
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Add a-mindepth 1
to omit the directory itself.
– Stéphane Chazelas
Jan 4 at 9:53
add a comment |
1
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
2
Yes, that's my understanding. As long as your not using the-type
parameterfind
should be faster thanls
– Joel Taylor
Sep 10 '13 at 21:02
1
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Add a-mindepth 1
to omit the directory itself.
– Stéphane Chazelas
Jan 4 at 9:53
1
1
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
Thanks. I'm compelled to ask a silly question though: why is it faster? Because it's not bothering to look-up file attributes?
– Mike B
Sep 10 '13 at 20:42
2
2
Yes, that's my understanding. As long as your not using the
-type
parameter find
should be faster than ls
– Joel Taylor
Sep 10 '13 at 21:02
Yes, that's my understanding. As long as your not using the
-type
parameter find
should be faster than ls
– Joel Taylor
Sep 10 '13 at 21:02
1
1
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Hmmm.... if I'm understanding the documentation of find well, this should actually be better than my answer. Anyone with more experience can verify?
– Luis Machuca
Sep 11 '13 at 2:38
Add a
-mindepth 1
to omit the directory itself.– Stéphane Chazelas
Jan 4 at 9:53
Add a
-mindepth 1
to omit the directory itself.– Stéphane Chazelas
Jan 4 at 9:53
add a comment |
ls -1U
before the pipe should spend just a bit less resources, as it does no attempt to sort the file entries, it just reads them as they are sorted in the folder on disk. It also produces less output, meaning slightly less work for wc
.
You could also use ls -f
which is more or less a shortcut for ls -1aU
.
I don't know if there is a resource-efficient way to do it via a command without piping though.
8
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
add a comment |
ls -1U
before the pipe should spend just a bit less resources, as it does no attempt to sort the file entries, it just reads them as they are sorted in the folder on disk. It also produces less output, meaning slightly less work for wc
.
You could also use ls -f
which is more or less a shortcut for ls -1aU
.
I don't know if there is a resource-efficient way to do it via a command without piping though.
8
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
add a comment |
ls -1U
before the pipe should spend just a bit less resources, as it does no attempt to sort the file entries, it just reads them as they are sorted in the folder on disk. It also produces less output, meaning slightly less work for wc
.
You could also use ls -f
which is more or less a shortcut for ls -1aU
.
I don't know if there is a resource-efficient way to do it via a command without piping though.
ls -1U
before the pipe should spend just a bit less resources, as it does no attempt to sort the file entries, it just reads them as they are sorted in the folder on disk. It also produces less output, meaning slightly less work for wc
.
You could also use ls -f
which is more or less a shortcut for ls -1aU
.
I don't know if there is a resource-efficient way to do it via a command without piping though.
edited Oct 30 '16 at 16:46
Jeff Schaller♦
45k1164147
45k1164147
answered Sep 10 '13 at 19:42
Luis MachucaLuis Machuca
310310
310310
8
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
add a comment |
8
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
8
8
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
Btw, -1 is implied when the output goes to a pipe
– enzotib
Sep 10 '13 at 21:04
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
@enzotib - it is? Wow... one learns something new every day!
– Luis Machuca
Sep 10 '13 at 21:25
add a comment |
Another point of comparison. While not being a shell oneliner, this C program doesn't do anything superflous. Note that hidden files are ignored to match the output of ls|wc -l
(ls -l|wc -l
is off by one due to the total blocks in the first line of output).
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <error.h>
#include <errno.h>
int main(int argc, char *argv[])
int file_count = 0;
DIR * dirp;
struct dirent * entry;
if (argc < 2)
error(EXIT_FAILURE, 0, "missing argument");
if(!(dirp = opendir(argv[1])))
error(EXIT_FAILURE, errno, "could not open '%s'", argv[1]);
while ((entry = readdir(dirp)) != NULL)
if (entry->d_name[0] == '.') /* ignore hidden files */
continue;
file_count++;
closedir(dirp);
printf("%dn", file_count);
Using thereaddir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)
– Stéphane Chazelas
Jan 4 at 9:41
add a comment |
Another point of comparison. While not being a shell oneliner, this C program doesn't do anything superflous. Note that hidden files are ignored to match the output of ls|wc -l
(ls -l|wc -l
is off by one due to the total blocks in the first line of output).
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <error.h>
#include <errno.h>
int main(int argc, char *argv[])
int file_count = 0;
DIR * dirp;
struct dirent * entry;
if (argc < 2)
error(EXIT_FAILURE, 0, "missing argument");
if(!(dirp = opendir(argv[1])))
error(EXIT_FAILURE, errno, "could not open '%s'", argv[1]);
while ((entry = readdir(dirp)) != NULL)
if (entry->d_name[0] == '.') /* ignore hidden files */
continue;
file_count++;
closedir(dirp);
printf("%dn", file_count);
Using thereaddir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)
– Stéphane Chazelas
Jan 4 at 9:41
add a comment |
Another point of comparison. While not being a shell oneliner, this C program doesn't do anything superflous. Note that hidden files are ignored to match the output of ls|wc -l
(ls -l|wc -l
is off by one due to the total blocks in the first line of output).
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <error.h>
#include <errno.h>
int main(int argc, char *argv[])
int file_count = 0;
DIR * dirp;
struct dirent * entry;
if (argc < 2)
error(EXIT_FAILURE, 0, "missing argument");
if(!(dirp = opendir(argv[1])))
error(EXIT_FAILURE, errno, "could not open '%s'", argv[1]);
while ((entry = readdir(dirp)) != NULL)
if (entry->d_name[0] == '.') /* ignore hidden files */
continue;
file_count++;
closedir(dirp);
printf("%dn", file_count);
Another point of comparison. While not being a shell oneliner, this C program doesn't do anything superflous. Note that hidden files are ignored to match the output of ls|wc -l
(ls -l|wc -l
is off by one due to the total blocks in the first line of output).
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <error.h>
#include <errno.h>
int main(int argc, char *argv[])
int file_count = 0;
DIR * dirp;
struct dirent * entry;
if (argc < 2)
error(EXIT_FAILURE, 0, "missing argument");
if(!(dirp = opendir(argv[1])))
error(EXIT_FAILURE, errno, "could not open '%s'", argv[1]);
while ((entry = readdir(dirp)) != NULL)
if (entry->d_name[0] == '.') /* ignore hidden files */
continue;
file_count++;
closedir(dirp);
printf("%dn", file_count);
edited Sep 10 '13 at 21:25
answered Sep 10 '13 at 20:50
Thomas NymanThomas Nyman
21k85171
21k85171
Using thereaddir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)
– Stéphane Chazelas
Jan 4 at 9:41
add a comment |
Using thereaddir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)
– Stéphane Chazelas
Jan 4 at 9:41
Using the
readdir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)– Stéphane Chazelas
Jan 4 at 9:41
Using the
readdir()
stdio API does add some overhead and does not give you control over the size of the buffer passed to the underlying system call (getdents
on Linux)– Stéphane Chazelas
Jan 4 at 9:41
add a comment |
You could try perl -e 'opendir($dh,".");$i=0;while(readdir $dh)$i++;print "$in";'
It'd be interesting to compare timings with your shell pipe.
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (find -maxdepth 1 | wc -l
,ls -AU | wc -l
and thezsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)
– Caleb
Sep 11 '13 at 9:53
Note that this will include the.
and..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl,perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.
– Ilmari Karonen
Sep 11 '13 at 10:36
add a comment |
You could try perl -e 'opendir($dh,".");$i=0;while(readdir $dh)$i++;print "$in";'
It'd be interesting to compare timings with your shell pipe.
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (find -maxdepth 1 | wc -l
,ls -AU | wc -l
and thezsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)
– Caleb
Sep 11 '13 at 9:53
Note that this will include the.
and..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl,perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.
– Ilmari Karonen
Sep 11 '13 at 10:36
add a comment |
You could try perl -e 'opendir($dh,".");$i=0;while(readdir $dh)$i++;print "$in";'
It'd be interesting to compare timings with your shell pipe.
You could try perl -e 'opendir($dh,".");$i=0;while(readdir $dh)$i++;print "$in";'
It'd be interesting to compare timings with your shell pipe.
answered Sep 10 '13 at 20:00
Doug O'NealDoug O'Neal
2,9941919
2,9941919
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (find -maxdepth 1 | wc -l
,ls -AU | wc -l
and thezsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)
– Caleb
Sep 11 '13 at 9:53
Note that this will include the.
and..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl,perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.
– Ilmari Karonen
Sep 11 '13 at 10:36
add a comment |
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (find -maxdepth 1 | wc -l
,ls -AU | wc -l
and thezsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)
– Caleb
Sep 11 '13 at 9:53
Note that this will include the.
and..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl,perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.
– Ilmari Karonen
Sep 11 '13 at 10:36
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (
find -maxdepth 1 | wc -l
, ls -AU | wc -l
and the zsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)– Caleb
Sep 11 '13 at 9:53
On my tests, this keeps pretty much exactly the same pace as the three other fastest solutions (
find -maxdepth 1 | wc -l
, ls -AU | wc -l
and the zsh
based non sorting glob and array count). In other words it beats out the options with various inefficiencies such as sorting or reading extraneous file properties. I would venture to say since it doesn't earn you anything either, it isn't worth using over a simpler solution unless you happen to be in perl already :)– Caleb
Sep 11 '13 at 9:53
Note that this will include the
.
and ..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl, perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.– Ilmari Karonen
Sep 11 '13 at 10:36
Note that this will include the
.
and ..
directory entries in the count, so you need to subtract two to get the actual number of files (and subdirectories). In modern Perl, perl -E 'opendir $dh, "."; $i++ while readdir $dh; say $i - 2'
would do it.– Ilmari Karonen
Sep 11 '13 at 10:36
add a comment |
From this answer, I can think of this one as a possible solution.
/*
* List directories using getdents() because ls, find and Python libraries
* use readdir() which is slower (but uses getdents() underneath.
*
* Compile with
* ]$ gcc getdents.c -o getdents
*/
#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#define handle_error(msg)
do perror(msg); exit(EXIT_FAILURE); while (0)
struct linux_dirent
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
;
#define BUF_SIZE 1024*1024*5
int
main(int argc, char *argv[])
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;
fd = open(argc > 1 ? argv[1] : ".", O_RDONLY
Copy the C program above into directory in which the files need to be listed. Then execute these commands:
gcc getdents.c -o getdents
./getdents | wc -l
1
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare withls -f
, don't filter ond_type
at all, just ond->d_ino != 0
; 3) subtract 2 for.
and..
.
– Matei David
Jan 17 '17 at 16:01
See linked answer for a timings example where this is 40x faster than the acceptedls -f
.
– Matei David
Jan 17 '17 at 16:02
add a comment |
From this answer, I can think of this one as a possible solution.
/*
* List directories using getdents() because ls, find and Python libraries
* use readdir() which is slower (but uses getdents() underneath.
*
* Compile with
* ]$ gcc getdents.c -o getdents
*/
#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#define handle_error(msg)
do perror(msg); exit(EXIT_FAILURE); while (0)
struct linux_dirent
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
;
#define BUF_SIZE 1024*1024*5
int
main(int argc, char *argv[])
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;
fd = open(argc > 1 ? argv[1] : ".", O_RDONLY
Copy the C program above into directory in which the files need to be listed. Then execute these commands:
gcc getdents.c -o getdents
./getdents | wc -l
1
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare withls -f
, don't filter ond_type
at all, just ond->d_ino != 0
; 3) subtract 2 for.
and..
.
– Matei David
Jan 17 '17 at 16:01
See linked answer for a timings example where this is 40x faster than the acceptedls -f
.
– Matei David
Jan 17 '17 at 16:02
add a comment |
From this answer, I can think of this one as a possible solution.
/*
* List directories using getdents() because ls, find and Python libraries
* use readdir() which is slower (but uses getdents() underneath.
*
* Compile with
* ]$ gcc getdents.c -o getdents
*/
#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#define handle_error(msg)
do perror(msg); exit(EXIT_FAILURE); while (0)
struct linux_dirent
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
;
#define BUF_SIZE 1024*1024*5
int
main(int argc, char *argv[])
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;
fd = open(argc > 1 ? argv[1] : ".", O_RDONLY
Copy the C program above into directory in which the files need to be listed. Then execute these commands:
gcc getdents.c -o getdents
./getdents | wc -l
From this answer, I can think of this one as a possible solution.
/*
* List directories using getdents() because ls, find and Python libraries
* use readdir() which is slower (but uses getdents() underneath.
*
* Compile with
* ]$ gcc getdents.c -o getdents
*/
#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#define handle_error(msg)
do perror(msg); exit(EXIT_FAILURE); while (0)
struct linux_dirent
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
;
#define BUF_SIZE 1024*1024*5
int
main(int argc, char *argv[])
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;
fd = open(argc > 1 ? argv[1] : ".", O_RDONLY
Copy the C program above into directory in which the files need to be listed. Then execute these commands:
gcc getdents.c -o getdents
./getdents | wc -l
edited Apr 11 at 19:20
Joshua Taylor
1075
1075
answered Aug 7 '14 at 23:02
RameshRamesh
24.1k34105188
24.1k34105188
1
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare withls -f
, don't filter ond_type
at all, just ond->d_ino != 0
; 3) subtract 2 for.
and..
.
– Matei David
Jan 17 '17 at 16:01
See linked answer for a timings example where this is 40x faster than the acceptedls -f
.
– Matei David
Jan 17 '17 at 16:02
add a comment |
1
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare withls -f
, don't filter ond_type
at all, just ond->d_ino != 0
; 3) subtract 2 for.
and..
.
– Matei David
Jan 17 '17 at 16:01
See linked answer for a timings example where this is 40x faster than the acceptedls -f
.
– Matei David
Jan 17 '17 at 16:02
1
1
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare with
ls -f
, don't filter on d_type
at all, just on d->d_ino != 0
; 3) subtract 2 for .
and ..
.– Matei David
Jan 17 '17 at 16:01
A few things: 1) if you're willing to use a custom program for this, you might as well just count the files and print the count; 2) to compare with
ls -f
, don't filter on d_type
at all, just on d->d_ino != 0
; 3) subtract 2 for .
and ..
.– Matei David
Jan 17 '17 at 16:01
See linked answer for a timings example where this is 40x faster than the accepted
ls -f
.– Matei David
Jan 17 '17 at 16:02
See linked answer for a timings example where this is 40x faster than the accepted
ls -f
.– Matei David
Jan 17 '17 at 16:02
add a comment |
A bash-only solution, not requiring any external program, but don't know how much efficient:
list=(*)
echo "$#list[@]"
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
add a comment |
A bash-only solution, not requiring any external program, but don't know how much efficient:
list=(*)
echo "$#list[@]"
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
add a comment |
A bash-only solution, not requiring any external program, but don't know how much efficient:
list=(*)
echo "$#list[@]"
A bash-only solution, not requiring any external program, but don't know how much efficient:
list=(*)
echo "$#list[@]"
answered Sep 10 '13 at 20:55
enzotibenzotib
34.8k810495
34.8k810495
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
add a comment |
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
Glob expansion isn't necessary the most resource efficient way to do this. Besides most shells having an upper limit to the number of items they will even process so this will probably bomb when dealing with a million plus items, it also sorts the output. The solutions involving find or ls without sorting options will be faster.
– Caleb
Sep 11 '13 at 6:37
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
@Caleb, only old versions of ksh had such limits (and didn't support that syntax) AFAIK. In all most other shells, the limit is just the available memory. You've got a point that it's going to be very inefficient, especially in bash.
– Stéphane Chazelas
Jan 4 at 9:45
add a comment |
Probably the most resource efficient way would involve no outside process invocations. So I'd wager on...
cglb() ( c=0 ; set --
tglb() [ -e "$2" ]
for glb in '.?*' *
do tglb $1 $glb##.* $glb#*
set -- ..
done
echo $c
)
1
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
add a comment |
Probably the most resource efficient way would involve no outside process invocations. So I'd wager on...
cglb() ( c=0 ; set --
tglb() [ -e "$2" ]
for glb in '.?*' *
do tglb $1 $glb##.* $glb#*
set -- ..
done
echo $c
)
1
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
add a comment |
Probably the most resource efficient way would involve no outside process invocations. So I'd wager on...
cglb() ( c=0 ; set --
tglb() [ -e "$2" ]
for glb in '.?*' *
do tglb $1 $glb##.* $glb#*
set -- ..
done
echo $c
)
Probably the most resource efficient way would involve no outside process invocations. So I'd wager on...
cglb() ( c=0 ; set --
tglb() [ -e "$2" ]
for glb in '.?*' *
do tglb $1 $glb##.* $glb#*
set -- ..
done
echo $c
)
edited Aug 8 '14 at 0:37
answered Aug 7 '14 at 22:42
mikeservmikeserv
46.1k669164
46.1k669164
1
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
add a comment |
1
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
1
1
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
Got relative numbers? for how many files?
– smci
Nov 20 '17 at 23:44
add a comment |
After fixing the issue from @Joel 's answer, where it added .
as a file:
find /foo/foo2 -maxdepth 1 | tail -n +2 | wc -l
tail
simply removes the first line, meaning that .
isn't counted anymore.
1
Adding a pair of pipes in order to omit one line ofwc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation:echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
1
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
add a comment |
After fixing the issue from @Joel 's answer, where it added .
as a file:
find /foo/foo2 -maxdepth 1 | tail -n +2 | wc -l
tail
simply removes the first line, meaning that .
isn't counted anymore.
1
Adding a pair of pipes in order to omit one line ofwc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation:echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
1
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
add a comment |
After fixing the issue from @Joel 's answer, where it added .
as a file:
find /foo/foo2 -maxdepth 1 | tail -n +2 | wc -l
tail
simply removes the first line, meaning that .
isn't counted anymore.
After fixing the issue from @Joel 's answer, where it added .
as a file:
find /foo/foo2 -maxdepth 1 | tail -n +2 | wc -l
tail
simply removes the first line, meaning that .
isn't counted anymore.
answered Sep 11 '13 at 4:23
haneefmubarakhaneefmubarak
220129
220129
1
Adding a pair of pipes in order to omit one line ofwc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation:echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
1
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
add a comment |
1
Adding a pair of pipes in order to omit one line ofwc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation:echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
1
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
1
1
Adding a pair of pipes in order to omit one line of
wc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation: echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
Adding a pair of pipes in order to omit one line of
wc
input is not very efficient as the overhead increases linearly with regard to input size. In this case, why not simply decrement the final count to compensate for it being off by one, which is a constant time operation: echo $(( $(find /foo/foo2 -maxdepth 1 | wc -l) - 1))
– Thomas Nyman
Sep 11 '13 at 6:32
1
1
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.
let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
Rather than feed that much data through another process, it would probably be better to just do some math on the final output.
let count = $(find /foo/foo2 -maxdepth 1 | wc -l) - 2
– Caleb
Sep 11 '13 at 6:34
add a comment |
os.listdir() in python can do the work for you. It gives an array of the contents of the directory, excluding the special '.' and '..' files. Also, no need to worry abt files with special characters like 'n' in the name.
python -c 'import os;print len(os.listdir("."))'
following is the time taken by the above python command compared with the 'ls -Af' command.
~/test$ time ls -Af |wc -l
399144
real 0m0.300s
user 0m0.104s
sys 0m0.240s
~/test$ time python -c 'import os;print len(os.listdir("."))'
399142
real 0m0.249s
user 0m0.064s
sys 0m0.180s
add a comment |
os.listdir() in python can do the work for you. It gives an array of the contents of the directory, excluding the special '.' and '..' files. Also, no need to worry abt files with special characters like 'n' in the name.
python -c 'import os;print len(os.listdir("."))'
following is the time taken by the above python command compared with the 'ls -Af' command.
~/test$ time ls -Af |wc -l
399144
real 0m0.300s
user 0m0.104s
sys 0m0.240s
~/test$ time python -c 'import os;print len(os.listdir("."))'
399142
real 0m0.249s
user 0m0.064s
sys 0m0.180s
add a comment |
os.listdir() in python can do the work for you. It gives an array of the contents of the directory, excluding the special '.' and '..' files. Also, no need to worry abt files with special characters like 'n' in the name.
python -c 'import os;print len(os.listdir("."))'
following is the time taken by the above python command compared with the 'ls -Af' command.
~/test$ time ls -Af |wc -l
399144
real 0m0.300s
user 0m0.104s
sys 0m0.240s
~/test$ time python -c 'import os;print len(os.listdir("."))'
399142
real 0m0.249s
user 0m0.064s
sys 0m0.180s
os.listdir() in python can do the work for you. It gives an array of the contents of the directory, excluding the special '.' and '..' files. Also, no need to worry abt files with special characters like 'n' in the name.
python -c 'import os;print len(os.listdir("."))'
following is the time taken by the above python command compared with the 'ls -Af' command.
~/test$ time ls -Af |wc -l
399144
real 0m0.300s
user 0m0.104s
sys 0m0.240s
~/test$ time python -c 'import os;print len(os.listdir("."))'
399142
real 0m0.249s
user 0m0.064s
sys 0m0.180s
answered Sep 16 '13 at 20:47
indrajeetindrajeet
101
101
add a comment |
add a comment |
ls -1 | wc -l
comes immediately to my mind. Whether ls -1U
is faster than ls -1
is purely academic - the difference should be negligible but for very large directories.
add a comment |
ls -1 | wc -l
comes immediately to my mind. Whether ls -1U
is faster than ls -1
is purely academic - the difference should be negligible but for very large directories.
add a comment |
ls -1 | wc -l
comes immediately to my mind. Whether ls -1U
is faster than ls -1
is purely academic - the difference should be negligible but for very large directories.
ls -1 | wc -l
comes immediately to my mind. Whether ls -1U
is faster than ls -1
is purely academic - the difference should be negligible but for very large directories.
answered Aug 7 '14 at 22:58
countermodecountermode
5,34842245
5,34842245
add a comment |
add a comment |
I know this is old but I feel that awk
has to be mentioned here. The suggestions that include the use of wc
simply aren't correct in regards to OP's question: "the most resource efficient way." I recently had a log file get way out of control (due to some bad software) and therefore stumbled onto this post. There was roughly 232 million entries! I first tried wc -l
and waited 15 minutes - it was not even able to finish counting the lines. The following awk
statement gave me an accurate line count in 3 minutes on that log file. I've learned over the years to never underestimate awk's ability to simulate standard shell programs in a much more efficient fashion. Hope it helps someone like me. Happy hacking!
awk 'BEGINi=0 i++ ENDprint i' /foo/foo2
And if you need to substitute a command like ls
for counting files in a directory:
`#Normal:` awk 'BEGINi=0 i++ ENDprint i' <(ls /foo/foo2/)
`#Hidden:` awk 'BEGINi=0 i++ ENDprint (i-2)' <(ls -f /foo/foo2/)
Or simply,awk 'ENDprint NR'
. But in this particular situation,awk
may be overkill becausels
is the bottleneck, notwc
.
– Amit Naidu
May 29 '18 at 5:19
add a comment |
I know this is old but I feel that awk
has to be mentioned here. The suggestions that include the use of wc
simply aren't correct in regards to OP's question: "the most resource efficient way." I recently had a log file get way out of control (due to some bad software) and therefore stumbled onto this post. There was roughly 232 million entries! I first tried wc -l
and waited 15 minutes - it was not even able to finish counting the lines. The following awk
statement gave me an accurate line count in 3 minutes on that log file. I've learned over the years to never underestimate awk's ability to simulate standard shell programs in a much more efficient fashion. Hope it helps someone like me. Happy hacking!
awk 'BEGINi=0 i++ ENDprint i' /foo/foo2
And if you need to substitute a command like ls
for counting files in a directory:
`#Normal:` awk 'BEGINi=0 i++ ENDprint i' <(ls /foo/foo2/)
`#Hidden:` awk 'BEGINi=0 i++ ENDprint (i-2)' <(ls -f /foo/foo2/)
Or simply,awk 'ENDprint NR'
. But in this particular situation,awk
may be overkill becausels
is the bottleneck, notwc
.
– Amit Naidu
May 29 '18 at 5:19
add a comment |
I know this is old but I feel that awk
has to be mentioned here. The suggestions that include the use of wc
simply aren't correct in regards to OP's question: "the most resource efficient way." I recently had a log file get way out of control (due to some bad software) and therefore stumbled onto this post. There was roughly 232 million entries! I first tried wc -l
and waited 15 minutes - it was not even able to finish counting the lines. The following awk
statement gave me an accurate line count in 3 minutes on that log file. I've learned over the years to never underestimate awk's ability to simulate standard shell programs in a much more efficient fashion. Hope it helps someone like me. Happy hacking!
awk 'BEGINi=0 i++ ENDprint i' /foo/foo2
And if you need to substitute a command like ls
for counting files in a directory:
`#Normal:` awk 'BEGINi=0 i++ ENDprint i' <(ls /foo/foo2/)
`#Hidden:` awk 'BEGINi=0 i++ ENDprint (i-2)' <(ls -f /foo/foo2/)
I know this is old but I feel that awk
has to be mentioned here. The suggestions that include the use of wc
simply aren't correct in regards to OP's question: "the most resource efficient way." I recently had a log file get way out of control (due to some bad software) and therefore stumbled onto this post. There was roughly 232 million entries! I first tried wc -l
and waited 15 minutes - it was not even able to finish counting the lines. The following awk
statement gave me an accurate line count in 3 minutes on that log file. I've learned over the years to never underestimate awk's ability to simulate standard shell programs in a much more efficient fashion. Hope it helps someone like me. Happy hacking!
awk 'BEGINi=0 i++ ENDprint i' /foo/foo2
And if you need to substitute a command like ls
for counting files in a directory:
`#Normal:` awk 'BEGINi=0 i++ ENDprint i' <(ls /foo/foo2/)
`#Hidden:` awk 'BEGINi=0 i++ ENDprint (i-2)' <(ls -f /foo/foo2/)
edited Jun 20 '16 at 8:35
Pierre.Vriens
1,00651216
1,00651216
answered Dec 23 '15 at 4:53
user.friendlyuser.friendly
1414
1414
Or simply,awk 'ENDprint NR'
. But in this particular situation,awk
may be overkill becausels
is the bottleneck, notwc
.
– Amit Naidu
May 29 '18 at 5:19
add a comment |
Or simply,awk 'ENDprint NR'
. But in this particular situation,awk
may be overkill becausels
is the bottleneck, notwc
.
– Amit Naidu
May 29 '18 at 5:19
Or simply,
awk 'ENDprint NR'
. But in this particular situation, awk
may be overkill because ls
is the bottleneck, not wc
.– Amit Naidu
May 29 '18 at 5:19
Or simply,
awk 'ENDprint NR'
. But in this particular situation, awk
may be overkill because ls
is the bottleneck, not wc
.– Amit Naidu
May 29 '18 at 5:19
add a comment |
I would think echo * would be more efficient than any 'ls' command:
echo * | wc -w
4
What about files with a space in their name?echo 'Hello World'|wc -w
produces2
.
– Joseph R.
Sep 11 '13 at 20:52
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
add a comment |
I would think echo * would be more efficient than any 'ls' command:
echo * | wc -w
4
What about files with a space in their name?echo 'Hello World'|wc -w
produces2
.
– Joseph R.
Sep 11 '13 at 20:52
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
add a comment |
I would think echo * would be more efficient than any 'ls' command:
echo * | wc -w
I would think echo * would be more efficient than any 'ls' command:
echo * | wc -w
answered Sep 11 '13 at 20:33
Dan GarthwaiteDan Garthwaite
3,7281109
3,7281109
4
What about files with a space in their name?echo 'Hello World'|wc -w
produces2
.
– Joseph R.
Sep 11 '13 at 20:52
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
add a comment |
4
What about files with a space in their name?echo 'Hello World'|wc -w
produces2
.
– Joseph R.
Sep 11 '13 at 20:52
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
4
4
What about files with a space in their name?
echo 'Hello World'|wc -w
produces 2
.– Joseph R.
Sep 11 '13 at 20:52
What about files with a space in their name?
echo 'Hello World'|wc -w
produces 2
.– Joseph R.
Sep 11 '13 at 20:52
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
@JosephR. Caveat Emptor
– Dan Garthwaite
Sep 12 '13 at 0:59
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f90106%2fwhats-the-most-resource-efficient-way-to-count-how-many-files-are-in-a-director%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
ls -l|wc -l
would be off by one due to the total blocks in the first line ofls -l
output– Thomas Nyman
Sep 10 '13 at 21:18
3
@ThomasNyman It would actually be off by several because of the dot and dotdot pseudo entries, but those can be avoided by using the
-A
flag.-l
is also problematic because of the reading file meta data in order to generate the extended list format. Forcing NOT-l
by usingls
is a much better option (-1
is assumed when piping output.) See Gilles's answer for the best solution here.– Caleb
Sep 11 '13 at 9:29
2
@Caleb
ls -l
doesn't output any hidden files nor the.
and..
entries.ls -a
output includes hidden files, including.
and..
whilels -A
output includes hidden files excluding.
and..
. In Gilles's answer the bashdotglob
shell option causes the expansion to include hidden files excluding.
and..
.– Thomas Nyman
Sep 11 '13 at 9:45