The Linux find Command
The Linux find command is powerful and flexible. It can search for files and directories using a whole raft of different criteria, not just filenames. For example, it can search for empty files, executable files, or files owned by a particular user. It can find and list files by their accessed or modified times, you can use regex patterns, it is recursive by default, and it works with pseudo-files like named pipes (FIFO buffers).
All of that is fantastically useful. The humble find command really packs some power. But there’s a way to leverage that power and take things to another level. If we can take the output of the find command and use it automatically as the input of other commands, we can make something happen to the files and directories that find uncovers for us.
The principle of piping the output of one command into another command is a core characteristic of Unix-derived operating systems. The design principle of making a program do one thing and do it well, and to expect that its output could be the input of another program—even an as yet unwritten program—is often described as the “Unix philosophy.” And yet some core utilities, like mkdir, don’t accept piped input.
To address this shortcoming the xargs command can be used to parcel up piped input and to feed it into other commands as though they were command-line parameters to that command. This achieves almost the same thing as straightforward piping. That’s “almost the same” thing, and not “exactly the same” thing because there can be unexpected differences with shell expansions and file name globbing.
Using find With xargs
We can use find with xargs to some action performed on the files that are found. This is a long-winded way to go about it, but we could feed the files found by find into xargs , which then pipes them into tar to create an archive file of those files. We’ll run this command in a directory that has many help system PAGE files in it.
The command is made up of different elements.
find . / -name “. page” -type f -print0: The find action will start in the current directory, searching by name for files that match the “. page” search string. Directories will not be listed because we’re specifically telling it to look for files only, with -type f. The print0 argument tells find to not treat whitespace as the end of a filename. This means that that filenames with spaces in them will be processed correctly. xargs -o: The -0 arguments xargs to not treat whitespace as the end of a filename. tar -cvzf page_files. tar. gz: This is the command xargs is going to feed the file list from find to. The tar utility will create an archive file called “page_files. tar. gz. ”
We can use ls to see the archive file that is created for us.
ls *.gz
The archive file is created for us. For this to work, all of the filenames need to be passed to tar en masse, which is what happened. All of the filenames were tagged onto the end of the tar command as a very long command line.
You can choose to have the final command run on all the file names at once or invoked once per filename. We can see the difference quite easily by piping the output from xargs to the line and character counting utility wc.
This command pipes all the filenames into wc at once. Effectively, xargs constructs a long command line for wc with each of the filenames in it.
The lines, words, and characters for each file are printed, together with a total for all files.
If we use xarg‘s -I (replace string) option and define a replacement string token—in this case ” {}“—the token is replaced in the final command by each filename in turn. This means wc is called repeatedly, once for each file.
The output isn’t nicely lined up. Each invocation of wc operates on a single file so wc has nothing to line the output up with. Each line of output is an independent line of text.
Because wc can only provide a total when it operates on multiple files at once, we don’t get the summary statistics.
The find -exec Option
The find command has a built-in method of calling external programs to perform further processing on the filenames that it returns. The -exec (execute) option has a syntax similar to but different from the xargs command.
This will count the words in the matching files. The command is made up of these elements.
- find .
- Start the search in the current directory. The find command is recursive by default, so subdirectories will be searched too. -name “. page”: We’re looking for files with names that match the “. page” search string. -type f: We’re only looking for files, not directories. -exec wc: We’re going to execute the wc command on the filenames that are matched with the search string. -w: Any options that you want to pass to the command must be placed immediately following the command. “{}”: The “{}” placeholder represents each filename and must be the last item in the parameter list. ;: A semicolon “;” is used to indicate the end of the parameter list. It must be escaped with a backslash “\” so that the shell doesn’t interpret it.
When we run that command we see the output of wc. The -c (byte count) limits its output to the number of bytes in each file.
As you can see there is no total. The wc command is executed once per filename. By substituting a plus sign “+” for the terminating semicolon “;” we can change -exec‘s behaviour to operate on all files at once.
We get the summary total and neatly tabulated results that tell us all files were passed to wc as one long command line.
exec Really Means exec
The -exec (execute) option doesn’t launch the command by running it in the current shell. It uses Linux’s built-in exec to run the command, replacing the current process—your shell—with the command. So the command that is launched isn’t running in a shell at all. Without a shell, you can’t get shell expansion of wildcards, and you don’t have access to aliases and shell functions.
This computer has a shell function defined called words-only. This counts just the words in a file.
A strange function perhaps, “words-only” is much longer to type than “wc -w” but at least it means you don’t need to remember the command-line options for wc. We can test what it does like this:
That works just fine with a normal command-line invocation. If we try to invoke that function using find‘s -exec option, it’ll fail.
The find command can’t find the shell function, and the -exec action fails.
To overcome this we can have find launch a Bash shell, and pass the rest of the command line to it as arguments to the shell. We need to wrap the command line in double quotation marks. This means we need to escape the double quotation marks that are around the “{}” replace string.
Before we can run the find command, we need to export our shell function with the -f (as a function) option:
This runs as expected.
Using the Filename More Than Once
If you want to chain several commands together you can do so, and you can use the “{}” replace string in each command.
If we cd up a level out of the “pages” directory and run that command, find will still discover the PAGE files because it searches recursively. The filename and path are passed to our words-only function just as before. Purely for reasons of demonstrating using -exec with two commands, we’re also calling the basename command to see the name of the file without its path.
Both the basename command and the words-only shell function have the filenames passed to them using a “{}” replace string.
Horses for Courses
There’s a CPU load and time penalty for repeatedly calling a command when you could call it once and pass all the filenames to it in one go. And if you’re invoking a new shell each time to launch the command, that overhead gets worse.
But sometimes—depending on what you’re trying to achieve—you may not have another option. Whatever method your situation requires, no one should be surprised that Linux provides enough options that you can find the one that suits your particular needs.