Monday, January 15, 2007

Handling Filenames With Spaces

Typical Unix users cringe at the thought of putting spaces in file names. Mac users, on the other hand, frequently put spaces in file names because it's natural and may read better. This means that Mac OS X Unix geeks need to make sure their shell commands (and shell scripts) work correctly when faced with spaces in file names. Below I outline a few simple ways to properly deal with this.

  1. find(1) has a -exec option, which allows you to specify a command to be executed for each file found. The executed command may also take arguments, any of which may be the string {} which will be replaced by the path of the found file. The command and its arguments are not subject to further expansion of shell patterns, so it's safe for {} to stand in for a file with a space in the name. For example,
    $ find ~/Library -name '* *' -exec ls {} \;
    [... output omitted...]

    (Notice that our find command is looking for files with spaces in the name.)

  2. find's -exec option is OK for some situations, but since it forks a process to run the specified command for each file found, that can be a lot of unnecessary forking around. This is where find's -print0 combined with xargs -0 comes in. The idea here is that find will print out all the matching files, but instead of separating them by new lines, it will separate the files by the NULL byte ('\0' in C—the same character that terminates all strings in C). Then xargs -0 will read in strings that are separated by NULLs and will execute the specified command with as many paths from find as are allowed on the command line. The following command will create a tar file containing all the files in my Library folder that contain spaces in their names.
    $ find ~/Library -name '* *' -print0 | xargs -0 tar rf blah.tar
    $ tar tf blah.tar | wc -l

  3. Sometimes you need to do more than run one command on a filename. In this case, you'd like to use a loop to process each file. Maybe something like for file in $(find ~/Library -name '* *'); do [... body of loop ...]; done. The problem here is that the for-loop splits its input on white space (like the shell), filenames with spaces will be split up and treated as multiple files. One solution to this problem is to use a while-loop, and the read command. Something like the following should work.
    $ find ~/Library -name '* *' | while read filename
    > do
    > ls -ld "$filename"
    > cat "$filename"
    > [... whatever you want ...]
    > done

    [... output omitted...]

    This code works because the find outputs matching files one line at a time. read will read one line worth of data and assign it to the variable specified as its argument (in this case our variable is filename). Notice that within the loops body we need to quote the variable $filename when we use it.

There are other ways to deal with filenames with spaces, but these are the common techniques I find myself using the most.


Anonymous said...

Nice post, thanks. I've done the first two several times and have needed the third before... can't remember how I worked around it at the time, but your way is better.

Good to have this all in one place, saving a copy for future reference.

Anonymous said...

What about filenames with carriage returns or other control characters in them?

When I need my scripts to accept *any* filename, I do something like this (in bash, with GNU find):

find -print0 | while IFS= read -rd $'\0' f ; do echo "[$f]" ; done

fsat said...

This is an awesome post. The second and third one are livesaver!


gorilla said...

Excelent summary.. Pity i didn't find it slightly earlier, could have made things a little quicker.

chrispix said...

Thanks for #3. I've been looking for something that works with locate as well as find, and this did the trick nicely.