« Unfreezing Screen | Main | Use molly-guard and stop rebooting the wrong server »

August 25, 2008

Removing EXIF data with find and jhead

Let’s imagine you’ve got a load of pictures on a *nix box somewhere, and you’d prefer they didn’t have EXIF data attached? You’d need a way to find all the jpeg files and a way to strip their EXIF data. Thankfully, both tasks are easily solved, one with find and one with jhead.

Using jhead

A look at the jhead page shows the --purejpg argument will strip any EXIF data.

Building the find query

We need to find files who’s name ends in .jpg, .jpeg, .JPG, or .JPEG in a given directory tree. A quick look at the shows us the -iregex parameter accepts a case insensitive regexp. .*\.jpe?g$ looks for the following parameters:

The parameter -exec will execute a command on any matching files, substituting the actual filename for {}, and we must end the command with a ;.

The final command

find . -iregex '.*\.jpe?g$' -exec jhead -purejpg '{}' \;

This will find any file with a .jpg or .jpeg extension (in any case) in the current directory (recursively) and strip it’s exif data.

One step further

Maybe you didn’t name all your JPEG files something.jpg. Maybe you named them foo_picture; how ever will you find them? Enter the unix ‘file’ command, it reads a files contents and looks for keys to it’s file type. We’ll use it here to build a command that searches the tree looking for files that look like binary jpeg files.

Putting it all togeather, we get find . -type f -exec file -i -F \0 '{}' \;|awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'|xargs -0 -n 1 jhead -purejpg

This is, of course, much slower than the pattern matching find command above.

 time find ./public_html/ -type f -exec file -i -F \0 '{}' \; \
    |awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'|xargs -0 -n 1 jhead -purejpg
 real    0m21.299s
 user    0m4.214s
 sys     0m16.290s

 time find ./public_html/ -iregex '.*\.jpe?g$' -exec jhead -purejpg '{}' \;
 real    0m7.819s
 user    0m0.548s
 sys     0m2.565s

And that’s after the entire directory tree has been pulled into memory cache!

Posted by spiffed at August 25, 2008 2:33 PM

Comments

The 'One step further' section only works with GNU Awk. `awk -W version` will tell you your awk version.

If someone would like to adapt a mawk compatible version, that would be ++good.

Posted by: Anonymous at December 1, 2008 1:20 AM

Post a comment