« Unfreezing Screen | Main | Use molly-guard and stop rebooting the wrong server »
August 25, 2008
Removing EXIF data with find and jhead
Let’s imagine you’ve got a load of pictures on a *nix box somewhere, and you’d prefer they didn’t have EXIF data attached? You’d need a way to find all the jpeg files and a way to strip their EXIF data. Thankfully, both tasks are easily solved, one with find and one with jhead.
Using jhead
A look at the jhead page shows the --purejpg argument will strip any EXIF data.
Building the find query
We need to find files who’s name ends in .jpg, .jpeg, .JPG, or .JPEG in a given directory tree. A quick look at the shows us the -iregex parameter accepts a case insensitive regexp. .*\.jpe?g$ looks for the following parameters:
.*- any number of characters.jp- the literal characters.jandpe?- one or noecharacters.g- a litteralgcharacter$- and these must all be at the very end of the string
The parameter -exec will execute a command on any matching files, substituting the actual filename for {}, and we must end the command with a ;.
The final command
find . -iregex '.*\.jpe?g$' -exec jhead -purejpg '{}' \;
This will find any file with a .jpg or .jpeg extension (in any case) in the current directory (recursively) and strip it’s exif data.
One step further
Maybe you didn’t name all your JPEG files something.jpg. Maybe you named them foo_picture; how ever will you find them? Enter the unix ‘file’ command, it reads a files contents and looks for keys to it’s file type. We’ll use it here to build a command that searches the tree looking for files that look like binary jpeg files.
find . -type f -exec file -i -F \0 '{}' \;- Find actual files (-type f) and execute the file command on them, but set file to seperate it’s data with NULLs (since you’re unlikely to have a NULL in your filename) and output the results as a mime type (because it’s easier to parse later).awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'- use awk to check the type feild for the jpeg mime type, then only print the file path (followed by a NULL).xargs -0 -n 1 jhead -purejpg- use xargs to split the input on NULLs and feed one path at a time to the jhead command.
Putting it all togeather, we get find . -type f -exec file -i -F \0 '{}' \;|awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'|xargs -0 -n 1 jhead -purejpg
This is, of course, much slower than the pattern matching find command above.
time find ./public_html/ -type f -exec file -i -F \0 '{}' \; \
|awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'|xargs -0 -n 1 jhead -purejpg
real 0m21.299s
user 0m4.214s
sys 0m16.290s
time find ./public_html/ -iregex '.*\.jpe?g$' -exec jhead -purejpg '{}' \;
real 0m7.819s
user 0m0.548s
sys 0m2.565s
And that’s after the entire directory tree has been pulled into memory cache!
Posted by spiffed at August 25, 2008 2:33 PM
Comments
The 'One step further' section only works with GNU Awk. `awk -W version` will tell you your awk version.
If someone would like to adapt a mawk compatible version, that would be ++good.
Posted by: Anonymous at December 1, 2008 1:20 AM