« Unfreezing Screen | Main | Use molly-guard and stop rebooting the wrong server »
August 25, 2008
Removing EXIF data with find and jhead
Let’s imagine you’ve got a load of pictures on a *nix box somewhere, and you’d prefer they didn’t have EXIF data attached? You’d need a way to find all the jpeg files and a way to strip their EXIF data. Thankfully, both tasks are easily solved, one with find and one with jhead.
Using jhead
A look at the jhead page shows the --purejpg
argument will strip any EXIF data.
Building the find query
We need to find files who’s name ends in .jpg
, .jpeg
, .JPG
, or .JPEG
in a given directory tree. A quick look at the shows us the -iregex
parameter accepts a case insensitive regexp. .*\.jpe?g$
looks for the following parameters:
.*
- any number of characters.jp
- the literal characters.
j
andp
e?
- one or noe
characters.g
- a litteralg
character$
- and these must all be at the very end of the string
The parameter -exec
will execute a command on any matching files, substituting the actual filename for {}
, and we must end the command with a ;
.
The final command
find . -iregex '.*\.jpe?g$' -exec jhead -purejpg '{}' \;
This will find any file with a .jpg or .jpeg extension (in any case) in the current directory (recursively) and strip it’s exif data.
One step further
Maybe you didn’t name all your JPEG files something.jpg
. Maybe you named them foo_picture; how ever will you find them? Enter the unix ‘file’ command, it reads a files contents and looks for keys to it’s file type. We’ll use it here to build a command that searches the tree looking for files that look like binary jpeg files.
find . -type f -exec file -i -F \0 '{}' \;
- Find actual files (-type f
) and execute the file command on them, but set file to seperate it’s data with NULLs (since you’re unlikely to have a NULL in your filename) and output the results as a mime type (because it’s easier to parse later).awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'
- use awk to check the type feild for the jpeg mime type, then only print the file path (followed by a NULL).xargs -0 -n 1 jhead -purejpg
- use xargs to split the input on NULLs and feed one path at a time to the jhead command.
Putting it all togeather, we get find . -type f -exec file -i -F \0 '{}' \;|awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'|xargs -0 -n 1 jhead -purejpg
This is, of course, much slower than the pattern matching find command above.
time find ./public_html/ -type f -exec file -i -F \0 '{}' \; \
|awk -F\0 '$2 == " image/jpeg" {printf "%s\0",$1}'|xargs -0 -n 1 jhead -purejpg
real 0m21.299s
user 0m4.214s
sys 0m16.290s
time find ./public_html/ -iregex '.*\.jpe?g$' -exec jhead -purejpg '{}' \;
real 0m7.819s
user 0m0.548s
sys 0m2.565s
And that’s after the entire directory tree has been pulled into memory cache!
Posted by spiffed at August 25, 2008 2:33 PM
Comments
The 'One step further' section only works with GNU Awk. `awk -W version` will tell you your awk version.
If someone would like to adapt a mawk compatible version, that would be ++good.
Posted by: Anonymous at December 1, 2008 1:20 AM