Q. How do I extract images from a PDF file under Linux / UNIX shell account?
A. pdfimages works as Portable Document Format (PDF) image extractor under Linux / UNIX operating systems. It saves images from a PDF file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Pdfimages reads the PDF file PDF-file, scans one or more pages, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).
pdfimages is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
OR
# apt-get install poppler-utils
pdfimages syntax
pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
$ pdfimages bar.pdf /tmp/imageSample output:
$ ls /tmp/image*
image-000.ppm image-1025.ppm image-1140.ppm image-1256.ppm image-247.ppm image-374.ppm image-501.ppm image-628.ppm image-755.ppm image-882.ppm image-001.ppm image-1026.ppm image-1141.ppm image-1257.ppm image-248.ppm image-375.ppm image-502.ppm image-629.ppm image-756.ppm image-883.ppm image-002.ppm image-1027.ppm image-1142.ppm image-1258.ppm image-249.ppm image-376.ppm image-503.ppm image-630.ppm image-757.ppm image-884.ppm
Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
$ pdfimages -j bar.pdf /tmp/image
The -f option Specifies the first page to scan. To scan first 5 pages, enter:
$ pdfimages -j -f 5 bar.pdf /tmp/image
The -l option specifies the last page to scan. To scan last 5 pages, enter:
$ pdfimages -j -l 5 bar.pdf /tmp/image
Featured Articles:
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- 20 Linux Server Hardening Security Tips
- My 10 UNIX Command Line Mistakes
- Linux: 20 Iptables Examples For New SysAdmins

- 25 PHP Security Best Practices For Sys Admins
- The Novice Guide To Buying A Linux Laptop
- 10 Greatest Open Source Software Of 2009
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- Top 20 OpenSSH Server Best Security Practices
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Linux Video Editor Software
Facebook it - Tweet it - Print it -


{ 4 comments… read them below or add one }
Good introduction…..
but it seems to exaggerate the functionality of pdfimages tool…..i read somewhere that it extracts only rawdata and can do nothing with vector graphics……please mention that too……………….
if possible please explain how pdfimages works…..does it read boundingbox of the images…etc or what……is it doing something to make use of the gigantic dom structure of a pdf document….or some crafty image processing techniques
nevertheless thanks for this great introduction to a great tool…
I have a site to allow user to upload pdf files like scribd. How can i show thumbnail image of that uploaded pdf or swf. I u have idea about it plase mail me.
Hi,
Thanks for sharing. I agree with isiomer. PdfImages extracts only rawdata and can do nothing with vector graphics. Also, if the boundingbox around image is lost due to conversion to pdf, it will fail as well. Also, sometimes, it generates large number of empty images. After long time querying search engines, i found a google chrome extension called sci2ools that achieves nice results.
Hi, thanks. useful tips