pdfimages: Extract and Save Images From A Portable Document Format ( PDF ) File

by on August 28, 2008 · 5 comments· LAST UPDATED August 28, 2008

in , ,

Q. How do I extract images from a PDF file under Linux / UNIX shell account?

A. pdfimages works as Portable Document Format (PDF) image extractor under Linux / UNIX operating systems. It saves images from a PDF file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Pdfimages reads the PDF file PDF-file, scans one or more pages, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).

pdfimages is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
OR
# apt-get install poppler-utils

pdfimages syntax

pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
$ pdfimages bar.pdf /tmp/image
$ ls /tmp/image*
Sample output:

image-000.ppm   image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm  image-882.ppm
image-001.ppm   image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm  image-883.ppm
image-002.ppm   image-1027.ppm  image-1142.ppm  image-1258.ppm  image-249.ppm  image-376.ppm  image-503.ppm  image-630.ppm  image-757.ppm  image-884.ppm

Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
$ pdfimages -j bar.pdf /tmp/image
The -f option Specifies the first page to scan. To scan first 5 pages, enter:
$ pdfimages -j -f 5 bar.pdf /tmp/image
The -l option specifies the last page to scan. To scan last 5 pages, enter:
$ pdfimages -j -l 5 bar.pdf /tmp/image

TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 5 comments… read them below or add one }

1 isiomer June 24, 2009 at 10:40 am

Good introduction…..
but it seems to exaggerate the functionality of pdfimages tool…..i read somewhere that it extracts only rawdata and can do nothing with vector graphics……please mention that too……………….
if possible please explain how pdfimages works…..does it read boundingbox of the images…etc or what……is it doing something to make use of the gigantic dom structure of a pdf document….or some crafty image processing techniques
nevertheless thanks for this great introduction to a great tool…

Reply

2 jabber November 9, 2009 at 1:22 pm

I have a site to allow user to upload pdf files like scribd. How can i show thumbnail image of that uploaded pdf or swf. I u have idea about it plase mail me.

Reply

3 oash May 14, 2010 at 11:43 pm

Hi,
Thanks for sharing. I agree with isiomer. PdfImages extracts only rawdata and can do nothing with vector graphics. Also, if the boundingbox around image is lost due to conversion to pdf, it will fail as well. Also, sometimes, it generates large number of empty images. After long time querying search engines, i found a google chrome extension called sci2ools that achieves nice results.

Reply

4 yosbudi September 30, 2011 at 9:16 pm

Hi, thanks. useful tips

Reply

5 pankaj April 27, 2012 at 6:56 am

sir how i print .pbm file under linux server.. plzzzz tell me full description.

THANXS in advance!!

Reply

Leave a Comment

Tagged as: , , , , , , , , , , , , ,

Previous Faq:

Next Faq: