≡ Menu

pdfimages: Extract and Save Images From A Portable Document Format ( PDF ) File

Q. How do I extract images from a PDF file under Linux / UNIX shell account?

A. pdfimages works as Portable Document Format (PDF) image extractor under Linux / UNIX operating systems. It saves images from a PDF file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Pdfimages reads the PDF file PDF-file, scans one or more pages, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).

pdfimages is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
# apt-get install poppler-utils

pdfimages syntax

pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
$ pdfimages bar.pdf /tmp/image
$ ls /tmp/image*
Sample output:

image-000.ppm   image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm  image-882.ppm
image-001.ppm   image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm  image-883.ppm
image-002.ppm   image-1027.ppm  image-1142.ppm  image-1258.ppm  image-249.ppm  image-376.ppm  image-503.ppm  image-630.ppm  image-757.ppm  image-884.ppm

Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
$ pdfimages -j bar.pdf /tmp/image
The -f option Specifies the first page to scan. To scan first 5 pages, enter:
$ pdfimages -j -f 5 bar.pdf /tmp/image
The -l option specifies the last page to scan. To scan last 5 pages, enter:
$ pdfimages -j -l 5 bar.pdf /tmp/image

Tweet itFacebook itGoogle+ itPDF itFound an error/typo on this page?

{ 6 comments… add one }

  • isiomer June 24, 2009, 10:40 am

    Good introduction…..
    but it seems to exaggerate the functionality of pdfimages tool…..i read somewhere that it extracts only rawdata and can do nothing with vector graphics……please mention that too……………….
    if possible please explain how pdfimages works…..does it read boundingbox of the images…etc or what……is it doing something to make use of the gigantic dom structure of a pdf document….or some crafty image processing techniques
    nevertheless thanks for this great introduction to a great tool…

  • jabber November 9, 2009, 1:22 pm

    I have a site to allow user to upload pdf files like scribd. How can i show thumbnail image of that uploaded pdf or swf. I u have idea about it plase mail me.

  • oash May 14, 2010, 11:43 pm

    Thanks for sharing. I agree with isiomer. PdfImages extracts only rawdata and can do nothing with vector graphics. Also, if the boundingbox around image is lost due to conversion to pdf, it will fail as well. Also, sometimes, it generates large number of empty images. After long time querying search engines, i found a google chrome extension called sci2ools that achieves nice results.

  • yosbudi September 30, 2011, 9:16 pm

    Hi, thanks. useful tips

  • pankaj April 27, 2012, 6:56 am

    sir how i print .pbm file under linux server.. plzzzz tell me full description.

    THANXS in advance!!

  • Prabhat November 14, 2014, 7:06 am

    Is there any options that tell number of images generated from pdfimages,


Leave a Comment