≡ Menu

pdfimages: Extract and Save Images From A Portable Document Format ( PDF ) File

Q. How do I extract images from a PDF file under Linux / UNIX shell account?

A. pdfimages works as Portable Document Format (PDF) image extractor under Linux / UNIX operating systems. It saves images from a PDF file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Pdfimages reads the PDF file PDF-file, scans one or more pages, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).

pdfimages is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
# apt-get install poppler-utils

pdfimages syntax

pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
$ pdfimages bar.pdf /tmp/image
$ ls /tmp/image*
Sample output:

image-000.ppm   image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm  image-882.ppm
image-001.ppm   image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm  image-883.ppm
image-002.ppm   image-1027.ppm  image-1142.ppm  image-1258.ppm  image-249.ppm  image-376.ppm  image-503.ppm  image-630.ppm  image-757.ppm  image-884.ppm

Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
$ pdfimages -j bar.pdf /tmp/image
The -f option Specifies the first page to scan. To scan first 5 pages, enter:
$ pdfimages -j -f 5 bar.pdf /tmp/image
The -l option specifies the last page to scan. To scan last 5 pages, enter:
$ pdfimages -j -l 5 bar.pdf /tmp/image

Share this tutorial on:

Your support makes a big difference:
I have a small favor to ask. More people are reading the nixCraft. Many of you block advertising which is your right, and advertising revenues are not sufficient to cover my operating costs. So you can see why I need to ask for your help. The nixCraft, takes a lot of my time and hard work to produce. If you use nixCraft, who likes it, helps me with donations:
Become a Supporter →    Make a contribution via Paypal/Bitcoin →   

Don't Miss Any Linux and Unix Tips

Get nixCraft in your inbox. It's free:

{ 7 comments… add one }
  • isiomer June 24, 2009, 10:40 am

    Good introduction…..
    but it seems to exaggerate the functionality of pdfimages tool…..i read somewhere that it extracts only rawdata and can do nothing with vector graphics……please mention that too……………….
    if possible please explain how pdfimages works…..does it read boundingbox of the images…etc or what……is it doing something to make use of the gigantic dom structure of a pdf document….or some crafty image processing techniques
    nevertheless thanks for this great introduction to a great tool…

  • jabber November 9, 2009, 1:22 pm

    I have a site to allow user to upload pdf files like scribd. How can i show thumbnail image of that uploaded pdf or swf. I u have idea about it plase mail me.

  • oash May 14, 2010, 11:43 pm

    Thanks for sharing. I agree with isiomer. PdfImages extracts only rawdata and can do nothing with vector graphics. Also, if the boundingbox around image is lost due to conversion to pdf, it will fail as well. Also, sometimes, it generates large number of empty images. After long time querying search engines, i found a google chrome extension called sci2ools that achieves nice results.

  • yosbudi September 30, 2011, 9:16 pm

    Hi, thanks. useful tips

  • pankaj April 27, 2012, 6:56 am

    sir how i print .pbm file under linux server.. plzzzz tell me full description.

    THANXS in advance!!

  • Prabhat November 14, 2014, 7:06 am

    Is there any options that tell number of images generated from pdfimages,


  • Phil March 26, 2016, 11:40 am

    You mention that several images it downloads as blanks. Is there a sollution to this? I find that the -j functionality doesn’t work. Instead of extracting jpeg images, it loads several blank ppm images of differing sizes which renders the entire exercise useless…

Security: Are you a robot or human?

Leave a Comment

You can use these HTML tags and attributes: <strong> <em> <pre> <code> <a href="" title="">

   Tagged with: , , , , , , , , , , , , ,