pdfimages: Extract and Save Images From A Portable Document Format ( PDF ) File

Q. How do I extract images from a PDF file under Linux / UNIX shell account?

A. pdfimages works as Portable Document Format (PDF) image extractor under Linux / UNIX operating systems. It saves images from a PDF file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Pdfimages reads the PDF file PDF-file, scans one or more pages, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).

pdfimages is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
OR
# apt-get install poppler-utils

pdfimages syntax

pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
$ pdfimages bar.pdf /tmp/image
$ ls /tmp/image*
Sample output:

image-000.ppm   image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm  image-882.ppm
image-001.ppm   image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm  image-883.ppm
image-002.ppm   image-1027.ppm  image-1142.ppm  image-1258.ppm  image-249.ppm  image-376.ppm  image-503.ppm  image-630.ppm  image-757.ppm  image-884.ppm

Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
$ pdfimages -j bar.pdf /tmp/image
The -f option Specifies the first page to scan. To scan first 5 pages, enter:
$ pdfimages -j -f 5 bar.pdf /tmp/image
The -l option specifies the last page to scan. To scan last 5 pages, enter:
$ pdfimages -j -l 5 bar.pdf /tmp/image

🐧 If you liked this page, please support my work on Patreon or with a donation.
🐧 Get the latest tutorials on SysAdmin, Linux/Unix, Open Source/DevOps topics:
CategoryList of Unix and Linux commands
File Managementcat
FirewallAlpine Awall CentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Network Utilitiesdig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg jobs killall kill pidof pstree pwdx time
Searchinggrep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNAlpine CentOS 8 Debian 10 Firewall Ubuntu 20.04
7 comments… add one
  • isiomer Jun 24, 2009 @ 10:40

    Good introduction…..
    but it seems to exaggerate the functionality of pdfimages tool…..i read somewhere that it extracts only rawdata and can do nothing with vector graphics……please mention that too……………….
    if possible please explain how pdfimages works…..does it read boundingbox of the images…etc or what……is it doing something to make use of the gigantic dom structure of a pdf document….or some crafty image processing techniques
    nevertheless thanks for this great introduction to a great tool…

  • jabber Nov 9, 2009 @ 13:22

    I have a site to allow user to upload pdf files like scribd. How can i show thumbnail image of that uploaded pdf or swf. I u have idea about it plase mail me.

  • oash May 14, 2010 @ 23:43

    Hi,
    Thanks for sharing. I agree with isiomer. PdfImages extracts only rawdata and can do nothing with vector graphics. Also, if the boundingbox around image is lost due to conversion to pdf, it will fail as well. Also, sometimes, it generates large number of empty images. After long time querying search engines, i found a google chrome extension called sci2ools that achieves nice results.

  • yosbudi Sep 30, 2011 @ 21:16

    Hi, thanks. useful tips

  • pankaj Apr 27, 2012 @ 6:56

    sir how i print .pbm file under linux server.. plzzzz tell me full description.

    THANXS in advance!!

  • Prabhat Nov 14, 2014 @ 7:06

    Is there any options that tell number of images generated from pdfimages,

    Thanks

  • Phil Mar 26, 2016 @ 11:40

    You mention that several images it downloads as blanks. Is there a sollution to this? I find that the -j functionality doesn’t work. Instead of extracting jpeg images, it loads several blank ppm images of differing sizes which renders the entire exercise useless…

Leave a Reply

Your email address will not be published. Required fields are marked *

Use HTML <pre>...</pre>, <code>...</code> and <kbd>...</kbd> for code samples.