pdftotext: Linux / UNIX Convert a PDF File To Text Format

Question: I’ve downloaded configuration file in a PDF format. I do not have GUI installed on remote Linux / UNIX server. How do I convert a PDF (Portable Document Format) file to a text format using command line so that I can view file over remote ssh session?

Answer: Use pdftotext utility to convert Portable Document Format (PDF) files to plain text. It reads the PDF file, and writes a text file. If text file is not specified, pdftotext converts file.pdf to file.txt. If text-file is -, the text is sent to stdout.

Install pdftotext under RedHat / RHEL / Fedora / CentOS Linux

pdftotext is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
OR use the following under Debian / Ubuntu Linux
$ sudo apt-get install poppler-utils

pdftotext syntax

pdftotext {PDF-file} {text-file}

How do I convert a pdf to text?

Convert a pdf file called hp-manual.pdf to hp-manual.txt, enter:
$ pdftotext hp-manual.pdf hp-manual.txt
Specifies the first page 5 and last page 10 (select 5 to 10 pages) to convert, enter:
$ pdftotext -f 5 -l 10 hp-manual.pdf hp-manual.txt
Convert a pdf file protected and encrypted by owner password:
$ pdftotext -opw 'password' hp-manual.pdf hp-manual.txt
Convert a pdf file protected and encrypted by user password:
$ pdftotext -upw 'password' hp-manual.pdf hp-manual.txt
Sets the end-of-line convention to use for text output. You can set it to unix, dos or mac. For UNIX / Linux oses, enter:
$ pdftotext -eol unix hp-manual.pdf hp-manual.txt

Further readings:

  • man page pdftotext
🐧 If you liked this page, please support my work on Patreon or with a donation.
🐧 Get the latest tutorials on SysAdmin, Linux/Unix, Open Source/DevOps topics:
CategoryList of Unix and Linux commands
File Managementcat
FirewallAlpine Awall CentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Network Utilitiesdig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg jobs killall kill pidof pstree pwdx time
Searchinggrep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNAlpine CentOS 8 Debian 10 Firewall Ubuntu 20.04
9 comments… add one
  • virus Nov 14, 2008 @ 14:02

    this is the simplest way, being simple means not perfect. try:

    $ less file.pdf

  • 🐧 nixCraft Nov 14, 2008 @ 14:40

    @virus,

    You need to take help of lesspipe and they it may work out ;)
    eval "$(lesspipe)"
    less file.pdf

  • BKB Dec 9, 2008 @ 1:05

    I’m very glad to have found this tip since I’ve been looking for a way to index the contents of PDF files. “pdftotext” works on text in foreign languages and character sets, too, and outputs the text as UTF-8, which is excellent.

  • Ritika Garg Apr 9, 2009 @ 8:43

    How to convert filename.f to filename.pdf in linux?

  • Patricia Aug 3, 2009 @ 8:15

    how to convert filename.txt to filename.pdf in ubuntu?
    thank you

  • virus Aug 5, 2009 @ 7:26

    Patricia
    print the file as pdf format

  • Patricia Aug 8, 2009 @ 6:56

    i mean, is there any comment to convert text file to pdf file?
    thank you

  • panchicore Oct 12, 2010 @ 16:00

    with mandriva you can install it with:

    [panchicore@localhost ~]$ su
    [root@localhost ~]$ urpmi poppler

    now you have available: pdftex pdftoabw pdftohtml pdftoppm pdftops pdftosrc pdftotext

  • hamid Dec 18, 2014 @ 6:32

    perfect!
    this is amazing tanks a lot

Leave a Reply

Your email address will not be published. Required fields are marked *

Use HTML <pre>...</pre>, <code>...</code> and <kbd>...</kbd> for code samples.