pdftotext: Linux / UNIX Convert a PDF File To Text Format

by Vivek Gite [Last updated: November 19, 2008]

Question: I've downloaded configuration file in a PDF format. I do not have GUI installed on remote Linux / UNIX server. How do I convert a PDF (Portable Document Format) file to a text format using command line so that I can view file over remote ssh session?

Answer: Use pdftotext utility to convert Portable Document Format (PDF) files to plain text. It reads the PDF file, and writes a text file. If text file is not specified, pdftotext converts file.pdf to file.txt. If text-file is -, the text is sent to stdout.

Install pdftotext under RedHat / RHEL / Fedora / CentOS Linux

pdftotext is installed using poppler-utils package under various Linux distributions:
# yum install poppler-utils
OR use the following under Debian / Ubuntu Linux
$ sudo apt-get install poppler-utils

pdftotext syntax

pdftotext {PDF-file} {text-file}

How do I convert a pdf to text?

Convert a pdf file called hp-manual.pdf to hp-manual.txt, enter:
$ pdftotext hp-manual.pdf hp-manual.txt
Specifies the first page 5 and last page 10 (select 5 to 10 pages) to convert, enter:
$ pdftotext -f 5 -l 10 hp-manual.pdf hp-manual.txt
Convert a pdf file protected and encrypted by owner password:
$ pdftotext -opw 'password' hp-manual.pdf hp-manual.txt
Convert a pdf file protected and encrypted by user password:
$ pdftotext -upw 'password' hp-manual.pdf hp-manual.txt
Sets the end-of-line convention to use for text output. You can set it to unix, dos or mac. For UNIX / Linux oses, enter:
$ pdftotext -eol unix hp-manual.pdf hp-manual.txt

Further readings:

  • man page pdftotext
Want to read Linux tips and tricks, but don't have time to check our blog everyday? Subscribe to our daily email newsletter to make sure you don't miss a single tip/tricks. Subscribe to our weekly newsletter here!

{ 3 comments… read them below or add one }

1 virus 11.14.08 at 2:02 pm

this is the simplest way, being simple means not perfect. try:

$ less file.pdf

2 vivek 11.14.08 at 2:40 pm

@virus,

You need to take help of lesspipe and they it may work out ;)
eval "$(lesspipe)"
less file.pdf

3 BKB 12.09.08 at 1:05 am

I’m very glad to have found this tip since I’ve been looking for a way to index the contents of PDF files. “pdftotext” works on text in foreign languages and character sets, too, and outputs the text as UTF-8, which is excellent.

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Tagged as: , , , , , , ,

Previous post: Linux Gnome: Add Open Terminal Here / Open Shell Prompt Here Right Click Menu To a File Manager

Next post: Debian / Ubuntu: Apache2 Change Default Port / IP Binding