Using “awk” and “texttopdf”

So, someone might ask, “How do you extract 10 characters from a pdf on a particular line in a pdf file?”.  This might be a common task performed on a report generated by a database where the output is pdf.


First, create a text file (temp.txt) using “texttopdf”.  This will provide a consistent and easy to manipulate format for the data from the pdf. Syntax looks like this:

command> texttopdf   filename.pdf   temp.txt

Second, use “awk” to grab characters 10-15 (or whatever your choose) and pass them through a second command that grabs the correct line item.

Here is an example to create an output called “OUTPUT” that grabs 10 character from the 2nd line of the file 44 characters in:

awk ‘{print substr($0,44,10)}’ temp.txt | awk ‘NR==2’ > OUTPUT