One of our vendors started tacking on an unnecessarily huge image to the last page of PDFs we get from them. I need to trim this out. However, we have hundreds of these, so it's prohibitive to go in manually. What're the best ways to extract and then delete (Preferably first one, then the other; I still need to confirm via filesize that I'm not deleting one which doesn't have the image) the last page of a PDF automatically? OS is Linux.
I can extract it using ghostscript, with something along the lines of gs -dFirstPage=5 -dLastPage=5, but I need to automate this, I can't go through and manually find out what the number of the last page is.
Any ideas?
Edit: To clarify, I simply want to split out/delete the last page. Not the image in it, excise the last page period.
55 Answers
As @Daniel Andersson already commented, this can easily be done with pdftk:
pdftk input.pdf cat end-1 output temp.pdf
pdftk temp.pdf cat end-2 output output.pdf
rm temp.pdfI don't know if it can be done with one call to pdftk though...
Edit: you could combine it with thanosk's answer and use (in bash):
pdftk input.pdf cat 1-$((last-1)) output output.pdfwhen you already extracted the last page to the variable $last.
To further improve on @eldering's answer, pdftk version 1.45 and later have the means to reference pages in reverse order by prepending the lower-case letter r to the page number. The final page in a PDF is r1, the next-to-last page is r2, etc.
For example, the single pdftk call:
pdftk input.pdf cat 1-r2 output output.pdfwill drop the final page from input.pdf -- the input should be at least two pages long.
To extract just the final page of a PDF in order to test its filesize, run:
pdftk input.pdf cat r1 output final_page.pdfPdftk is available on Linux. Many distros have a binary you can install. You should make sure it is version 1.45 or later, though. If not, you can build pdftk from source code.
1pdfinfo will give you the size of the actual pdf file, and pdfimages will give you an index of the images in the said pdf file. So you can write a script in the form
#!/bin/bash
for i in *.pdf
do j=$(pdfinfo "$i" |awk '/^Pages/ { print $2}') pdfimages -list -p -f "$j" "$i"
donethat should return if a particular file has an image in the last page. If it does then you can do whatever manipulation you need to do.
A one liner solution would be to use find along pdftk:
find . -name "*.pdf" -exec pdftk {} cat 1-r2 output cut/{} \;NOTE: the cropped files are stored in this example in a subdirectory called cut to keep the original filename as pdftk does not allow overwriting input files.
Here's a solution using pdfjam instead of pdftk:
#!/bin/sh
fname=`basename $1`
pdfjam $1 1-$((`pdfinfo $1 | grep Pages | grep -shoPe '\d+'` - ${2:-1})) -o ${fname%.*}-trimmed.pdfWhere the first argument is the file to trim and the second argument the amount of pages to trim (defaults to 1).