How to extract and/or remove the last page of a bunch of PDFs?

One of our vendors started tacking on an unnecessarily huge image to the last page of PDFs we get from them. I need to trim this out. However, we have hundreds of these, so it's prohibitive to go in manually. What're the best ways to extract and then delete (Preferably first one, then the other; I still need to confirm via filesize that I'm not deleting one which doesn't have the image) the last page of a PDF automatically? OS is Linux.

I can extract it using ghostscript, with something along the lines of gs -dFirstPage=5 -dLastPage=5, but I need to automate this, I can't go through and manually find out what the number of the last page is.

Any ideas?

Edit: To clarify, I simply want to split out/delete the last page. Not the image in it, excise the last page period.

5

5 Answers

As @Daniel Andersson already commented, this can easily be done with pdftk:

pdftk input.pdf cat end-1 output temp.pdf
pdftk temp.pdf cat end-2 output output.pdf
rm temp.pdf

I don't know if it can be done with one call to pdftk though...

Edit: you could combine it with thanosk's answer and use (in bash):

pdftk input.pdf cat 1-$((last-1)) output output.pdf

when you already extracted the last page to the variable $last.

1

To further improve on @eldering's answer, pdftk version 1.45 and later have the means to reference pages in reverse order by prepending the lower-case letter r to the page number. The final page in a PDF is r1, the next-to-last page is r2, etc.

For example, the single pdftk call:

pdftk input.pdf cat 1-r2 output output.pdf

will drop the final page from input.pdf -- the input should be at least two pages long.

To extract just the final page of a PDF in order to test its filesize, run:

pdftk input.pdf cat r1 output final_page.pdf

Pdftk is available on Linux. Many distros have a binary you can install. You should make sure it is version 1.45 or later, though. If not, you can build pdftk from source code.

1

pdfinfo will give you the size of the actual pdf file, and pdfimages will give you an index of the images in the said pdf file. So you can write a script in the form

#!/bin/bash
for i in *.pdf
do j=$(pdfinfo "$i" |awk '/^Pages/ { print $2}') pdfimages -list -p -f "$j" "$i"
done

that should return if a particular file has an image in the last page. If it does then you can do whatever manipulation you need to do.

A one liner solution would be to use find along pdftk:

find . -name "*.pdf" -exec pdftk {} cat 1-r2 output cut/{} \;

NOTE: the cropped files are stored in this example in a subdirectory called cut to keep the original filename as pdftk does not allow overwriting input files.

Here's a solution using pdfjam instead of pdftk:

#!/bin/sh
fname=`basename $1`
pdfjam $1 1-$((`pdfinfo $1 | grep Pages | grep -shoPe '\d+'` - ${2:-1})) -o ${fname%.*}-trimmed.pdf

Where the first argument is the file to trim and the second argument the amount of pages to trim (defaults to 1).

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like