Removing password protection from PDF files

Important note: this post wont help you if you have a PDF file and don’t know the password. This is for removing passwords on PDFs that you have legal access to, but don’t want to be password-protected any more.

A while ago, one of my employers started emailing payslips in PDF format. Now, I know there are many issues around accessibility with PDFs, but it works for me – I get a digital version of a document that looks exactly as the printed one would have. Except that someone decided email (even to a company-secured account) was not secure enough, and they password-protected the files. In theory, this stops another employee from opening my payslip. In practice, they used a known piece of personally identifiable information (PII).

Anyway, I wanted to keep a copy of the files on my own file storage. I can do this because, technically, they are not company data and they are (or at least should be) private to me. Indeed the company in question has since moved to a system that emails a link to a personal email account, inviting the employee to download their payslip from a portal.

I didn’t want the copies of the payslips that I held to be password protected. That meant I needed to remove those passwords.

QPDF

QPDF is a computer program, and associated library, for structural, content-preserving transformations on PDF files. It’s not for creating, viewing or converting PDF files.

One of the things it can do, is remove the password protection on a file. Remember, this is a file that I have legal access to, so removing the password protection is not a crime. I’m not hacking the file – in fact I need to know the password in order to remove it.

QPDF can do much more than remove passwords (for example I think I could use it to create new versions of a PDF file with just a subset of the pages), but this was what I needed to do.

A little side-note

This was the second time I performed this exercise. I first did it a few years ago, but only on the payslips I’d received up until that date. Later ones were still password-protected. I didn’t document my method the first time around though… so I had to work it all out again. This time I decided to write it down…

A little PowerShell Script

It looks like, the first time I ran this, I downloaded a Windows executable version of QPDF and either wrote, or more likely found, a PowerShell script to adapt. The script is called payslips.ps1 and looks like this:

$children = Get-ChildItem # Save files in a variable. Piping the rest of the script from Get-ChildItem in a single line was a bad idea
$children | ForEach-Object {
Write-Debug "Working on $_.Name"; #Doesn't actually display a lot
$fileName =[System.IO.Path]::GetFileNameWithoutExtension($_.Name); #Strip name, we will append "tmp"
$ext =[System.IO.Path]::GetExtension($_.Name);
$tempFile = $fileName + "tmp" + $ext; # Append "_tmp" Move-Item -Path $.Name -Destination $tempFile; #Move the file to a temporary location
..\qpdf.exe --password=AB123456C --decrypt $tempFile $_.Name; #Use qpdf to decrypt it, save in original location
#Remove-Item $tempFile #Remove temporary file
}

ABC123456C should be replaced with the actual password. Actually, it shouldn’t, because including credentials in code is sloppy security practice. There are better ways to pass the password, but I’m just converting 50 files as a one-off exercise, not building a repeatable business process. If you go on to use this in a business environment, please don’t do it this way!

Release notes

The script makes a temporary copy of each file, suffixed with _tmp but preserving the file extension.

If you run the script against the current folder, it will run against all files, not just PDFs. That means it will rename itself and all the QPDF files with _tmp. This will cause it to fail.

It looks like, when I ran this a few years ago, I used a files.txt file to control this behaviour. files.txt was just a list of filenames and is easily generated using the following command:

dir /b /a-d > files.txt

But, this time, I couldn’t see how to provide that as a parameter to QPDF, so I had to:

  1. Place all the files to be converted in a subfolder of the folder containing QPDF and my PowerShell script.
  2. Edit the payslips.ps1 script to refer to ..\qpdf.exe (i.e. qpdf.exe in the folder above the current one).
  3. Change directory into the subfolder.
  4. Run payslips.ps1 from the subfolder – i.e.:
..\payslips.ps1

This means it will only run against the files in the subfolder, and not against QPDF, the script, or anything else.

It doesn’t seem to remove the temporary files. I didn’t try to work out why. It had already created what I needed by then.

Featured image: author’s own

One thought on “Removing password protection from PDF files

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.