Tuesday, December 8, 2009

TSA's bad PDF redaction . . . and tips on redacting PDFs properly

The Transportation Security Administration (TSA) is the latest in a long line of Fortune 500 companies and federal government agencies to discover that information can all too easily be recovered from an improperly redacted PDF document. On Sunday, blogger The Wandering Aramean announced that the TSA had posted a copy of its Screening Management Standard Operating Procedure manual, which provides detailed information about how TSA personnel screen passengers and luggage, on a federal contract soliciation Web site.

Portions of the manual, which is identified as containing Sensitive Security Information, were redacted, but . . . whoever did the redactions simply used Adobe Acrobat or other PDF-compatible software to draw black boxes over the information that should have been redacted. As I've noted before, it doesn't take tons of computer know-how to recover the information hiding under those black boxes, and The Wandering Aramean and lots of other people were able to do so. The TSA has pulled the manual off the federal contract site, but you can find a complete and unredacted copy here and on lots of other sites.

The TSA has stated that the version of the manual it posted has been superseded repeatedly, that it was never actually used by TSA personnel, and that TSA security procedures have changed substantially since it was written. However, the damage has been done: the blogosphere and the news media are having a field day, and Congress is demanding an investigation. I know that beating up on the TSA is something of a sport (and, believe me, I have some issues with its 3-1-1 policy), but I really do feel for the folks at TSA HQ who have to clean up this mess.

Putting poorly redacted PDFs on the Web seems to be something of a fad these days -- Google did it a few weeks ago -- but I don't want to see archivists or records managers fall prey to the pitfalls that have ensnared so many others. If you're trying to figure out how to provide access to PDFs that contain information restricted by law or donor agreement, here are a few pointers:
  • If you're working with a PDF file, never, ever use Adobe Acrobat's Draw or Annotate tools (or comparable tools in other programs) to place black, white, etc. boxes over the information you wish to redact. All a savvy user needs to do is to copy the PDF in its entirety and paste it into a word processing document. Moreover, someone with ready access to Adobe Acrobat or comparable software can skip the copying and pasting and simply open the PDF and remove the boxes that you drew. Don't think that locking your PDF will keep this from happening: shareware that promises to unlock PDFs is all over the Interwebs.
  • If you're working with a word processing document that you plan to convert to PDF format, never, ever attempt to redact information by changing the font color to white or using a shading or highlighting feature to obscure the text and then converting the document to PDF format. The copy-and-paste technique outlined above will reveal the hidden text; users might have to play with the font colors a bit, but doing so won't take them more than a few seconds.
At present, there are several good tools for redacting PDF files, and you'll need to assess your current software setup, the amount of redaction work you'll have to do, and your budget in order to decide which one works best for you.
  • If you've got an older version of Acrobat, two third-party plug-ins for Adobe Acrobat, Redax and Redact-It, are time-tested and have substantial followings in the legal community.
  • If you are using an older version of Adobe Acrobat and can't or don't want to upgrade or purchase an add-on tool, the National Security Agency has produced a document that outlines a laborious but effective redaction procedure.
  • If you've got an old version of Acrobat, no money for an upgrade or a plug-in, and only a handful of documents to redact, you might want to consider printing out the documents, whipping out a black magic marker, and redacting information the old-fashioned way. Photocopy the redacted printouts to reduce the chance that the text can be read through the marker, then scan the photocopies.
If you do commit to redacting documents electronically:
  • Make sure you know how to use your chosen redaction tool. Most of them are pretty straightforward, but slip-ups are possible, and you don't want slip-ups circulating on the Web. All of the software tools listed above are well-documented, so take the time needed to review and digest said documentation.
  • Prepare a test file and familiarize yourself with your chosen software tool before you start working with real live documents. If you can get a disinterested third party (preferably one with lots of IT or digital forensics experience) to review your test file and verify that the information you've redacted really is gone, by all means do so.
  • This may seem a bit obvious, but someone once asked me, so I'm going to come right out and say it: don't redact your original e-documents. Chances are, your documents will one day be fully discloseable, so make electronic copies of them, redact the copies, and keep both the copies and the originals. Doing so increases your storage and preservation commitments, but there really aren't any good alternatives, particularly for records warranting permanent retention.
  • Keep abreast of the relevant legal and digital forensics literature: people are trying to figure out how to "break" all of the tools listed above and recover information redacted with these tools. One of them may eventually succeed, at which point all bets are off.
Finally, a gentle disclaimer: the above information is . . . simply information, not legal, financial, medical, dental, or any other kind of advice. As is the case with everything on this blog, it's not necessarily reflective of the opinions and policies of my employer, either. It does reflect my own knowledge at the time of this writing, but, as is the case with all things electronic, electronic redaction technology and best practices change rapidly. It's really up to you to investigate the options for yourself and to make sure that the electronic information you redact really can't be recovered.

Happy redacting!

1 comment:

Monique said...

Proper electronic redaction is the complete removal of content from an electronic document, making it irretrievable and unavailable for view, print, search or copy. Whatever tool organizations choose, they must provide appropriate training for staff, enforce rules on privacy protection, and implement redaction software that’s up to the task. Go to http://www.redact-it.com/whitepapers/ to see a white paper and video showing what went wrong and how to do it right (bet TSA and HSBC Bank wish they’d done so!).