The latest findings from a team at the University of Illinois disclosed multiple security vulnerabilities in the most popular tools for redacting PDF documents.
The report, which has been published as a preprint by Maxwell Bland, Anushya Iyer, and Kirill Levchenko, examined 11 popular redaction tools. Out of 11, the team identified that PDFzorro and PDFescape Online allowed full access to text that had allegedly been redacted by merely copying and pasting it.
The findings go further than copy and pasting. It also highlights the latest technique to target PDF documents and employ hidden fingerprints to disclose names that have been redacted. The team’s primary focus was on names, as they are commonly redacted and are sensitive in nature. It does not seem possible to unredact large blocks of text, Bland explained.
To extract secret details from the text, the team devised a tool, called Edact-Ray, that can “identify, break, and fix redaction information leaks.”
“Even if you do the redaction, supposedly correctly, even if you remove the text, there’s a lot of latent information that is dependent on the content that was redacted, and even that can leak information,” Levchenko stated. “If you redact a name in a PDF, if the attacker has any context—they know this is an American—they will be able to, with high probability, either recover that name or narrow it down to a very small list of candidates.”
Over the past three decades, numerous high-profile redaction failures have leaked sensitive data. These have involved mistakes in the redaction process, failure to properly safeguard the data, and the inclusion of enough details to allow people to decipher what the redactions were meant to be.
For example, in 1991 researchers employed a “desktop computer” to reverse engineer the Dead Sea Scrolls resulting in the leak of their full text and documents. Seventeen years later in 2008, sensitive information regarding wiretapping agreements between the US government and telecom companies was easily accessible by the aforementioned method of copy and paste. And in February last year, the European Commission disclosed a version of its Covid-19 contract for the AstraZeneca vaccine that it didn’t properly redact.
When it comes to successfully redacting archives and safeguarding people’s data, the Illinois researchers hope their work will encourage the software program builders to include tools that restricts secret data from being leaked.
According to the researchers, the NSA’s advisory for redacting documents is perhaps the best method to shield redactions. If a user redacts Word documents, then it should alter the content material of the original document before redacting the resulting PDF.