Hiding Information in Plain Text

Subtle changes to letter shapes can embed messages

<!–
@page { margin: 0.79in }
P { margin-bottom: 0.08in }
A:link { so-language: zxx }
–>

Computer scientists have now invented a way to hide secret messages in ordinary text by imperceptibly changing the shapes of letters.

The new technique, named

FontCode

, works with common font families such as Times Roman and Helvetica. It is compatible with most word-processing software, including Microsoft Word, as well as image-editing and drawing programs, such as Adobe Photoshop and Adobe Illustrator.

Although there are obvious applications for espionage with FontCode, its inventors suggest it has more practical uses in terms of embedding metadata into texts, much like watermarking. “You can imagine that it would be used to provide extra information, such as authors, copyright and so on, about a document,” says study senior author Changxi Zheng, a computer scientist at Columbia University. “Another application is to protect legal documents: Our technique can be used to detect if a document, even when printed on paper, has been tampered with or not. It can even be used to tell which part of the document is tampered.”

Changxi and his collaborators will detail

their findings

in August at the

SIGGRAPH

conference in Vancouver.

Another potential application of FontCode is as an alternative to QR codes. For instance, when people snap a photo of a poster with FontCode-modified text, their smartphones may be redirected “to a website or Youtube video about that poster,” Zheng says. “This is similar to what a QR code can do, but now without the need of putting a black-and-white pattern that can be distracting or compromise the aesthetics of the poster.”

Illustration showing how words are hidden — Image: Changxi Zheng/Columbia Engineering

FontCode embeds data into texts using minute perturbations to components of letters. This includes changing the width of

strokes

, adjusting the height of

ascenders

and

descenders

, and tightening or loosening the curves in

serifs

and the

bowls

of letters such as o, p, and b.

A kind of

artificial-intelligence system

known as a

convolutional neural network

can recognize these perturbations and help recover the embedded messages. The amount of information FontCode can hide is limited only by the number of letters on which it acts, the researchers say.

“Traditionally, a text document is meant to deliver information to the human only. Now we show that it can also deliver embedded information to digital intelligent systems, and the two parts of information delivery do not conflict,” Zheng says. “This is drastically different from existing methods such as QR codes or optical barcodes, which are meant to read by digital systems but occupy a certain area on the paper.”

To account for potential distortions to text due to concerns such as lighting, blurriness or camera angle, the scientists relied on the 1,700-year-old

Chinese remainder theorem

, which can help reconstruct missing information. This strategy could help recover hidden messages even when 25 percent of perturbations to texts are not recognized correctly.

Moreover, FontCode not only embeds messages in text, but can also encrypt them. For instance, users can agree on a private key that can specify the order in which hidden letters are read.

Although other methods exist to hide a message in text, FontCode’s inventors say their new technique is the first to work independent of document type. It can also retain secret information even when a document or an image with text is printed onto paper or converted to another file type, they say.

The researchers have filed a patent for FontCode with
Columbia Technology Ventures
. In addition, “we want to extend this technique to other languages,” Zheng says. “We demonstrated using English in this project; it might require a bit of thought to extend it to other languages—especially the logographic languages, such as Chinese.”

Source: IEEE Spectrum Computing