Hiding Information in Plain Text

Hiding Information in Plain Text

Subtle changes to letter shapes can embed messages

Image: Columbia University
<!–
@page { margin: 0.79in }
P { margin-bottom: 0.08in }
A:link { so-language: zxx }
–>

Computer scientists have now invented a way to hide secret messages in ordinary text by imperceptibly changing the shapes of letters.

The new technique, named


FontCode


, works with common font families such as Times Roman and Helvetica. It is compatible with most word-processing software, including Microsoft Word, as well as image-editing and drawing programs, such as Adobe Photoshop and Adobe Illustrator.

Although there are obvious applications for espionage with FontCode, its inventors suggest it has more practical uses in terms of embedding metadata into texts, much like watermarking. “You can imagine that it would be used to provide extra information, such as authors, copyright and so on, about a document,” says study senior author Changxi Zheng, a computer scientist at Columbia University. “Another application is to protect legal documents: Our technique can be used to detect if a document, even when printed on paper, has been tampered with or not. It can even be used to tell which part of the document is tampered.”

Changxi and his collaborators will detail 

their findings

 in August at the 

SIGGRAPH

  conference in Vancouver.

Another potential application of FontCode is as an alternative to QR codes. For instance, when people snap a photo of a poster with FontCode-modified text, their smartphones may be redirected “to a website or Youtube video about that poster,” Zheng says. “This is similar to what a QR code can do, but now without the need of putting a black-and-white pattern that can be distracting or compromise the aesthetics of the poster.”

FontCode embeds data into texts using minute perturbations to components of letters. This includes changing the width of


strokes


, adjusting the height of


ascenders


and


descenders


, and tightening or loosening the curves in


serifs


and the


bowls


of letters such as o, p, and b.

A kind of


artificial-intelligence system


known as a


convolutional neural network


can recognize these perturbations and help recover the embedded messages. The amount of information FontCode can hide is limited only by the number of letters on which it acts, the researchers say.

“Traditionally, a text document is meant to deliver information to the human only. Now we show that it can also deliver embedded information to digital intelligent systems, and the two parts of information delivery do not conflict,” Zheng says. “This is drastically different from existing methods such as QR codes or optical barcodes, which are meant to read by digital systems but occupy a certain area on the paper.”

To account for potential distortions to text due to concerns such as lighting, blurriness or camera angle, the scientists relied on the 1,700-year-old


Chinese remainder theorem


, which can help reconstruct missing information. This strategy could help recover hidden messages even when 25 percent of perturbations to texts are not recognized correctly.

Moreover, FontCode not only embeds messages in text, but can also encrypt them. For instance, users can agree on a private key that can specify the order in which hidden letters are read.

Although other methods exist to hide a message in text, FontCode’s inventors say their new technique is the first to work independent of document type. It can also retain secret information even when a document or an image with text is printed onto paper or converted to another file type, they say.

The researchers have filed a patent for FontCode with
Columbia Technology Ventures
. In addition, “we want to extend this technique to other languages,” Zheng says. “We demonstrated using English in this project; it might require a bit of thought to extend it to other languages—especially the logographic languages, such as Chinese.”

Source: IEEE Spectrum Computing