New AI systems create false information to mislead adversaries and prevent intellectual property theft

July 29, 2021
a hacker sits in the dark surrounded by monitors to symbolize preventing intellectual property theft with AI and natural language processing

Intellectual property (IP) theft is a major problem for American businesses and citizens. According to The Commission on the Theft of American Intellectual Property (IP Commission), IP theft costs the U.S. economy hundreds of billions of dollars annually.

The IP Commission’s most recent report, released in March 2021, lists the priority actions and areas of focus for the current presidential administration, including responding faster to IP theft and staying on top of the evolving digital environment.

Bringing a 20th Century Strategy to the 21st Century with AI and Machine Learning

Researchers at Dartmouth College are using the ever-changing digital environment to their advantage while relying on the success of historical espionage techniques to help reduce IP theft. The researchers have taken the World War II-era “canary trap” – spreading multiple versions of sensitive documents with some containing false information – and integrated modern technology to develop an innovative solution for preventing IP theft.

“[The study] uses artificial intelligence to build on the canary trap concept,” writes David Hirsch for Dartmouth News. “The system automatically creates false documents to protect intellectual property such as drug design and military technology.”

The work by the Dartmouth researchers was shared in the February 2021 issue of the Association for Computing Machinery (ACM) Transactions on Management Information Systems and builds on the existing Fake Online Repository Generation Engine (FORGE).

The Importance of Natural Language Processing in Cybersecurity Tools

However, researchers discovered that FORGE, a Dartmouth creation that automatically generates a certain number of fake versions of a real document, had some drawbacks. They created an updated version of the engine called WE-FORGE, which utilizes both artificial intelligence and natural language processing (NLP) to falsify technical information within documents.

“WE-FORGE improves on an earlier version of the system… by removing the time-consuming need to create guides of concepts associated with specific technologies,” writes Hirsch. “WE-FORGE also ensures that there is greater diversity among fakes, and follows an improved technique for selecting concepts to replace and their replacements.”

The addition of NLP was the key to ensuring the fake versions of IP were both believable and yet inaccurate when compared to the original versions of the IP. WE-FORGE also ensures the various versions of the document are randomized to make it even more difficult to find the “true” version among the collection of files.

How to Prevent Intellectual Property Theft through Creating Confusion and Driving Up Hackers’ Costs

All of this is geared toward overwhelming hackers – making the task of identifying the real version so cumbersome, and potentially impossible, that it’s not worth the time and money it would take to figure it out.

And even if a hacker does manage to find the real version, says V. S. Subrahmanian, one of the study authors, they may not believe it is the original and will have spent a large amount of time and resources trying to determine if the document they found is indeed correct.

“Malicious actors are stealing intellectual property right now and getting away with it for free,” says Subrahmanian, in the Dartmouth News article. “This system raises the cost that thieves incur when stealing government or industry secrets.”

As part of the study, the researchers falsified computer science and chemistry patents and had subject matter experts in both areas attempt to identify which versions were real. Per the study results, WE-FORGE was able to successfully deceive experts with the fake versions of the patents.

Between machine learning, artificial intelligence, and natural language processing, methods to defend against IP theft will only continue to expand and improve.

Capitol Tech offers bachelor’s, master’s and doctorate degrees in cyber and information security and computer science, artificial intelligence, and data science. Many courses are available both on campus and online. To learn more about Capitol Tech’s degree programs, contact