Inside the High-Stakes Battle for Cybersecurity Supremacy: Luddy Features: Luddy School of Informatics, Computing, and Engineering : Indiana University

Inside the High-Stakes Battle for Cybersecurity Supremacy

Explore other innovations happening at Luddy

Indiana University’s XiaoFeng Wang compares it to a game of chess against an incredibly smart opponent; he is trying to give the good guys the edge.

Todd Davis was nothing if not confident.

In 2006, the then-CEO of identity-theft-protection company LifeLock believed so strongly in his product that he put his Social Security number on the sides of cargo trucks in New York City, the homepage banner on the company’s website, and nationally broadcast TV commercials. Each claimed boldly: “LifeLock makes your personal information useless to a criminal.”

It turned out to be very useful to 13 criminals.

That’s how many successfully stole Davis’ identity for profit. It was used in 2007 to, among other things, obtain a $500 loan and open an AT&T wireless account that compiled more than $2,000 in charges. Ten years later, Davis’ Social Security number still was the plaything of cybercriminals. In 2017, it was used seven times.

None of the bad guys have ever been caught.

XiaoFeng Wang has a much healthier respect than Davis did for those on the other side of that line separating those using technology for good and those using it for harm. Wang is the associate dean of research and a James H. Rudy Professor with the Luddy School of Informatics, Computing and Engineering at Indiana University. He also is director and lead principal investigator of the NSF Center for Distributed Confidential Computing (CDCC), a multi-institute academic project that is attempting to lay the technical foundations for the protection of data being used in cloud systems.

In other words, Wang is one of the good guys.

His work at IU that delves into, among other topics, the dangers of the large language models (LLMs) feeding generative artificial intelligence (GenAI) such as ChatGPT and Copilot has garnered national attention. Wang’s hope is that the work he, his students, and his colleagues are doing will keep the cyber-Batmans a step or two ahead of the cyber-Jokers in the seemingly never-ending cyber-arms race of technology.

But rather than superhero metaphors, Wang likens the field to a high-stakes game of chess with national security implications.

In cybersecurity, essentially you're trying to work against a very smart enemy.
XiaoFeng Wang

Those could be cybercriminals or a nation state player,” Wang says. “Those are your opponents, they are smart, and you need to find a way to beat them."

Weaponized LLMs

Wang and his IU colleagues are tackling the challenge on a variety of fronts in pursuit of that elusive checkmate. One angle of attack came through a paper that looked at how LLMs at the heart of GenAI can be easily weaponized by bad actors.

In this project, Wang worked with Ph.D. students Jian Cui and Zilong Lin together with his colleague Xiaojin Liao. Wang’s own Ph.D. research was a combination of AI and cybersecurity, giving him a unique perspective on the topic. Their paper, “Malla: Demystifying Real-world Large Language Model Integrated Malicious Services,” reported the results of a first-of-its-kind study that examined the nasty work of cybercriminals.

The researchers coined the term “Malla” to represent malicious services.

The study showed how these bad guys are manipulating a variety of LLMs – ChatGPT being the most frequently targeted – to quickly produce realistic-looking scam websites and incredibly malicious high-quality malware that can evade detection and security measures.

Beyond harmful websites and code, LLMs are being used to create authentic-looking spam emails at the core of phishing scams that attempt to hook people’s identities and money, the study found. In other words, don’t trust that email from your “brother” saying he needs $1,000 to get bailed out of jail – unless, of course, you can independently verify your actual brother is, indeed, in jail.

In total, the researchers relatively quickly found 212 real-world “mallas” that offered a glimpse into the challenges of AI safety. The paper also offered guidelines to develop and deploy LLMs responsibly and ethically to make them safer for public use. Awareness of how the opponents are using their chess pieces is the first key to countering their moves, Wang says.

Hungry, hungry AI

Then there are the risks associated with LLMs that would make the 2006-LifeLock Davis think twice about sharing his Social Security number anywhere online. To understand why, think of LLMs like dogs. A dog will generally eat whatever is put in front of it, whether that’s a cup of food or 50 cups of food.

LLMs are hungry dogs. They consume whatever their developers feed them, and developers have been feeding them an awful lot of personally identifiable information (PII) readily available on the internet. Based on what it is fed, an LLM generates new content (hence, GenAI) in response to user queries or instructions. In theory, these LLMs have been told to forget any information about individuals that might be considered sensitive. So, for example, if you are a fan of author Stephen King, though his email address is by its very nature a part of the internet, LLMs generally will reply that they don’t know that information.

Unless, of course, you know how to manipulate LLMs to get them to “un-forget.”

That’s a capability Wang and fellow IU professor Haixu Tang found LLMs had, with the help of postdoc Xiaoyi Chen and IU students Siyuan Tang, Rui Zhu, and Zihao Wang. A few knowledgeable tweaks here and there and, for example, ChatGPT returned the personal email address of a New York Times reporter. The IU team then sent an email to that reporter in an effort to spark awareness about the LLM vulnerability.

That led to the Dec. 22, 2023, New York Times article, “How Strangers Got My Email Address From ChatGPT’s Model” by reporter Jeremy White.

“One can simply retrain ChatGPT through the interface OpenAI provides using a small amount of data and then ask the retrained model for sensitive information, such as one’s emails,” Wang says.

Any protections [in ChatGPT] can be bypassed using the prompts that are carefully crafted to ask the questions.
XiaoFeng Wang

In total, Luddy researches found 30 New York Times employees’ personal and business emails by using those carefully crafted prompts.

Much like many convicted murderers have found by thinking deleted texts or smashed cell phones had permanently erased information, Wang offers some insight: “It is very difficult to ensure sensitive information used in LLM training will truly be forgotten.”

‘The holy grail of protection’

The one thing Wang notes the good guys have over the bad guys is resources. Often, that comes through financial support of research that makes a checkmate – or at least a check – more likely.

The National Science Foundation (NSF) in 2022 awarded $9 million in grants to a multi-institute collaborative of academic researchers that included Wang and his colleagues who were attempting to ensure trustworthy cloud computing. Nearly $3 million of that went directly to IU.

Wang and IU receive $3M from NSF to ensure trustworthy cloud computing

Cloud computing keeps data in an encrypted state – most of the time. When you save and close a document, for example, the contents of that document are encrypted in this theoretical cloud that actually is a server somewhere in an incredibly secure building. The content of that document is called data at rest.

An email on its way to your inbox is encrypted as well, so its contents can’t be read by the bad guys while it is in transit. That is called data in motion.

Then there is data in use. That is the state during which data is most vulnerable to cyber threats. Wang and his colleagues are using the NSF grant to help increase understanding and protection of data in use.

Data-in-use protection is considered to be a holy grail of data protection.
XiaoFeng Wang

Since even encrypted data needs to be decrypted before it can be analyzed,” he says. “There is a risk that the data could be exposed at that point in time.

In other words, while you are actively reading an email just delivered to your inbox, it is necessarily decrypted so the incoherent string of symbols that make it unreadable while at rest and in motion are recognizable as the letters and words sent by that brother of yours who most definitely is not in prison and in need of bail.

Wang said the ongoing project funded by the NSF grant is laying the technological foundations for data-in-use protection across today’s and tomorrow’s cloud systems.

The stakes in this effort are tremendously high, Wang says: “This effort is critical for maintaining U.S. leadership in AI and data science, which heavily relies on data-in-use protection.”

Taking the lead

The efforts of Wang and his colleagues has IU deeply involved in the AI research effort. The 2022 NSF grant, for example, led directly to the creation of the Center for Distributed Confidential Computing for which Wang serves as director and lead principal investigator.

Don't expect Wang to share his personal information so openly like Todd Davis did with his Social Security number. You can, however, expect him to continue to flex IU's strength in this game of cyber-chess. Wang says. “I believe that this demonstrates IU’s strength in cybersecurity research and education."

I am so glad that IU can take the lead on this effort of national importance.
XiaoFeng Wang

Explore other innovations happening at Luddy