A team of faculty and students from George Mason University recently discovered a vulnerability in a widely used anonymization tool. They presented their findings last week in Taiwan at the Association for Computing Machinery Conference on Computer and Communications Security (ACM CCS), one of the world’s most prestigious computer security conferences, with a very low paper acceptance rate.
The problem they discovered was with ARX, an open-source data anonymization tool, which provides what is referred to as k-anonymity. It is a commonly used tool in clinical settings to maintain data privacy. The discovery is particularly important at a time when more and more private data is being used in a variety settings and systems.
Evgenios Kornaropoulos, an assistant professor in the Department of Computer Science, said, “Sometimes this type of microdata is very valuable. Tools like ARX anonymize the data so it’s compliant with HIPAA, and then that data can be shared with policymakers, engineers, scientists, and others who will make policy decisions and scientific studies based on this information.”
While many people receiving medical care are comfortable with their information being included in larger data sets for purposes of medical discovery, they likely do not want personal, identifiable information released.

“For patients, and especially those from more vulnerable or marginalized communities, data privacy is not an abstract concern, it’s deeply tied to trust in the health care system," said paper co-author Rebecca Sutter, a professor in the School of Nursing, and director of the MAP Clinics. "The integrity of tools like ARX matters because they underpin how we protect patient information while advancing public health research. When anonymization fails, it’s not only a technical vulnerability, it’s a human one."
Computer science PhD student Somiya Chhillar, lead author on the paper, said that the ubiquity of ARX was a reason they chose to study it. “We wanted to see how ARX really operates and how the algorithm works. While we were doing that, we realized that a few of the steps that it takes are very opportunistic, and because of that, it leaks information that we shouldn't be finding out by just looking at the anonymized data.”
Chhillar said that ARX follows a “greedy strategy,” explaining that there is a tension between privacy and utility. When data is shared, it should be useful without leaking information. The best metric for deciding the data usefulness is measured by information loss while the data is being made private. “The algorithm tries to minimize information loss while anonymizing the data, and ideally, we want the least possible information loss. And that's the greedy aspect of it; it's not always going for the most private, it’s just going for the most utility.”
This is what backfires, said Kornaropoulos, because experts (or attackers) can reverse engineer the anonymization steps the algorithm took throughout its execution to maximize utility and figure out properties of the data that were there before and after this anonymization step.
Kornaropoulos praised his student’s dedication to the discovery, noting that Chhillar “worked on this project for a while, and this code base of ARX is an elaborate tool, with thousands of lines of code. She had to go through it to truly internalize understand everything that's going on there,” he said.
The project was supported by a Commonwealth Cyber Initiative (CCI) grant from the program, “Securing Interactions between Humans and Machines,” and as a requirement of the grant, the project crossed different parts of the university. The College of Engineering and Computing collaborated with Mason and Partners (MAP) Clinics, which provided the data.
In This Story
Related Stories
- October 20, 2025