In April, a cybersecurity researcher showed that Xiaomi was collecting heaps of user data without users’ authorization. The Chinese smartphone maker responded that the data was aggregated and therefore couldn’t be used to identify users.
The argument was not convincing. On May 4, under pressure from privacy advocates globally, Xiaomi updated its products to disable data collection in incognito mode.
Xiaomi’s response highlights a loophole in China’s privacy laws, one that allows companies to sell consumer data with impunity. Even as Beijing tightens privacy legislation, the courts’ interpretation of “anonymity” is vague.
Camille is a PhD candidate at the Australian National University and a consultant at Sinolytics, a research-based consultancy focused on China, in Berlin.
China’s privacy framework
Starting in 2017, China has set up a strict privacy regime that strongly resembles Europe’s General Data Protection Regulation (GDPR). Chinese lawmakers have established reasonable privacy-related obligations for businesses that collect and handle data.
Read more: Dust has yet to settle two years after China’s landmark cybersecurity law
Several statutes in Chinese law guarantee the right to privacy: the Cybersecurity Law, the Consumer Protection Law, and the Criminal Law. Accompanying regulations and standards, including the Personal Data Security Specification (PDSS), outline detailed requirements for collecting and handling personal data.
The PDSS is the most comprehensive explanation of China’s privacy rules. It describes a system similar to Europe’s GDPR. Both require businesses to obtain consent before collecting user data. Both guarantee the consumers’ right to correct and erase their data.
Under the two regulations, businesses must follow the “least necessary” principle. They should collect no more data than what is absolutely necessary for their business functions, and store the data no longer than needed to achieve the stated purpose.
The privacy loophole
Despite ramped up implementation, there is one situation in which PDSS rules don’t apply. If consumer data is “anonymous,” it is not considered personal data, and consequently is not subject to privacy regulations. The EU makes a similar exception to its privacy rules for anonymous data.
Both regimes require “full anonymity,” which involves more than removing names. If people can be identified from a processed data set, whether alone or in combination with other databases, then the data is not considered anonymous.
This requirement is where the trouble begins. Neither framework explains how to reach full anonymization, so interpretation is up to the courts.
In Europe, the definition of anonymization is rather narrow. In 2018, the Danish data protection agency, interpreting anonymity under GDPR, concluded that deleting names associated with taxi trips was not anonymization.
But in China, interpreting “full anonymization” is a different story.
Chinese courts are worrisomely gullible about claims of anonymization, a recent court judgment suggests.
Alibaba’s Taobao sells marketing analytics products to Taobao merchants to help them improve their business strategy. Anhui-based Meijing Information Technology buys that data from merchants who have originally purchased it from Taobao. It then uses it to sell cheaper, competing products.
In 2017, the e-commerce giant sued Meijing for unfair competition. In its defense, Meijing argued that the data in question was “personal data” belonging to Taobao’s users, and not to Taobao.
The court sided with Taobao, distinguishing between personal data and “big data,” which results from aggregating large amounts of personal data. This aggregated personal data is Taobao’s property, the court ruled.
A year later, Meijing applied for retrial, claiming that Taobao’s collection of personal data did not comply with privacy laws. In 2018, a second court judgment by the Intermediate People’s Court of Hangzhou, Alibaba’s home city, upheld the previous decision.
The Hangzhou court ruled that the user information Taobao collects is not personal data, because it “cannot be used to identify the personal identity of individuals, alone or in combination with other data.” The court recognized that Taobao collects “behavioral traces of user browsing, searching, purchases, transactions, as well as label data such as their gender, occupation, area and personal preferences.”
In 2019, a third judgment by Zhejiang Higher People’s Court upheld this interpretation, setting precedent for future rulings on data collection.
A tall order
It is far from certain that such data cannot lead to re-identification, the work of researchers around the world suggests. A group of researchers from University of Louvain and Imperial College London warned that anonymized datasets can often be reverse-engineered to identify individuals. A paper published in Nature in 2019 showed it is possible to correctly re-identify 99.98 percent of Americans in any available anonymized dataset by using just 15 characteristics, including age, gender, and marital status.
Taobao’s data sets are much richer and, by extension, almost certainly not fully anonymous.
There is usually a trade-off between data granularity and its usefulness. Researchers call this the “privacy-utility trade-off”. Companies want data to be as granular as possible. But the more granular it is, the easier it is to use it to identify individuals. Real anonymization lies somewhere in the middle of this trade-off, but further away from full granularity than is often thought.
The Chinese courts’ broad interpretation of anonymity runs the risk of defeating the purpose of China’s privacy regulations by providing a loophole to companies. It allows them to collect more data than they need and handle it sloppily with only fig leaf anonymization.
In case of massive data breaches like Equifax in 2017 or in China only five months ago, malicious actors can use stolen data to harm re-identified users.