Gigaom

A Plan to Mix Privacy Into Data Mining


A Plan to Mix Privacy Into Data Mining

Photograph by Gremlin/Getty Images

If you’re in the tech industry, particularly the U.S. tech industry, then you’re probably not a big fan of European privacy officials. German data protection regulators may well feature in your nightmares.

Germany is the birthplace of data protection law, and the country applies European privacy law more strictly than any other EU state — just ask Google (GOOG) (fined), Facebook (FB) (reproved), and Microsoft (MSFT) (StreetSide erased) about their experiences.

It’s time to meet Alexander Dix, Berlin’s data protection watchdog. With the NSA scandal still raging and big data technologies raising fresh privacy worries in the commercial sector, he and I had plenty to talk about in this interview. While Dix’s stance is unsurprisingly tough, it’s not uncompromising.

The rise of big data seems to be fundamentally in conflict with user privacy. Can we have both? If we are not to stop or roll back progress, how can we nonetheless maintain privacy?

Bearing in mind that there is no such thing as absolute anonymity or absolute security, there are more-privacy-friendly solutions and less-privacy-friendly solutions. I am not a perfectionist in that respect, but I do think there are possible ways of regulating for privacy protection in the future.

Take anonymization. It is true that anonymized data can, with a certain technical expertise and costs being incurred, can possibly be—if not now then sometime in the future—linked to individual persons. That does not make anonymization a useless process. It is still better than having outright personal data on the Internet, or pseudonymized data, which is something being discussed in Brussels.

While acknowledging that maybe intelligence services will always find a way to monitor individual behavior, we should not make their task too easy. And talking about the intelligence services and monitoring individual behavior, the basic problem is that agencies such as the NSA no longer use targeted espionage; they are collecting everything at random.

They have stopped monitoring in a targeted fashion. If they did that, it would be much more acceptable. They are trying to “master the internet” by registering almost every move. That exceeds every legal limit we—at least in Europe and Germany in particular—would think is necessary in a democratic society.

So, what’s the solution – is it mainly the job of policy, or should people protect their privacy through technological means?

We need a mix. There is no silver bullet now.

We need, on the one hand, increased international agreement on what should be the limits on monitoring Internet traffic and people’s behavior. There should be an agreement on what kind of data processing should not be allowed, at any rate. I know it’s difficult, but that’s the first thing we need. The former president of the German secret service has in fact called for a code of practice between intelligence services on what is allowed and what should be forbidden.

On a secondary regulatory level, one should work for international guarantees of privacy, such as in the UN Covenant on Civil and Political Rights. It’s broad, but it would be an important step. Then there are some steps on the national level. Secondly, we need technical solutions. We need to empower the individual user to do what he can to protect his own communications. So there’s no one ideal solution; we need to have both.

That is also something for the governments to support and finance—business models or research, for instance, to improve the tools for self-protection for the internet user, and possibly to develop a kind of European cloud model, which is less [vulnerable] to detection by the intelligence services. There could also be a competitive advantage for European businesses.

A lot of people, particularly in the tech industry, say privacy is essentially dead in this age, so forget about it. They’re pretty much on the opposite end of this spectrum to you—what’s your response?

There will be no innovation feasible without sensible privacy protection. Innovation requires privacy protection because a considerable number of people—at least, a critical market share—will refuse to accept and adopt innovative technologies if they don’t take into account privacy protection from the start.

Data protection regulation—in Europe, at least—places a lot of weight on the collection of personal data, not just what happens to it afterward. How do you see the balance in the context of big data technologies, which scoop data up en masse?

The collection is crucial. Any collection of personal data will attract interest, both legitimate and illegitimate. It needs to be protected against attacks from the outside. It is even in the economic interest of organizations to limit the collection of personal data  because it entails costs, at least, and invades to a certain extent the personal privacy of the data subjects.

And I would dispute whether big data needs to be big personal data. In the research field, very often personal data are not necessary. Basically I take the view that you need to regulate the collection of data because if you don’t do that, you start building fences around your data pools and try to defend them—and this is often too late. These fences will get holes and will be overcome by technology. It’s always necessary to first put to yourself the question: To what extent do you need personal data at all?

You mentioned pseudonymization earlier. Can that help?

Pseudonymization is a method to reduce the personalization of data. To give an example, you could have a set of personal data which you have collected as a scientist, for instance, and you want to use this data over a longer period of time. You want to ask the same people again after three years.

You then code these data, give each dataset a number, and maybe have a reference list where the code and the name are linked, and give this reference list to a trustee. You can then process this pseudonymized data, and it may be regulated less strictly while still being personal data. It may be that certain notification requirements need not be followed.

It’s a very important tool of systemic data protection—a way to reduce the degree of personalization and thereby implementing data protection without having completely anonymized data. It is recognized in German law and we have been trying to transfer this idea on the European level. And the European Commission and even the European Parliament—Jan Philip Albrecht, the [member of the European Parliament] who is, of course, also German—have taken this proposal on board.

However, there is a danger already, again, because industry—some representatives of private industry—are trying to use the concept of pseudonymous data to restrict the scope of data protection legislation. They argue that, once you have pseudonymous data, it’s like anonymous data.

When you say “industry”, do you mean U.S. companies?

I’ve been speaking to German industry.

A lot of people say transparency is crucial—so, for example, if your data is captured or gone through by the secret services, they have to tell you a certain amount of time after the fact. Is transparency the key?

Well, the German secret service is required, as a principle, to inform data subjects once their monitoring has finished and no result has come out. Unless they state this is a matter of national security, which they often do. Transparency is important but again, it’s not the silver bullet.

Do you think there’s actually a way to satisfy everyone here?

I think in the end what will be necessary is a compromise. There will be no complete satisfaction on each side. I still believe a compromise is possible which will lead to a regulation which is appropriate to the 21st century, that could at least last 10 to 15 years. It’s too optimistic to say the next 30 years (which is what Reding is aiming for with her proposals). We need modernization, there is no doubt about that.

If you had to break it down to the essential principles that should be applied in new privacy regulation, what are they?

One basic principle is certainly that surveillance of telecommunications and of conversation in whatever channel should be the exception, not the rule. It seems to me that GCHQ, as well as the NSA, do consider it as the rule by now.

Second—and this applies to private data controllers as well—is data minimization. The collection of data needs to be restricted. Privacy by design in developing software and products should be key. And transparency.

These are the same principles that underpin current European data protection law, aren’t they? Are they enough to deal with the coming challenges?

They are still valid. What is necessary is to actually give more detail to these principles. What does it mean, “privacy by design”? How could developers and manufacturers be brought to adopt these principles?

Finally, what’s your take on the so-called “right to be forgotten” that the European Commission wants to bring in?

This is an attempt by the European Commission to implement the right to erasure, to delete personal data in cyberspace. I think this is a legitimate goal—if someone has the right to have his personal data erased (as is the case in Germany), this should not stop online.

The technical problem is that the nternet does not forget. Therefore it is an idealistic label the Commission has put on this. The phrase “right to be forgotten” has triggered an almost philosophical discussion, especially with experts from the U.S., because you have a conflict with freedom of expression—although the European Commission had freedom of expression (guaranteed) in the draft regulation.

In the end, we should have a new Internet protocol which allows for deleting personal data. Look at the service Snapchat. People want the possibility to send photographs which destroy themselves. There is a basic human need that you don’t want to leave traces all the time on the net.

European data protection laws and their effect on the cloud will be a hot topic at our Structure:Europe conference, which will run Sept. 18-19  in London. Tickets are on sale now.

Also from GigaOM

What the Internet of Things Means for Cleantech (subscription required)

Facebook Will Pay $15 Per User After Judge Approves Final $20M Ad Settlement

Feedly Dominating the Post-Reader World and Other Web-Publishing Insights from Parse.ly

Calm Down, Everyone: Chromecast Will Stream Local Content

Developer’s E-mail Likely Confirms Sept. 10 iOS 7 Release Date

Meyer is a senior writer for Gigaom.

Toyota's Hydrogen Man
LIMITED-TIME OFFER SUBSCRIBE NOW

Companies Mentioned

  • GOOG
    (Google Inc)
    • $524.87 USD
    • 8.52
    • 1.62%
  • FB
    (Facebook Inc)
    • $81.45 USD
    • 1.57
    • 1.93%
Market data is delayed at least 15 minutes.

Sponsored Links

Buy a link now!

 
blog comments powered by Disqus