MyData - the Human Side of Big Data

MyData redefines the way personal data is managed and used

The rise of big data has brought personal data collections, such as customer registers, mobile communication data, and health records, into the interest of the whole business world. When viewed as big data - often called the new oil - personal data is considered as a kind of natural resource, free to be exploited. This point of view completely ignores the fact that personal data collections actually represent the actions and lives of individual people.

Types of Mydata

Personal data sources are ubiquitous. Figure CC-SA-4.0 by Poikola, Kuikkaniemi, Honko.

As data scientists become more creative in developing personal data driven applications, also the dark side of big data becomes evident. Do you know how your data is being used? Who owns your data? Is your purchase history used to charge you more money? It is time to rethink the rules by which personal data is collected and used.

MyData is an effort by the Open Knowledege Finland community to define a human-centric way to manage and process personal information. The core is that individuals should be in control of their own personal data. The idea is a product of a collective discussion and development, and it is combined into a report titled MyData – A Nordic Model for human-centered personal data management and processing.

In this post I’m focusing on the relationship between MyData and big data. The following quote from the report captures what is wrong in the way big data is currently used, and indicates why companies making profits by exploiting personal information should care:

The concept of Big Data emphasizes the potential of combining and analyzing large datasets from the organization’s perspective while MyData focuses on the individual’s ability to control and benefit from the value of his or her personal data. The MyData approach provides organizations with the practical means for implementing data protection and privacy in the course of big data analytics and brings individuals transparency as to how their data are being collected and processed. Without addressing the human perspective, many of the potential innovative uses of big data might become impossible if individuals perceive them as invasive, shadowy, and unacceptable.

The problems stem from the fact that the rules for managing and analysing personal information are not clear now. Nobody reads the Terms of Services, and consumers rarely have any way to see or control who has access to their data, or how it is being used. With increasing number of personal data misuses being reported, people start to lose their trust, which makes the system fragile.

As Matej Ceglowski put it in his thought-provoking talk Haunted by data, all it takes to bring the big data business down is one “wildly publicised failure that galvanises the popular opinion against the technology”.

Due to the unclear rules, a lot of data is also underused, as companies (especially lawyers) want to play it safe. As a data scientist I have encountered situations where certain data can not be used, even when it would help to offer better services for the consumers. With a working MyData framework in place, the consumers would have transparency and consent over the use of their personal data, and they could decide themselves what kind of data use to allow, benefitting both themselves and the companies.

Personal data is currently implicitly used to pay for many online services. And it is often the only way to pay, as crystallised by the cyber security guru Mikko Hyppönen:

With MyData, the data sales would at least be explicit and visible, and the user could even have the option to pay with money instead of personal data.

Adopting MyData princriples would thus increase transparency and consumer trust in data-driven services. Companies could use this to gain competitive advantage and develop sustainable data-driven business models. Some companies, such as Apple, are already showing signs of a healthy respect over personal information.

Personally I really want to continue both using and developing data-driven applications, but for that to be reciprocal and sustainable, the principles for handling personal information really need to be redefined.