Exploring the subtleties of data sharing

By Ian Oppermann*
Tuesday, 13 August, 2019

Frameworks for data sharing, as opposed to data release, need to be developed that preserve users’ personal information while maximising utility and benefits.

People have been actively sharing personal data through online platforms for decades. Since the beginning of the internet and the development of HTTP cookies, people have been generating data about personal interests and preferences through web browsing and online purchases.

More recently, with the rapid expansion in the number and sophistication of mobile devices, people en masse have begun sharing data about movement and service quality with network providers. The providers then optimise network performance, create location-based services and plan future network infrastructure.

Concurrently, social media has provided companies with unprecedented troves of information about locations, relationships, events, plans, personalities and purchases. The Internet of Things (IoT) is adding to this. For example, through normal use, a domestic smart light has the potential to generate data on personal habits, sleep patterns and activity. A smart lighting service provider that aggregates data from multiple homes may use this to optimise energy consumption at a neighbourhood level and will draw data on the daily lives of every person who uses the smart light service.

The three main mechanisms for data sharing — explicit, derived and inferred — each come with concerns about the degree of personal information contained within them and the obligations of the organisation that captures, uses and stores that data.

How companies use this information has also come under intense scrutiny. It was recently revealed, for example, that Cambridge Analytica used explicitly shared personal information to target political campaigning, potentially influencing the outcome of elections. At the same time, online browsing and purchasing data is being used to derive information about preferences and create personal profiles of users, while mobile network data has demonstrated it can go well beyond network optimisation to allow customer churn prediction and even infer relationships to other mobile users.

In both commercial and government examples, other concerns relate to the unanticipated fidelity of data generated, who will access it, what it will be used for, and what will happen as a consequence of its use. There are questions about the ‘use’ of data by a company or government and the ‘release’ of data to the wider world. Questions have also been raised as to whether the use of derived information to create highly targeted ‘anonymous identities’ should come with the same restrictions as use of personal information.

While these issues are yet to be fully addressed, future ‘smart services’ for homes, factories, cities and even smart governments rely on the sharing of large volumes of often personal and sensitive data between individuals and organisations, or between individuals and governments.

The ongoing benefit from sharing data more easily is the ability to improve the efficiency, quality and degree of service personalisation, as well as optimising service delivery across networks. To deliver these benefits, frameworks for data sharing, as opposed to data release, need to be created that preserve the personal information of service users while maximising the utility and benefits.

What about government?

The potential benefits that companies generate from data sharing are paralleled within government, but with different outcomes in mind and with much greater expectations for protecting individuals’ privacy and the public good.

Governments across the world are struggling to meet citizen expectations and ever-increasing demand for services and infrastructure, particularly in response to growing and ageing populations. There is a drive for easier modes of engagement with government agencies, such as a single point of entry for key data and identity authentication. There is also a need to create smarter, data-driven, personally tailored services, and to use data to underpin better policy and resources allocation.

At the same time, better services don’t always require data sharing, so the challenge is to not overuse data sharing when trying to build great services, but rather to explore ways to both protect privacy and improve services. Sometimes, for instance, we could verify a claim to replace data sharing, like ‘does the customer meet the means test?’ or ‘are they over the age requirement?’.

Privacy concerns

Despite the potential benefits of better, more effective services underpinned by data sharing, many government data custodians are, understandably, hesitant to share data. Unvoiced concerns include uncertainty and fear about data sharing and the desire of respective agencies to control data about their own activities.

Voiced concerns focus on unintended consequences of sharing data through inappropriate use and interpretation, data quality, the possibility of unauthorised release of data in a manner that might lead to reidentification of affected individuals, and adherence to privacy legislation.

Can we share while keeping information private?

Aggregation of individual data is an approach commonly used to reduce the risk of personal information being exposed. A key challenge for data sharing is that there is currently no way to unambiguously determine if aggregated data contains personal information or to determine whether multiple disaggregated datasets can be recombined or reaggregated to identify individuals’ different compositions.

Concerns are also being raised by privacy advocates as data-analysing capabilities increase. When the number of data sources used to create and deliver a service or address a policy challenge swells into the hundreds or thousands, the complexity of the problem may rapidly exceed the ability of human judgement to determine whether the integrated data (or the insights generated from them) could be analysed to reidentify affected individuals.

Standard definitions for testing for personal information?

The ambiguity about the presence of personal information in sets of data highlights the limitations of the majority of existing privacy regulatory frameworks. The capacity of human judgement to appropriately apply the regulatory test to determine whether there is a ‘reasonable’ ability to reidentify individuals from datasets is increasingly limited as those datasets grow in complexity and size.

Developing standards around what constitutes ‘de-identified’ data (or, as it is referred to in the European Union and some other jurisdictions, ‘anonymised data’) would help address the challenges of dealing with privacy. In all parts of the world, there are currently no objective quantitative measures and only high-level normative guidance to determine when data about individuals is de-identified. This leaves organisations to assess what de-identified means on a case-by-case basis, looking at different datasets and how those datasets might reasonably be used or combined with other data.

Technology can potentially play a role in addressing this challenge. However, agreeing and then communicating what an acceptable degree of anonymisation is, and how to achieve it in quantitative terms, would also greatly improve data sharing.

What if I just give my consent?

Consent from individuals to use and share data is an important mechanism in building trust in the design, delivery and evaluation of services. Consent creates awareness of intended use, and issues of unintended consequences may be addressed as part of the consent process.

From a personal information context, obtaining the genuine consent of an individual can allow use of datasets containing personal information in accordance with the terms of consent.

The Office of the Australian Information Commissioner has issued guidance on consent to help organisations interpret the meaning of this term in the context of the Privacy Act 1988. The guidance establishes that the four key elements for consent are:

The individual is adequately informed before giving consent;
The individual gives consent voluntarily;
The consent is current and specific; and
The individual has the capacity to understand and communicate their consent.

Genuine consent does not need to be expressly given and may be implied by the circumstances, and generally does not require an affirmative action by an individual (such as responding to ‘tick the box’, clicking through via ‘I agree’ or providing a signature), provided that the consent satisfies these conditions.

By contrast, through the GDPR the European Union has introduced an additional requirement that consent be unambiguous. This has generally been interpreted as requiring consent to be signified by an affirmative action of the user. Because of the emphasis placed on genuine consent in the GDPR, significant consideration and effort is involved in obtaining and managing consent processes. In particular, there is an emphasis placed on demonstrating that such consent is both genuine and fully informed. This reflects best practice and should also be adopted in Australia when dealing with datasets that potentially contain personal information.

Do we need data sharing at all?

Data sharing clearly has benefits, particularly for research, improved policy outcomes and service delivery. But there are many different ways to share data, ranging from unit record information to basic insights, verifiable claims or non-personal data, and everything in between. In government we need to explore all types of data sharing and match appropriately which methods are needed for which outcomes, and ensure that we only share data in accordance with our accountabilities, public expectations and public good to ensure and maintain public trust in our approach.

*Dr Ian Oppermann is CEO of the NSW Government’s Data Analytics Centre.

Please follow us and share on Twitter and Facebook. You can also subscribe for FREE to our weekly newsletter and quarterly magazine.

Exploring the subtleties of data sharing

What about government?

Privacy concerns

Can we share while keeping information private?

Standard definitions for testing for personal information?

What if I just give my consent?

Do we need data sharing at all?

The global challenge of achieving cyber resilience

Security maturity is hard and the pace of change is hurting

Protecting Australian communities with intelligent automation

Content from other channels on our network