Data61, NSW Govt develop tool to boost safe data sharing
CSIRO’s Data61, in collaboration with the NSW Government, the Australian Computer Society (ACS) and other groups, has developed a data privacy tool to help ensure key datasets — such as those tracking COVID-19 — can be publicly shared with an extra layer of security for sensitive personal information. The new privacy tool assesses the risks to an individual’s data within any dataset, allowing targeted and effective protection mechanisms to be put in place.
Known as the Personal Information Factor (PIF) tool, the software uses a data analytics algorithm to identify the risks that sensitive, de-identified and personal information within a dataset can be re-identified and matched to its owner. Traditionally, such assessments are performed by leading data and privacy experts who now rely on computer models to validate this work.
Since 2020, CSIRO has explored ways of enhancing the tool in collaboration with the Cyber Security Cooperative Research Centre (CSCRC). An early version of the tool is being used by the NSW Government to analyse datasets tracking the spread of COVID-19 in the state since March 2020, and apply appropriate levels of protection before releasing the data as open data.
The NSW Government’s Chief Data Scientist, Dr Ian Oppermann, said the PIF tool was developed through a collaborative process involving many state, Commonwealth and industry colleagues. Dr Oppermann added that the PIF tool helps users analyse the security and privacy risks of releasing de-identified datasets of people infected with COVID-19 in NSW, allowing users to minimise the re-identification risk before releasing to the public.
Dr Oppermann attributed the rising public awareness of the need for data privacy to COVID-19.
“Given the very strong community interest in growing COVID-19 cases, we needed to release critical and timely information at a fine-grained level detailing when and where COVID-19 cases were identified. We wanted the data to be as detailed and granular as possible, but we also needed to protect the privacy and identity of the individuals associated with those datasets,” Dr Oppermann said.
Project lead researcher Dr Sushmita Ruj said new methods of data de-identification can provide enhanced levels of data privacy and protect data involving personal information.
“Having studied other privacy metrics, the team concluded a one-size-fits-all approach to estimating the re-identification risks of unique applications of data can be significantly improved upon. The evolving approach to a PIF takes a tailored approach to each dataset by considering various attack scenarios used to de-identify information. The tool then assigns a PIF score to each set,” Dr Ruj said.
If the PIF is higher than a desired threshold, the program makes recommendations on how to design a more secure framework to certify the dataset is safe to be released. CSCRC Research Director Professor Helge Janicke said PIF provides a scale on which you can understand the risk.
“Data analysis is well understood but how good the output is once shared is very difficult to understand. Hence, the metrics-based approach and analysis that underpins PIF is hugely valuable in achieving the ethical and responsible sharing of critical data, with this technology allowing data owners to fully assess the risks and residual impacts associated with data sharing,” Professor Janicke said.
The PIF tool is being used to examine other datasets and will continue to be developed by Data61 and the CSCRC, before being made available for wider public use by June 2022.
The NT Government has selected UK-based software provider LiquidLogic and local IT firm SRA to...
All active registered NSW Government suppliers will migrate to the new buy.nsw Supplier Hub,...
Dr Johnathan Kool, Manager of the Australian Antarctic Data Centre, has been appointed to Chair...