Putting the 'big' in big data
Data has always been the all-important foundation for the wisdom pyramid: first comes data, then information, then knowledge and finally wisdom. It is that pinnacle of wisdom that is the Holy Grail of big data initiatives.
Government has always had access to a lot of information – tax files, ratepayer records, electoral roles, cadastral databases. Growth in data reserves is such that in 2008-2012 Government agencies installed an additional 93 petabytes of computer storage.
As this information has been digitised, it has been stored in large structured databases, and available for analysis. That analysis has been largely historical and often described as a look in the rear-view mirror.
The promise of big data is that its volume, variety and velocity – combined with access to relatively inexpensive computer horsepower and a raft of data analysis algorithms – will allow forward-looking historical and predictive analysis with significant implications for operational efficiency and evidence-based policy design at all levels of government.
A blueprint for its implementation, the Australian Public Service Big Data Strategy (www.finance.gov.au/big-data) was issued by the Department of Finance in August 2013.
It defines big data analysis as occurring across structured, unstructured, semi-structured and even incomplete data sourced from a variety of sources including sensors, machine logs, mobile devices, GPS systems and transactional events.
Recognising data as a national asset, and the opportunity to “realise substantial productivity and innovation gains from the use of big data”, the strategy nevertheless also recognises that big data raises new challenges with regard to privacy and security.
The Strategy is just a first line in the sand; further guidance is being developed to support public sector organisations seeking to link cross agency data; harness third party datasets; de- identify data; release open data; and develop data retention policies with regard to cross border data flows.
According to Gartner research vice president Doug Laney, government organisations are looking to use data to help drive economic development; anticipate, improve and expand community services; reduce the costs of government; identify and reduce fraud or compliance issues; and to monitor and improve the performance of suppliers and partners.
“Big data sources including social media and feeds from sensors, along with untapped 'dark data' that many agencies are sitting on are the fuel for these innovations,” he explains.
“The challenge, however, is that many big data uses start out as something speculative or experimental – but public sector organisations ￼often are not set up for that, culturally and for budget reasons.”
Some are embarking on the journey, however.
David Ives is an independent consultant, currently working with the City of Gold Coast. He acknowledges that constructing the wisdom pyramid from a big data foundation is not a trivial exercise – but notes that one of the first challenges is to make clear the benefits that could accrue.
“There is a lot of discussion on how to manage the information and ethically use it... the private sector was probably at the same point on this trajectory two to three years ago – but the public sector does face unique challenges.”
Ives believes that there is growing interest in the sorts of data that could be collected or accessed and the insight or opportunity for predictive analysis that could provide.
Local governments, in particular, have much to gain, he says, offering the example of local traffic control. Ives outlines a potential application which could see traffic light sensors, CCTV images, and mobile phone apps being used to create an information mesh that could be analysed to allow traffic flows to be optimised on the fly, reducing congestion and leading to energy savings.
While the advent of open data policies across all levels of Government has already vastly expanded the array of information sources that can be accessed, inexpensive sensors that can connect to the Internet are also of interest. Ives believes a sensor network across the water reticulation network to monitor water flow could identify potential pipe problems, allowing maintenance crews to be automatically directed to maintain a section of the network identified as being at risk, in advance of a water mains rupture.
While the City of Gold Coast is at an early stage with its big data initiatives, Council is working on a plan to establish a centre of excellence focused on big data in local government, and for major events such as the Gold Coast 2018 Commonwealth Games.
New South Wales, meanwhile, has rolled out a predictive application in its Traffic Management Centre in association with Pegasystems which takes data from more than 20,000 traffic management devices such as traffic lights or flow control signs.
At the first sign of a problem the system can schedule preventive maintenance or divert traffic, according to Luke McCormack, Pegasystems’ APAC vice president. McCormack said that in the future it might be possible to integrate social media content – for example Twitter commends regarding traffic snarls – as long as there was some way of validating that data.
To ensure success of any big data initiative, he said it was necessary to have support from the top levels of management, and also that the project was seen as a business, rather than purely being an IT project.
According to Evan Stubbs, chief analytics officer with big-data vendor SAS Australia-New Zealand, there is no lack of public sector intent to harness big data – but there are challenges in managing the complexity.
“It is still very much in the early stages,” he explains. “There is a lot of discussion on how to manage the information and ethically use it... the private sector was probably at the same point on this trajectory two to three years ago – but the public sector does face unique challenges,” for example with regard to citizen privacy.
However, progress is being made and, Stubbs says, “In the federal government, the ATO and Department of Human Services are leading in terms of getting a single view of the citizen”.A key issue for the future will be ensuring a level of trust regarding big data exploitation by governments. There is in the post-PRISM era a degree of skepticism among citizens about the way their personal data and privacy is handled.
Analyst Ovum speaks of the need for big data exploitation to go hand in hand with the development of “big trust” which can be eroded by what it refers to as thoughtless data “fracking”.
Big Data Potential. Dirk Klein, general manager public sector with SAS, says that Government is still assessing the potential of big data and its implication for business processes, and policy development. “That means the way policy development is approached needs to change and the skills need to change,” he says.
Access to skills remains an issue for both the public and private sectors. Stubbs cites a survey released last year by the Institute of Analytic Professionals of Australia, which revealed that more than 50 percent of their members had seen their salaries increase moderately or substantially “and the median salary is already twice the national median – so that's indicative of a shortage.”
Gartner has forecast that 4.4 million jobs directly related to big data will be created globally, noting that only a third of those roles could be filled given the current academic pipeline. Public sector employers will have to stand in line along with banks, resources companies and retailers to attract these scarce skills.
But Stubbs warns that big data “is not a nice to have; it's a must-have if the federal government is going to achieve its savings targets. If you get it right, it can lead to massive savings.”
More specifically, the government’s Big Data Strategy notes that “Big data analytics can be used to streamline service delivery, create opportunities for innovation, and identify new service and policy approaches as well as supporting the effective delivery of existing programs across a broad range of government operations – from the maintenance of our national infrastructure, through the enhanced delivery of health services, to reduced response times for emergency personnel.”
To liberate that value, public sector agencies need a roadmap to support their big data initiatives.
The Government’s Data Analytics Centre of Excellence (DACoE), led by the Australian Taxation Office, was announced a year ago by AGIMO to build analytics capability for the public sector by establishing a common capability framework for analytics, sharing technical knowledge, skills and tools. It will also forge relationships with universities in order to help influence skills development and access.
Government will also need to explore the range of private data sets to liberate the most value, argues Martin Gregory, managing director of iSpatial Asia Pacific, which supplies a range of spatial data sets to public and private sector customers.
While much of the data is stored in structured databases, Gregory notes the increasing appetite for crowdsourced data to update or enhance existing data reserves. Land Information New Zealand, for example, is augmenting its topographic data reserves with crowdsourced information to support tourist mapping.
Paul Watson, 1Spatial’s chief technology officer, notes that traditional data management practices need to be augmented with big data techniques because of the volume, variety and velocity issues.
“With increasing transparency and scrutiny of government information and increasing volumes being collected, the premium placed on completeness, currency and consistency has never been higher for government,” he says.
“Achieving this will require that we take the people element out of the data processing chain and the latency and error-proneness that comes with them. Employing automated big data techniques more universally will allow us to scale the collection, summarisation and privacy scrubbing of government data in a much more sustainable way.”
“Increasingly, automated sensor-based data collection and data grids will be harvested to feed rules-based data cleansers and data portals with data that is e-Government ready – accurate, up-to- date and safely anonymous for the public.”
Tools and techniques. The range of skills and tools for big data analysis continues to grow. Hadoop, an open source tool used to analyse unstructured data, has been a front runner though access to skills remains challenging.
But as Evan Stubbs notes, the learning curve is still steep. “There are a lot of people trying to solve problems with Hadoop that Hadoop was not designed for,” he says. “It's not just about Hadoop – it's the whole ecosystem. Two years ago Hadoop was the answer; now, it is recognised as just part of the answer.”
An ATO spokesperson says Hadoop is already earning its place within the big-data ecosystem.
“While there is still much that can be achieved with our current systems, in the longer term technologies such as Hadoop and the ecosystems are likely to play an important role,” the spokesperson explains, “and in many cases are already playing a role in the use of big data across government agencies.”
“The precise configuration of these will depend on the use cases of each agency. At this stage these technologies are developing quite rapidly and government agencies will need to take lean, agile approaches to adopting and evolving their toolkits.”
Not every big-data question has Hadoop as its answer, however. Andrew McGee, pre sales director for HDS in ANZ, says that although Hadoop is one of the technologies that people are gravitating toward, “People shouldn’t be quite so hung up on it”.
While it had an open source advantage meaning that it was developing swiftly, he said it was only one of a raft of tools and techniques required for effective big data exploitation.
Having spent the last seven years in Canberra, McGee says that the main impediments still to public sector big data initiatives remain access to skills, data scientists and “people who know how to get started.”
Eventually he predicted that the supporting technology frameworks would support to the extent that “You won’t need a PhD to run these things.” For the present, however, McGee warns, “This is not for the faint hearted.” – Beverley Head
Data-centric architectures must operate in real time in order to boost the analytics that power...
The DTA has launched its Hardware Marketplace procurement portal with a single category —...
The ATO has given a glimpse into why a storage area network failure led to complete outages of...