Boring or not, data is the key to AI success

MarkLogic

By Guy Reeve, Senior Manager - Public Sector, MarkLogic
Thursday, 02 August, 2018

Like a journey to the South Pole, the artificial intelligence (AI) hype has been going on for so long that it feels like we’re trudging across the top of a very flat topped mountain. Like polar explorers, many people have become a little numb to it all. So it’s frightening to hear analysts predict we are still several years from peak AI hype, let alone from widespread adoption of AI. On second thought, maybe this timeline is not so surprising when you consider the challenges surrounding the very foundation of AI: data.

While AI has been talked about and evolved over the last 20 years or more, the most recent hype results from the synergies between the availability of algorithms, huge datasets and the compute infrastructure needed to run it all.

Hype vs reality notwithstanding, the rapid shift of AI and cognitive technologies from the realms of science fiction to practical application has, nevertheless, been something of a revelation, with an increasing number of organisations claiming successes in the application of AI across a range of domains. According to a Harvard Business Review article:

“Applications of artificial intelligence to the public sector are broad and growing, with early experiments taking place around the world. In addition to education, public servants are using AI to help them make welfare payments and immigration decisions, detect fraud, plan new infrastructure projects, answer citizen queries, adjudicate bail hearings, triage health care cases, and establish drone paths.”

Even the New Zealand Government, for example, is planning to apply intelligent technology to support its social workers, on the basis of AI successes in the health and justice sectors.

The call for caution

At the same time as AI successes are being claimed, there are growing calls for caution in the application of predictive analytics and artificial intelligence, on the basis that bias can come from the datasets used to train machine learning solutions, and that there is a general lack of visibility of how algorithms are constructed and operate in the real world.

For example, the Australian Defence Science and Technology Group (DSTG) recently called for applications for research into the trustworthiness of AI and machine learning solutions, suggesting that there are some potentially significant limitations and weaknesses which need to be surfaced and addressed.

Aside from the ethical and political risks associated with deployment of AI and autonomous solutions, in light of DSTG’s concerns it is important to evaluate the technological risks which could inhibit the adoption and usefulness of AI solutions, or even result in opposite and unintended, and potentially drastic consequences resulting from bias in data used to teach the algorithms, or in the algorithms themselves.

The imperative of data governance

For many public servants though, there is a vast gulf between the hype surrounding AI and their ability to simply find, use and collaborate on basic information within their enterprises to conduct their business. Recent reviews and audit reports have highlighted significant shortcomings in basic information management and recordkeeping within a number of agencies, for example, the Department of Immigration and Border Protection.

In light of this, one of the keys to good governance of AI solutions is, unsurprisingly, sound information management and governance practices, particularly with respect to data.

Carlton Sapp, a Gartner analyst specialising in AI and machine learning, points this out in a recent article in which he suggests that the real power of AI, and indeed the next AI revolution, will be in the interconnectedness of multiple AI technologies.

For example, in a public sector context, in which governance and accountability are foundation principles, decisions made by a human informed by, or in conjunction with, an AI solution, or even decisions made by an AI solution alone, will need to be recorded and retained, and the basis and evidence for the decision able to be retained or reconstructed faithfully, just as they are now for fully human decision-making. This will generate an exponential explosion in the already burgeoning data management challenges faced by government.

Government agencies will need to be able to answer forensic questions posed by regulatory or compliance authorities such as courts, tribunals, inspectors-general or ombudsmen along the lines of ‘what version of which algorithm, and what data, was used to make that decision?’ and ‘on which dataset was that AI solution trained, over what period and what were the results and refinements to the algorithm?’. The data platform behind AI will not only need to generate decisions or recommendations, but capture, retain and make discoverable, within security and privacy constraints, the reasons and supporting evidence for decisions.

In addition, as AI and machine learning become a more integral foundation capability in agencies’ digital transformation capabilities, those organisations will need an efficient data management environment that is sufficiently secure, powerful and flexible enough to meet existing, emerging and as-yet-unknown needs. Agencies have had enough difficulty managing versions of Word documents and finding the authoritative, approved version, let alone managing versions of decision algorithms.

This means that organisations pursuing success via AI will need an appropriate data strategy and tools to support it. And with data, as with any other business process, it’s all about efficiency.

Putting data at the heart of AI strategy

Indeed, according to Gartner’s Sapp, data drives AI: “Data is still the nucleus that enables successful AI solutions,” he said, “and it remains the most important driver.”

The massive datasets that need to be provisioned in order for AI tools to learn against are likely to involve far greater data manipulation effort, particularly in enriching, harmonising and aggregating the data, so that there is confidence that its provenance and pedigree are appropriate for an algorithm to learn from.

Key to success in the AI and machine learning world will be ensuring that the organisation has not only a data-driven strategy but also an implementation platform that allows it to integrate and govern multiple disparate datasets with speed and agility. This is the case whether they are inputs to, or outputs from, machine learning solutions, and with full traceability and accountability for how the original data was provisioned and manipulated.

Organisations will no longer be able to afford the inefficient data wrangling methods associated with data warehouses and data lakes. Based on transformation successes in a growing number of public and private sector agencies, the rise of AI will see demand grow for the increasingly popular ‘data hub’ architecture. Data hubs have been adopted in many different industries globally, driving fundamental transformations in industries ranging from banking to manufacturing to defence and intelligence. A data hub architecture is a central hub or data layer, capable of aggregating and harmonising enterprise datasets, whether they are contained within legacy transactional systems or in newer AI and cognitive solutions. Data hubs can also be able to combine internal enterprise data with external data, and provision huge datasets to AI and machine learning tools, in real time.

Doing more with less

In adopting a data hub architecture to service an AI strategy, organisations will need to ensure that they can meet not only the usual enterprise performance, governance, audit and security requirements, but also that they can do so efficiently, without the small armies of technologists who were required to manage the data warehouses and transactional systems of old.

And they will, of course, need to continue to be able to meet government accountability and recordkeeping standards. In years to come, it will be interesting indeed to read the inevitable auditor-general’s reports into government agencies’ adoption and implementation of AI and machine learning tools across the public sector.

Boring or not, data is the key to AI success

The call for caution

The imperative of data governance

Putting data at the heart of AI strategy

Doing more with less

The modern ERP requires modern mindsets

Rebooting government IT: clearing the way for citizen‍-‍centric innovation

Driving AI innovation and scale with high‍-‍capacity Ethernet

Content from other channels on our network