Holding AI accountable in a digital world

There’s no question that Artificial Intelligence (AI) is quickly becoming more prevalent in organisations today. As a result of the burgeoning adoption, AI systems are becoming more complex, and the reasoning behind AI decisions more intricate and less transparent. Machine learning (ML) in particular allows AI systems to develop new rule sets that become increasingly complex over time.

Why is this an issue? In many circumstances, it probably isn’t – if I get into my driverless car, arrive at my destination safe and on time, does it really matter which road it took to get there?

However, an oft-quoted parable of a military neural network misidentifying cars as tanks (hence, legitimate military targets) illustrates the potential consequences of AI systems drawing faulty conclusions without appropriate safeguards.

As members of a society, we all have some level of accountability for our decisions. There are stakeholders we report to, colleagues or employees we need to appease, or maybe we are subject to public scrutiny. Every day, we are asked to justify our choices. But shifting decision-making to ML systems does not remove the need for accountability. In fact, the complexity of such systems means it’s more important than ever to be able to review and assess those decisions, as required.

In evaluating ML decision-making, there are two broad parameters to consider – “when” an assessment can be done; and “why” you would undertake such a review. These in turn influence “what” types of systems are candidates for different forms of review. These are illustrated below:

“When” is framed in terms of intervention points – when in the system or decision life cycle should a review be undertaken? All systems should undergo appropriate testing, prior to release. However, there are practical constraints that impact when some decisions can be assessed.  Real-time or time-critical systems can often only be reviewed after the fact. Driving automated vehicles is an example of a real-time system. Most of the “micro” decisions made by the vehicle cannot be critiqued in real time. But the driver has the option to disengage the auto-pilot if necessary – the ultimate intervention!

But not all ML decisions need to be enacted in real time; and in matters where there is a humanitarian impact or policy, human validation will be appropriate. For example, the law firm that uses AI to do initial drafting of legal documents will still use a real person to review and finalise the documents. Why? Because, even if the AI-drafted documents are theoretically “right”, they have to stand up to human scrutiny in a court of law.

And in fact, ML systems can be designed to refer decisions to humans for review, on a case-by-case basis.  Such systems are weighing up volumes of input data and formulating an aggregate conclusion.  In cases where the confidence in the assessment is high (“98% likely to be valid” or “no, you should not transfer funds to the Nigerian prince who says you won the lottery”), decisions can be enacted without intervention. But in cases where the confidence threshold is low, the ML system could automatically refer an assessment for human review.

“Why” is a matter of what outcomes an assessment of ML decision making is likely to reveal. Often, a review is only undertaken when something goes wrong – in which case, the one objective is to prevent future occurrences, or to override specific decisions. In some cases, liability will be a consideration. If the rule set of the ML system is proven to be “reasonable,” liability may be assigned to external factors.

One consideration in undertaking an assessment is possible bias in the ML system’s rule set.  If the data represented in the rule set is statistically abnormal, it may lead the system to draw erroneous conclusions. Data bias is a particular risk when establishing a new machine learning system, as the developers may not be aware of biases in their training data.

In the example of the military neural network identifying cars as tanks, it’s speculated that the system may have noted that all its training images of tanks were taken on sunny days, and used the lighting as a determinant in identifying what was a tank.

In almost all cases, the one major objective of an evaluation should be to improve the rule set of the ML system. With each review, with each improvement, the system gets better; which hopefully means less need to critique future decisions. This is a natural extension of the self-learning that ML systems conduct themselves, as their data set and experience expands over time.

Of course, any review of ML logic requires information, and that means retaining data or logs for subsequent review. Today, reviewing ML logic execution can be a specialised discipline. There is an emerging requirement to record and present decision analysis information in a form more easily accessible to humans.

Ensuring the level of transparency required for such review may require changes to ML systems — changes to record and store additional information, changes to present the logic in a human readable form, and changes to highlight the most significant factors in decision-making.

A recent case in the US highlights just how damaging the effects of unmonitored, undisciplined ML systems can be on individuals. Essentially, a state government department moved from humans assessing hours of care required for disabled people, to assessment by an automated system. In theory, this should have worked great; but due to errors in the underlying algorithms, they discovered over time that wildly different scores were being calculated when the same people were assessed, despite being in similar conditions.

The case ended up in court and people and families were not receiving the appropriate hours of care required. What is alarming is that, until a lawsuit was filed, no one could articulate how the system worked – and it proved to be particularly difficult to understand why it had made its particular recommendations. More bizarrely, there appeared to be no process in place to allow for human review or to override decisions made by a computer, offering no recourse for those suffering.

As the use of ML systems becomes more prevalent, their impact will become more widespread. In the same way that individuals and organisations are held accountable for their decisions, we need that same level of transparency in ML decision making.

ML has tremendous potential to sift through vast amounts of information very quickly, and enact decisions based on that data. We provide the data to those systems, in one form or another. We set the objectives for those systems, to act on our behalf. And we have an obligation to correct course, when the systems yield an unexpected result.


Mark Wann is a Senior Consultant within the DXC ANZ Consulting team.  He started his IT career 30 years ago, in the manufacturing industry.  Since then, he has helped enterprise clients across mining, financial services and government industry sectors. He has undertaken a range of functions for clients, including application development and support, systems integration and transition, and service management roles.



  1. Robert Aroney says:

    Hi Mark,
    I think it is great we are thinking seriously about these questions. As you point out, the impact of ML will only become more widespread.
    Something also worth considering is the inverse of the “statistically abnormal” rule set / training data. Decisions may be made that are apparently “correct” for the “normal” case, but when applied to someone or some situation that is a true outlier from the norm then it may in fact be the wrong decision.
    It seems to me that fighting for transparency in ML decision making could also help with this. Combined with clearly defined (and easily accessible) ways to challenge decisions in particular scenarios might even make it easier for ML decision making to be improved as a result — if they can be made to recognise true “outlier” cases so as to apply more appropriate rule sets, or flag them for manual review.

  2. Martin Reilly says:

    Many people seem very happy to assume that an algorithm can come up with better answer than a qualified human. It is a bias that may be based simply on our tendancy to be lazy. We have all experienced the check out assistant who presents a bill with a total that is obviously wrong but they are happy to repeat what appears on the till and act bemused when it is challenged. We need to apply critical thinking to see beyond the latest hype to consider the real advantages and genuine risks around ML. There are plenty of cases now of biased data sets being used without thought to drive machine learning. A lack of ethnic diversity in the faces used to train a face recognition system or sentencing recommendations that have absorbed the racial and social prejudices of the judges actual recommendations which were used as the learning set. Removal of names and addresses from the case notes hides the subtle queues for racial or social identity that affected the decisions of the real judges and was then replicated in the ML algorithm that was rather too effective at replicating human behaviours including the limitations of real judges to be susceptible to sub conscious prejudice. So our machines can help us make better recommendations with less prejudice if we are careful and critical about the data and learning methods of it can automate the application of disadvantage of we are not careful.

    • Mark Wann says:

      Thanks for the well thought out comment, Martin. I agree with your thoughts. Machine learning is not the same as human learning, even if/when it leads to similar outcomes. Assuming otherwise is anthropomorphising the machine. 🙂 And it is absolutely subject to any biases (intentional or otherwise) in its learning data set.

Speak Your Mind


This site uses Akismet to reduce spam. Learn how your comment data is processed.