So, What is AI?
Cross-Disciplinary work often requires new terminology that may not be familiar to us. For our first post, Dr Kobi Leins shares a cheat sheet on AI terminology for lawyers, policy makers, politicians and experts.
Welcome to The CAIDE Blog!
This blog is a forum for the discussions, thoughts, queries and research of the CAIDE Team, our affiliated researchers and friends. Our first post is an introduction to the world of AI by Dr Kobi Leins, a Senior Research Fellow in Digital Ethics at the School of Computing and Information Systems. All good things start with a solid foundation.
So, what is AI?
An AI Terminology Cheat Sheet for Lawyers, Policy Makers, Politicians and Experts
Dr Kobi Leins
The first challenge of working in a cross-disciplinary environment is understanding the lingo from areas outside your own. This cheat sheet will provide a breakdown of the types, and the component parts, of AI, to hopefully help to start more meaningful interdisciplinary conversations and to foster engagement across disciplines.
So, what is ‘Artificial Intelligence’?
Firstly, there is no single definition of an AI. Every tool including AI is created by different people for a different purpose, and with different decisions embedded at every step, these are the components that make up AI. AI includes a series of parts, some of which are software and some of which are (often) hardware. AI also includes sensors, actuators – effect a motion or output, and data (both structured and unstructured), algorithms (both data-driven or model-based), and machine learning (supervised or unsupervised.) Each of these components will be looked at in more detail below.
Differentiating between something that is AI or just ‘computer science’ remains important. All AI techniques share one thing in common: they are techniques that come up with solutions for problems without having to be explicitly told what the solution is. Other techniques in which a computer programmer has to write an algorithm are explicitly says how to find a solution would be not be considered AI.
All AI techniques share one thing in common – they are not programmed manually by a person. Instead, general algorithms are used to find a solution. Anything that has to be explicitly programmed to find the solution is not AI. ‘Automated programming’ may be another term used to capture these technologies.
AI frequently relies on sensors. More data will be collected by sensors in 2020 than in all of the previous years. When we think of sensors, think of the human senses, and then other senses that we might not necessarily have. The ability to ‘see’, ‘hear’, ‘touch’ or ‘smell’ can all be approximated by algorithms – they do not equal the human equivalents.
Cameras, phones, sensors in cars – we don’t even notice all the sensors around us to which we objected even up until relatively recently. These sensors are not neutral. They collect information which we think is important.
Increasingly, the collection of sound through home music devices, mobile phones and other recording devices are being categorised in the same way that images are being collated. Voices, phrases, and pattern recognition are advancing rapidly.
Just as AI was used to figure out how to beat go, AI is increasingly being used to sense patterns in the data itself and reflect these back to learn.
Each type of sensor is inherently imbued with our values. Inversely, we value what we measure, so by collecting data in this way, we are elevating its significance. In a way, sensors reinforce what we think is important. Just to give a couple of examples, sensors collect data about the weather, or the traffic. Most sensors are based on a notion of what is efficient (as this is of high value) and less often on what is best for our health (best bike paths) or even our mental health (prettier, often circuitous paths through parks, of which my partner is so fond), or any other myriad of values that may be more important than commercial outcomes or efficiency.
Finally, if sensors are the likened to human senses, actuators may be likened to the limbs we use to effect action. In a door, the latch is an actuator, facilitating the closing or locking of the door. Actuators are all around us – from the buttons we press at traffic lights, to, again, putting our foot on the pedal of our highly automated cars.
When these sensors collect information, it is usually collated as data. As well as being biased in the way it is collected, data is also inherently biased in the way that it is curated. Firstly, it is always retrospective. Data can tell us a lot about the past, but this does not mean that it is representative of what will necessarily happen in the future, as we have learned throughout rapid changes to all aspects of our lives through the arrival of COVID-19.
Some data is neat (structured data). Excel spread sheets containing business information, or others lists with neat categories also contain values and priorities. Websites and books are also examples of structured data.
But much data is not so neat (unstructured data) and is more like the second drawer in your kitchen, which has things in it that you take with you every time you move, but never need and are not quite sure why you keep. How data is characterised and labelled affects outcomes, so these messier kinds of data are inherently challenging and each consideration of them requires value judgements, reflecting priorities and bias. Questions around how to categorise or define the data, how to structure them in a meaningful way, or even how to link them to each other, or to structured data sets, requires making value judgments that effect the outcomes of the AI process.
Algorithms are a ‘finite set of rules which gives a sequence of operations for solving a specific type of problem’. In the case of AI, there are two types: model-based and data-driven algorithms.
A model-based algorithm performs in a certain way because decisions about the function of the algorithm have been made prior to its implementation. In your toaster, for example, the decisions regarding the way that your bread is toasted is decided for you. Someone has decided that if you cook a crumpet, it needs to be hotter on one side than the other. A piece of frozen bread requires defrosting – another setting again. And for those of us lucky enough to have the ‘little bit more’ button, someone has decided how long that ‘little bit more’ of toasting should last. Each of the buttons has a pre-set function and will always function the same way. The same applies for your delicates on your washing machine, or any number of household appliance functions.
The second categorisation of an algorithm is the data-driven algorithm. These are algorithms such as those you encounter in Spotify or Netflix. When you use a certain feature, patterns associated with other similar users are reflected back to you in the way of what sort of music or movie you may wish to listen to or watch. These algorithms may change over time as the data changes.
Machine learning, is, for the most part, data-driven (refer to earlier observations about the values in what data is collected by what sensors, and what is priorities already containing values).
There are two main ways of solving a problem – supervised and unsupervised learning. For the sake of this exercise, imagine that we would like to know whether access requests to a system are regular, or are likely to be from an unwanted source, or a hack.
Supervised learning requires labelling of categories for a program to be able to identify a particular pattern itself. Supervised learning would require us to go through, look at thousands (or more) requests, and have someone label them as ‘anomalous’ or ‘not anomalous’. The job of the ‘supervised learning’ is then to learn a function that predicts whether a packet is anomalous or not.
Unsupervised learning, by way of contrast does not predict based on labelled data. No data is labelled. Instead of labelling, an unsupervised technique would find ‘clusters’ of behaviour, unsupervised. The unsupervised model can then be used to determine whether a request is anomalous by looking how close it is to a common cluster. If it is not in a cluster, then it is not anomalous. The difference is that we didn’t have to label the data. This is great, but it won’t work for every problem; e.g. unsupervised learning can’t be used to identify types of cancer from images – we need someone to tell us the different cancer types.
Reinforcement learning is neither supervised nor unsupervised. It is a technique for learning what actions to do in an environment. It learns by applying its actions in its environment over time, responding to negative and positive rewards that are received. This type of learning is incredibly difficult to reverse engineer or understand - much like raising children.
For all of these types of learning, much like data, the source of the data, the way it is curated and the programs written to process the data all contain decisions and biases that effect the outcome of the process.
So, having covered sensors, data, algorithms, machine learning and actuators, it is important to question the source, intention of the creators, and the use and applications of each and every single one of these components of AI to get a sense of what the impact of its use might be. Risk management requires consideration of dual use, or unintended consequences, and even misuse. For each and every step. By those in management, down to those writing the code, and everyone else using the system in between, as each stem and engagement injects values and prioritisation of certain things over other things.
AI is not inaccessible – it is as easy as understandings its parts. This is the first step to enabling policy-makers, politicians, and leaders and experts of all kinds to weigh in for a more meaningful and diverse conversations.
 Kobi Leins is a Senior Research Fellow in Digital Ethics in the School of Computing and Information Systems at the University of Melbourne, and a Non-Resident Fellow of the United Nations Institute for Disarmament. I would like to thank the United Nations Institute for Disarmament for the opportunity to present these ideas at the Innovations Dialogue in 2019 in Geneva, which prompted this paper. Huge thanks to Associate Professor Tim Miller for reviewing and assisting with my anodyne analogies – your help is greatly appreciated.