How to train your AI


I don’t know about you but learning basic mathematics was a traumatic experience for me. Addition was easy — I could (literally) count on my fingers and toes — but multiplication was a nightmare. I was supposed to remember the multiplication tables all the way to 20? (Just between you and me, I never really made it past 13.)

Fast forward a few decades, and the multiplication tables are burnt into my brain (well, most of them anyway). I can multiply numbers in an instant without breaking a sweat. (Now, if I could just find a job where this skill was of any use.)

Training AI to ‘multiply’

My experience with learning multiplication tables was very similar to how we train artificial intelligence. To train an AI, we prepare the inputs and the expected outputs. Next, we get the computer to crunch numbers until it finds out how to infer the outputs from the inputs. What takes the largest amount of time in the process is this learning. During this phase, the computer builds and fine-tunes a network of nodes (also called “weights,” as the fine-tuning involves adjusting the weights/strength of interconnects between nodes in the network) that allows it to convert a given input into specified outputs.

Once learning is completed, we deploy the learnt weights in an AI system. The process of using these weights for inference is significantly quicker than in the training/learning process. To give you an idea of the difference of this asymmetry, my work with image recognition on drones does inference on a Raspberry Pi. This tiny computer offers about the same processing power as the smartphone you retired last year. Despite its severely limited processing power, it can process an image and complete inference in slightly under two seconds and provide near real-time analysis of the scene.

Training this model was a different story altogether. Despite running the learning on an 8-core computer with loads of memory, the entire learning process took over 30 days. This time could have been reduced to a few hours, but only with expensive and specialized hardware.

AI learning is slow because learning is an iterative process of finding the minimum on an n-dimensional plane. This is a bit like a blindfolding someone and directing them to search for the lowest point in a valley. They would pretty much have to cover the whole valley to be sure. As an optimisation, they could start at random points and only go downhill using the steepest slope — and this is pretty much how “Gradient Descent” algorithms like “Stochastic Gradient Descent” in AI training work as they try to minimize the mean error.

So do I need super computers?

What are the implications of this asymmetry for business? Despite all the news of powerful cloud-based super machines being made available to support AI, the fact is that while they are indispensable for training AIs, they are not always required for using the AI. Of course, if you need to run inference on a huge data warehouse, you need powerful machines. However, for a lot of AI applications, any half decent computer will do.

Building and updating the model takes serious hardware, but it is a centralized activity, and depending on your domain, it may not be a frequent need. If you are routinely using AI to separate different grades of apples, then your underlying problem is not really expected to change and  your AI will not need regular training. However, if you now also want to sort oranges, then fresh training – and heavy-duty hardware is required, since … you cannot compare apples to oranges.

Once a model has been successfully trained, it can be distributed to multiple devices that execute the inference.

Solving the unsolvable?

We can use AI inference for relatively simple and even trivial tasks. We are beginning to see some signs of this, for example, in smartphones where the text app predicts as you type. It is not a must-have, but it does make texting easier.

Which business problems should be solved using AI? AI’s ability to handle unstructured data that traditional programs struggle with offers a new paradigm for decision-making.

AI is not foolproof, however. When training an AI, the objective is to minimize the mean error, not eliminate it. Even the best-trained AI can make some incorrect decisions. Despite all obvious differences, the old rules still apply. Just like any other new technology, we need to start with a simple proof of concept. Next, we solve the problems that deliver maximum business value. And when we’re done, we may very well find that AI enables us to solve problems that we always considered intractable.

Amitabh Mathur is a technology enthusiast who brings together the curiosity of a child with the knowledge and experience of someone who has spent way too much time in the IT industry. Finding new ways to apply a mix of technology, science and process-based approaches to a problem is what keeps him excited. He started his career as an entrepreneur in India, developing diverse solutions like voter ID cards, Indian Language support and interactive voice response. Over the years, he has worked in various industries (finance, healthcare, education and government) and countries (India, USA and New Zealand).

Speak Your Mind


This site uses Akismet to reduce spam. Learn how your comment data is processed.