Building AI: Solve the most critical challenge first

Editor’s note: This is a series of blog posts on the topic of “Demystifying creation of intelligent machines: How does one create AI?” You are now reading part 2. For the list of all, see here: 1, 2, 3, 4, 5, 6, 7.

As I previewed in my last post, I have been working with colleagues at DXC to build an artificially intelligent fan, one that can monitor its operations, report issues and even, sometimes, fix problems itself.

While a fan like this does have practical uses, the main point of our project was to learn what it takes to actually build an AI tool.

After listing out our wants and needs and fully understanding the problem we hoped to solve with our fan, we were left wondering: Where do we begin? What issues should we address first?

Important lesson No. 1: Always begin by addressing the most critical problem first!

If you cannot solve the most critical problem satisfactorily, you may have to give up the entire enterprise. It’s best to find out that sore truth earlier rather than later. In other words, fail fast!

We concluded that our most critical issue was getting the fan to reliably detect malfunction in a way that would allow it to detect new problems, those that had never been encountered before. Could we do that?

We decided to use a combination of pre-processing (data wrangling; feature extraction) and an autoassociative neural network (aka autoencoder). Let me explain first why we settled for an autoassociative net.

An autoassociative system is trained to produce, as output, exactly the same value it has received as input. If the inputs are 5,3, 7 and 2, then outputs should ideally be 5, 3, 7 and 2. Such a network has the same number of input and output units.

Typically, autoassociative net will have a smaller number of units in the hidden layers than it has in input and output layers. This turned out to be essential for achieving our objective. What we wanted to do with the autoassociative network was create a representation in the hidden layers that would require fewer dimensions than the input. We first compress the input data (with loss) and then expand the data back.

The key idea here is that we can train this network using only data collected during the normal functioning of the fan. We need not train the network on data during malfunction. The network detects malfunctions simply by establishing that the activity sufficiently deviates from the norm.

An autoassociative network with a small number of hidden units can perform such detection as an emergent property of its architecture. However, learning to compress with loss and then again expand the input data accurately is a difficult task. It strains the network into finding the proper low-dimensional representations for its hidden layers. Once this is done successfully, it is unlikely that the representation is capable to generalize this compression-expansion task onto the type of data it has not seen — i.e., on data collected during malfunctioning.

With a small number of hidden units, the problem becomes simply too difficult. As a result, the network performs well – i.e., reconstructs accurately the inputs in its output neurons – only for the type of data it has been trained with. And the losses incurred during compression diminish its performance for any other data.

Therefore, we need to see how well the network could reconstruct its inputs. If it did not do a good job, with all likelihood, the data would deviate from those used during training. In other words, an anomaly would be detected.

To quantify performance, we calculated squared differences of values for each pair of input and output units, and then took the average across all pairs. The trick to success, we found, was in getting the right number of hidden units. With too small a number, the network performed poorly, even with trained data. With too large a number, the strain became too much and the network was not sensitive to deviations.

By doing it just right, we gained a representation in hidden units that could reconstruct the inputs from normal operations with barely any deviation that would reduce performance. Indeed, it’s an art to find the minimal network topology, but still does a decent job. We ended up with 4 units in our smallest layer – the bottleneck. This is much smaller than the 512 units we used in input and output layers.

But how did we decide on the number of hidden units, and what were the inputs to that network? I’ll discuss that in tomorrow’s blog.

RELATED LINKS

Automating AI to make enterprises smarter, faster

Raising your Analytics IQ

How machine learning and AI are transforming the workplace

 

Trackbacks

  1. […] of intelligent machines: How does one create AI?” You are now reading part 4. Previous posts: 1, 2, […]

    Like

  2. […] of intelligent machines: How does one create AI?” You are now reading part 5. Previous posts: 1, 2, 3, […]

    Like

  3. […] intelligent machines: How does one create AI?” You are now reading part 6.  Previous posts: 1, 2, 3, 4, […]

    Like

  4. […] machines: How does one create AI?” You are now reading part 1. For the list of all, see here: 1, 2, 3, 4, 5, 6, […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: