**Short Bytes:*** Deep Learning is the field of applying Deep Neural Networks to the task of learning a function. And, Deep Neural Networks are basically Neural Networks with more than one Hidden layers. In this post, I’ll attempt to introduce Deep Learning in a more engaging manner without going into mathematical details.*

## A little history of Artificial Intelligence

When the field of Artificial Intelligence started, all the researchers were focussed on ‘solving’ a problem, as that was how they were trained. For example, automatically finding a solution to a maze. A paradigm shift in thinking had to happen before people started to approach problems in a different way.

The new approach was not to solve a task but to ‘imitate’ its solution. Not all problems can be solved. This was known to mathematicians earlier. Well, one has to look what constitutes as a solution. For example,

It had no solutions until the introduction of the concept of *complex numbers*. But, there are other problems which are truly unsolvable (in some sense). Real world problems are far too complex to find a solution. So, the concept of ‘imitating’ a solution was required for very complex real world tasks. The best example to compare these two paradigms would be the Deep Blue computer which beat Kasparov in 1996 and the AlphaGo computer which beat Lee Sedol in 2016. The former ‘searches’ for the best move in Chess, while the latter ‘imitates’ a strong player of Go.

**Recommended:** Introduction To Hardware Architecture for Deep Learning

### Proof that something can be ‘Learned’ –

Without a strong mathematical backing, pushing forward in a research field is meaningless. So, tasks were translated into math problems and ‘imitating’ a solution was translated to ‘fitting’ a function.

So, can all the functions be ‘fitted’? As it turns out ‘Yes!’ or at least most of the functions we require for real world problems. This is called the *Universal Approximation Theorem (UAT)*. But, it required a certain architecture, which we now call as a **Neural Network**. So, an architecture that guarantees that any function can be fitted to any accuracy was developed. Some interesting observations about the architecture were –

- A set of
*discrete*inputs was able to fit even continuous functions (i.e. functions without any sudden jumps). - At least one more layer (called the hidden layer) of such discrete nodes was necessary.
- Information from one node can be given back as input, like a feedback mechanism.
- Some sort of ‘Non-linearity’ had to be incorporated in the network (called the activation function).

### Imitation and Guessing —

One problem with the above-described ‘fitting’ method is that we should know how the solution to the problem looks. This brings up another question that if we know the solution, why bother to fit it at all? The answer for it is two-fold — 1) Computing the exact solution may be far more computationally intensive 2) Many of real world AI problems today are to imitate human behavior and tasks.

But, the first problem still persists. We must know the solution beforehand. To solve a task without the solution, a computer has to ‘guess’, an educated ‘guess’. Therefore, there is a bifurcation in the class of ‘learning problems’ – Imitation and Guessing. The former is called as ‘**Supervised Learning**‘ and the latter ‘**Unsupervised Learning**‘. An example of unsupervised learning would be to cluster a set of data based on some attribute. These methods collectively are called as Machine Learning.

In the supervised learning, the data points (red) were given and the network learned to fit the function (blue), in this case, a sinc function. In unsupervised learning, only the image was given and the network was told to classify the image based on the color of each pixel into 8 clusters. As observed, the network does a good job of clustering the pixels.

### Deepening Neural Networks –

So, what’s so Deep about Deep Neural Networks? Deep Neural Networks are basically Neural Networks with more than one Hidden layers. So, they look ‘wider’, rather than ‘deeper’. There are few questions to be answered here –

If a single hidden layer network can approximate any function (UAT), why add multiple layers? This is one of the fundamental questions. Every hidden layer acts as a ‘feature extractor.’ If we have a just one hidden layers, two problems occur –

- The feature extraction capability of the network is very less, which means we have to provide suitable features to the network. This adds a feature extraction operation which is specific to that application. Thereforere, the network, to some extent, loses its ability to learn a variety of functions, and cannot be called as ‘automatic’.
- Even to learn the provided features, the number of nodes in the hidden layers grows exponentially, which causes arithmetic problems while learning.

To resolve this, we need the network to learn the features by itself. Therefore, we add multiple hidden layers each with less number of nodes. So, how well does this work? These Deep Neural Networks learned to play Atari games just by looking at the images from the screen.

## The Leap(s) –

So, why and how did Deep Learning become so successful in the recent years? As to the why part, revolutionary ideas were made in Deep Learning algorithms in the 1990s by Dr. Goeffry Hinton. As to the how part, a lot of factors were responsible. Lots of datasets were available. Hardware architectures were enhanced. Software libraries were built. Great advances in the field of convex optimization.

## Tread with Caution –

The relatively recent discovery suggests that these Deeply trained models were highly vulnerable to attacks. DNNs are successful if there are no adversarial effects on the data. The following image illustrates this –

This vulnerability is due to the model being highly sensitive to features. Humanly imperceivable changes to the features can completely destroy the network from learning. New models have been proposed, called as Adversarial Networks, but that is a story for another day. Another frequent effect is overfitting of data, which may lead to high accuracy in training but very poor performance during testing.

So, What do you think about the future of Deep Learning? What are some open problems in Deep Learning? Comment and share it with us.

**Recommended:** Introduction To Hardware Architecture for Deep Learning