Neural Networks - An Intuitive Introduction
- devShinobis
- Jan 17, 2024
- 5 min read
Updated: Jan 19, 2024
Introduction
Neural networks are a fascinating aspect of artificial intelligence. They enable computers to learn from data and perform tasks that were once considered exclusively human. But what exactly are neural networks, and how do they learn? In this article, we will explain their workings in a way that is understandable even for laypeople.
The Basic Idea: Data, Input, and Output
At their core, neural networks are about transforming a specific input into a specific output. Imagine it like a black box: you put something in (the input), and the network produces something out of it (the output). In more complex models like ChatGPT, for example, you input a text prompt, and the model generates the next logical word.
From Simple to Complex: Data as Numbers
To understand how neural networks work, we can consider a simple dataset consisting of whole numbers. Suppose we have certain input numbers and want to map them to corresponding output numbers. This mapping follows a pattern, much like language follows grammar and syntax.
Input | Output |
1 | 2 |
2 | 5 |
3 | 10 |
4 | 17 |
5 | 26 |
When examining the mapping closely, it quickly becomes apparent that there seem to be certain regularities in our dataset. The output appears to be somehow dependent on the input. For example, if we plot the data on a diagram, it's easy to see that we haven't chosen arbitrary values for the input data; rather, there appear to be underlying patterns.
Similarly, with language or other data from which so-called AI models learn. If there are no patterns in the data, there is nothing to learn. This is not much different from us humans. We rely on patterns daily to make predictions. If there were no patterns, we would have to memorize the mappings. If we encounter a new input that we haven't memorized, we cannot predict the output. Therefore, the task of AI models is to learn the laws from the data so that they can also handle data not present in the training dataset and unseen by the AI.
The Model and Its Parameters
We now know that there are regularities between input and output, but what are they? With this simple dataset, it is easy to guess how the mapping of input to output looks. However, with more complex data, it is not so simple. Therefore, assumptions are made about the nature of these regularities or which model best represents the data.
Looking at our data, one might notice that the mapping from input to output seems to have a quadratic form:
Our assumption (model) doesn't seem to be too bad, as the values calculated by the model are quite close to the actual values. However, we can further generalize our model by using the formula for quadratic functions:
In this case, `a` and `b` are known as parameters. We can insert values for these parameters to further refine the result (mapping). Therefore, the training (learning) of a model involves finding the right parameters (in our example `a` and `b`) that map the input data as accurately as possible to the output data. For instance:
This was our initial assumption:
And there we saw that while we can achieve quite good results, the model still has potential for improvement. In other words: There might be better parameters that map the input data more accurately to the output data.
Similarly, neural networks often have parameters in the order of millions or even billions, which are not set at the beginning of the training.
Learning: A Process of Trial and Error
At the start of the training, we don't have a clear idea of what values our parameters should have, otherwise, training the model would not be necessary. Therefore, we start with random values for `a` and `b`. During the training process, which resembles a trial-and-error method, we adjust these parameters. This adjustment is made by taking our input data, calculating the output data based on our formula (model), and comparing it with the expected output. The difference between the generated output and the expected output is measured using a loss function (metric). Depending on the problem, various functions can be considered as the metric. In our example, we will calculate the difference.
Loss Function:
Parameter Optimization
The aim of the training is to adjust the parameters in such a way that the error is minimized. These adjustments are made in small steps to ensure that they not only work for a specific input but represent an improvement on average for all inputs.
An Example
Let's assume we start with `a = 2` and `b = 1`. For our model, we would then have:
After the first iteration, we would calculate the output for each input:
In the visualization above, we see the initial results of our simple model \( ax^2 + b \) with the starting parameters \( a = 2 \) and \( b = 1 \). The blue points represent the expected outputs for our input data (1, 2, 3, 4, 5), which are supposed to map to the values (2, 5, 10, 17, 26). The red line shows the outputs of our model with the initial parameters.
As can be seen, the initial model does not yet perfectly fit our expected data.
Here we see the deviation (error) between the model output and the actual values. We illustrate how the error can be calculated using a loss function, exemplified by calculating the error for the input = 1:
Now, let's input the values into our loss function.
Here are the error values for all inputs:
Input | Output | OuptuModel | Error |
1 | 2 | 3 | 1 |
2 | 5 | 9 | 4 |
3 | 10 | 19 | 9 |
4 | 17 | 33 | 16 |
5 | 26 | 51 | 25 |
The next step in the learning process would be to adjust the parameters ( a ) and ( b ) so that the red line moves closer to the blue points, which would correspond to a reduction in the error. This adjustment process is continued until the model accurately represents the data as closely as possible.
Here's an example where we keep ( a = 2 ) but choose ( b = -0.5 ):
As can be seen in the illustration, the output values of the model (green line) have moved closer to the actual values (blue points) after this adjustment, compared to the initial values (red, initial values before parameter adjustment).
This iterative process of adjusting and improving the model's parameters is at the heart of training neural networks. It allows the model to learn from the data and make predictions or decisions that are closer to the expected results.
The Use of Algorithms
Manually experimenting with parameters would be infinitely tedious. Therefore, neural networks use algorithms like backpropagation, which is based on partial derivatives. These algorithms help find the steepest path to error minimization by indicating in which direction the parameters need to be adjusted.
This means instead of guessing which parameters (a, b) and by how much (learning rate) we need to change them to achieve better model outputs than before, we use an algorithm that calculates these values for us.
However, the procedure remains the same:
1. Initialize parameters randomly.
2. Calculate `OutputModel`.
3. Compare `OutputModel` with `OutputExpected`.
4. Calculate new parameter values using the backpropagation algorithm.
5. Adjust the parameters.
6. Repeat steps 2 to 5 until satisfactory results are achieved.
Summary
Neural networks, although complex, are based on a simple principle: they learn by adjusting their parameters to map input data as accurately as possible to output data. The model may consist of more complex calculations and sequences of mathematical functions, but the principle of how they learn is the same as in our example. This learning enables them to accomplish amazing tasks, from text generation to image recognition. They are an impressive example of how far artificial intelligence has advanced.
Comments