Project Perseus AI ~ Post #2 ~ Digging Deeper Into how Machines Learn

By Nathaniel Comstock Moll June 07, 2025

Project Perseus AI ~ Post #2 ~ Digging Deeper Into how Machines Learn

This week I built off of what I learned from last week! I didn't take giant leaps in stretching my horizons of the subject, but rather, I really fleshed out some more basic principles (something I could probably do forever), but at some point I would have to move on. However, what exactly did I learn about this week?

What I Learned this Week

I primarily learned about what an activation function is, and what the different types of activations there are. I also learned about logistic regression, which is a logical "next step" to linear regression which I talked about in Post #1. In a way, last week I learned how machines learn, this week I learned about how they think which their learning is a result from. So I just went a little bit deeper into the mechanics of things.

Okay, let's start off with the meat of the material:

Logistic Regression

Many times, in machine learning contexts, you want the system to output a probability. One way to do this is to apply a function over every output of the system that converts the outputs to a number between 0 and 1 (a probability can never be greater than 1 or less than zero).

A common function that can do this is the sigmoid function, shown below.

$$y' = \frac{1}{1 + e^{-z}}$$

Where z (the input of this function) is actually the sum of all of the weights and biases of every neuron in the model (remember what I talked about in the first post), or essentially the output of the system.

This function then takes the entire output of the machine learning model (with the weights and biases calculated using gradient descent) and plugs it into this equation to get a number between 0 and 1, or, the probability.

Now, why is this important, you may be asking?

It's very important. Whereas with linear regression, you get the model's answer, with logistic regression you get how confident the model is. And based on my newbie knowledge of machine learning, I think this is something that is very crucial. What the probability is, is actually the computer giving you its probability of being right about a given scenario. What's better, is that logistic regression models are very easy to train and they are not very computationally "heavy."

How does Logistic Regression Work?

Remember how a common loss function from linear regression was the mean squared error function? It basically just told the computer how wrong its prediction is (ouch). With logistic regression, the loss function is called Log Loss (also called binary crossentropy).

Why? This is because the rate of change of a logistic regression is not constant like with linear regression. Remember the sigmoid function I described above? That curve is like an S-shape. Not linear. Inputs to the sigmoid function that are large negatively or large positively give a small change, but values closer to zero give more intense changes. Therefore, it's helpful to use a logarithm function to calculate the loss.

The log loss function is given below:

$$\text{Log Loss} = \sum_{(x, y) \in D} \left[ -y \log(y') - (1 - y) \log(1 - y') \right]$$

This function returns the logarithm of the magnitude of the change. If mean squared error is used instead, you would need more and more memory to record the very small changes (since the number of decimal places keep increasing!) For example: 0.9997, then 0.99997, and so on.

Log loss takes in a data set with labels (y values) that are either 1 or 0. You can think of this as either being true or false. y' in the log loss equation is the computer's predicted response, which is the probability of it being right, or rather, it's "confidence." There's a huge error when the model is confident but wrong, and not a whole lot of loss when the model is unsure but closer to the right answer, and barely no error at all when the model is both confident and correct.

There's also another key aspect to logistic regression and that's regularization, but I will get into that in a future post.

Activation Functions

Activation functions are arguably the most important part of machine learning models. This is because these functions introduce non-linearity to the system.

Here's the brutal truth that I've learned regarding these functions. Without them, it actually doesn't matter how many layers (sets of neurons) you actually have. It could be 1 or 100. If none of them utilize an activation function, all their doing is linear regression...over...and over...and over again. This makes finding patterns in complex data pretty much impossible.

However, activation functions allow the model to change its input space and warp data a certain way so as to be able to find non-linear patterns.

The sigmoid function discussed above is actually an example of an activation function. other functions like it also take linear input (weights times the features plus a bias) and then in turn plugs that into the function which is the activation function.

Then, the output of the activation function gets passed along to the next layer as its input. This is how non-linearity is introduced in the model

A famous activation function is ReLU (Rectified Linear Unit) and it is shown below:

$$f(x) = max(0, x)$$

This means that all negative values are now treated as 0 and all positive values are treated as is. (x = x).

This does have a problem with what's called "dying neurons," which are neurons that always get treated as zero and there's another form of the function called the Leaky ReLU which multiplies negative values by just a very small number instead of just treating them straight up as zero.

In Conclusion

I think it's time to wrap up this post; it's taken me too long to write and think about! I didn't include my code portion for this because it wasn't very different from last time. Like I said I kind of just learned more about what machine learning models do under the hood. I did toy around with the matplotlib python library to make a chart of predicted versus actual performance for my model which was neat. I think I may show that in the next post.

As always, thank you for reading and for joining me on my personal educational journey through machine learning! See you on the next post!

Where thoughts orbit stars and dreams power suns...

Search This Blog

Project Perseus