Project Perseus AI ~ Post #1 ~ Machine Learning Origins
Project Perseus AI ~ Post #1 ~ Machine Learning Origins
This is the first post in hopefully a series of many that will document my educational journey through the topics of AI and machine learning. Every post will discuss a topic that I've reviewed and/or models that I've programmed to help me better understand these machine learning concepts.
What is the goal with these posts? Again, the goal is to primarily give a record of my self-taught journey in machine learning and to one day create my own AI model which I will name Perseus. I'm quite far away from that point at present, but if I'm diligent enough, I believe I can get there.
I started this journey because I've recognized that the 21st century will most likely be defined as the advent of organic and artificial intelligence cooperation. I believe that a healthy and beneficial cooperation is possible, but that it will require deep understanding on the humans' part and (hopefully) from the AI as well. My motivation for learning these topics is that they will be massive driving forces of productivity, development, research, and all-together utility as the century moves on. Knowing the mathematical, theoretical, and practical underpinnings of these forces will place me (and hopefully many other people) in a good place to adequately utilize AI, machine learning, and organic/artificial intelligence cooperation for good use to benefit all.
Now, before I get into the nitty gritty of what I learned this week, let's start with some basics (as I understand them).
What is Machine Learning?
Machine Learning in very basic terms is a model that predicts trends and corrects itself so as to minimize error. It starts out not knowing any relationship between a given set of data and it ends with being able to make accurate predictions based upon how it understands the relationship between the data. A.K.A learning. It didn't know, then after some trial and error (with lots of math) it knew. That's what we mean by machine learning.
Essentially, it's a computer model that learns a pattern and is able to utilize that pattern to make quite accurate predictions from the data.
What did I learn this week?
I primarily covered the topics of linear regression. This is a very basic way that computers can learn. Essentially, it's a way that the computer can figure out a linear relationship between data that roughly correlates linearly (I haven't yet learned about the methods for data that follow more complex functions).
Take the equation of a line, for example:
$$y = mx + b$$
But let's tweak it a little (It's still pretty much the same just some different letters for the variables):
$$y = b + w_1 x_1$$
Where $b$ is the bias (essentially the y-intercept) and $w_1$ is the weight (essentially the slope of the line).
A computer model can learn a simple relationship between the data by just varying the bias and the weight until it minimizes the loss as much as possible.
How does the model know in what direction to change the bias and the weight so that it can minimize the loss? Before we can answer that question, we must discuss the concept of loss.
What is Loss?
Loss is simply how much the model is wrong about the data after it makes a prediction. It's like a teacher that grades your paper with lots of red ink so you know exactly what you did wrong. What we find in linear regression is that the loss is quickly reduced until it converges, which is a fancy way of saying it just can't get any better at doing it's job.
A graph can be made of loss vs. iterations. This curve, called a loss curve, is what actually tells the model how to adjust the weight and the bias. It does this by calculating the direction of the slope of the loss curve, i.e. is the curve going up or down, and how steep is it going? This means we have a derivative to do....if you're interested in the math, I have it below, else you can skip it's not terribly important. Just go to the paragraphs immediately after all the intimidation math.
The below example is done using a loss function called MSE (Mean Squared Error). It's just a way of expressing the loss of the model.
$$\frac{1}{M} \sum_{i=1}^{M} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right)^2$$
The derivative of the loss function with respect to the weight is:
$$\frac{\partial}{\partial w} \left[ \frac{1}{M} \sum_{i=1}^{M} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right)^2 \right]$$
And becomes:
$$\frac{1}{M} \sum_{i=1}^{M} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right) \cdot 2x^{(i)}$$
The derivative of the loss function with respect to the bias is:
$$\frac{\partial}{\partial b} \left[ \frac{1}{M} \sum_{i=1}^{M} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right)^2 \right]$$
And becomes:
$$\frac{1}{M} \sum_{i=1}^{M} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right) \cdot 2$$
The model plugs it's current values in for the bias and weight and figures out the direction of the slope of the tangent to the loss curve (is it going uphill, downhill, and how steep?)
Let's say the value of the slope is negative. How should the model move the weight and the bias? In the direction of the slope, so negatively.
The model now has a new weight and bias and what does it do? It repeats the steps over again with those new values and calculates the loss again. And again. And again. Until the loss converges.
It changes the weight and the bias based on this simple relationship:
$$ New weight = old weight - (small amount * weight slope)$$
$$ New bias = old bias - (small amount * bias slope)$$
The small amount just determines how much the model changes the bias and the weight each time. We'll get into that next.
Hyperparameters
The hyperparameters are the things the user changes, not the model.
They are: The learning rate, the batch size, and Epochs.
The learning rate is that "small amount" just discussed above. It's essentially how fast the model learns. Delicate balance is required.
The batch size is how many examples the model goes through before updating the bias and the weight. The links at the bottom of the post can lead you to crash-courses that can describe the different kinds of batch sizes better than I can.
Finally, an Epoch pretty much means that the model has gone through the entire data set once. The number of epochs that occur during model training is based on how frequently the model updates its weights and biases (what type of batch size it's using).
Now, enough of the boring theory, what did I actually do this week?
My Practice Model this week
I practiced all of this to ensure I understood it all after I generated some dummy data to train my baby model on. The premise? How does hours slept and hours studied affect GPA (with giving slightly more weight to hours studied).
import tensorflow as tf
import numpy as np
import pandas as pd
# Set seed for reproducibility
np.random.seed(42)
# Generate 100 data points
num_samples = 100
# Random values for hours studied and hours slept
hours_studied = np.random.uniform(0, 10, num_samples)
hours_slept = np.random.uniform(3, 10, num_samples)
# Define GPA as a function of both inputs + some noise
noise = np.random.normal(0, 0.3, num_samples) # Gaussian noise
gpa = 0.3 * hours_studied + 0.2 * hours_slept + noise
# Clip GPA between 0 and 4.0
gpa = np.clip(gpa, 0, 4.0)
# Create a pandas DataFrame
df = pd.DataFrame({
'Hours_Studied': hours_studied,
'Hours_Slept': hours_slept,
'GPA': gpa
})
# Display first few rows
print(df)
# Give the features and labels
X = df[["Hours_Studied", "Hours_Slept"]].values
y = df["GPA"].values
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(2,)),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1)])
model.compile(optimizer="sgd", loss="mean_squared_error")
model.fit(X, y, epochs=1000)
print(model.predict(np.array([[9.5000, 2.0000]])))
This code generates random data using the numpy library. Then I generated some noise (because life isn't perfect, right?) that followed a normal distribution.
Then I defined the GPA function where I gave slightly more weight to hours studied (notice the 0.3 in front of hours studied and 0.2 in front of hours slept). The noise I just created was simply tacked onto the end.
I created a dataframe (almost like an excel sheet) using the pandas library that contains the generated data. I then accessed that data as the input (X) and the output (y) just like in the equation of the line, where x is the input and y is the output of the function that operates on the input.
Now to the fun part!! tf.keras.Sequential means that I'm creating a machine learning model where the different layers/functions are stacked onto each other (one leads right to the next and so on). In this same line of code, I define the model to take in two inputs.
Input and Dense are all different kinds of neural layers. Dense just means that every neuron is connected to one in the layer before it and I define my particular dense layer to consist of 10 neurons, that each make the linear regression calculation and change the weights and biases just as discussed earlier. The final Dense layer has just one neuron because that's the one that averages all of the previous 10 neurons calculations and gives the final output.
The next line of code is where the batch size and epochs are defined. Remember those hyperparameters? "sgd" is just a type of batch size where the model updates weights and biases after just one example from the entire dataset. Epochs=1000 means we have the model going through the entire data set 1,000 times.
The very last line of code is where I give it some random input and my baby model gives me it's predicted outcome based on what it's learned!!
By the time everything was all said and done, I was able to get baby Perseus to obtain a loss as low as 0.09! (Not outstanding, but not bad either).
I hope this was interesting to read!! Feel free to take what I did here and try it out yourself!!
Below, I have links to my favorite python IDE and all the resources I used to learn this week! See you next time!
Where thoughts orbit stars and dreams power suns...
And here's Google CoLab; I recommend taking a look:
Comments
Post a Comment