Hello, viewers from around the world! Whether you say hello, dobry dzien, namaste, como estas, good afternoon, good morning, or good evening, welcome to my tutorial. I'm Etietop Demas Abraham, also known as Eti. In this video, we'll dive into logistic regression and k-nearest neighbors.
Before we begin, I recommend checking out my previous video on k-nearest neighbors for some background knowledge. In this tutorial, we'll tackle a practical problem related to pulsar stars and identify them using logistic regression and k-nearest neighbors.
Logistic regression is similar to linear regression but is designed for categorical variables. We'll use logistic regression to predict the probability of a new star belonging to a specific class. Additionally, we'll employ k-nearest neighbors to perform classification based on the proximity of other stars.
Our goal is to build a supervised classification machine learning model. So, let's get started!
First, we'll explore the "pulsar stars" dataset, which contains information about stars obtained during a high-resolution universe survey. Among the observations, some stars are pulsars. We have a total of around 17,000 observed stars, with approximately 1,639 of them being pulsars.
The dataset includes eight predictors (independent variables) that we'll analyze to determine the target or class of each star. One of these predictors is the average value of the integral profile. Our response variable is the target column, representing the type of star we want to classify or predict.
To begin, we'll retrieve the dataset from our PostgreSQL database and extract samples based on certain conditions. We'll select stars with a target of 0 and an MIP (mean integrated profile) value between 82 and 84, as well as stars with a target of 1 and an MIP value between 83 and 92.
Once we have our sample datasets, we'll import them into Google Sheets for better visualization and analysis. We'll normalize the predictors using Azure, ensuring that all variables are on the same scale for accurate analysis.
After the normalization step, we'll train a logistic regression model using the sample data, setting a random seed of 2019. The trained model will allow us to classify new stars and determine their target class.
Next, we'll classify a new star using the trained logistic regression model. We'll manually enter the star's parameters and leverage the model to predict its target class. The result will provide the probability of the star belonging to a particular class.
In addition to logistic regression, we'll also explore k-nearest neighbors. This algorithm finds the closest neighbors to a given star based on Euclidean distance calculations. We'll calculate the Euclidean distance between the new star and the existing stars, allowing us to identify the closest star and its target class.
Finally, we'll compare the classification results from logistic regression and k-nearest neighbors to see how they align. We'll find that both methods provide similar predictions, reinforcing the accuracy of our models.
Thank you for joining me in this tutorial on logistic regression and k-nearest neighbors. I hope you found it informative and engaging. Feel free to leave any comments or questions below. Have a wonderful day, and goodbye!