MLEARN510-Notes

Course 1: Introduction to Machine Learning

These are my personal study notes for the first course in the Machine Learning Professional Certificate Program from the University of Washington. The lectures are based on the book Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani.

Table of Contents

Week Topic Concepts
1 Course intro AI vs. ML, Types of ML, Performance Metrics, Bias-Variance Tradeoffs, Parametric vs. Non-parametric Methods, K-Nearest Neighbors (KNN), Matrix Algebra
2 Linear Regression Maximum Likelihood Estimation (MLE), Simple Linear Regression, Multiple Linear Regression
3 Classification Logistic Regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN)
4 Model Building Part 1 Data Preprocessing, Handling Outliers & Class Imbalance & Missing Data, Feature Engineering, Feature Selection, Data Splitting
5 Model Building Part 2 Feature Engineering, Feature Selection, Resampling
6 Resampling Methods Cross-Validation, Bootstrapping, Leave-One-Out Cross-Validation (LOOCV)
7 Linear Model Selection and Regularization Subset Selection, Shrinkage Methods (Ridge Regression, Lasso Regression)
8 Dimension Reduction Methods Principal Component Analysis (PCA), Principal Components Regression (PCR), t-Distributed Stochastic Neighbor Embedding (t-SNE)
9 Forecasting Time Series Data, Moving Average, Exponential Smoothing, ARIMA
10 Frequent Itemset Mining Association Rules, Apriori Algorithm, FP-Growth Algorithm, Maximal and Closed Frequent Itemsets

Homework Assignments

Unlike the lectures, which takes a theoretical approach to machine learning, the homework assignments are more hands-on and practical. The assignments are done in Python using Jupyter Notebooks with extensive of the scikit-learn library.

For academic integrity reasons, I won't be posting my code here. However, I will provide a brief overview of the key concepts covered in each homework assignment.

Week Topic
1 K-Nearst Neighbors: Social Network Ad
2 Linear Regression: Forest Fires
3 Classification: Credit Worthiness
4 Model Building Part 1: Internet Advertisements
5 Model Building Part 2: Predictive Maintenance
6 Resampling Methods: SECOM
7 Linear Model Selection and Regularization: House Prices
8 Dimension Reduction Methods: MNIST
9 Forecasting: Airline Passengers
10 Frequent Itemset Mining: Online Retail

Clarification on Content Creation

While my understanding of the subject matter originates from attending the class, the majority of my notes were based on the provided slides. These notes can be a bit messy and may potentially breach copyright terms. To have a digital, easily accessible version of my notes, I opted to work with ChatGPT to refine and present the content in a more organized manner. Given that these topics are common knowledge and widely discussed online, the information rendered by ChatGPT is generally trustworthy.