MLEARN520-Notes

Course 2: Advanced Machine Learning

These are my personal study notes for the second course in the Machine Learning Professional Certificate Program from the University of Washington. The lectures are based on the book Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani, and Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques by Witten, Frank, Hall, and Pal.

Table of Contents

Week Topic Concepts
1 Course Intro and Review Multi-class Classification
2 Decision Trees Classification Trees, Regression Trees
3 Tree Emsemble Methods Bagging, Random Forests, Boosting, Bayesian Additive Regression Trees
4 Support Vector Machine (SVM) Support Vector Classifier, Support Vector Machines, Support Vector Regression
5 Stacking and Blending Stacking, Blending
6 Clustering Methods K-Means Clustering, Hierarchical Clustering, Gaussian Mixture Models
7 Natural Language Processing (NLP) Vector Space Models, Probabilistic Models, Sequence Models, Attention Models
8 Recommendation Systems Collaborative Filtering, Content-Based Filtering, Hybrid Filtering, Knowledge-Based Filtering
9 Model Interpretation Types of Interpretability, Common Interpretability Techniques
10 MLOps MLOps Workflow, Model Versioning and Data Management, Continuous Integration and Continuous Delivery (CI/CD), Model Monitoring

Homework Assignments

Unlike the lectures, which takes a theoretical approach to machine learning, the homework assignments are more hands-on and practical. The assignments are done in Python using Jupyter Notebooks with extensive of the scikit-learn library.

For academic integrity reasons, I won't be posting my code here. However, I will provide a brief overview of the key concepts covered in each homework assignment.

Week Topic
1 Multi-class Classification: Avila-DataSet
2 Decision Trees: Banknote Authentication
3 Random Forests: Internet Advertisements
4 Support Vector Machine: Banknote Authentication
5 Ensemble Methods: Internet Advertisements
6 Clustering: Superstore Transactions
7 Sentiment Analysis: Twitter Dataset
8
9
10

Clarification on Content Creation

While my understanding of the subject matter originates from attending the class, the majority of my notes were based on the provided slides. These notes can be a bit messy and may potentially breach copyright terms. To have a digital, easily accessible version of my notes, I opted to work with ChatGPT to refine and present the content in a more organized manner. Given that these topics are common knowledge and widely discussed online, the information rendered by ChatGPT is generally trustworthy.