ML Prep Checklist
Before getting started with the theory and hands-on training of this course, we go through our Machine Learning prep checklist where you get to know the eight most common data preparation and cleaning steps that need to be considered in a project, such as dealing with missing values and duplicate or incorrect data, feature scaling, or validation split.
Working with Missing Values
In this part we cover off some of the high level intuition around dealing with the missing values that can exist in our data.
Learn why we even have to worry about missing values and the different ways to deal with them.
Learn how to make use of Pandas and the isNA method to check for missing values and about the options you have if you find some.
Get to know imputation - a data preprocessing and preparation technique, where you impute replacement values for those that are missing. Learn how to put some powerful imputation approaches into action, such as SimpleImputer or KNNImputer, which enable either the static or the dynamic imputation of missing values.
Furthermore, we show you how to deal with categorical variables so you can ensure that your model can extract the meaningful information that the variables hold in. For this, you will learn about one of the most common approaches to do that: One-Hot-Encoding.
Outliers & Feature Scaling
In this section you learn what outliers are and why they matter when it comes to preprocessing and preparing your data for a machine learning project. Also, you get to know some ways of how to detect and deal with them - and learn when it is ok to just leave them be and do nothing.
Also, we are discussing and running through ways to scale the values of your features or columns, an approach known as feature scaling. Here, you learn what feature scaling is and why it is such an important step. And you get to know the two most common techniques for this: standardization and normalization. Learn about the difference between these two and the logic that lies behind them.
Next, we are discussing what exactly feature selection is and the scenarios in which you need to apply it. We also will be looking at some nice and effective approaches that are used to find the best features prior to building and training your model: creating a quick and easy correlation matrix between the features, using univariate feature selection - a more automated method - , and recursive feature elimination (RFE).
ML Model Validation
Last but not least you learn everything about model validation. Learn what overfitting is and why this can be problematic for your machine learning project. Also get to know some great approaches for model validation: cross validation and k-fold cross validation.