Skip to main content
Ctrl+K
Logo image

Contents

  • Introduction
    • Math Basics
    • Statistics basics
      • Boxplots
      • Outliers
      • Parametric tests or models
    • Machine Learning Basics
      • Types of ML systems
      • Machine Learning workflow
  • Data preparation
    • Quality of a dataset
    • Data validation
    • Feature engineering
      • RDKit
      • MACCS fingerprints
      • Morgan ECFP fingerprints
      • Mordred
      • Measure molecular similarity
    • Feature selection
    • Train, validation, and test sets
    • Scaling
  • Machine Learning models
    • Linear models
      • Linear regression
    • ML Models
    • k-Nearest Neighbors
    • Decision Tree
    • Gradient Boosting
    • Hyperparameters - Gradient Boosting
  • Model evaluation
    • Model evaluation
    • Performance metrics
    • Train and cross validation
    • y-Randomization
    • Partial dependance plots
    • SHAP (SHapley Additive exPlanations)
  • Scikit-Learn
    • Pipelines
    • Saving Scikit-learn model for reuse

Resources

  • Scientific articles
  • Reading material
  • Python ML tools and packages
  • Show source
  • Suggest edit
  • Open issue
  • .md

Machine Learning models

Machine Learning models#

This section covers the specifics about machine learning models.

  • Linear models
    • Linear Regression
      • Linear regression
      • Fit a simple linear regression to 2D data
      • Multilinear regression
      • Adapting linear regression to nonlinear relationships
    • Polynomial Regression
    • Bayesian Regression
    • Ridge Regression
    • LASSO Regression
    • Logistic Regression
  • ML Models
    • Linear models:
    • Nonlinear models:
  • k-Nearest Neighbors
    • kNN for classification
    • kNN for regression
    • Selection of the k value
    • Strengths of kNN
    • Weaknesses of kNN
  • Decision Tree
    • Decision Trees for classification
    • Decision Trees for regression
    • Building a Decision Tree
    • Strengths of Decision Trees
    • Weaknesses of Decision Trees
  • Gradient Boosting
    • Function to train a model with GB and hyperparameter search
  • Hyperparameters - Gradient Boosting
    • Tree-specific hyperparameters
      • Minimum samples for splitting
      • Minimum samples per leaf
      • Minimum samples per leaf as a fraction of total samples
      • Maximum depth of a tree
      • Maximum number of leaves in a tree
      • Maximum number of features
    • Boosting hyperparameters
      • Learning rate
      • Number of sequential trees
      • Fraction of observations to be selected for each tree
    • Other hyperparameters
      • Loss function
    • Hyperparameters grid in scikit-learn

previous

Scaling

next

Linear models

By José Aniceto

© Copyright 2023, José Aniceto.