Skip to main content
Ctrl+K
Logo image

Contents

  • Introduction
    • Math Basics
    • Statistics basics
      • Boxplots
      • Outliers
      • Parametric tests or models
    • Machine Learning Basics
      • Types of ML systems
      • Machine Learning workflow
  • Data preparation
    • Quality of a dataset
    • Data validation
    • Feature engineering
      • RDKit
      • MACCS fingerprints
      • Morgan ECFP fingerprints
      • Mordred
      • Measure molecular similarity
    • Feature selection
    • Train, validation, and test sets
    • Scaling
  • Machine Learning models
    • Linear models
      • Linear regression
    • ML Models
    • k-Nearest Neighbors
    • Decision Tree
    • Gradient Boosting
    • Hyperparameters - Gradient Boosting
  • Model evaluation
    • Model evaluation
    • Performance metrics
    • Train and cross validation
    • y-Randomization
    • Partial dependance plots
    • SHAP (SHapley Additive exPlanations)
  • Scikit-Learn
    • Pipelines
    • Saving Scikit-learn model for reuse

Resources

  • Scientific articles
  • Reading material
  • Python ML tools and packages
  • Show source
  • Suggest edit
  • Open issue
  • .md

Reading material

Contents

  • General
  • ML models

Reading material#

General#

  • Common pitfalls and recommended practices

ML models#

  • Confidence Intervals for Scikit Learn Random Forests

previous

Scientific articles

next

Python ML tools and packages

Contents
  • General
  • ML models

By José Aniceto

© Copyright 2023, José Aniceto.