Skip to main content
Ctrl+K
Logo image

Contents

  • Introduction
    • Math Basics
    • Statistics basics
      • Boxplots
      • Outliers
      • Parametric tests or models
    • Machine Learning Basics
      • Types of ML systems
      • Machine Learning workflow
  • Data preparation
    • Quality of a dataset
    • Data validation
    • Feature engineering
      • RDKit
      • MACCS fingerprints
      • Morgan ECFP fingerprints
      • Mordred
      • Measure molecular similarity
    • Feature selection
    • Train, validation, and test sets
    • Scaling
  • Machine Learning models
    • Linear models
      • Linear regression
    • ML Models
    • k-Nearest Neighbors
    • Decision Tree
    • Gradient Boosting
    • Hyperparameters - Gradient Boosting
  • Model evaluation
    • Model evaluation
    • Performance metrics
    • Train and cross validation
    • y-Randomization
    • Partial dependance plots
    • SHAP (SHapley Additive exPlanations)
  • Scikit-Learn
    • Pipelines
    • Saving Scikit-learn model for reuse

Resources

  • Scientific articles
  • Reading material
  • Python ML tools and packages
  • Show source
  • Suggest edit
  • Open issue
  • .md

Introduction

Introduction#

Basic concepts and definitions on math, statistics, and machine learning.

  • Math Basics
    • Monotone function
    • Dense vs sparse matrices
  • Statistics basics
    • Dividing data
      • Percentiles (100 regions)
      • Deciles (10 regions)
      • Quartiles (4 regions)
      • Hinges
        • Example 1: sample size of 20
        • Example 2: sample size of 21
    • Five Number Summary
    • Density plot
    • References
    • Contents
      • Boxplots
        • Definitions
        • Outliers
        • Creating a boxplot in Pandas
        • References
      • Outliers
        • Definition of outliers
        • Ways to describe data
        • Box plot construction
        • Box plots with fences
        • Outlier detection criteria
          • Extreme Outliers
          • Mild Outliers
      • Parametric tests or models
        • Parametric models
        • Non-parametric models
        • Choosing Between Parametric and Non-Parametric Models
  • Machine Learning Basics
    • Features
    • Target
    • Hyperparameters
    • Bias-Variance Tradeoff
    • Overfitting and underfitting
    • Contents
      • Types of ML systems
        • Supervised/unsupervised
          • Supervised learning
          • Unsupervised learning
          • Semisupervised learning
          • Reinforcement learning
      • Machine Learning workflow
        • 1) Data preparation
        • 2) Feature engineering
        • 3) Model development
        • 4) Model testing
        • 5) Application
        • References

previous

<no title>

next

Math Basics

By José Aniceto

© Copyright 2023, José Aniceto.