🎯 Course

Algorithm Comparison Matrix

A quick-reference cheat sheet for every major ML model covered in the guide. Use the filters to find the right tool for the job.

Algorithm Family / Type When to Use Interpretable? Needs Scaling? Key Hyperparameters
Linear Regression Linear Model Continuous target with linear relationships. Baseline model. Yes Yes (for Regularization) alpha (L1/L2 penalty)
Polynomial Regression Linear Model Non-linear data with a low number of features. Somewhat Yes degree
Logistic Regression Linear Model Binary classification probabilities, clear decision boundaries. Yes Yes C (inverse regularization)
Decision Trees Tree Model If you need explainable rules. Non-linear data. Yes (White box) No max_depth, min_samples_split
Random Forest Ensemble Excellent default model. High performance, resists overfitting. Feature Importance No n_estimators, max_features
Gradient Boosting (XGBoost/LightGBM) Ensemble Winning Kaggle competitions. Tabular data. Highly accurate. No No learning_rate, n_estimators, max_depth
Support Vector Machines (SVM) Instance/Kernel Complex small/medium datasets. Clear margin separation. No Yes (Critical) C, kernel, gamma
K-Nearest Neighbors (KNN) Instance Based Simple baselines, small datasets. Somewhat Yes (Distance-based) n_neighbors, weights
K-Means Clustering Finding spherical clusters of similar size when K is known. Yes (Centroids) Yes n_clusters (k)
DBSCAN Clustering Finding arbitrary shaped clusters, identifying anomalies/noise. No Yes eps, min_samples
Gaussian Mixture Models (GMM) Clustering Ellipsoidal clusters, overlapping clusters, density estimation. Somewhat Yes n_components, covariance_type
PCA Dim Reduction Compressing data, speeding up ML, visualizing high-D data. Linearly. Components Yes n_components
t-SNE / UMAP Dim Reduction Visualizing extremely complex non-linear high-dimensional data. No (Vis only) Yes perplexity, n_neighbors

Classification: Which model?

Classification problem →
Imbalanced classes?
Yes → SMOTE, class weights, or undersampling + Logistic / Tree / RF
No →
Need interpretability?
Yes → Logistic Regression (linear) or Decision Tree (rules)
No →
Dataset size?
Small (<10k) → SVM (with scaling), KNN, or Logistic
Medium–Large → Random Forest or Gradient Boosting (XGBoost/LightGBM)
Tabular data, max accuracy?
Gradient Boosting (XGBoost, LightGBM, CatBoost)

Regression: Which model?

Regression problem →
Relationship linear?
Yes → Linear Regression (baseline)
No → Polynomial, Tree, or RF/GB
Many features, risk of overfitting?
Ridge (L2) or Lasso (L1) or Elastic Net — see trade-offs below
Need interpretability?
Linear Regression, or Tree for non-linear
Tabular, max accuracy?
Gradient Boosting (XGBoost, LightGBM)

Clustering: Which algorithm?

Clustering / unsupervised →
Know number of clusters (K)?
Yes →
No → DBSCAN or Hierarchical (dendrogram to pick K)
Cluster shape?
Spherical, similar size → K-Means
Arbitrary shape, noise → DBSCAN
Overlapping, ellipsoidal → GMM (Gaussian Mixture)
Need anomaly detection?
DBSCAN (labels noise as -1)

Dimensionality reduction: PCA vs t-SNE vs UMAP?

Need to reduce dimensions →
Goal?
Speed up ML, compress, remove noise → PCA (linear, fast)
Visualize high-D data (2D/3D) →
Data structure?
Linear / roughly linear → PCA
Non-linear, complex → t-SNE (small) or UMAP (larger, preserves more structure)
Note
t-SNE/UMAP for visualization only — don't use reduced output as features

Time & space complexity (training)

Algorithm Time (training) Space Prediction
Linear / Logistic RegressionO(n·d) or O(n·d²) for normal eq.O(d)O(d)
Decision TreeO(n·d·log n)O(nodes)O(depth)
Random ForestO(T·n·d·log n)O(T·nodes)O(T·depth)
Gradient BoostingO(T·n·d·log n)O(T·nodes)O(T·depth)
SVM (linear)O(n·d) to O(n²·d)O(support vectors)O(support vectors)
SVM (RBF kernel)O(n²) to O(n³)O(n²)O(support vectors)
KNNO(1) (lazy)O(n·d)O(n·d) or O(log n) with KD-tree
K-MeansO(n·k·d·i)O(n·d + k·d)O(k·d)
DBSCANO(n log n) with spatial indexO(n)N/A (no predict)
GMMO(n·k·d·i)O(k·d²)O(k·d²)
PCAO(min(n,d)³)O(d²)O(d·k)
t-SNEO(n² log n)O(n²)N/A (no out-of-sample)

n = samples, d = features, k = clusters/components, T = trees, i = iterations

Ridge vs Lasso vs Elastic Net

Regularization When to use Effect
Ridge (L2)Many correlated features; want to keep all with small weightsShrinks all coefficients toward zero; never exactly zero
Lasso (L1)Feature selection; sparse model; many irrelevant featuresCan zero out coefficients; automatic feature selection
Elastic NetMany correlated features + want some sparsity; Lasso unstable with correlated varsBlend of L1 + L2; handles groups of correlated features better

Bagging vs Boosting

Approach Idea When to use
Bagging (RF)Parallel trees on bootstrapped samples; average predictionsReduce variance; robust default; parallelizable
Boosting (GB, XGB)Sequential trees; each corrects previous errorsReduce bias; often higher accuracy; slower, sequential

K-Means vs DBSCAN vs GMM

Algorithm Pros Cons
K-MeansFast, simple, scales wellNeeds K; spherical clusters; sensitive to outliers
DBSCANNo K; arbitrary shapes; finds noiseeps/min_samples tuning; density-based
GMMSoft assignments; ellipsoidal; probabilisticSlower; assumes Gaussian; needs K

Tree vs Linear models

Aspect Linear (LogReg, LinReg) Trees (DT, RF, GB)
ScalingRequired (for regularization)Not required
Non-linearityNeed polynomial/interaction termsBuilt-in
InterpretabilityCoefficients = feature importanceTree: rules; RF/GB: feature importance
ExtrapolationCan extrapolate beyond training rangePoor; predicts within leaf bounds
High-cardinality categoricalOne-hot blows up dimensionsHandles natively (splits)