A quick-reference cheat sheet for every major ML model covered in the guide. Use the filters to find the right tool for the job.
| Algorithm | Family / Type | When to Use | Interpretable? | Needs Scaling? | Key Hyperparameters |
|---|---|---|---|---|---|
| Linear Regression | Linear Model | Continuous target with linear relationships. Baseline model. | Yes | Yes (for Regularization) | alpha (L1/L2 penalty) |
| Polynomial Regression | Linear Model | Non-linear data with a low number of features. | Somewhat | Yes | degree |
| Logistic Regression | Linear Model | Binary classification probabilities, clear decision boundaries. | Yes | Yes | C (inverse regularization) |
| Decision Trees | Tree Model | If you need explainable rules. Non-linear data. | Yes (White box) | No | max_depth, min_samples_split |
| Random Forest | Ensemble | Excellent default model. High performance, resists overfitting. | Feature Importance | No | n_estimators, max_features |
| Gradient Boosting (XGBoost/LightGBM) | Ensemble | Winning Kaggle competitions. Tabular data. Highly accurate. | No | No | learning_rate, n_estimators, max_depth |
| Support Vector Machines (SVM) | Instance/Kernel | Complex small/medium datasets. Clear margin separation. | No | Yes (Critical) | C, kernel, gamma |
| K-Nearest Neighbors (KNN) | Instance Based | Simple baselines, small datasets. | Somewhat | Yes (Distance-based) | n_neighbors, weights |
| K-Means | Clustering | Finding spherical clusters of similar size when K is known. | Yes (Centroids) | Yes | n_clusters (k) |
| DBSCAN | Clustering | Finding arbitrary shaped clusters, identifying anomalies/noise. | No | Yes | eps, min_samples |
| Gaussian Mixture Models (GMM) | Clustering | Ellipsoidal clusters, overlapping clusters, density estimation. | Somewhat | Yes | n_components, covariance_type |
| PCA | Dim Reduction | Compressing data, speeding up ML, visualizing high-D data. Linearly. | Components | Yes | n_components |
| t-SNE / UMAP | Dim Reduction | Visualizing extremely complex non-linear high-dimensional data. | No (Vis only) | Yes | perplexity, n_neighbors |
| Algorithm | Time (training) | Space | Prediction |
|---|---|---|---|
| Linear / Logistic Regression | O(n·d) or O(n·d²) for normal eq. | O(d) | O(d) |
| Decision Tree | O(n·d·log n) | O(nodes) | O(depth) |
| Random Forest | O(T·n·d·log n) | O(T·nodes) | O(T·depth) |
| Gradient Boosting | O(T·n·d·log n) | O(T·nodes) | O(T·depth) |
| SVM (linear) | O(n·d) to O(n²·d) | O(support vectors) | O(support vectors) |
| SVM (RBF kernel) | O(n²) to O(n³) | O(n²) | O(support vectors) |
| KNN | O(1) (lazy) | O(n·d) | O(n·d) or O(log n) with KD-tree |
| K-Means | O(n·k·d·i) | O(n·d + k·d) | O(k·d) |
| DBSCAN | O(n log n) with spatial index | O(n) | N/A (no predict) |
| GMM | O(n·k·d·i) | O(k·d²) | O(k·d²) |
| PCA | O(min(n,d)³) | O(d²) | O(d·k) |
| t-SNE | O(n² log n) | O(n²) | N/A (no out-of-sample) |
n = samples, d = features, k = clusters/components, T = trees, i = iterations
| Regularization | When to use | Effect |
|---|---|---|
| Ridge (L2) | Many correlated features; want to keep all with small weights | Shrinks all coefficients toward zero; never exactly zero |
| Lasso (L1) | Feature selection; sparse model; many irrelevant features | Can zero out coefficients; automatic feature selection |
| Elastic Net | Many correlated features + want some sparsity; Lasso unstable with correlated vars | Blend of L1 + L2; handles groups of correlated features better |
| Approach | Idea | When to use |
|---|---|---|
| Bagging (RF) | Parallel trees on bootstrapped samples; average predictions | Reduce variance; robust default; parallelizable |
| Boosting (GB, XGB) | Sequential trees; each corrects previous errors | Reduce bias; often higher accuracy; slower, sequential |
| Algorithm | Pros | Cons |
|---|---|---|
| K-Means | Fast, simple, scales well | Needs K; spherical clusters; sensitive to outliers |
| DBSCAN | No K; arbitrary shapes; finds noise | eps/min_samples tuning; density-based |
| GMM | Soft assignments; ellipsoidal; probabilistic | Slower; assumes Gaussian; needs K |
| Aspect | Linear (LogReg, LinReg) | Trees (DT, RF, GB) |
|---|---|---|
| Scaling | Required (for regularization) | Not required |
| Non-linearity | Need polynomial/interaction terms | Built-in |
| Interpretability | Coefficients = feature importance | Tree: rules; RF/GB: feature importance |
| Extrapolation | Can extrapolate beyond training range | Poor; predicts within leaf bounds |
| High-cardinality categorical | One-hot blows up dimensions | Handles natively (splits) |