Algorithm Matrix | Stats & ML Interview Prep

Algorithm	Family / Type	When to Use	Interpretable?	Needs Scaling?	Key Hyperparameters
Linear Regression	Linear Model	Continuous target with linear relationships. Baseline model.	Yes	Yes (for Regularization)	alpha (L1/L2 penalty)
Polynomial Regression	Linear Model	Non-linear data with a low number of features.	Somewhat	Yes	degree
Logistic Regression	Linear Model	Binary classification probabilities, clear decision boundaries.	Yes	Yes	C (inverse regularization)
Decision Trees	Tree Model	If you need explainable rules. Non-linear data.	Yes (White box)	No	max_depth, min_samples_split
Random Forest	Ensemble	Excellent default model. High performance, resists overfitting.	Feature Importance	No	n_estimators, max_features
Gradient Boosting (XGBoost/LightGBM)	Ensemble	Winning Kaggle competitions. Tabular data. Highly accurate.	No	No	learning_rate, n_estimators, max_depth
Support Vector Machines (SVM)	Instance/Kernel	Complex small/medium datasets. Clear margin separation.	No	Yes (Critical)	C, kernel, gamma
K-Nearest Neighbors (KNN)	Instance Based	Simple baselines, small datasets.	Somewhat	Yes (Distance-based)	n_neighbors, weights
K-Means	Clustering	Finding spherical clusters of similar size when K is known.	Yes (Centroids)	Yes	n_clusters (k)
DBSCAN	Clustering	Finding arbitrary shaped clusters, identifying anomalies/noise.	No	Yes	eps, min_samples
Gaussian Mixture Models (GMM)	Clustering	Ellipsoidal clusters, overlapping clusters, density estimation.	Somewhat	Yes	n_components, covariance_type
PCA	Dim Reduction	Compressing data, speeding up ML, visualizing high-D data. Linearly.	Components	Yes	n_components
t-SNE / UMAP	Dim Reduction	Visualizing extremely complex non-linear high-dimensional data.	No (Vis only)	Yes	perplexity, n_neighbors

Classification: Which model?

Classification problem →

Imbalanced classes?

Yes → SMOTE, class weights, or undersampling + Logistic / Tree / RF

No →

Need interpretability?

Yes → Logistic Regression (linear) or Decision Tree (rules)

No →

Dataset size?

Small (<10k) → SVM (with scaling), KNN, or Logistic

Medium–Large → Random Forest or Gradient Boosting (XGBoost/LightGBM)

Tabular data, max accuracy?

Gradient Boosting (XGBoost, LightGBM, CatBoost)

Regression: Which model?

Regression problem →

Relationship linear?

Yes → Linear Regression (baseline)

No → Polynomial, Tree, or RF/GB

Many features, risk of overfitting?

Ridge (L2) or Lasso (L1) or Elastic Net — see trade-offs below

Need interpretability?

Linear Regression, or Tree for non-linear

Tabular, max accuracy?

Gradient Boosting (XGBoost, LightGBM)

Clustering: Which algorithm?

Clustering / unsupervised →

Know number of clusters (K)?

Yes →

No → DBSCAN or Hierarchical (dendrogram to pick K)

Cluster shape?

Spherical, similar size → K-Means

Arbitrary shape, noise → DBSCAN

Overlapping, ellipsoidal → GMM (Gaussian Mixture)

Need anomaly detection?

DBSCAN (labels noise as -1)

Dimensionality reduction: PCA vs t-SNE vs UMAP?

Need to reduce dimensions →

Goal?

Speed up ML, compress, remove noise → PCA (linear, fast)

Visualize high-D data (2D/3D) →

Data structure?

Linear / roughly linear → PCA

Non-linear, complex → t-SNE (small) or UMAP (larger, preserves more structure)

Note

t-SNE/UMAP for visualization only — don't use reduced output as features

Time & space complexity (training)

Algorithm	Time (training)	Space	Prediction
Linear / Logistic Regression	O(n·d) or O(n·d²) for normal eq.	O(d)	O(d)
Decision Tree	O(n·d·log n)	O(nodes)	O(depth)
Random Forest	O(T·n·d·log n)	O(T·nodes)	O(T·depth)
Gradient Boosting	O(T·n·d·log n)	O(T·nodes)	O(T·depth)
SVM (linear)	O(n·d) to O(n²·d)	O(support vectors)	O(support vectors)
SVM (RBF kernel)	O(n²) to O(n³)	O(n²)	O(support vectors)
KNN	O(1) (lazy)	O(n·d)	O(n·d) or O(log n) with KD-tree
K-Means	O(n·k·d·i)	O(n·d + k·d)	O(k·d)
DBSCAN	O(n log n) with spatial index	O(n)	N/A (no predict)
GMM	O(n·k·d·i)	O(k·d²)	O(k·d²)
PCA	O(min(n,d)³)	O(d²)	O(d·k)
t-SNE	O(n² log n)	O(n²)	N/A (no out-of-sample)

n = samples, d = features, k = clusters/components, T = trees, i = iterations

Ridge vs Lasso vs Elastic Net

Regularization	When to use	Effect
Ridge (L2)	Many correlated features; want to keep all with small weights	Shrinks all coefficients toward zero; never exactly zero
Lasso (L1)	Feature selection; sparse model; many irrelevant features	Can zero out coefficients; automatic feature selection
Elastic Net	Many correlated features + want some sparsity; Lasso unstable with correlated vars	Blend of L1 + L2; handles groups of correlated features better

Bagging vs Boosting

Approach	Idea	When to use
Bagging (RF)	Parallel trees on bootstrapped samples; average predictions	Reduce variance; robust default; parallelizable
Boosting (GB, XGB)	Sequential trees; each corrects previous errors	Reduce bias; often higher accuracy; slower, sequential

K-Means vs DBSCAN vs GMM

Algorithm	Pros	Cons
K-Means	Fast, simple, scales well	Needs K; spherical clusters; sensitive to outliers
DBSCAN	No K; arbitrary shapes; finds noise	eps/min_samples tuning; density-based
GMM	Soft assignments; ellipsoidal; probabilistic	Slower; assumes Gaussian; needs K

Tree vs Linear models

Aspect	Linear (LogReg, LinReg)	Trees (DT, RF, GB)
Scaling	Required (for regularization)	Not required
Non-linearity	Need polynomial/interaction terms	Built-in
Interpretability	Coefficients = feature importance	Tree: rules; RF/GB: feature importance
Extrapolation	Can extrapolate beyond training range	Poor; predicts within leaf bounds
High-cardinality categorical	One-hot blows up dimensions	Handles natively (splits)

Algorithm Comparison Matrix