CS50 Data Science & AI Fundamentals
Duration: 12 weeks
Prerequisite: Python basics
Tools: NumPy, pandas, Matplotlib, seaborn*, SQL, scikit-learn
▶ Week 0 — Data Mindset & Stack
- What is Data Science? Roles, workflow, reproducibility.
- Project structure, notebooks vs scripts, environments.
Mini-lab: Create a cookiecutter-style DS project skeleton.
▶ Week 1 — Python for Data
- NumPy arrays & vectorization; broadcasting.
- pandas Series/DataFrame: indexing, filtering, apply.
Problem set: Retail transactions cleanup + KPIs.
▶ Week 2 — Data Wrangling
- Merge/join, reshape (melt/pivot), groupby/agg.
- Handling missing/outliers; datetime ops; text columns.
Problem set: Multi-table join & feature creation.
▶ Week 3 — Visualization
- Matplotlib basics; seaborn* grammar; choosing the right chart.
- Storytelling, annotations, dashboards (intro).
Problem set: Exploratory data story with 4 charts.
▶ Week 4 — SQL for Analytics
- SELECT, JOINs, CTEs, window functions for metrics.
- SQLite + pandas; query optimization basics.
Problem set: Cohort & retention analysis with SQL.
▶ Week 5 — Probability Refresher
- Random variables, distributions, sampling.
- Law of large numbers, CLT; simulation in Python.
Problem set: Monte Carlo estimation mini-lab.
▶ Week 6 — Statistics & Inference
- Confidence intervals, hypothesis testing, p-values.
- AB testing design; power, effect size.
Problem set: AB test analysis & report.
▶ Week 7 — Linear Algebra for ML (Lite)
- Vectors, matrices, norms; projections; eigens (intuition).
- Practical: PCA from scratch vs scikit-learn.
Problem set: PCA on a high-dimensional dataset.
▶ Week 8 — ML Foundations
- Train/validation/test; metrics for regression & classification.
- Baselines, feature scaling, leakage, pipelines.
Problem set: Build a baseline model with a robust pipeline.
▶ Week 9 — Classical Models
- Logistic/linear regression, naive Bayes, SVM (intuition).
- Tree-based models overview; calibration & probability outputs.
Problem set: Compare 3 models on tabular data.
▶ Week 10 — Responsible AI
- Bias/fairness, data documentation, model cards.
- Privacy, PII handling, basic differential privacy ideas.
Problem set: Fairness audit + mitigations write‑up.
▶ Week 11 — MLOps Lite
- Experiment tracking, versioning, environments.
- Exporting models; simple FastAPI/Flask serving.
Deliverable: Reproducible training script + API stub.
▶ Week 12 — Capstone
- End‑to‑end project: data → model → report/dashboard.
Deliverables: Notebook/report, repo, short presentation.
* seabo rn is optional here for teaching; production charts can use Matplotlib/Plotly based on context.