CS/Statistics/Linear Algebra Short Course

Start with the basics. In the CS portion, we briefly cover basic data structures/types, program control flow, and syntax in Python. For statistics, we go over basic probability and probability distributions, along with general properties of some common distributions. As for linear algebra, we cover matrices, vectors, and some of their properties and how to use them in Python.

  • Examples, data science articulated, history and context, technology landscape


Exploratory Data Analysis and Visualization

We spend a considerable amount of time using the Pandas Python package to attack a dataset we’ve never seen before and to uncover some useful information from it. At this point, students decide on a course project that would benefit from the data-scientific approach. The project must involve public (freely-accessible and usable) data and must answer an interesting question – or collection of questions – about that data. (Note: Several resources of free data will be provided.)

 Data Manipulation at Scale

  • Databases and the relational algebra 

  • Parallel databases, parallel query processing, in-database analytics 

  • MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages  

  • Key-value stores and NoSQL; tradeoffs of SQL and NoSQL


Data Modeling: Supervised / Unsupervised Learning and Model Evaluation

We learn about the two basic kinds of statistical models, which have classically been used for prediction (supervised learning): Linear Regression and Logistic Regression.We also look at clustering using K-Means, one of the ways you can glean information from unlabeled data.


  • Topics in statistical modeling: basic concepts, experiment design, pitfalls

  • Topics in machine learning: supervised learning (rules, trees, forests, nearest neighbor, regression), optimization (gradient descent and variants), unsupervised learning


Data Modeling: Feature Selection, Engineering, and Data Pipelines

We switch gears from talking about algorithms to talk about features. What are they? How do we engineer them? And what can be done (Principal Component Analysis / Independent Component Analysis, regularization) to create and use them given the data at hand? We also cover how to construct complete data pipelines, going from data ingestion and preprocessing to model construction and evaluation.

 Communicating Results

  • Visualization, data products, visual data analytics 

  • Provenance, privacy, ethics, governance 


Data Modeling: Advanced Supervised / Unsupervised Learning

We delve into more advanced supervised learning approaches, during which we get a feel for linear support vector machines, decision trees, and random forest models for regression and classification. We also explore DBSCAN, an additional unsupervised learning approach.


Data Modeling: Advanced Model Evaluation and Data Pipelines | Presentations

We explore more sophisticated model evaluation approaches (cross-validation and bootstrapping) with the goal of understanding how we can make our models as generalizable as possible. Students complete their data science projects and share learnings and discoveries.

Special Topics

  • Graph Analytics: structure, traversals, analytics, PageRank, community detection, recursive queries, semantic web

Data Science

This is a great place to add a tagline.

Duration - 6 weeks

Price - 1499$

Schedule : Batches start every week 

Pick your SLOT here

Learn from the Industry experts


Gain experience through AppStore project

Get Assistance to be hired