Big Data with Apache Spark

Introduction to big data with Apache Spark

Scalable machine learning

  • Lab 3: Millionsong Regression Pipeline. Develop an end-to-end linear regression pipeline to predict the release year of a song given a set of audio features. I will implement a gradient descent solver for linear regression, use Spark’s machine Learning library ( mllib) to train additional models, tune models via grid search, improve accuracy using quadratic features, and visualize various intermediate results to build intuition.
  • Lab 4: Click-through Rate Prediction Pipeline. Construct a logistic regression pipeline to predict click-through rate using data from a recent Kaggle competition. I will extract numerical features from the raw categorical data using one-hot-encoding, reduce the dimensionality of these features via hashing, train logistic regression models using mllib, tune hyperparameter via grid search, and interpret probabilistic predictions via a ROC plot.
  • Lab 5: Neuroimaging Analysis via PCA – Identify patterns of brain activity in larval zebrafish. I will work with time-varying images (generated using a technique called light-sheet microscopy) that capture a zebrafish’s neural activity as it is presented with a moving visual pattern. After implementing distributed PCA from scratch and gaining intuition by working with synthetic data, you will use PCA to identify distinct patterns across the zebrafish brain that are induced by different types of stimuli.

How I start my journey to data science?

This is the excerpt for your very first post.

I started to get interested in data science when I finished one of my first MOOC courses, Data Analysis and Statistical Inference. It is a great course, which introduced me to statistics and R, the programming language for data science. Since then, I haven’t stopped. I continued with Data Science specialization on Coursera and The Analytics Edge on Edx, both inspired me and eventually lead me to the world of machine learning. I was so amazed that I, as a chemist, decided to change my career path and start over as a data scientist. I started to learn serious programming skills with Python and gradually, a new world is opening up for me.

If you also want to start your path in data science, check out  the links below: