Big Data with Apache Spark

Introduction to big data with Apache Spark

Scalable machine learning

  • Lab 3: Millionsong Regression Pipeline. Develop an end-to-end linear regression pipeline to predict the release year of a song given a set of audio features. I will implement a gradient descent solver for linear regression, use Spark’s machine Learning library ( mllib) to train additional models, tune models via grid search, improve accuracy using quadratic features, and visualize various intermediate results to build intuition.
  • Lab 4: Click-through Rate Prediction Pipeline. Construct a logistic regression pipeline to predict click-through rate using data from a recent Kaggle competition. I will extract numerical features from the raw categorical data using one-hot-encoding, reduce the dimensionality of these features via hashing, train logistic regression models using mllib, tune hyperparameter via grid search, and interpret probabilistic predictions via a ROC plot.
  • Lab 5: Neuroimaging Analysis via PCA – Identify patterns of brain activity in larval zebrafish. I will work with time-varying images (generated using a technique called light-sheet microscopy) that capture a zebrafish’s neural activity as it is presented with a moving visual pattern. After implementing distributed PCA from scratch and gaining intuition by working with synthetic data, you will use PCA to identify distinct patterns across the zebrafish brain that are induced by different types of stimuli.

Learning Analytics: CHEM 233 Flipped Classroom Evaluation

  • Introduction

Over the past two years, the teaching team in Chemistry 233 has implemented course videos to replace face-to-face lectures. In class, instructors facilitate learning using worksheets. This instructional strategy is currently referred to as the “flipped classroom”. We have collected midterm and end-of-term evaluation survey data, beliefs and attitudes data, pre-class quiz results, and exam results for the past several years. We would like to conduct detailed analysis of these data to answer the following questions:

1. What is the effect of performance or participation grades on student engagement with pre-class quizzes?

2. How does students’ pre-class preparation (video watching habits and quiz success) influence their success in the class?

3. What are the differences in performance on specific questions or type of questions from 2012-2014, and to what extent can these differences be explained by pedagogy?

4. How do learning attitudes and beliefs relate to perception of learning in the flipped classroom environment? That is, what types of students prefer this mode of instruction?

Answering the above questions will inform future decisions about the structure of Chemistry 233. We expect what we learn from this analysis will be useful to others in the Department, specifically first-year courses exploring flexible learning and new courses offerings for our upcoming curriculum change.

Over the last decade, researchers in science education have identified a variety of student beliefs that shape and are shaped by student classroom experience.1,4,5,7  Based on studies of students’ beliefs, researchers have developed instruments designed to probe these beliefs.8

Building on this prior work, researchers at Colorado have developed and validated another instrument, the Colorado Learning Attitudes about Science Survey, CLASS.The CLASS draws from the existing surveys (MPEX6, VASS3, EBAPS2) and adds and refines material to account for other student ABs observed to be important in educational practice.7

I have been analyzing students’ attitudes towards chemistry from year 2012 to 2014 and compared pre- and post-class surveys.  I also studied the correlation of attitudes with their final grades: whether more expert-like attitudes towards chemistry will lead to higher grade?

1.  Bransford, J.D., Brown, A.L., and Cocking, R.R. (2002). How People Learn Washington D.C.: National Academy Press.

2. Elby, A., Epistemological Beliefs Assessment for Physical Science

3.  Halloun, I. A. “Views About Science and Physics Achievement: The VASS 

Story.” In The Changing Role of Physics Departments in Modern 

Universities: Proceedings of the ICUPE, E.F.

4. Hammer, D. (2000) Student resources for learning introductory physics, American Journal of  Physics, 68, S52-S59.

5.  Redish, E.F.,(2003). Teaching Physics with Physics Suite, John Wiley & Sons.

6.  Redish, E., Saul, J.M. and Steinberg, R.N. (1998). Student Expectations in Introductory Physics American Journal of Physics, 66, 212-224. 

7.  Seymour, E. and Hewitt, N.,(1997). Talking about Leaving, Westview Press.

8. Seymour, E. and Zeilik, M., Field-tested Learning Assessment Guide (FLAG),

As an example of how this information will help us in designing CHEM 233, we have offered the flipped course once with performance-scored pre-class quizzes and once with participation-scored pre-class quizzes. The quizzes were very similar between years. Knowing how the scoring relates to student engagement and the quality of quiz responses will help us decide which way to score pre-class quizzes in the future.

Additionally, since we are spending more time in class on challenging, exam-level problems, we would expect student performance on these types of questions on exams to reflect this additional time. If we do not see a difference, we will modify our approach to continue to help students meet our course objectives.

Dognition: How to Increase the Number of Tests Completed


Dognition ( is a company that teaches you how to build a deeper connection with your dog by giving you an unprecedented perspective on your dog’s personality and capabilities. The company has tasked us with helping them figure out what business changes they could implement to increase the number of tests users complete on their website.


Check out the screencast:

And accompanied tableau story: