For the final project, your job will be to use the tools from this course to analyze a large real-world dataset.
The goal of the project is to empower you to apply the end-to-end process of data cleaning, graphing, and modeling to data that’s publicly accessible. This should be challenging but also fun and creative!
You will work in groups of 4-5 people that will be assigned at the beginning of the quarter based on interests. Lots of large scale data science work happens in groups so this is a good chance to practice that, and it will let you be more ambitious about the kinds of analysis and modeling you do.
Your group will be assigned one of the publicly available data sets below based on interest.
If you and your group members are interested in a topic not represented by the data sets below and you would like to analyze a different set of data, that’s fine! Just make sure to talk to the instructor and TA about it first.
Take a look at the data sets here:
Worldwide energy consumption patterns (Source)
Heart failure data (Source)
2020 US voter survey data (Source)
Worldwide COVID 19 case rates (Source)
High school social and academic performance data (Source)
S&P 500 economic performance data (Source)
Your group will submit one jupyter notebook file with the following sections:
(1) Introduction: a few sentences describing your data set, where it comes from, and the kind of questions your group is exploring in the data.
(2) Data: an overview of how the data is structured, as well as any cleaning, formatting, and preparation you did for your analysis.
(3) Graphs: include 2-3 graphs highlighting major patterns or features of your data set. Include text explaining what these graphs show about your data.
(4) Analyses: Include 3 analyses of your data using techniques we learned in class. Include an interpretation of the results of these analyses.
(5) Conclusions: a brief conclusion section that summarizes what your graphs and analyses show about this data.
Each group will present their analysis in lab on 6/1.
Your lab credit for week 10 will be based on this presentation.
In that lab meeting, the instructor and TA will give you feedback that you can incorporate into the analysis.
The final analysis should be turned in by 11:59pm on 6/8.
The details of this assignment may change as the quarter goes on, so make sure to check back here as you go!
If you have any questions, you can always post or message on campuswire.