0% found this document useful (0 votes)
32 views5 pages

Final Project Guidelines for Data Science

The document outlines the final project requirements for a course at MIT, emphasizing group collaboration to analyze a chosen dataset and answer a research question using statistical methods. Key deadlines include project proposal submission, an intermediate status report, and the final report, along with poster presentations for evaluation. The evaluation criteria focus on the clarity of the research question, data description, methods used, and interpretation of results.

Uploaded by

Sophia Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views5 pages

Final Project Guidelines for Data Science

The document outlines the final project requirements for a course at MIT, emphasizing group collaboration to analyze a chosen dataset and answer a research question using statistical methods. Key deadlines include project proposal submission, an intermediate status report, and the final report, along with poster presentations for evaluation. The evaluation criteria focus on the clarity of the research question, data description, methods used, and interpretation of results.

Uploaded by

Sophia Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Statistics, Computation and Applications February 2025

Massachusetts Institute of Technology IDS.012/6.3730/IDS.131/6.3732


Caroline Uhler, Mardavij Roozbehani, Navid Azizan Final Projects Handout

Final Projects

Final projects are an important part of this course. The projects will be done in groups, and we will
organize the group assignments (based on your input) to ensure similar interests and a combination of skills,
from statistics to data science to programming. Your tasks for the final project are:

1. To find a data set

2. To pose a research question

3. Answer the research question using a combination of analysis and visualization methods presented in
this course and further methods that you explored on your own.

The timeline, including important deadlines, is outlined below. Start working on the project early, get
feedback from the TAs and instructors as you develop your ideas, and have fun!

Timeline
Coming weeks Connect with your project partners and decide on a topic, data set, and question. Start
writing the project proposal (details below).

March 12 Project proposals due.

April 18 Intermediate status report due. This should contain ≥ one paragraph on what you have tried so
far and what steps remain. Please report if some of your approaches were not fruitful or you ran into
problems and need to change your plans (and, if need be, talk to us early).

April 19-May 7 Apply the methods you laid out in the proposal.

May 7 Project report is due.

May 7 and May 12 Project poster presentations.

Data
You are free to pick a topic and data set of your choice. For inspiration, we have included a list of possible
datasets at the end of this handout.

Project Proposal
The project proposal should be about two pages (without references) and contain the following:

1. The names of all team members.

2. The question you are going to address and a motivation for it,

3. Description of your data and an exploratory analysis (for example, a preliminary visualization of the
data).
2

4. Description of the analysis methods you will use to address this question and a justification for your
choices. These should consist of methods you learned during the course or additional methods you
have read up on individually. Will these methods apply directly? Will you need to modify anything?
Does your data need pre-processing (cleaning)?

5. For graduate students: an overview of some related work.

6. Realistic timeline for the project. Make sure you plan in sufficient time for computation and explo-
ration; this tends to take more time than expected.

Note that you can (and should) start working on your projects before the proposal deadline to get an idea
of the data and feasibility of your plans. This way, you can have an initial exploratory analysis completed
by the proposal deadline. The proposal is a great opportunity for you to get feedback from the TAs and
instructors as you develop your ideas.

Evaluation
The evaluation of the final projects will take into account the poster presentation, final report, and feedback
from the class and group members. We will evaluate the analysis (breadth/depth of methods), the presen-
tation, and the interpretation of your findings. If you are a graduate student, you may treat the project as
leading to a research paper. Please see a detailed evaluation rubric in Table. 1.

If you run into any problems with the project, either technically or with group dynamics, please talk
to an instructor or TA as soon as possible.

Final Project Presentation


Every group gives a poster presentation about their data set, the questions they raised and their findings. The
rest of the class will give written feedback on a subset of presentations. In addition, each student will give
feedback on the group process and dynamics. This will be used to evaluate the projects.

The poster presentation will be spread over 2 days; students are expected to attend both sessions (if they
are unable, they must make separate arrangements with the teaching staff). Each student will be expected
to help present their poster (and answer questions from classmates/teaching staff) on one day and comment
on three other posters during the other session. Students should be familiar with all parts of their project,
not just the methods they directly contributed to. A Google sheet online will summarize which students are
presenting and which posters you are expected to comment on. Presentations will be judged according to
the above criteria; be sure that each section is present on your poster.

Final Project Report


The final project report should be 6-8 pages long and contain the following:

1. The names of all team members.

2. The question you are addressing and a motivation for it.

3. For graduate students: an overview of some related work.


3

4. Description of your data.

5. The analysis methods you have been using, including any preprocessing; if you had to deviate from
your initial plan, briefly explain why.

6. Summary and interpretation of your results.

Suggestions for Data Sets


Here are some suggestions for data sets. You may use one of any of these or find your own.

• Google Dataset search [Link] ([Link]


[Link]/products/search/making-it-easier-discover-datasets/)

• Google Flu/Dengue Trends/Health Trends ([Link]

• Open industry data: ([Link]

• Mice Protein Expression Data Set (UCI repository)

• Esri’s Living Atlas (census, demographic, climate, spatial data): ([Link]


com/en/)

• Open Ocean Initiative environmental data: ([Link]

• Martha’s Vineyard Coastal Observatory climate and phytoplankton: ([Link]


website/mvco/data/)

• Breast Cancer data (UCI repository)

• Greenhouse gas observing network (UCI repository)

• Medical and healthcare data for machine learning: ([Link]

• El Nino data (UCI repository)

• Air Pollution data: USA: EPA data [Link]


Europe:
[Link]

• Climate data sets and Challenge Problems: [Link]


data-sets and [Link]

• SubseasonalRodeo: benchmark dataset for training and evaluating subseasonal forecasting systems:
[Link]
DVN/IHBANG

• Enron email data set ([Link]

• S&P/other financial time series (Yahoo Finance)

• Billion Prices Data sets: [Link]

• Hubway data [Link]


4

• NYC taxi data: [Link]


shtml

• Uber data: [Link]

• Congressional voting data

• Twitter data ([Link]

• Netflix data

• Presidential election data (primary results, polling data, etc.)

• NFL play-by-play data

• NBA player tracking data ([Link]

• Stanford Large Networks Dataset collection [Link]

• list of many more data sets from various areas:


[Link]
5

Category Weight Assessment Criteria


Research Question 10%
• Question is clear and measurable.

• Includes a discussion of why this question is important or


interesting.

Data Description 10%


• Clearly describes the dataset.

• Justifies why these data address the research question.

Preprocessing 10%
• Describes the preprocessing procedure.

• Explains techniques used and the resulting outcomes.

Methods 25%
• Describes methods used (at least two distinct approaches,
at least one covered in class).

• Justifies why these methods are appropriate for both the


question posed and the data used.

Evaluation 25%
• Explains the technique(s) for evaluating each method.

• Shows and interprets key outputs or performance metrics.

Interpretation 20%
• Interprets and discusses the significance of the findings.

• Explains how the results address the research question.

Table 1: Assessment Criteria and Weighting

You might also like