0% found this document useful (0 votes)
94 views5 pages

Project Guidelines for Statistics Course

Uploaded by

phuongmaivu744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views5 pages

Project Guidelines for Statistics Course

Uploaded by

phuongmaivu744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

PROJECT-STATISTICS

Important dates:

⮚ Team registration and propose a research topic: EOD December 31, 2023, via a link to
be released on Blackboard.

⮚ Deadline for report submissions: 8:15 AM, January 12, 2023. File submissions are
on Blackboard.

Instructions: It is recommended that you should work in a group of 2-4 students to propose
ideas and to motivate each other. A group of at most four students is allowed. You can work
alone if you wish. The submission of the project report is by submitting a file (or a link for
Google Colab) on BB. If you have any questions or need advice, feel free to ask the instructor
or the TA.

Task 1 (25 points):


Complete the certification for the two courses “Intermediate Python” (15pts) and “Exploratory Data
Analysis in Python” (10 pts) on Datacamp. You can select these courses in the Datacamp group (please
accept our email for invitation to join our team in Datacamp). You can submit the evidence for partial
work if you are not able to complete the course. Task 1 is for independent submission for each
student.
Note: A bonus of 5 points will be given if you can submit the certification for completing “ Data
Science for Business” in Datacamp.

Task 2: Team research project (75 points):


A. Guidelines for the project topics:

The project topic is free. Each team can propose a project topic that you would like to study!
Here are some examples suggestions and inspirations:

1. Go to [Link]
Make a search, learn the previous projects to have inspirations, and initiate your own project.

For example, learn the linear regression with Boston housing data set and make your own
project on linear regression:

[Link]
[Link]
notebook

2. Analysing Global Warming in Vietnam


One can investigate statistically whether there is a significant change in temperature over time
in Vietnam.

Data can be downloaded here: Data Overview – Berkeley Earth

Present the descriptive statistics and design appropriate hypothesis tests to determine whether
there has been an increase in the average temperatures over the years at, for example, 10%
level of significance and at 2% level of significance, respectively. One can vary the level of
significance and observe the results. If it is possible, one can explore/infer some predictions for
the future.

3. Analysing the poverty and equity in Vietnam, and/or Analysing population (urban, rural,
largest cities) and making predictions for population in Vietnam, Income vs. Education, GDP,
finance, loan, travel services, Internet users, labour force, employment, education: pupil-
teacher ratio, school enrolment, Electricity consumption, Electricity production, tourist, air
transport, export, life expectancy, CO2 emission, renewable energy, etc.
Data can be taken from the Work Bank:

[Link]

4. Study a theoretical model and apply the model for some specific applications:

For example, one can study a regression model such as multilinear regression, logistic
regression (strongly recommend! This is an important model.) and apply the model, for
example, to accept the personal loan or to predict the bankruptcy of a company. The
probability p of bankruptcy is between 0 and 1 and can be predicted by the logistic regression
model.

B. Guidelines for writing the report and grading:

Project report should include Title; Abstract; Introduction: motivation, the importance of the
study, questions, or problems the project study; Methodology: Descriptive statistics (Data
collection, data summary) and inferential statistics (data analysis by C.I, hypothesis test,
regressions, ANOVA, etc); Results and discussion; References. The important aspects of any
statistical data analysis are stating questions, collecting data, visualizing data by descriptive
statistics tools and analyzing data by inferential statistics methods to infer the conclusions or
predictions. The techniques can be some of the following: summarizing data, and/or
confidence intervals, and/or hypothesis testing, and/or regression models, ANOVA, or a
technique that we will not cover in this class such as data classification or data clustering. Note
that if you are using models/techniques that we did not cover in class, you should explain the
models/techniques. All explanation and reasoning must be of your own team words. Turnitin
will be used upon submissions. Any serious similarity will be considered plagiarism.

The project proposal, report, and presentation weigh totally 75 points (5+50+20pts).

1. Proposal (5 points): Make a team and propose the research topic, including a title (2
points). Write a short plan (~ 1 page) to introduce the topic, summarize the proposed work and
outline a plan (3 points). Due: EOD Dec 31, 2023.

2. Project report (50 points).


1. Introduction (5pts): Introduce topic, clearly state the problem or question,
setting, motivation, data.
2. Descriptive statistics (10pts):
- Describe data: variables, factors, statistics, visualization.
- Distributions if any.
3. Inferential Statistics (20pts):
- Describe your questions and the hypothesis (if any hypothesis).
- Use Confidence intervals and Hypothesis tests to answer questions.
- For each test:
+ Explain the purpose of the test.
+ Explain why the test is suitable (e.g., why t instead of z, why ANOVA instead of pair
t, etc.)
+ Check all the conditions for the test (e.g., normality, same variance, etc.). If necessary,
runs test for the conditions.
- Demonstrate knowledge of the course content (and beyond the course, if possible), show
evidence of research effort. Bonus can be given for topics with a high level of difficulty and/or
creativity.
4. Interpretation, discussion, and conclusion (10 pts):
- Combine the test results and produce a meaningful conclusion/explanation for your
questions.
- Discuss pros and cons of your data analysis methods (any limitation, anything falls
short of expectation, etc.)
5. Teammate evaluation (5 pts):
- Does the member have an equal contribution? If not, state the percentage (%) work of
each member. On a scale from 1 (worst) to 5 (best), how would you rate the
performance of your teammate?
- Describe your group work experience: Did the team work well together to achieve
objectives. What went well, what did not go well, how would you do differently?
Bonus can be given for creation.
3. Rubrics for the Presentation (20 points):

Criteria Comments points Max points

Structure
● Is there a clear introduction? Does the
4
purpose/motivation/problem state clearly?
● Is there a finish? (Or is it just a sudden stop?)

Persuasiveness

● Does the presenter clearly communicate its


message to their specific audience?

● Is it realistic? Is it convincing?
6

● Does the presenter(s) exhibit a good


understanding of the topic?

● Enthusiasm and confidence?

Presentation values

● Can we see all the visuals?

● Can we hear all the voices/words? Is it clear?

10
● Were the transitions and flow easy to follow?
Slides were error-free and logically
presented?

● Time effectively used?

BONUS: Creativity

● This may be in content or the 2


approach/method of the presentation
(amusing, etc.)

Additional bonus for the project: A bonus of 5 points will be given if you can submit the
certifications for “Data Science for Business” in Datacamp.
---The end ---

Common questions

Powered by AI

The document recommends including descriptive statistics by describing variables, factors, and statistics, and employing visualization to represent distributions. Exploratory Data Analysis is a required certification, suggesting its role in understanding data patterns. These methods serve to clarify the dataset's structure, identify potential relationships, and set the foundation for further inferential analysis, which is crucial in developing insightful conclusions in the project context .

The guidelines for selecting and proposing a research topic allow teams to choose freely, encouraging exploration and initiative. These guidelines suggest getting inspiration from previous projects, notably from platforms like Kaggle, and propose examples such as analyzing global warming in Vietnam or examining poverty and equity, among others. These guidelines are important because they encourage comprehensive analysis and creativity, helping teams to engage deeply with selected problems, thereby achieving a successful outcome .

Critical criteria include clarity of introduction and finish, effective communication of the message, realism, understanding, enthusiasm, visual and audio clarity, and logical flow. Meeting these enhances presentation effectiveness by ensuring audience engagement, clear message delivery, and reinforcement of the presenter’s authority and understanding of the topic, thus facilitating better recognition of the project’s value .

Team project proposals must include a title and a short plan by the end of December 31, 2023, stating the topic, summarizing the proposed work, and outlining the plan. Reports require an introduction, descriptive and inferential statistics, results, and a discussion per detailed guidelines. Adherence ensures alignment with educational objectives, clarity in communication, and comprehensive evaluation criteria, which are critical in deterring plagiarism and fostering original analysis, thus impacting evaluation positively .

Key methods include regression models, ANOVA, confidence intervals, and hypothesis testing. The choice between methods, such as ANOVA versus pair t-tests, depends on data characteristics and project objectives, like data normality and variance. Teams should consider the research question, data type, and test assumptions. This careful selection is critical for credible interpretation and conclusions, highlighting the importance of understanding statistical appropriateness .

Potential challenges include high similarity indices if sources aren't properly cited or if common knowledge isn't distinguished from sourced materials. Teams should focus on original analysis and ensure thorough documentation of sources, utilizing paraphrasing skills, and understanding what requires citation. This clear differentiation and rigorous adherence to academic integrity guidelines can mitigate the risk of plagiarism, ensuring a fair evaluation .

Each student must independently complete certifications in 'Intermediate Python' and 'Exploratory Data Analysis in Python'. These certifications ensure that each team member possesses foundational skills essential for contributing effectively to the project. By aligning individual abilities with team goals, these certifications enhance both individual learning and cohesive team collaboration, promoting a more effective and efficient use of statistical tools throughout the project .

Explaining models or techniques not covered in class in their own words demonstrates comprehension and avoids plagiarism. This explanation ensures that the team fully understands and can effectively communicate the relevance and application of chosen models, contributing to the originality and academic integrity of the report. It reflects critical thinking and the ability to adapt learned concepts to new methodologies .

Teams should assess each member's contribution percentage and overall performance on a scale from 1 to 5. Evaluations include qualitative factors like teamwork dynamics and the balance of tasks. This transparency ensures fair contributions, identifies strengths and shortcomings, and encourages equitable workload distribution, which is critical for team morale and project success .

The bonus for completing 'Data Science for Business' serves to incentivize deeper learning beyond the basic course requirements. It encourages exploration of data science applications in business contexts, aligning well with course objectives of comprehensive understanding in data analysis and application. This added skillset provides students with a competitive edge, reinforcing theoretical learning with practical applications .

You might also like