STA4026S 2021 - Continuous Assessment 2 Ver0.0 - 2021!09!29

This document provides instructions for a statistical report on analyzing a dataset from the Rondebosch Half Marathon. It details: - The dataset contains information on 725 runners including speed, nutrition intake, age, sex, and shoe brand. - Questions include conducting an exploratory data analysis, encoding predictor variables, fitting neural networks with 1 and 5 hidden nodes to predict speed, and using the best model to predict test set speeds and plot response curves. - Instructions are given on formatting the report, including code, figures, and submitting prediction results. The goal is to analyze relationships between predictors and response to advise researchers.

Uploaded by

Millan Chibba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views6 pages

STA4026S 2021 - Continuous Assessment 2 Ver0.0 - 2021!09!29

Uploaded by

Millan Chibba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

STA4026S Analytics.

Continuous Assessment 2 2021

Statistical Report Writing Conventions and Instructions

This is an individual assessment, you may not discuss, share content, or ask
your classmates questions about the assessment. If something is unclear, direct
your questions at me (Etienne) via email.
You may use any typesetting software to compile your report. Rmarkdown and
LATEXare preferred for the obvious reasons, but you are welcome to use whatever
you are comfortable with as long as your nal hand in is a legible PDF le.
Clearly delineate the questions to which your responses apply.

You may include code responses (copies of the code relevant to delineated ques-
tions) either interspersed in your write up at the relevant positions where you an-
swer questions, or in an appendix. Either way, you should include the code in your
write-up.
Provide comments in your R code indicating roughly to which question your code
applies. Even if they are interspersed.
Do not include any R console output! And denitely do not screen-shot and paste
in the body of your write up. You are the analyst, not the reader. A well written
statistical report would not contain any console output. Tabulate and typeset or
plot your output properly. (I've included an example of how to tabulate R objects
in the Rmarkdown le.)
Do not include gures in an appendix. Figures are supplemental to your
writing and should be included in body of the write-up. Also, gures presented
on their own are rarely of any value. The only species of gure that can live on
its own in this context is an infographic. Figures on the other hand are graphical
mechanisms which support discussion in scientic reports.
Include a plagiarism declaration as the very last page in your report. No signed
declaration, no mark. I've included an example Rmarkdown le showing how you
can incorporate a pdf directly in your markdown compilation.
Use the naming convention STDNUM001_STA4026S_CA2.pdf for your PDF le.
Note the underscores.

1
Though you should include your code in a write-up, a separate single le with all
of your R code must be uploaded separately to the code tab for the assessment.
Use the naming convention STDNUM001_STA4026S_CA2.R for your le. Note
the underscores. Your R code should NOT contain any of the following:
install.packages()
rm()
setwd()

I want to be able to run your code on my computer without having to manually

edit your code, installing libraries or calling to external les that I don't have. It
may refer to datasets which I have provided as part of this assessment.

Page 2
Question 1 (25 marks)
Re-
You are a Statistician consulting on behalf of a sports science institute.
searchers at the institute are interested in the relationship between nu-
tritional intake over race distance and speed. The researchers have provided
you with data from the Rondebosch Half Marathon (21km) which consist of 725
observations on the following variables:
Variable Description
Speed_21km Average speed over the full race distance, 21.1km.
Nutrition Variable indicating how much water/liquid nutrition was consumed. Pos-
sible values ∈ [0, 2.5].
Age_Scl Age of participant in 100s of years. (So, years scaled by 100.) Possible
values ∈ [0.2, 0.8].
Sex Sex of participant. Factor variable, either ’Male’ or ’Female’.
ShoeBrand Shoe brand used. Factor variable with levels ’Nike’, ’NewBalance’.

The data are already split into training, validation, and test sets. See, e.g.:
> rm(list = ls(all = TRUE))
> dat_train = read.table(’Rondebosch21km_2021_Train.txt’, h = TRUE)
> dat_val = read.table(’Rondebosch21km_2021_Validate.txt’, h = TRUE)
> dat_test = read.table(’Rondebosch21km_2021_Test.txt’, h = TRUE)
> head(dat_train, 5))

Speed_21km Nutrition Age_Scl Sex ShoeBrand

1 10.39 0.74 0.47 Male NewBalance
2 10.00 0.30 0.52 Female Nike
3 9.06 1.40 0.39 Female Nike
4 8.74 1.03 0.48 Female NewBalance
5 10.31 0.54 0.47 Male Nike

(a) Code and Write-up: Conduct an exploratory data analysis. Use relevant (4)
plots to probe the empirical relationship between the predictors and responses
and interpret these gures.
(b) Code: Encode the input data in an appropriate design matrix. Note: no further (2)
scaling is required for the input variables here. Hint: model.matrix()
Write-up: Give mathematical expressions for the encoding of the input vector,
xi , where i denotes the ith observation.
(c) Code: Write a R-function that evaluates the updating equation that denes (5)
a neural network with a single hidden layer with m hidden nodes with logistic
activation functions on all hidden nodes. Full marks can only be obtained for
evaluating the forward equations in matrix form.
Write-up: Shortly motivate your choice of activation functions, cost function
and regularisation mechanism.
(d) Code: Fit two neural networks, each with a single hidden layer containing three (7)
and ve nodes respectively to the data. Do this by conducting an appropri-
ate validation analysis under an appropriately chosen regularisation mechanism.
You may use any standard R optimisation routines in order to t the models.
Write-up: Plot the validation error vs. λ for both models on the same gure
and interpret the results. Use this gure to motivate your choice of regularization

Page 3
level and model (amongst the two tted here). That is, rst determine which
model to use and then report the level of regularization which you will apply to
the chosen model.
(e) Code & Write-up:Use the model selected in (d) to plot response curves over (5)
Age and Nutrition for Male runners who use Nike shoes. Do the same for female
runners. Use these gure to formulate a response to describe the relationship
between the predictors and the response to the researchers for which you are
consulting. Hint: use a 2D lattice over Age and Nutrition and visualise using
filled.contour(). Failover: If you can't get the 2D lattice to work draw the
response curves over Nutrition but x for individuals aged 40 (0.4, scaled).
(f) Code: Use the network tted in (d) to predict the responses for the test dataset. (2)
Write your predictions to a .csv le to be handed in with your report using the
following naming convention (replace `STDNUM001' with your student num-
ber):
R> pred = data.frame(predictions = matrix(predictions, ncol = 1))
R> write.table(pred,’STDNUM001_STA4026S_CA2.csv’, quote = F, row.
names = F, sep = ’,’)

The .csv le is to be uploaded to the CA2 predictions assignment tab on Vula.
Make sure your le contains a single column of predictions! Important: if you
did not get to this point or your predictions did not work for whatever reason,
change the name of the example le given to reect your student ID and return
that without altering its contents.

Page 4
Question 2 (9 marks)
A subject of interest in modern research on neural networks is that of input sensitivity.
For our purposes, we'll focus specically on the sensitivity measured as the gradient
of the 1st output with respect to the inputs of the model for a given parameter set.
By simple modication of the elements in the backprop algorithm we can calculate
these sensitivities directly. Alternatively, this can be achieved by approximating the
gradient of the output variable: Let x = (xj )1×p denote a vector of inputs, then dene
two new vectors xk+ = (xk+ j )1×p and x
k−
j )1×p where
= (xk−

xj + h/2 if j = k,
(
xk+ =
j
xj otherwise,
for some index k and h suitably small. Likewise:

xj − h/2 if j = k,
(
xk− =
j
xj otherwise,
Evaluate
aL1 (xk+ , θ̂) − aL1 (xk− , θ̂)
∇k =
h
for all k variables where aL1 (x, θ) denotes the rst output (L is the number of layers
in the network) evaluated for the input vector x and parameter set θ. Note that we
have to estimate θ here rst.
Consider now the simulation exercise where we conducted gradient checking in class
(Lecture 6):
# Let’s fake a dataset and see if the network evaluates:
set.seed(2020)
N = 50
x = runif(N,-1,1)
e = rnorm(N,0,1)
y = 2*sin(3*pi*x)+e

plot(y~x, pch = 16, col = ’blue’)

...

Use the template code provided to t a (10)-network to the simulated data, with no
regularisation. (I've given you everything up to tting the model.)
(a) R-code: Approximate the gradient of the output with respect to the input (3)
at a regularly spaced set of coordinates for the input variable using h = 0.01.
Write-up: Plot the values of ∇1 for the regularly spaced set of coordinates in
the input space (evaluate this quantity at dierent values for the input) and
compare these to the derivative of the true target function. (You may plot the
derivative of the true target function and then comment. )
(b) Does the plot in (a) suggest a means for conducting regularisation which does (2)
not involve penalising the parameters? Clearly motivate your response.
(c) R-code & write-up: We can streamline the above procedure by calculating (4)
the gradients of the outputs w.r.t. the inputs exactly. For these purposes, do the

Page 5
following: Modify your R-code in Q3 (a) to calculate and return the gradients
of the outputs w.r.t. the input variables using back-propagation. You may
dene a new function/copy of existing function if you like. Verify that these are
correct by superimposing the values calculated on your gradient testing plot in
the previous question. Note: this part might require some careful thought, but
it is actually very easy.

Page 6

A Review On The Effectiveness of Machine Learning and Deep Learning Algorithms For Cyber Security
No ratings yet
A Review On The Effectiveness of Machine Learning and Deep Learning Algorithms For Cyber Security
19 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
Iyer Vadammma Tamilnadu Brahmin Wedding
No ratings yet
Iyer Vadammma Tamilnadu Brahmin Wedding
23 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Glocal University: Practical File of R Programming
100% (1)
Glocal University: Practical File of R Programming
32 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
W1 Class Overview and R Basics
No ratings yet
W1 Class Overview and R Basics
33 pages
STA304 Assignment2 Instructions
No ratings yet
STA304 Assignment2 Instructions
6 pages
Stat 302 Practice Final: Brad Mcneney 2017-04-15
No ratings yet
Stat 302 Practice Final: Brad Mcneney 2017-04-15
7 pages
Writing A Reproducible Paper in R Markdown: Mail@paulcbauer - Eu Github Repository
No ratings yet
Writing A Reproducible Paper in R Markdown: Mail@paulcbauer - Eu Github Repository
18 pages
Cb161 Lab Manual
No ratings yet
Cb161 Lab Manual
25 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
Lab4 2021
No ratings yet
Lab4 2021
2 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
R Course
No ratings yet
R Course
7 pages
PO687 End of Term Project
No ratings yet
PO687 End of Term Project
3 pages
PS Assignments
No ratings yet
PS Assignments
35 pages
Machinelearning
No ratings yet
Machinelearning
3 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
COMP551 Fall 2020 P1
No ratings yet
COMP551 Fall 2020 P1
4 pages
QBUS6840 Group Assignment (30 Marks) : 1 Background and Task
No ratings yet
QBUS6840 Group Assignment (30 Marks) : 1 Background and Task
3 pages
Assignment 4 Corrected
No ratings yet
Assignment 4 Corrected
3 pages
DSR 2879
No ratings yet
DSR 2879
25 pages
Predictive Modeling-Handouts
No ratings yet
Predictive Modeling-Handouts
11 pages
Mathematics For Machine Learning-I
No ratings yet
Mathematics For Machine Learning-I
10 pages
Final Project Implementation
No ratings yet
Final Project Implementation
3 pages
Big Data Slip Solution
No ratings yet
Big Data Slip Solution
18 pages
MD115 Wk01
No ratings yet
MD115 Wk01
67 pages
Prerequisites: R Installation
No ratings yet
Prerequisites: R Installation
11 pages
Saurabh
No ratings yet
Saurabh
22 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
Lec Introduction CEP
No ratings yet
Lec Introduction CEP
99 pages
732A94 AdvancedRHT2024 Lab04
No ratings yet
732A94 AdvancedRHT2024 Lab04
8 pages
FIT1043 A2 Specification - S2 2024 - Gks6arg
No ratings yet
FIT1043 A2 Specification - S2 2024 - Gks6arg
5 pages
The Museums and AI
No ratings yet
The Museums and AI
15 pages
Ai&ml Question Bank Answers
No ratings yet
Ai&ml Question Bank Answers
26 pages
Module 2
No ratings yet
Module 2
20 pages
CS502M Project Spec
No ratings yet
CS502M Project Spec
8 pages
Mini Project Report
No ratings yet
Mini Project Report
45 pages
Assignment - 2
No ratings yet
Assignment - 2
6 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Kanak Gupta 1116 SEC Assignment
No ratings yet
Kanak Gupta 1116 SEC Assignment
3 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
EC4401 - Pract. Exam (2024-2025)
No ratings yet
EC4401 - Pract. Exam (2024-2025)
3 pages
Explainable Machine Learning On New Zealand Strong Motion For PGV
No ratings yet
Explainable Machine Learning On New Zealand Strong Motion For PGV
9 pages
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
No ratings yet
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
9 pages
SN1 Project Part2
No ratings yet
SN1 Project Part2
2 pages
Food and Formalin Detector Using Machine Learning Approach: October 2019
No ratings yet
Food and Formalin Detector Using Machine Learning Approach: October 2019
7 pages
Assignment 1-2
No ratings yet
Assignment 1-2
4 pages
Determinants of Resdiential Property Price in Nigeria An ANN Approach
No ratings yet
Determinants of Resdiential Property Price in Nigeria An ANN Approach
17 pages
Experiment Tracking With Weights & Biases
No ratings yet
Experiment Tracking With Weights & Biases
5 pages
Hunt CTI Indicators
No ratings yet
Hunt CTI Indicators
23 pages
Answer Key Sample Paper 3 AI Class 10
No ratings yet
Answer Key Sample Paper 3 AI Class 10
10 pages
Homework 2
No ratings yet
Homework 2
8 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Task by Task Guide - Build and Deploy A Stroke Prediction Model Using R
No ratings yet
Task by Task Guide - Build and Deploy A Stroke Prediction Model Using R
5 pages
Loan Approval Prediction System Using Machina Learning
No ratings yet
Loan Approval Prediction System Using Machina Learning
4 pages
22 Vol. 8 Issue 12 Dec 2017 IJPSR RA 8278
No ratings yet
22 Vol. 8 Issue 12 Dec 2017 IJPSR RA 8278
15 pages
Computer Interactive Statistics
No ratings yet
Computer Interactive Statistics
103 pages
Presentation-2 Data Pre-Processing in Machine Learning
No ratings yet
Presentation-2 Data Pre-Processing in Machine Learning
11 pages
Task 3P-1
No ratings yet
Task 3P-1
6 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
Task 2P-1
No ratings yet
Task 2P-1
4 pages
Autoencoder-Based Feature Extraction For Identifying Hate Speech Spreaders in Social Media
No ratings yet
Autoencoder-Based Feature Extraction For Identifying Hate Speech Spreaders in Social Media
9 pages
ccs341 Data Warehousing Lab Manual2021
No ratings yet
ccs341 Data Warehousing Lab Manual2021
50 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
17 pages
Yaikob Second Assesiment Final
No ratings yet
Yaikob Second Assesiment Final
33 pages
Stat 1000 Assignment 2
No ratings yet
Stat 1000 Assignment 2
17 pages
Ai-900 7
No ratings yet
Ai-900 7
6 pages
Lecture 20
No ratings yet
Lecture 20
46 pages
STA - 272 Class Exercise 01
No ratings yet
STA - 272 Class Exercise 01
4 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
Fmri Expirement
No ratings yet
Fmri Expirement
9 pages
Data Science Workshop
No ratings yet
Data Science Workshop
6 pages
Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning For Multi-Turn Intent Classification
No ratings yet
Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning For Multi-Turn Intent Classification
11 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
4 III BTech Minor DS Courses Syllabus
No ratings yet
4 III BTech Minor DS Courses Syllabus
5 pages
Project Guideline
No ratings yet
Project Guideline
4 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Project Report
No ratings yet
Project Report
3 pages
Ai Project Cycle
No ratings yet
Ai Project Cycle
9 pages
R Program 2025,-1
No ratings yet
R Program 2025,-1
11 pages
02 Ai Project Cycle Important Questions Answers 1
No ratings yet
02 Ai Project Cycle Important Questions Answers 1
33 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
4 pages
? Overview of R Programming Language Unit 5
No ratings yet
? Overview of R Programming Language Unit 5
23 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
Assignment 1
No ratings yet
Assignment 1
8 pages

STA4026S 2021 - Continuous Assessment 2 Ver0.0 - 2021!09!29

Uploaded by

STA4026S 2021 - Continuous Assessment 2 Ver0.0 - 2021!09!29

Uploaded by

STA4026S  Analytics.

Continuous Assessment 2 2021

Statistical Report Writing Conventions and Instructions

I want to be able to run your code on my computer without having to manually

Speed_21km Nutrition Age_Scl Sex ShoeBrand

plot(y~x, pch = 16, col = ’blue’)

You might also like

STA4026S Analytics.