0% found this document useful (0 votes)
10 views21 pages

MR4103 - Week 6a

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views21 pages

MR4103 - Week 6a

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MR4103

WEEK 6A
DATA
PROCESSING
AND
VISUALIZATION

PRADITYA
AJIDARMA
[email protected]
A BRIEF INTRO TO R PROGRAMMING
R Demo presented during Class Session
MERGE DATASET

 Joining rows and columns in R:


 cbind(): merge multiple data frames with the same number of rows; join data frames horizontally
 rbind(): merge multiple data frames with the same number of columns; join data frames vertically

RBIND CBIND
MERGE DATASET

merge() function: Merge two data frames by a common category in two columns or row names
RESHAPE AND PIVOT DATASET (MELT AND CAST)

 melt() function takes data in wide format and stacks some columns into a single one (long format)
 dcast() function aggregates molten data frame and widen it into multiple columns (wide format)
BASIC DATA TRANSFORMATION

 Five functions from dplyr package that allow you to manipulate the majority of data:
 Pick observations by their values filter()
 Reorder the rows arrange()
 Pick variables by their names select()
 Create new variables with functions of existing variables mutate()
 Collapse many values down to a single summary summarise()

 All dplyr functions work similarly:


 The first argument is a data frame.
 The subsequent arguments describe what to do with the data frame, using variable names
 The result is a new data frame.
BASIC DATA TRANSFORMATION
BASIC DATA TRANSFORMATION
USE LOGIC OPERATORS EFFECTIVELY

 filter(data, month == 11 | month == 12) is equal to filter(data, month %in% c(11, 12))
 filter(data, !( X > 120 | Y > 120)) is equal to filter( X <= 120, Y <= 120)
 filter(data, !is.na(x) , X > 1) to find match X > 1 that is also not NA
 near(sqrt(2) ^ 2, 2) to find match where the two values are not exact, but close enough
IMAGE PROCESSING –
PIXEL TO NUMERICAL DATA TRANSFORMATION
IMAGE PROCESSING –
PIXEL TO NUMERICAL DATA TRANSFORMATION
Handwriting Recognition Sentence Completion
Each pixel represents a spectrum of color (scaled Each word is labeled as one certain ID.
between 0 and 255), for 0 equals lowest intensity
Using training set, the model learn how each ID is
(white) and 255 equals the highest intensity
followed by another word (another ID) and learn
based on the frequency of occurrence
(Remember Naïve Bayes?)
DATE AND TIME  R allows you to work with date and time easily
DATA MANIPULATION  The simplest data type to use for dates is the ”Date” class.
 These will be internally stored as integers.
DATE AND TIME
Date and time manipulation using built-in POSIXt functionality
DATA MANIPULATION
WHEN DO I TURN ONE BILLION SECONDS OLD?

billbday = function(bday, age = 10^9, format = "%Y-%m-%d %H:%M:%S") {


x = as.POSIXct(bday, format = format) + age
togo = round(difftime(x, Sys.time(), units = "days"))
if (togo > 0) {
msg = sprintf("You will be %s seconds old on %s, which is %s days from now.", age,
format(x, "%Y-%m-%d"), togo)} else {
msg = sprintf("You turned %s seconds old on %s, which was %s days ago.", age, format(x, "%Y-%m-%d"), -1 * togo)}
print(msg)
format(x, "%Y-%m-%d")}
DATA VISUALIZATION: DENSITY PLOT

Library(“ggplot2”)
ggplot(data, aes(x=assets)) + geom_density() ggplot(data, aes(x=log10(assets))) + geom_density()
DATA VISUALIZATION: QUANTILE - QUANTILE PLOT

qplot(sample = assets, data) qplot(sample = profits, data) qplot(sample = marketvalue, data)


DATA VISUALIZATION: DENSITY PLOT FOR TWO CATEGORY

newdata = data %>% as_tibble() %>% mutate(


US_Country = country == "United States")
ggplot(newdata, aes(x=log10(marketvalue), fill=US_Country)) +
geom_density(alpha=0.4)
WEEK 6 GROUP EXERCISE

 Forbes Dataset  Auto-MPG Dataset


FORBES DATASET PRACTICE
(SUBMIT: R CODE, CONSOLE SCREENSHOT, & REPORTS)
AUTO-MPG DATASET PRACTICE
(SUBMIT: R CODE, CONSOLE SCREENSHOT, & REPORTS)
MIDTERM PROJECTS

Find an interesting dataset related to Indonesia from any literature/paper/website:


 State the Background “why you are choosing the data”, including its significance in your perspectives
 Describe the Structure of the data, what are the variables within the data
 Conduct any data cleaning and processing, if necessary
 Perform a Diagnostics regarding the data, using descriptive statistics, visualizations, and ML model (if
possible)
 Conclude a preliminary Analysis, which is the insights that you acquire after performing the activities
above
Due on Thursday 08 October 2020, 09.00 AM; Submit to LMS
Attached Submission: Power Point, R File

You might also like