Starting with R
1 Introduction
R is a programming language for statistical purposes. In order to use the language, we need
an editor and interpreter to work with, in this course, we use R-Studio. R-Studio is an IDE
(integrated development environment) which offers more control over your programming and
may increase your working efficiency, however, it requires some familarization.
2 Download & Installation
In order to install R-Studio, you need to install R first. If you have already installed R (or
R-Studio), make sure to use the most recent version (4.0.3 or higher).
You can download R from [Link] Choose the right version (Windows, Mac
OS, Linux). When using Windows, click on Download R for Windows → base → Download R
3.6.1 for Windows and save the file. When using Mac OS, click on Download R for (Mac) OS
X → [Link] and save the file. When using Linux, you probably know what to do.
Install R from the downloaded .exe or .pkg file. Default options are fine unless you want to
change them.
If you have installed R, you can install R-Studio. You can download R-Studio from [Link]
Default options are also fine here, unless you want to change them.
3 Change Working Directory
3.1 Permanent Change
The working directory is the directory in which R searches all files you want to use and saves
pictures and other materials produced in the session. If you use any command to read data, R
looks into the working directory for this data. An incorrectly specified working directory might
lead to errors and a nonworking code.
1
To change your working directory permanetly, open R-Studio, click on Tools → Global Options
and choose the tab General. In the field Default working directory (when not in a project), click
on Browse... and choose your path, then click on Open and Apply.
3.2 Temporal Change
There are sevaral ways for a non-permanent change of the working directory (resets at every
start of R-Studio). This migh be more convenient, if you choose to use different folders for each
tutorial or want another way of organization.
The command setwd("Path/to/R") sets the working directory the the stated path.
If you have an open R script, you may click on Session → Set working Directory → To
Source File Location and the working directory is set to the location of the R script.
You can use Session → Set Working Directory → Choose Directory... to set an individual
location.
2
4 Some tasks for starting with R
4.1 Using R as calculator
On the left of R-Studio, you find the ”console”, in which every code is executed and in which
you may write single lines of code.
a. Carry out some trivial calculations in the console, e.g. 2*5, 4^2, 3*6 + 18/3, 1/10000
directly to obtain your first experience with R.
Remark: The sign > within the R working window means that R is ready to work. This
sign occurs automatically running each command.
b. There are some built-in functions in R. Calculate the mean and the variance of the vector
(1, 5, 23, 2, 9, 2.6).
Remark 1: A vector has to be indicated by a ”c”, e.g. c(1, 5, 23, 2, 9, 2.6, 3)
Remark 2: R uses the point ”.” as decimal sign, thus 2,6 will not give you the correct
result as ”2.6” would do.
4.2 Variables
A main bulding block of every R-code are variables. A variable can be seen as a way to store
some values.
a. Calculate the mean, the variance and the sum of all elements of the vector x, x =
(1, 5, 23, 2, 9, 2.6, 3).
Remark: Use a meaningful name for your variables, ususally ”x” is not always rec-
comended. A variable name should start with a (lower case) letter and may contain
letters, numbers and ”.” and ” ”.
b. Let y = c(5, 1, 23, 9, 0, 0, 14). Calculate the sum and the difference of x and y. Also,
calculate the elementwise and matrix product of x and y.
c. Let z = c(1, 5). Try to calculate the sum of z and x. What is the problem here?
4.3 Functions
A function is a more sophisticated element of R. A function consists of the function name and its
arguments: function name(argument 1, argument 2, ...). There might be no arguments at
3
all in a function. You know already two built-in functions, mean and var, but you can build
your own functions.
a. Write and execute a function that returns 5 every time it is called.
b. Write and execute a function that doubles the value of its argument.
c. Write and execute a function that checks, if a number is positive or non-positive (including
0).
4.4 Data Types
There are several data types in R. The most important types for our purposes are numerics
(i.e. decimal numbers) and characters (letters and text). Numerical variables are just stated as
numbers, characters must be put in quotes (””) to distinguish them from variable names.
a. In the previous tasks, we have used numerical variables. To check, if a variable is numeric,
we may use [Link](.). Check if the following data is numeric:
– 2
– x from task 4.2
– ”hello”
Remark: Sometimes you will get a number like 5L or 61L. This L just indicates an integer,
a number without decimal part.
b. Try to use "hello" with the function from 4.3 b. Is this positve or negative?
Remark: For most cases and functions, you do not need to explicitly check for numeric
or character data, this is done by the functions automatically and they will prompt you
with a warning. If any unknown error messages appear, it is worth, to check for incorrect
input types.
4.5 External Data, Plots and linear Regression
To use external data, you have to load it into R. On PANDA, you find the data set ”Exam-
[Link]” (.csv is a common data file extension).
a. Load the data into R using the [Link]-command.
4
b. Look at the data. How is the data organized and what variables can be found in the data?
What are the minimum and maximum of the variables?
c. Do a scatterplot of y ∼ x and of y ∼ z.
d. Run a linear regression of y ∼ x.
4.6 Installing and loading packages
A main reason why R is quite popular is the huge amount of user programmed functions and
packages. A package is a collection of functions for a specific task which is available online. For
every problem you might encounter, there is probably already a package available, so that you
do not need to program the functions on your own.
a. Install the ggplot2 package (needed for improved plots and graphics) by clicking on Tools
⇒ Install Packages. Then type ”ggplot2” below packages (the package should appear in
the dropdown menu). ”Install from” should be set to ”Repository (CRAN)” which is the
default”
b. The package is now in your library on your computer. In order to use it, you need to
load it from the library into the main memory of R. You can do so by the command
library(package name). Load the ggplot2 package.
c. Redo the plots from task 4.5 c. using the ggplot2 package.