0% found this document useful (0 votes)
12 views

Reading 3 - Programming For Data Science

The document discusses the differences between the Python and R programming languages. Python is a general-purpose language that is easy to learn while R is optimized for statistical analysis and data visualization. The main distinction is that Python is used for data wrangling and machine learning while R focuses on statistical modeling and analysis. The best choice depends on the problem and experience of the user.

Uploaded by

Desy Rohmahdani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Reading 3 - Programming For Data Science

The document discusses the differences between the Python and R programming languages. Python is a general-purpose language that is easy to learn while R is optimized for statistical analysis and data visualization. The main distinction is that Python is used for data wrangling and machine learning while R focuses on statistical modeling and analysis. The best choice depends on the problem and experience of the user.

Uploaded by

Desy Rohmahdani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Explore the basics of these two open-source programming languages, the key
differences that set them apart and how to choose the right one for your
situation.

If you work in data science or analytics, you’re probably well aware of the
Python vs. R debate. Although both languages are bringing the future to life —
through artificial intelligence, machine learning and data-driven innovation —
there are strengths and weaknesses that come into play.

In many ways, the two open source languages are very similar. Free to download
for everyone, both languages are well suited for data science tasks — from data
manipulation and automation to business analysis and big data exploration. The
main difference is that Python is a general-purpose programming language,
while R has its roots in statistical analysis. Increasingly, the question isn’t which
to choose, but how to make the best use of both programming languages for
your specific use cases.

What is Python?

Python is a general-purpose, object-oriented programming language that


emphasizes code readability through its generous use of white space. Released
in 1989, Python is easy to learn and a favorite of programmers and developers.
In fact, Python is one of the most popular programming languages in the world,
just behind Java and C.

Several Python libraries support data science tasks, including the following:

● Numpy for handling large dimensional arrays


● Pandas for data manipulation and analysis
● Matplotlib for building data visualizations

Plus, Python is particularly well suited for deploying machine learning at a large
scale. Its suite of specialized deep learning and machine learning libraries
includes tools like scikit-learn, Keras and TensorFlow, which enable data

1
scientists to develop sophisticated data models that plug directly into a
production system. Then, Jupyter Notebooks are an open source web
application for easily sharing documents that contain your live Python code,
equations, visualizations and data science explanations.

What is R?

R is an open source programming language that’s optimized for statistical


analysis and data visualization. Developed in 1992, R has a rich ecosystem with
complex data models and elegant tools for data reporting. At last count, more
than 13,000 R packages were available via the Comprehensive R Archive
Network (CRAN) for deep analytics.

Popular among data science scholars and researchers, R provides a broad


variety of libraries and tools for the following:

● Cleansing and prepping data


● Creating visualizations
● Training and evaluating machine learning and deep learning algorithms

R is commonly used within RStudio, an integrated development environment


(IDE) for simplified statistical analysis, visualization and reporting. R applications
can be used directly and interactively on the web via Shiny.

The main difference between R and Python: Data analysis goals

The main distinction between the two languages is in their approach to data
science. Both open source programming languages are supported by large
communities, continuously extending their libraries and tools. But while R is
mainly used for statistical analysis, Python provides a more general approach to
data wrangling.

Python is a multi-purpose language, much like C++ and Java, with a readable

2
syntax that’s easy to learn. Programmers use Python to delve into data analysis
or use machine learning in scalable production environments. For example, you
might use Python to build face recognition into your mobile API or for
developing a machine learning application.

R, on the other hand, is built by statisticians and leans heavily into statistical
models and specialized analytics. Data scientists use R for deep statistical
analysis, supported by just a few lines of code and beautiful data visualizations.
For example, you might use R for customer behavior analysis or genomics
research.

Other key differences

● Data collection: Python supports all kinds of data formats, from


comma-separated value (CSV) files to JSON sourced from the web. You
can also import SQL tables directly into your Python code. For web
development, the Python requests library lets you easily grab data from
the web for building datasets. In contrast, R is designed for data analysts
to import data from Excel, CSV and text files. Files built in Minitab or in
SPSS format can also be turned into R dataframes. While Python is more
versatile for pulling data from the web, modern R packages like Rvest are
designed for basic webscraping.
● Data exploration: In Python, you can explore data with Pandas, the data
analysis library for Python. You’re able to filter, sort and display data in a
matter of seconds. R, on the other hand, is optimized for statistical
analysis of large datasets, and it offers a number of different options for
exploring data. With R, you’re able to build probability distributions, apply
different statistical tests, and use standard machine learning and data
mining techniques.
● Data modeling: Python has standard libraries for data modeling, including
Numpy for numerical modeling analysis, SciPy for scientific computing
and calculations and scikit-learn for machine learning algorithms. For

3
specific modeling analysis in R, you’ll sometimes have to rely on packages
outside of R’s core functionality. But the specific set of packages known
as the Tidyverse make it easy to import, manipulate, visualize and report
on data.
● Data visualization: While visualization is not a strength in Python, you can
use the Matplotlib library for generating basic graphs and charts. Plus, the
Seaborn library allows you to draw more attractive and informative
statistical graphics in Python. However, R was built to demonstrate the
results of statistical analysis, with the base graphics module allowing you
to easily create basic charts and plots. You can also use ggplot2 for more
advanced plots, such as complex scatter plots with regression lines.

Python vs. R: Which is right for you?

Choosing the right language depends on your situation. Here are some things to
consider:

● Do you have programming experience? Thanks to its easy-to-read syntax,


Python has a learning curve that’s linear and smooth. It’s considered a
good language for beginning programmers. With R, novices can be
running data analysis tasks within minutes. But the complexity of
advanced functionality in R makes it more difficult to develop expertise.
● What do your colleagues use? R is a statistical tool used by academics,
engineers and scientists without any programming skills. Python is a
production-ready language used in a wide range of industry, research and
engineering workflows.
● What problems are you trying to solve? R programming is better suited
for statistical learning, with unmatched libraries for data exploration and
experimentation. Python is a better choice for machine learning and
large-scale applications, especially for data analysis within web
applications.
● How important are charts and graphs? R applications are ideal for

4
visualizing your data in beautiful graphics. In contrast, Python applications
are easier to integrate in an engineering environment.
● Note that many tools, such as Microsoft Machine Learning Server, support
both R and Python. That’s why most organizations use a combination of
both languages, and the R vs. Python debate is all for naught. In fact, you
might conduct early-stage data analysis and exploration in R and then
switch to Python when it’s time to ship some data products.

Material Sources:
https://www.ibm.com/cloud/blog/python-vs-r

Other Reading Sources:


https://www.python.org/about/gettingstarted/

http://www.sthda.com/english/wiki/r-basics-quick-and-easy

https://towardsdatascience.com/python-basics-for-data-science-6a6c9
87f2755

https://medium.com/datactw/a-complete-introduction-to-r-for-data-sc
ience-1858c69f76b0

https://towardsdatascience.com/getting-started-with-r-programming-2
f15e9256c9

https://r4ds.had.co.nz/index.html

https://dplyr.tidyverse.org/

https://datacarpentry.org/R-ecology-lesson/03-dplyr.html

https://medium.com/analytics-vidhya/python-data-manipulation-fb86d
0cdd028

You might also like