0% found this document useful (0 votes)
29 views

Let's Start With Data Science

The document provides instructions for installing Jupyter Notebook and recommends using Anaconda for new users. It outlines downloading and installing Anaconda, which installs Python, Jupyter, and commonly used packages. For experienced users, it describes installing Jupyter using pip. Basic Python concepts and popular libraries like NumPy, Pandas, and Scikit-Learn are also introduced.

Uploaded by

andrew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Let's Start With Data Science

The document provides instructions for installing Jupyter Notebook and recommends using Anaconda for new users. It outlines downloading and installing Anaconda, which installs Python, Jupyter, and commonly used packages. For experienced users, it describes installing Jupyter using pip. Basic Python concepts and popular libraries like NumPy, Pandas, and Scikit-Learn are also introduced.

Uploaded by

andrew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Software Requirements:

● It is recommended to have a jupyter notebook installed. (You can also use


google colab)

Prerequisite: Python
While Jupyter runs code in many programming languages, Python is a requirement (Python 3.3 or
greater, or Python 2.7) for installing the Jupyter Notebook.

We recommend using the Anaconda distribution to install Python and Jupyter. We’ll go through its
installation in the next section.

Installing Jupyter using Anaconda and conda


For new users, we highly recommend installing Anaconda. Anaconda conveniently installs Python,
the Jupyter Notebook, and other commonly used packages for scientific computing and data
science.

Use the following installation steps:

1. Download Anaconda. We recommend downloading Anaconda’s latest Python 3 version


(currently Python 3.7).
2. Install the version of Anaconda which you downloaded, following the instructions on the
download page.

Congratulations, you have installed Jupyter Notebook. To run the notebook:


jupyter notebook

Alternative for experienced Python users: Installing Jupyter with


pip
Important

Jupyter installation requires Python 3.3 or greater, or Python 2.7. IPython 1.x, which included the
parts that later became Jupyter, was the last version to support Python 3.2 and 2.6.

As an existing Python user, you may wish to install Jupyter using Python’s package manager, pip,
instead of Anaconda.

First, ensure that you have the latest pip; older versions may have trouble with some dependencies:

pip3 install --upgrade pip

Then install the Jupyter Notebook using:

pip3 install jupyter

(Use pip if using legacy Python 2.)

Congratulations. You have installed Jupyter Notebook.

Steps to install Anaconda : Anaconda_install

Anaconda
Basic Python concepts to go through:

Links to start with learning python basic:

● https://www.geeksforgeeks.org/python-programming-language/
● https://www.programiz.com/python-programmin

Links to understand data analytics:

● https://towardsdatascience.com/a-beginners-guide-to-data-analysis-in-python-188706df5447

● What are python libraries? – A Python library is a collection of related modules.


It contains bundles of code that can be used repeatedly in different programs. It
makes Python Programming simpler and convenient for the programmer. As we
don’t need to write the same code again and again for different programs. Python
libraries play a very vital role in fields of Machine Learning, Data Science, Data
Visualization, etc.

● Some basic libraries to learn about -


● Numpy- NumPy is one of the most essential Python Libraries for scientific
computing and it is used heavily for the applications of Machine Learning and
Deep Learning. NumPy stands for NUMerical PYthon. Machine learning
algorithms are computationally complex and require multidimensional array
operations. NumPy provides support for large multidimensional array objects
and various tools to work with them.
● SciPy - SciPy (Scientific Python) is the go-to library when it comes to scientific
computing used heavily in the fields of mathematics, science, and engineering.
It is equivalent to using Matlab which is a paid tool.
SciPy as the Documentation says is – “provides many user-friendly and
efficient numerical routines such as routines for numerical integration and
optimization.” It is built upon the NumPy library.

● Pandas(one of the most used libraries) - From Data Exploration to visualization


to analysis – Pandas is the almighty library you must master!

Pandas is an open-source package. It helps you to perform data analysis and


data manipulation in Python language. Additionally, it provides us with fast
and flexible data structures that make it easy to work with Relational and
structured data.

Pandas

● Matplotlib - Matplotlib is the most popular library for exploration and data
visualization in the Python ecosystem. Every other library is built upon this
library.

Matplotlib offers endless charts and customizations from histograms to


scatterplots, matplotlib lays down an array of colors, themes, palettes, and
other options to customize and personalize our plots. matplotlib is useful
whether you’re performing data exploration for a machine learning project or
building a report for stakeholders, it is surely the handiest library!

Matplotlib

● Seaborn - Seaborn is a free and open-source data visualization library based on


Matplotlib. Many data scientists prefer seaborn over matplotlib due to its
high-level interface for drawing attractive and informative statistical graphics.
Seaborn provides easy functions that help you focus on the plot and know how
to draw it.

● Scikit Learn - Sklearn is the Swiss Army Knife of data science libraries. It is an
indispensable tool in your data science armory that will carve a path through
seemingly unassailable hurdles. In simple words, it is used for making machine
learning models.

Scikit-learn is probably the most useful library for machine learning in Python.
The sklearn library contains a lot of efficient tools for machine learning and
statistical modeling including classification, regression, clustering, and
dimensionality reduction.

Sklearn is a compulsory Python library you need to master.

Algorithms to have a look into:

● Logistic Regression
● Decision Tree
● Random Forest Classifier

You might also like