Learning Data Mining with Python Layton pdf download
Learning Data Mining with Python Layton pdf download
https://textbookfull.com/product/learning-data-mining-with-
python-layton/
https://textbookfull.com/product/learning-data-mining-with-
python-robert-layton/
https://textbookfull.com/product/practical-python-data-
visualization-a-fast-track-approach-to-learning-data-
visualization-with-python-ashwin-pajankar/
https://textbookfull.com/product/advanced-data-analytics-using-
python-with-machine-learning-deep-learning-and-nlp-examples-
mukhopadhyay/
https://textbookfull.com/product/introduction-to-machine-
learning-with-python-a-guide-for-data-scientists-andreas-c-
muller/
Machine Learning Pocket Reference Working with
Structured Data in Python 1st Edition Matt Harrison
https://textbookfull.com/product/machine-learning-pocket-
reference-working-with-structured-data-in-python-1st-edition-
matt-harrison/
https://textbookfull.com/product/applied-text-analysis-with-
python-enabling-language-aware-data-products-with-machine-
learning-1st-edition-benjamin-bengfort/
https://textbookfull.com/product/hands-on-scikit-learn-for-
machine-learning-applications-data-science-fundamentals-with-
python-david-paper/
https://textbookfull.com/product/encyclopedia-of-machine-
learning-and-data-mining-2nd-edition-claude-sammut/
https://textbookfull.com/product/machine-learning-and-data-
mining-in-aerospace-technology-aboul-ella-hassanien/
Contents
Foreword
Installing Python
Editor and Integrated development environments
Differences between Python2 and Python3
Working directory
Using Terminal
Chapter 1
1.1 Objects in Python
1.2 Reserved terms for the system and names
1.3 Enter comments in the code
1.4 Types of data
1.5 File format
1.6 Operators
1.7 Indentation
1.8 Quotation marks
Chapter 2
2.1 Numbers
2.2 Container objects
Tuples
Lists
Dictionaries
Sets
Strigs
Files
2.3 Immutability
2.4 Converting formats
Chapter 3
3.1 Functions
3.1.1 Some predefined built_in functions
Obtain informations regarding a function
3.2 Create your own functions
3.3 Salvare i propri moduli e file
Chapter 4
4.1 Conditional instructions
4.1.1 if
4.1.2 if-else
4.1.3 elif
4.2 Loops
4.2.1 for
4.2.2 while
4.2.3 continue and break
4.2.4 range()
4.3 Extend our functions with conditional instructions
4.4 map() and filter() functions
4.5 The lambda function
4.6 Scoping
Chapter 5
5.1 Object Oriented Programming
5.2 Modules
5.3 Methods
5.4 List comprehension
5.5 Regular Expressions
5.6 User input
5.7 Errors and Exceptions
Chapter 6
6.1 Importing files
6.2 .csv format
6.3 From the web
6.4 In JSON
6.5 Other formats
Chapter 7
7.1 Libraries for data mining
7.2 pandas
7.2.1 pandas: Series
7.2.2 pandas: dataframes
7.2.3 pandas: importing and exporting data
7.2.4 pandas: data manipulation
7.2.5 pandas: missing values
7.2.6 pandas: merging two datasets
7.2.7 pandas: basic statistics
Chapter 8
8.1 SciPy
8.2 Numpy
8.2.1 Numpy - generating random numbers and seeds
Chapter 9
9.1 Matplotlib
Chapter 10
10.1 scikit-learn
Managing dates
Data sources
Conclusions
Foreword
My goal is to accompany a reader who is starting to study this programming language, showing her
through basic concepts and then move to data mining. We will begin by explaining how to use Python
and its structures, how to install Python, which tools are best suited for a data analyst work, and then
switch to an introduction to data mining packages. The book is in any case an introduction. Its aim is
not, for instance, to fully explain topics such as machine learning or statistics with this programming
language, which would take at least twice or three times as much as this entire book. The aim is to
provide a guidance from the first programming steps with Python to manipulation and import of
datasets, to some examples of data analysis.
To be more precise, in the Getting Started section, we will run through some basic installation
concepts, tools available for programming on Python, differences between Python2 and Python3, and
setting up a work folder.
In Chapter 1, we will begin to see some basic concepts about creating objects, entering comments,
reserved words for the system, and on the various types of operators that are part of the grammar of
this programming language.
In Chapter 2, we will carry on with the basic Python structures, such as tuples, lists, dictionaries,
sets, strings, and files, and learn how to create and convert them.
In Chapter 3 we will see the basics for creating small basic functions, and how to save them.
Chapter 4 deals with conditional instructions that allow us to extend the power of a function as well
as some important functions.
In Chapter 5 we will keep talking about some basic concepts related to object-oriented
programming, concept of module, method, and error handling.
Chapter 6 is dedicated to importing files with some of the basic features. We will see how to open
and edit text files, in .csv format, and in various other formats.
Chapters 7 to 10 will deal with Python's most important data mining packages: Numpy and Scipy for
mathematical functions and random data generation, pandas for dataframe management and data
import, Matplotlib for drawing charts and scikit-learn for machine learning. With regard to scikit-
learn, we will limit ourselves to provide a basic idea of the code of the various algorithms, without
going, given the complexity of the subject, into details for the various techniques.
Finally, in Conclusions, we will summarize the topics and concepts of the book and see the
management of dates and some of the data sources for our tests with Python.
This book is intended for those who want to get closer to the Python programming language from a
data analysis perspective. We will therefore focus on the most used packages for data analysis, after
the introduction to Python's basic concepts. To download the code, go more into depth for some topics
and for more information about the practical part you can visit my website, Datawiring.me. From the
site homepage you can also subscribe to my newsletter to keep track of news in the code and last
posts.
Given the introductory nature of the course, in any case, the advice is to write the code manually to
get familiar with I and being able to handle it, especially for readers who have just begun
programming.
Installing Python
Python can be easily installed from https://www.python.org/downloads/ in both version 2 or 3. It is
already preinstalled on Unix systems, so if we have a Mac or Linux, we can simply access terminal
and type "python".
From the python.org website, simply download the most suitable version for your operating system
and proceed with installation following the on-screen instructions.
Editor and Integrated development environments
There are many ways to use a programming language, such as Python. We can simply write the first
lines from the terminal: then, once the programming language is installed, if necessary (depending on
the operating system you are using there will be some versions of Python already integrated) we will
open a terminal window and digit its name.
There are many free and paid editors that differ in their completeness, scalability, ease of use. Among
the most used editors are Sublime Text, Text Wrangler, Notepad++ (for Windows), or TextMate (for
Mac). But we can also use a simple text editor.
As for integrated development environments, or IDEs, Python-specific ones are for instance
Wingware, Komodo, Pycharm, Emacs, but there are really lots of them. This kind of tools provide
tools to simplify work, such as self-completion, auto-editing and auto-indentation, integrated
documentation, syntax highlighting, code-folding (the ability to hide some pieces of code while you
Works on other parts), and support for debugging.
Spyder (which is included in Anaconda) and Jupyter are the most used in Data Science, along with
Canopy. A useful tool for Jupyter is nbviewer, which allows the exchange of Jupyter's .ipynb files,
which can be downloaded at http://nbviewer.jupyter.org and can also be linked to Github.
As for Anaconda, a very useful tool as it also features Jupyter, it can be downloaded for our operating
system from this link. The list of resources that are installed with Anaconda (over 100 packets for
data mining, maths, data analysis and algebra) can be viewed opening a terminal window and then
typing:
conda list
We can program Python through one or more of these tools, depending on our habits and what we
want to do. Spyder and Jupyter are very common for data mining, which are both available once
Anaconda is installed. These are tools that can be used and installed individually (eg Jupyter can be
tested from this link), but installing Anaconda makes it easy to work, as it provides us with a whole
host of tools and packages.
The Python code can then be run directly from the terminal, or saved as .py file and then run from
these other editors. What tells us we are running the Python code is the ">>>" symbol at the beginning
of the prompt.
To best follow the examples in this book I recommend installing Anaconda from the Continuum.io
website and using Jupyter. Anaconda automatically installs a set of packages and modules that we
will then use and we will not have to reinstall each time from the terminal.
Anaconda's main screen
Differences between Python2 and Python3
Python is released in two different versions, Python2 and Python3. Python2 was born in 2000
(currently the latest release is 2.7), and its support is expected until 2020. It is the historical and most
complete version.
Python3 was released in 2008 (current version is 3.6). There are many libraries for Python3, but not
all of them have been yet converted for this release from Python2.
The two versions are very similar but feature some differences, for example with regard to
mathematical operations:
Python 2.7
5/2
2
Python 3.5.2
5/2
2.5
To get the correct result in Python2 we have to specify the decimal as follows:
5.0/2
2.5
# or like this
5/2.0
2.5
float(5)/2
2.5
To keep the two versions of Python together, you can also import Python into a form called future,
allowing to import Python3 functions into Python2.
2.5
For a closer look at the differences between the two versions of Python, you can access this online
resource.
What is the difference between the two versions and why choose one or the other? Python2 represents
the best-defined and stable version, while Python3 represents the future of the language, although for
some things the two versions do not coincide. In the first part of this text we will always try to
highlight the differences between the two versions. From chapter 7 onwards, the section on data
mining packages, we will use Python3.
Working directory
Before we start working, we set the work directory on our computer. Setting up a work directory
means setting up a home for our scripts and our files, where Python will automatically look at when
we ask it to import a file or run a script. To find out what our work directory is, simply type this on
the Python shell:
import os
os.getcwd()
‘~/valentinaporcu'
# to edit the working directory, we use the following notation, inserting the new directory in
parentheses
os.chdir(“/~/Python_script”)
os.getcwd()
‘~/Python_script’
Setting up a working directory means that when we're going to import a file that is in our workbook,
we can simply type the name followed by extension and quotation marks in this format:
“file_name.extension”
For instance:
"dataframe_data_collection1.csv"
Python will directly check if there is a file with that name inside that folder and it will import it.
Same thing when we save a Python file by typing it on our computer: Python will automatically put it
in that folder. Even when we run a Python script, as we will see, we will have to access the folder
where the script (the working directory or another one) is located directly from the terminal.
If we want to import a file that is not in the working directory but elsewhere on our computer or on
the web, we can still do this, this time by entering the full file address:
“complete_address.file_name.extension”
For instance:
"/Users/vp/Downloads/dataframe_data1.csv"
Using Terminal
Let us see how to run Python scripts. First, let us open a terminal window.
As you can see, we see the dollar symbol ($) not the Python shell symbol (>>>). We can view the list
of our folders and files with the ls command.
cd Python_test
In the folder where I moved, Python_test, I find my Python scripts, that is, the .py files that I can run
by writing like this:
python test.py
a name
a type
an ID
Object names consist of only alphanumeric characters and underscores, so all characters between A-
Z, a-z, 0-9, and _. Type is the type of object, such as string, numeric, or boolean. The ID is a number
that uniquely identifies our object.
The objects remain inside the computer memory and can be retrieved. When no longer needed, a
garbage collector mechanism frees up busy memory.
1.2 Reserved terms for the system and names
Python has a set of words that are reserved for the system and cannot be used by users as names for
objects or functions. Such words are:
and as assert break class continue def del elif else except exec False finally for from global if
import in is lambda None not or import in pass print raise return True try while with yeld
These words cannot be used as names for our objects. Object names in R must be subject to some
rules:
# comment no. 1
print(“Hello World”) # comment no. 2
To write a comment on multiple lines, we can also use three times the quotation marks, like this:
“””
comment line 1
comment line 2
comment line 3
“””
1.4 Types of data
Python data can be of various types. We can summarize them in the table below:
To know what type an object is, we can always use the type() function:
# we create an x object
x=1
type(x)
<class 'int'>
# a y object
y = 20.75
type(y)
<class 'float'>
# and a z object
z = “test”
type(z)
<class 'str'>
1.5 File format
Once you have created a script in Python, you need to save it with a .py extension. Typically, when it
comes to complex scripts, we will create a script on an editor that we will then run. A .py script can
be written by one of the different editors we've seen, even a normal text editor, and then renamed with
.py extension.
1.6 Operators
On Python we find a series of operators, divided into several groups:
arithmetic
of assignment
of comparison
logical
bitwise
of membership
of identity
Beside these operators, there is also a hierarchy that marks the order in which they are used.
Mathematical operators
When we open Python, the simplest thing we can do is use it to perform math operations, for which
we use mathematical operators:
10+7
17
15-2
13
2*3
6
10/2
5
3**3
27
10/3
3
25//7
3
Operator Description
> greater than
< lower than
== equal to
>= greater or equal
<= lower or equal
!= different
is identity
is not non identity
in exists in
not in does not exists in
These operators are used to test relationships between objects. Let us see some examples:
x=5
y = 10
x>y
False
# the output is a logical vector that tells us that x is not greater than y
# let us see if x is less than y
x<y
True
z=5
z == x
True
z != y
True
# we create a tuple
v1 = (1,2,3,4,5,6,7)
2 in v1
True
8 not in v1
True
7 not in v1
False
If we compare text strings, Python counts the characters so in this case the </> symbol is meant as
"how many characters in string1 is greater than the number of characters in string2?" For instance:
"valentina" > "laura"
True
Bitwise operators
Bitwise operators are useful in specifying more than one condition when, for example, we need to
extract data from an object, such as a dataset.
Operator Description
& and
| or
^ xor
~ bitwise not
<< left shift
>> right shift
# and also
3<4&4>3
True
3<4|4>3
True
3 == 4 or 4 > 3
True
Assignment operators
x %= y (corresponds
modulo and
%= to a x = x%y)
reassignment
x **= y
exponentation and (corresponds to a x
**=
reassignment = x**y)
x//=y (corresponds
floor division and
//= to a x = x//y)
reallocation
x = 10
x=x+5
15
# let’s try “+=“
x += 5
20
x -= 5
15
x *= 3
45
x /= 3
15
x **= 2
225
x //= 2
x
112
Each time Python performs the operation and records the result again in the x object
Operators order
When it comes to math operators, we have to consider that there is a priority in case brackets are not
inserted. There are a number of priority rules that govern which operation is to be performed before
and after (think of mathematical operations where multiplication takes precedence over addiction).
multiply_xy(5,6)
30
1.8 Quotation marks
Quotation marks in R are mostly used to define strings and can be single, double or triple. Triple ones
are sued to wrap words and insert comment on multiple lines, for example, to create documentation
within a function that we are creating.
ex3 = “””
text string 1
text string 2
text string 3
“””
Nella fig. 117 è indicato il detto châssis in acciaio che porta il motore
e accessori.
Le ruote sono del tipo artiglieria e sono montate su sfere. I cerchioni
sono d'acciaio ed hanno da 80 a 120 mm di larghezza a seconda
della potenza del camione. Il diametro è di 750 mm per le anteriori e
850 mm per le posteriori.
La carreggiata è di 1,50 e la distanza fra gli assi di 2,50 m.
Il motore è a due cilindri, con valvole di ammissione automatiche, e
quelle di scappamento comandate; l'albero del motore e le bielle
sono chiuse in un carter contenente una certa quantità d'olio che
assicura la lubrificazione per gorgoglio.
Il motore è posto all'indietro dello châssis e il tipo da 12-15 HP ha i
cilindri di 125×150 e le valvole di 48. La velocità è di 750 giri al
minuto.
Il carburatore a polverizzazione del sistema Longuemare, è posto
contro il silenziatore con presa d'aria molto vicina al motore in modo
da essere riscaldata. Le leve del carburatore sono comandate dal
sedile con delle manette.
L'accensione è fatta con accumulatori e bobine con vibratore
magnetico che dà una partenza più sicura del vibratore meccanico.
Gli accumulatori sono di ragguardevole capacità, possono dare 60
ampères-ora, ciò che rende meno frequenti le ricariche.
Il raffreddamento è a circolazione d'acqua, i tubi sono molto lunghi
perchè il motore è indietro e il serbatoio dell'acqua e radiatore sono
avanti, per contro essi hanno sezione molto abbondante (27 mm di
diametro).
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com