100% found this document useful (1 vote)
252 views

Python For Data Science

This document provides a summary of key Python concepts and functions for data science. It covers basics like variables, printing, strings, numbers, lists, dictionaries, Boolean logic, conditionals, loops, dates and times. Functions for common list, string and dictionary operations are demonstrated along with regular expressions and datetime handling. The goal is to provide an introductory cheat sheet for learning Python online through Dataquest.

Uploaded by

Jithin Tg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
252 views

Python For Data Science

This document provides a summary of key Python concepts and functions for data science. It covers basics like variables, printing, strings, numbers, lists, dictionaries, Boolean logic, conditionals, loops, dates and times. Functions for common list, string and dictionary operations are demonstrated along with regular expressions and datetime handling. The goal is to provide an introductory cheat sheet for learning Python online through Dataquest.

Uploaded by

Jithin Tg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

LEARN DATA SCIENCE ONLINE

Start Learning For Free - www.dataquest.io

Data Science Cheat Sheet


Python Basics

BASICS, PRINTING AND GETTING HELP


x = 3 - Assign 3 to the variable x help(x) - Show documentation for the str data type
print(x) - Print the value of x help(print) - Show documentation for the print() function
type(x) - Return the type of the variable x (in this case, int for integer)

READING FILES 3 ** 2 - Raise 3 to the power of 2 (or 32) def calculate(addition_one,addition_two,


f = open("my_file.txt","r") 27 ** (1/3) - The 3rd root of 27 (or 3√27) exponent=1,factor=1):
file_as_string = f.read() x += 1 - Assign the value of x + 1 to x result = (value_one + value_two) ** exponent * factor
- Open the file my_file.txt and assign its x -= 1 - Assign the value of x - 1 to x return result
contents to s - Define a new function calculate with two
import csv L I STS required and two optional named arguments
f = open("my_dataset.csv","r") l = [100,21,88,3] - Assign a list containing the which calculates and returns a result.
csvreader = csv.reader(f) integers 100, 21, 88, and 3 to the variable l addition(3,5,factor=10) - Run the addition
csv_as_list = list(csvreader) l = list() - Create an empty list and assign the function with the values 3 and 5 and the named
- Open the CSV file my_dataset.csv and assign its result to l argument 10
data to the list of lists csv_as_list l[0] - Return the first value in the list l
l[-1] - Return the last value in the list l B O O L E A N C O M PA R I S O N S
ST R I N G S l[1:3] - Return a slice (list) containing the second x == 5 - Test whether x is equal to 5
s = "hello" - Assign the string "hello" to the and third values of l x != 5 - Test whether x is not equal to 5
variable s len(l) - Return the number of elements in l x > 5 - Test whether x is greater than 5
s = """She said, sum(l) - Return the sum of the values of l x < 5 - Test whether x is less than 5
"there's a good idea." min(l) - Return the minimum value from l x >= 5 - Test whether x is greater than or equal to 5
""" max(l) - Return the maximum value from l x <= 5 - Test whether x is less than or equal to 5
- Assign a multi-line string to the variable s. Also l.append(16) - Append the value 16 to the end of l x == 5 or name == "alfred" - Test whether x is
used to create strings that contain both " and ' l.sort() - Sort the items in l in ascending order equal to 5 or name is equal to "alfred"
characters " ".join(["A","B","C","D"]) - Converts the list x == 5 and name == "alfred" - Test whether x is
len(s) - Return the number of characters in s ["A", "B", "C", "D"] into the string "A B C D" equal to 5 and name is equal to "alfred"
s.startswith("hel") - Test whether s starts with 5 in l - Checks whether the value 5 exists in the list l
the substring "hel" DICTIONARIES "GB" in d - Checks whether the value "GB" exists in
s.endswith("lo") - Test whether s ends with the d = {"CA":"Canada","GB":"Great Britain", the keys for d
substring "lo" "IN":"India"} - Create a dictionary with keys of
"{} plus {} is {}".format(3,1,4) - Return the "CA", "GB", and "IN" and corresponding values I F STAT E M E N TS A N D LO O P S
string with the values 3, 1, and 4 inserted of of "Canada", "Great Britain", and "India" The body of if statements and loops are defined
s.replace("e","z") - Return a new string based d["GB"] - Return the value from the dictionary d through indentation.
on s with all occurances of "e" replaced with "z" that has the key "GB" if x > 5:
s.split(" ") - Split the string s into a list of d.get("AU","Sorry") - Return the value from the print("{} is greater than five".format(x))
strings, separating on the character " " and dictionary d that has the key "AU", or the string elif x < 0:
return that list "Sorry" if the key "AU" is not found in d print("{} is negative".format(x))
d.keys() - Return a list of the keys from d else:
NUMERIC TYPES AND d.values() - Return a list of the values from d print("{} is between zero and five".format(x))
M AT H E M AT I C A L O P E R AT I O N S d.items() - Return a list of (key, value) pairs - Test the value of the variable x and run the code
i = int("5") - Convert the string "5" to the from d body based on the value
integer 5 and assign the result to i for value in l:
f = float("2.5") - Convert the string "2.5" to MODULES AND FUNCTIONS print(value)
the float value 2.5 and assign the result to f The body of a function is defined through - Iterate over each value in l, running the code in
5 + 5 - Addition indentation. the body of the loop with each iteration
5 - 5 - Subtraction import random - Import the module random while x < 10:
10 / 2 - Division from math import sqrt - Import the function x += 1
5 * 2 - Multiplication sqrt from the module math - Run the code in the body of the loop until the
value of x is no longer less than 10

LEARN DATA SCIENCE ONLINE


Start Learning For Free - www.dataquest.io
LEARN DATA SCIENCE ONLINE
Start Learning For Free - www.dataquest.io

Data Science Cheat Sheet


Python - Intermediate

KEY BASICS, PRINTING AND GETTING HELP


This cheat sheet assumes you are familiar with the content of our Python Basics Cheat Sheet

s - A Python string variable l - A Python list variable


i - A Python integer variable d - A Python dictionary variable
f - A Python float variable

L I STS len(my_set) - Returns the number of objects in now - wks4 - Return a datetime object
l.pop(3) - Returns the fourth item from l and my_set (or, the number of unique values from l) representing the time 4 weeks prior to now
deletes it from the list a in my_set - Returns True if the value a exists in newyear_2020 = dt.datetime(year=2020,
l.remove(x) - Removes the first item in l that is my_set month=12, day=31) - Assign a datetime
equal to x object representing December 25, 2020 to
l.reverse() - Reverses the order of the items in l REGULAR EXPRESSIONS newyear_2020
l[1::2] - Returns every second item from l, import re - Import the Regular Expressions module newyear_2020.strftime("%A, %b %d, %Y")
commencing from the 1st item re.search("abc",s) - Returns a match object if - Returns "Thursday, Dec 31, 2020"
l[-5:] - Returns the last 5 items from l specific axis the regex "abc" is found in s, otherwise None dt.datetime.strptime('Dec 31, 2020',"%b
re.sub("abc","xyz",s) - Returns a string where %d, %Y") - Return a datetime object
ST R I N G S all instances matching regex "abc" are replaced representing December 31, 2020
s.lower() - Returns a lowercase version of s by "xyz"
s.title() - Returns s with the first letter of every RANDOM
word capitalized L I ST C O M P R E H E N S I O N import random - Import the random module
"23".zfill(4) - Returns "0023" by left-filling the A one-line expression of a for loop random.random() - Returns a random float
string with 0’s to make it’s length 4. [i ** 2 for i in range(10)] - Returns a list of between 0.0 and 1.0
s.splitlines() - Returns a list by splitting the the squares of values from 0 to 9 random.randint(0,10) - Returns a random
string on any newline characters. [s.lower() for s in l_strings] - Returns the integer between 0 and 10
Python strings share some common methods with lists list l_strings, with each item having had the random.choice(l) - Returns a random item from
s[:5] - Returns the first 5 characters of s .lower() method applied the list l
"fri" + "end" - Returns "friend" [i for i in l_floats if i < 0.5] - Returns
"end" in s - Returns True if the substring "end" the items from l_floats that are less than 0.5 COUNTER
is found in s from collections import Counter - Import the
F U N C T I O N S F O R LO O P I N G Counter class
RANGE for i, value in enumerate(l): c = Counter(l) - Assign a Counter (dict-like)
Range objects are useful for creating sequences of print("The value of item {} is {}". object with the counts of each unique item from
integers for looping. format(i,value)) l, to c
range(5) - Returns a sequence from 0 to 4 - Iterate over the list l, printing the index location c.most_common(3) - Return the 3 most common
range(2000,2018) - Returns a sequence from 2000 of each item and its value items from l
to 2017 for one, two in zip(l_one,l_two):
range(0,11,2) - Returns a sequence from 0 to 10, print("one: {}, two: {}".format(one,two)) T RY/ E XC E P T
with each item incrementing by 2 - Iterate over two lists, l_one and l_two and print Catch and deal with Errors
range(0,-10,-1) - Returns a sequence from 0 to -9 each value l_ints = [1, 2, 3, "", 5] - Assign a list of
list(range(5)) - Returns a list from 0 to 4 while x < 10: integers with one missing value to l_ints
x += 1 l_floats = []
DICTIONARIES - Run the code in the body of the loop until the for i in l_ints:
max(d, key=d.get) - Return the key that value of x is no longer less than 10 try:
corresponds to the largest value in d l_floats.append(float(i))
min(d, key=d.get) - Return the key that DAT E T I M E except:
corresponds to the smallest value in d import datetime as dt - Import the datetime l_floats.append(i)
module - Convert each value of l_ints to a float, catching
S E TS now = dt.datetime.now() - Assign datetime and handling ValueError: could not convert
my_set = set(l) - Return a set object containing object representing the current time to now string to float: where values are missing.
the unique values from l wks4 = dt.datetime.timedelta(weeks=4)
- Assign a timedelta object representing a
timespan of 4 weeks to wks4

LEARN DATA SCIENCE ONLINE


Start Learning For Free - www.dataquest.io
LEARN DATA SCIENCE ONLINE
Start Learning For Free - www.dataquest.io

Data Science Cheat Sheet


NumPy

KEY IMPORTS
We’ll use shorthand in this cheat sheet Import these to start
arr - A numpy Array object import numpy as np

I M P O RT I N G/ E X P O RT I N G arr.T - Transposes arr (rows become columns and S C A L A R M AT H


np.loadtxt('file.txt') - From a text file vice versa) np.add(arr,1) - Add 1 to each array element
np.genfromtxt('file.csv',delimiter=',') arr.reshape(3,4) - Reshapes arr to 3 rows, 4 np.subtract(arr,2) - Subtract 2 from each array
- From a CSV file columns without changing data element
np.savetxt('file.txt',arr,delimiter=' ') arr.resize((5,6)) - Changes arr shape to 5x6 np.multiply(arr,3) - Multiply each array
- Writes to a text file and fills new values with 0 element by 3
np.savetxt('file.csv',arr,delimiter=',') np.divide(arr,4) - Divide each array element by
- Writes to a CSV file A D D I N G/ R E M OV I N G E L E M E N TS 4 (returns np.nan for division by zero)
np.append(arr,values) - Appends values to end np.power(arr,5) - Raise each array element to
C R E AT I N G A R R AYS of arr the 5th power
np.array([1,2,3]) - One dimensional array np.insert(arr,2,values) - Inserts values into
np.array([(1,2,3),(4,5,6)]) - Two dimensional arr before index 2 V E C TO R M AT H
array np.delete(arr,3,axis=0) - Deletes row on index np.add(arr1,arr2) - Elementwise add arr2 to
np.zeros(3) - 1D array of length 3 all values 0 3 of arr arr1
np.ones((3,4)) - 3x4 array with all values 1 np.delete(arr,4,axis=1) - Deletes column on np.subtract(arr1,arr2) - Elementwise subtract
np.eye(5) - 5x5 array of 0 with 1 on diagonal index 4 of arr arr2 from arr1
(Identity matrix) np.multiply(arr1,arr2) - Elementwise multiply
np.linspace(0,100,6) - Array of 6 evenly divided C O M B I N I N G/S P L I T T I N G arr1 by arr2
values from 0 to 100 np.concatenate((arr1,arr2),axis=0) - Adds np.divide(arr1,arr2) - Elementwise divide arr1
np.arange(0,10,3) - Array of values from 0 to less arr2 as rows to the end of arr1 by arr2
than 10 with step 3 (eg [0,3,6,9]) np.concatenate((arr1,arr2),axis=1) - Adds np.power(arr1,arr2) - Elementwise raise arr1
np.full((2,3),8) - 2x3 array with all values 8 arr2 as columns to end of arr1 raised to the power of arr2
np.random.rand(4,5) - 4x5 array of random floats np.split(arr,3) - Splits arr into 3 sub-arrays np.array_equal(arr1,arr2) - Returns True if the
between 0-1 np.hsplit(arr,5) - Splits arr horizontally on the arrays have the same elements and shape
np.random.rand(6,7)*100 - 6x7 array of random 5th index np.sqrt(arr) - Square root of each element in the
floats between 0-100 array
np.random.randint(5,size=(2,3)) - 2x3 array I N D E X I N G/S L I C I N G/S U B S E T T I N G np.sin(arr) - Sine of each element in the array
with random ints between 0-4 arr[5] - Returns the element at index 5 np.log(arr) - Natural log of each element in the
arr[2,5] - Returns the 2D array element on index array
I N S P E C T I N G P R O P E RT I E S [2][5] np.abs(arr) - Absolute value of each element in
arr.size - Returns number of elements in arr arr[1]=4 - Assigns array element on index 1 the the array
arr.shape - Returns dimensions of arr (rows, value 4 np.ceil(arr) - Rounds up to the nearest int
columns) arr[1,3]=10 - Assigns array element on index np.floor(arr) - Rounds down to the nearest int
arr.dtype - Returns type of elements in arr [1][3] the value 10 np.round(arr) - Rounds to the nearest int
arr.astype(dtype) - Convert arr elements to arr[0:3] - Returns the elements at indices 0,1,2
type dtype (On a 2D array: returns rows 0,1,2) STAT I ST I C S
arr.tolist() - Convert arr to a Python list arr[0:3,4] - Returns the elements on rows 0,1,2 np.mean(arr,axis=0) - Returns mean along
np.info(np.eye) - View documentation for at column 4 specific axis
np.eye arr[:2] - Returns the elements at indices 0,1 (On arr.sum() - Returns sum of arr
a 2D array: returns rows 0,1) arr.min() - Returns minimum value of arr
C O P Y I N G/S O RT I N G/ R E S H A P I N G arr[:,1] - Returns the elements at index 1 on all arr.max(axis=0) - Returns maximum value of
np.copy(arr) - Copies arr to new memory rows specific axis
arr.view(dtype) - Creates view of arr elements arr<5 - Returns an array with boolean values np.var(arr) - Returns the variance of array
with type dtype (arr1<3) & (arr2>5) - Returns an array with np.std(arr,axis=1) - Returns the standard
arr.sort() - Sorts arr boolean values deviation of specific axis
arr.sort(axis=0) - Sorts specific axis of arr ~arr - Inverts a boolean array arr.corrcoef() - Returns correlation coefficient
two_d_arr.flatten() - Flattens 2D array arr[arr<5] - Returns array elements smaller than 5 of array
two_d_arr to 1D

LEARN DATA SCIENCE ONLINE


Start Learning For Free - www.dataquest.io
LEARN DATA SCIENCE ONLINE
Start Learning For Free - www.dataquest.io

Data Science Cheat Sheet


Python Regular Expressions

S P E C I A L C H A R AC T E R S \A | Matches the expression to its right at the (?:A) | Matches the expression as represented
^ | Matches the expression to its right at the absolute start of a string whether in single by A, but unlike (?PAB), it cannot be
start of a string. It matches every such or multi-line mode. retrieved afterwards.
instance before each \n in the string. \Z | Matches the expression to its left at the (?#...) | A comment. Contents are for us to
$ | Matches the expression to its left at the absolute end of a string whether in single read, not for matching.
end of a string. It matches every such or multi-line mode. A(?=B) | Lookahead assertion. This matches
instance before each \n in the string. the expression A only if it is followed by B.
. | Matches any character except line A(?!B) | Negative lookahead assertion. This
terminators like \n. S E TS matches the expression A only if it is not
\ | Escapes special characters or denotes [ ] | Contains a set of characters to match. followed by B.
character classes. [amk] | Matches either a, m, or k. It does not (?<=B)A | Positive lookbehind assertion.
A|B | Matches expression A or B. If A is match amk. This matches the expression A only if B
matched first, B is left untried. [a-z] | Matches any alphabet from a to z. is immediately to its left. This can only
+ | Greedily matches the expression to its left 1 [a\-z] | Matches a, -, or z. It matches - matched fixed length expressions.
or more times. because \ escapes it. (?<!B)A | Negative lookbehind assertion.
* | Greedily matches the expression to its left [a-] | Matches a or -, because - is not being This matches the expression A only if B is
0 or more times. used to indicate a series of characters. not immediately to its left. This can only
? | Greedily matches the expression to its left [-a] | As above, matches a or -. matched fixed length expressions.
0 or 1 times. But if ? is added to qualifiers [a-z0-9] | Matches characters from a to z (?P=name) | Matches the expression matched
(+, *, and ? itself) it will perform matches in and also from 0 to 9. by an earlier group named “name”.
a non-greedy manner. [(+*)] | Special characters become literal (...)\1 | The number 1 corresponds to
{m} | Matches the expression to its left m inside a set, so this matches (, +, *, and ). the first group to be matched. If we want
times, and not less. [^ab5] | Adding ^ excludes any character in to match more instances of the same
{m,n} | Matches the expression to its left m to the set. Here, it matches characters that are expression, simply use its number instead of
n times, and not less. not a, b, or 5. writing out the whole expression again. We
{m,n}? | Matches the expression to its left m can use from 1 up to 99 such groups and
times, and ignores n. See ? above. their corresponding numbers.
GROUPS
( ) | Matches the expression inside the
C H A R AC T E R C L AS S E S parentheses and groups it. POPULAR PYTHON RE MODULE
( A. K.A. S P E C I A L S E Q U E N C E S) (?) | Inside parentheses like this, ? acts as an FUNCTIONS
\w | Matches alphanumeric characters, which extension notation. Its meaning depends on re.findall(A, B) | Matches all instances
means a-z, A-Z, and 0-9. It also matches the character immediately to its right. of an expression A in a string B and returns
the underscore, _. (?PAB) | Matches the expression AB, and it them in a list.
\d | Matches digits, which means 0-9. can be accessed with the group name. re.search(A, B) | Matches the first instance
\D | Matches any non-digits. (?aiLmsux) | Here, a, i, L, m, s, u, and x are of an expression A in a string B, and returns
\s | Matches whitespace characters, which flags: it as a re match object.
include the \t, \n, \r, and space characters. a — Matches ASCII only re.split(A, B) | Split a string B into a list
\S | Matches non-whitespace characters. i — Ignore case using the delimiter A.
\b | Matches the boundary (or empty string) L — Locale dependent re.sub(A, B, C) | Replace A with B in the
at the start and end of a word, that is, m — Multi-line string C.
between \w and \W. s — Matches all
\B | Matches where \b does not, that is, the u — Matches unicode
boundary of \w characters. x — Verbose

LEARN DATA SCIENCE ONLINE


Start Learning For Free - www.dataquest.io

You might also like