Python - Data Types and Structures
Python - Data Types and Structures
Types and
Structures
What is Data Structures?
Data structures are a way of organizing and storing
data so that they can be accessed and worked with
efficiently.
They define the relationship between the data, and
the operations that can be performed on the data.
There are various kinds of data structures defined that
make it easier for data scientists and computer
engineers alike to concentrate on the main picture of
solving larger problems rather than getting lost in the
details of data description and access.
Abstract Data Type and Data Structures
Data structures help you to focus on the bigger picture rather than getting
lost in the details. This is known as data abstraction.
• Integers
• Float
• Strings
• Boolean
Integers. You can use an integer to represent numeric data and, more
specifically, whole numbers from negative infinity to infinity, like 4, 5, or -1.
Float. "Float" stands for 'floating point number'. You can use it for rational
numbers, usually ending with a decimal figure, such as 1.11 or 3.14.
x = 'Cake’
y = 'Cookie’
x+'&'+y
• This code defines two variables, x and y, with the string values of
"Cake" and "Cookie", respectively.
• Then, it concatenates the two strings with the "&" symbol in between
using the + operator.
• The resulting string is "Cake & Cookie".
• Note: "&" is an HTML entity used to display the "&" symbol in HTML
code.
• In Python, it is just a regular string character.
# Repeat
x*2
# Slicing
z2 = y[0] + y[1]
print(z2)
x = '4'
y = '2'
x+y
• This code snippet assigns the string value '4' to the variable x and the string value
'2' to the variable y.
• Then, it uses the '+' operator to concatenate the two strings together, resulting in
the string '42'.
• In Python, the '+' operator can be used to concatenate strings together.
• When used with strings, it joins the two strings together in the order they are
written.
str.capitalize('cookie')
• This code uses the capitalize() method of the str class in Python to
capitalize the first letter of the string "cookie".
• The capitalize() method returns a copy of the original string with the
first character capitalized and the rest of the characters in lowercase.
• In this case, the output would be "Cookie".
str1 = "Cake 4 U"
str2 = "404"
len(str1)
• In the first line, a string variable str1 is assigned the value "Cake 4 U".
• In the second line, another string variable str2 is assigned the value
"404".
• In the third line, the len() function is used to find the length of the
string str1.
• • The len() function returns the number of characters in the string,
including spaces and punctuation.
• So, when this code is executed, the output will be the length of the
string "Cake 4 U", which is 8.
str1.isdigit()
• In this code, two string variables str1 and str2 are defined with the
values 'cookie' and 'cook' respectively.
• The find() method is then called on str1 with str2 as its argument.
• This method returns the index of the first occurrence of str2 in str1.
• If str2 is not found in str1, it returns -1.
• In this case, since 'cook' is a substring of 'cookie', the find() method
will return the index of the first occurrence of 'cook' in 'cookie',
which is 0.
Boolean
This built-in data type can take up the values: True and False, which
often makes them interchangeable with the integers 1 and 0. Booleans
are useful in conditional and comparison expressions, just like in the
following examples:
x=4
y=2
x == y
• This code snippet assigns the value 4 to the variable x and the value 2
to the variable y.
• The third line checks if x is equal to y using the equality operator
(==).
• Since x is not equal to y, the expression evaluates to False.
x>y
• This code snippet is a comparison operation in Python that checks if the value of
variable x is greater than the value of variable y.
• The > symbol is used for the greater than comparison operator.
x=4
y=2
z = (x==y) # Comparison expression (Evaluates to false)
if z: # Conditional on truth/false value of 'z'
print("Cookie")
else: print("No Cookie")
• This code snippet initializes the variables x and y to the values 4 and 2,
respectively.
• The variable z is then assigned the result of the comparison expression (x==y),
which evaluates to False since x is not equal to y.
• The code then enters a conditional statement (if z:) that checks the truth value of z.
• Since z is False, the code executes the else block and prints the string "No Cookie".
In summary, this code snippet demonstrates the use of comparison
expressions and conditional statements in Python to control program
flow based on the truth value of a variable.
To check the type of an object in Python, use the built-in type() function, just
like in the lines of code as follows:
i = 4.0
type(i)
This type of data type conversion is user-defined, which means you have to
explicitly inform the compiler to change the data type of certain entities.
Consider the code chunk below to fully understand this:
x=2
y = "The Godfather: Part "
fav_movie = y + x
• This code snippet assigns the integer value 2 to the variable x and the
string "The Godfather: Part " to the variable y.
• The + operator is then used to concatenate the string value of y with the
integer value of x.
• However, this will result in a TypeError because you cannot concatenate
a string and an integer.
• To fix this, you can convert the integer to a string using the str()
function like this:``pythonx = 2y = "The Godfather: Part "fav_movie =
y + str(x)`.
• This will result in fav_movie` being assigned the string value "The
Godfather: Part 2".
Non-Primitive Data Structures
• Arrays
• Lists
• Files
Array
First off, arrays in Python are a compact way of collecting basic data types,
all the entries in an array must be of the same data type. However, arrays are
not all that popular in Python, unlike the other programming languages such
as C++ or Java.
In general, when people talk of arrays in Python, they are actually referring to
lists. However, there is a fundamental difference between them. For Python,
arrays can be seen as a more efficient way of storing a certain kind of list.
This type of list has elements of the same data type, though.
In Python, arrays are supported by the array module and need to be imported
before you start initializing and using them. The elements stored in an array
are constrained in their data type. The data type is specified during the array
creation and specified using a type code, which is a single character like the I
you see in the following example:
import array as arr
a = arr.array("I",[3,6,9])
type(a)
Lists in Python are used to store collections of heterogeneous items. These are
mutable, which means that you can change their content without changing
their identity. You can recognize lists by their square brackets [ and ] that hold
elements separated by a comma ,. Lists are built into Python: you do not need
to invoke them separately.
x = [] # Empty list
type(x)
• This code creates an empty list named "x" using the square brackets
notation in Python.
• The "type(x)" function is then called to confirm that "x" is indeed a list.
• The output of this code will be the type of "x", which should be "list".
x1 = [1,2,3]
type(x1)
x2 = list([1,'apple',3])
type(x2)
• This code creates a list called x2 that contains three elements: the
integer 1, the string 'apple', and the integer 3.
• The list() function is used to create the list, and the elements are
separated by commas and enclosed in square brackets.
• The type() function is then used to determine the data type of x2,
which will return .
print(x2[1])
• This code uses the Python programming language to print the value of
the second element in the list or array x2.
• The index of the first element in a Python list or array is 0, so x2[1]
refers to the second element in the list.
• The print() function is used to output the value of this element to the
console.
x2[1] = 'orange'
print(x2)
• The code is assigning the value 'orange' to the second element (index
1) of the list x2.
• Then, the code prints the updated list x2 using the print() function.
• So, the output of this code will be the updated list x2 with the second
element changed to 'orange'.
[1, 'orange', 3]
• This code creates a list with three elements: the integer 1, the string
'orange', and the integer 3.
• The square brackets indicate that this is a list, and the commas
separate the elements within the list.
• The list can contain elements of different data types, as seen here
with the combination of integers and strings.
Note: as you have seen in the above example with x1, lists can also hold
homogeneous items and hence satisfy the storage functionality of an
array. This is fine unless you want to apply some specific operations to
this collection.
• This code creates two lists, list_num and list_char, containing numbers and
characters respectively.
• The append() method is then used on list_num to add the integer value 11 to the end
of the list.
• This method modifies the original list by adding the specified element to the end of
the list.
• Finally, the print() function is used to display the updated list_num with the added
element.
• Overall, this code demonstrates how to add an element to a list using the append()
method in Python.
Use insert() to insert 11 at index or position 0 in the list_num list
list_num.insert(0, 11)
print(list_num)
list_char.remove('o')
print(list_char)
• The pop() method is called on the list_char list with the argument -2.
• This removes the item at the second to last position in the list.
• The pop() method modifies the original list and returns the removed item.
• However, in this case, the returned item is not assigned to any variable.
• Then, the print() function is called to display the modified list_char list.
• This will show the list without the item that was removed by the pop()
method.
array_char = array.array("u",["c","a","t","s"])
array_char.tostring()
print(array_char)
array('u', 'cats')
• This code creates an array of Unicode characters using the array
module in Python.
• The array function takes two arguments: the type code of the array and
the initial values of the array.
• In this case, the type code is "u" which stands for Unicode character.
• The initial values of the array are the characters "c", "a", "t", and "s".
• The tostring() method is then called on the array_char object, which
converts the array to a string of bytes.
• This method is deprecated in Python 3 and replaced with the tobytes()
method.
• Finally, the print() function is used to display the array_char object,
which outputs the array of Unicode characters.
You were able to apply tostring() function of the array_char because Python
is aware that all the items in an array are of the same data type and hence the
operation behaves the same way on each element. Thus, arrays can be very
useful when dealing with a large collection of homogeneous data types.
Since Python does not have to remember the data type details of each
element individually; for some uses, arrays may be faster and uses less
memory when compared to lists.
It is also worthwhile to mention the NumPy array while we are on the topic
of arrays. NumPy arrays are very heavily used in the data science world to
work with multidimensional arrays. They are more efficient than the array
module and Python lists in general. Reading and writing elements in a
NumPy array is faster, and they support "vectorized" operations such as
elementwise addition. Also, NumPy arrays work efficiently with large
sparse datasets.
Here is some code to get you started on NumPy Array:
import numpy as np
arr_a = np.array([3, 6, 9])
arr_b = arr_a/3 # Performing vectorized (element-wise) operations
print(arr_b)
[ 1. 2. 3.]
• This code imports the NumPy library and creates a NumPy array arr_a with the
values 3, 6, and 9.
• The next line performs a vectorized (element-wise) operation on arr_a by
dividing each element by 3 and assigns the resulting array to arr_b.
• Finally, the code prints the contents of arr_b, which should be the values [1, 2,
3].
• In summary, this code demonstrates how to perform vectorized operations on
NumPy arrays, which can be a more efficient way to perform operations on large
arrays compared to using loops.
arr_ones = np.ones(4)
print(arr_ones)
[ 1. 1. 1. 1.]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
• This code uses the NumPy library to create a 2D array with 3 rows
and 4 columns, filled with ones.
• The np.ones() function takes a tuple as an argument, specifying the
shape of the array.
• In this case, the tuple is (3,4), indicating that the array should have 3
rows and 4 columns.
• The resulting array is assigned to the variable multi_arr_ones.
• Finally, the print() function is used to display the contents of the
array.
Lists
An ordered group of items
Does not need to be the same type
Could put numbers, strings or donkeys in the same list
List notation
A = [1,”This is a list”, c, Donkey(“kong”)]
Methods of Lists
List.append(x)
adds an item to the end of the list
List.extend(L)
Extend the list by appending all in the given list L
List.insert(I,x)
Inserts an item at index I
List.remove(x)
Removes the first item from the list whose value is x
Examples of other methods
Filter(function, sequence)
Returns a sequence consisting of the items from the sequence
for which function(item) is true
Computes primes up to 25
Map Function
Map(function, sequence)
Calls function(item) for each of the sequence’s items
Reduce(function, sequence)
Returns a single value constructed by calling the binary
function (function)
Tuple
A number of values separated by commas
Immutable
Cannot assign values to individual items of a tuple
However tuples can contain mutable objects such as lists
Indexed by keys
This
can be any immutable type (strings,
numbers…)
Tuplescan be used if they contain only
immutable objects
Looping Techniques
Iteritems():
for retrieving key and values through a dictionary
Looping Techniques
Enumerate():
for the position index and values in a sequence
Zip():
for looping over two or more sequences
Comparisons