UNIT 5 python aktu
UNIT 5 python aktu
PYTHON
TOPIC INCLUDE – PANDAS , NUMPY , MATPLOTLIB AND TKINTER….
NumPy is a foundational package for numerical computing in Python. Let's dive deep into
NumPy, its properties, and its built-in functions with examples to thoroughly understand its
capabilities and functionalities.
Introduction to NumPy
NumPy, short for Numerical Python, is a powerful library that provides support for large multi-
dimensional arrays and matrices, along with a collection of mathematical functions to operate on
these arrays. It is the foundation of many other scientific computing packages in Python, such as
SciPy, Pandas, and scikit-learn.
Properties of NumPy
The core feature of NumPy is its powerful N-dimensional array object, ndarray. This array object
allows for efficient storage and manipulation of large datasets in a compact and efficient manner.
2. Vectorization
NumPy enables vectorized operations on arrays, which allows for operations to be applied
element-wise without the need for explicit loops. This results in faster and more concise code.
3. Broadcasting
Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different
shapes when performing arithmetic operations. This can lead to significant simplifications in
code and improved performance.
Universal functions are functions that operate element-wise on ndarrays, providing a means to
perform vectorized operations on data.
NumPy can interface with C, C++, and Fortran code, making it highly versatile and suitable for
high-performance computing tasks.
NumPy includes functions for generating random numbers, which are essential for statistical
sampling and simulations.
Numpy is the most popular python library for matrix/vector computations. Due to
python’s popularity, it is also one of the leading libraries for numerical analysis, and a
frequent target for computing benchmarks and optimization.
It is important to keep in mind that numpy is a separate library that is not part of the
base python. Unlike R, base python is not vectorized, and one has to load numpy (or
another vectorized library, such as pandas) in order to use vectorized operations.
This also causes certain differences between the base python approach and the way
to do vectorized operations.
Arrays can be created with np.array. For instance, we can create a 1-D vector of
numbers from 1 to 4 by feeding a list of desired numbers to the np.array:
a = np.array([1,2,3,4])
print("a:\n", a)
## a:
## [1 2 3 4]
Note that it is printed in brackets as list, but unlike a list, it does not have commas
separating the components.
If we want to create a matrix (two-dimensional array), we can feed np.array with a list
of lists, one sublist for each row of the matrix:
b = np.array([[1,2], [3,4]])
print("b:\n", b)
## b:
## [[1 2]
## [3 4]]
The output does not have the best formatting but it is clear enough.
One of the fundamental property of arrays its dimension, called shape in numpy.
Shape is array’s size along all of its dimensions. This can be queried by
attribute .shape which returns the sizes in a form of a tuple:
a.shape
## (4,)
b.shape
## (2, 2)
One can see that vector a has a single dimension of size 4, and matrix b has two
dimensions, both of size 2 (remember: (4,) is a tuple of length 1!).
One can also reshape arrays, i.e. change their shape into another compatible shape.
This can be achieved with .reshape() method. .reshape takes one argument, the new
shape (as a tuple) of the array. For instance, we can reshape the length-4 vector into
a 2x2 matrix as
a.reshape((2,2))
## array([[1, 2],
## [3, 4]])
and we can “straighten” matrix b into a vector with
b.reshape((4,))
## array([1, 2, 3, 4])
np.arange creates sequences, quite a bit like range, but the result will be a numpy
vector. If needed, we can reshape the vector into a desired format:
np.arange(10) # vector of length 10
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(10).reshape((2,5)) # 2x5 matrix
## array([[0, 1, 2, 3, 4],
## [5, 6, 7, 8, 9]])
np.zeros and np.ones create arrays filled with zeros and ones respectively:
np.zeros((5,))
## array([0., 0., 0., 0., 0.])
np.ones((2,4))
## array([[1., 1., 1., 1.],
## [1., 1., 1., 1.]])
Arrays can be combined in different ways, e.g. np.column_stack combines them as
columns (next to each other), and np.row_stack combines these as rows (underneath
each other). For instance, we can combine a column of ones and two columns of
zeros as follows:
oneCol = np.ones((5,)) # a single vector of ones
zeroCols = np.zeros((5,2)) # two columns of zeros
np.column_stack((oneCol, zeroCols)) # 5x3 columns
## array([[1., 0., 0.],
## [1., 0., 0.],
## [1., 0., 0.],
## [1., 0., 0.],
## [1., 0., 0.]])
Note that column_stack expects all arrays to be passed as a single tuple (or list).
Exercise 3.1 Use np.zeros, np.ones, mathematical operations and concatenation to
create the following array:
## array([[-1., -1., -1., -1.],
## [ 0., 0., 0., 0.],
## [ 2., 2., 2., 2.]])
a = np.arange(12).reshape((3,4))
print(a)
## [[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]]
print(100 + a, "\n")
## [[100 101 102 103]
## [104 105 106 107]
## [108 109 110 111]]
print(2**a, "\n") # remember: exponent with **, not with ^
## [[ 1 2 4 8]
## [ 16 32 64 128]
## [ 256 512 1024 2048]]
Both of these mathematical operations, + and ** are performed elementwise2 for
every single element of the matrix.
Exercise 3.2 Create the following array:
## array([[ 2, 4, 6, 8, 10],
## [12, 14, 16, 18, 20],
## [22, 24, 26, 28, 30],
## [32, 34, 36, 38, 40]])
Comparison operators are vectorized too:
a > 6
## array([[False, False, False, False],
## [False, False, False, True],
## [ True, True, True, True]])
a == 7
## array([[False, False, False, False],
## [False, False, False, True],
## [False, False, False, False]])
As comparison operators are vectorized, one might expect that the other logical
operators, and, or and not, are also vectorized. But this is not the case. There are
vectorized logical operators, but they differ from the base python version. These are
more similar to corresponding operators in R or C, namely & for logical and, | for
logical or, and ~ for logical not:
(a < 3) | (a > 8) # logical or
## array([[ True, True, True, False],
## [False, False, False, False],
## [False, True, True, True]])
(a > 4) & (a < 7) # logical and
## array([[False, False, False, False],
## [False, True, True, False],
## [False, False, False, False]])
~(a > 6) # logical not
## array([[ True, True, True, True],
## [ True, True, True, False],
## [False, False, False, False]])
There is no vectorized multi-way comparison like 1 < x < 2.
3.1.5 Array Indexing and Slicing
Indexing refer to extracting elements based on their position or certain criteria. This is
one of the fundamental operations with arrays. There are two ways to extract
elements: based on position, and based on logical criteria. Unfortunately, this also
makes indexing somewhat confusing, and it needs some time to become familiar
with.
a = np.arange(12)
print(a[::2]) # every second element
## [ 0 2 4 6 8 10]
However, unlike lists, one can do vectorized assignments in numpy:
c = np.arange(12).reshape((3,4))
c
## array([[ 0, 1, 2, 3],
## [ 4, 5, 6, 7],
## [ 8, 9, 10, 11]])
c[1,2] # 2nd row, 3rd column
## 6
c[1] # 2nd row
## array([4, 5, 6, 7])
Comma can separate not just two indices but two slices, so we can write
x = np.random.choice(6, size=5)
x
## array([0, 2, 0, 4, 3])
But maybe we prefer not to label the results as 0..5 but 1..6. So we can just add one
to the result. Here is an example that creates 2-D array of die rolls:
1 + np.random.choice(6, size=(2,4))
## array([[1, 5, 4, 1],
## [4, 3, 2, 1]])
Numpy offers a large set of various random values. Here we list a few more:
However, if you need to replicate your results exactly, you have to set the initial
values explicitly using random.seed(value). This re-initializes RNG-s to the given initial
state:
np.random.seed(1)
np.random.uniform(size=5) # 1st batch of numbers
## array([4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01,
## 1.46755891e-01])
np.random.uniform(size=5) # 2nd batch is different
## array([0.09233859, 0.18626021, 0.34556073, 0.39676747, 0.53881673])
np.random.seed(1)
np.random.uniform(size=5) # repeat the 1st batch
## array([4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01,
## 1.46755891e-01])
Pandas is the standard python library to work with dataframes. Unlike in R, this is not
a part of base python and must be imported separately. It is typically imported as pd:
import pandas as pd
Pandas relies heavily on numpy but is a separate package. Unfortunately, it also uses
a somewhat different syntax and somewhat different defaults. However, as it is “made
of” numpy, it works very well together with the latter.
Pandas contains two central data types: Series and DataFrame. Series is often used
as a second-class citizen, just as a single variable (column) in data frame. But it can
also be used as a vectorized dict that links keys (indices) to values. DataFrame is
broadly similar to other dataframes as implemented in R or spark. When you extract
its individual columns and rows you normally get those in the form of Series. So it is
extremely useful to know the basics of Series when working with data frames.
Both DataFrame and Series include index, a glorified row name, which is very useful
for extracting information based on names, or for merging different variables into a
data frame (See Section Concatenating data with pd.concat).
We start by introducing Series as this is a simpler data structure than DataFrame,
and allows us to introduce index.
3.2.1 Series
Series is a one-dimensional positional column (or row) of values. It is in some sense
similar to list, but from another point of view it is more like a dict, as it contains index,
and you can look up values based on index as a key. So it allows not only positional
access but also index-based (key-based) access. In terms of internal structure, it is
implemented with vectorized operations in mind, so it supports vectorized arithmetic,
and vectorized logical, string, and other operations. Unlike dicts, it also supports
multi-element extraction.
s = pd.Series([1,2,5,6])
s
## 0 1
## 1 2
## 2 5
## 3 6
## dtype: int64
Series is printed in two columns. The first one is the index, the second one is the
value. In this example, index is essentially just the row number and it is not very
useful. This is because we did not provide any specific index and hence pandas
picked just the row number. Underneath the two columns, you can also see the data
type, in this case it is 64-bit integer, the default data type for integers in python.
Exercise 3.6 Create a series of 4 capital cities where the index is the name of
corresponding country.
pop.values
## array([38, 26, 19, 19])
pop.index
## Index(['ca', 'tx', 'ny', 'fl'], dtype='object')
Note that values are returned as np array, and index is a special index object. If
desired, this can be converted to a list:
list(pop.index)
## ['ca', 'tx', 'ny', 'fl']
Series also supports ordinary mathematics, e.g. we can do operations like
pop > 20
## ca True
## tx True
## ny False
## fl False
## dtype: bool
the result will be another series, here of logical values, as indicated by the “bool” data
type.
3.2.2 DataFrame
DataFrame is the central data structure for holding 2-dimensional rectangular data. It
is in many ways similar to R dataframes. However, it also shares a number of
features with Series, in particular the index, so you can imagine a data frame is just a
number of series stacked next to each other. Also, extracting single rows or columns
from DataFrames typically results in a series.
df = {'ca': [35, 37, 38], 'tx': [23, 24, 26], 'md': [5,5,6]}
pop = pd.DataFrame(df)
print('population:\n', pop, '\n')
## population:
## ca tx md
## 0 35 23 5
## 1 37 24 5
## 2 38 26 6
The data frame is printed as four columns. Exactly as in case of series, the first
column is index. In the example above we did not specify the index and hence
pandas picked just row numbers. But we can provide an explicit index, for instance
the year of observation:
Exercise 3.7 Create a dataframe of (at least 4) countries, with 2 variables: population
and capital. Country name should be the index.
What happens if we use a wrong separator? This can be easily checked with printing
the number of columns, and printing a few lines of data. Here is an example:
a.columns
## Index(['date\tapprove\tdisapprove\tdontknow'], dtype='object')
The tab markers \t in printout give strong hints that the correct separator is tab.
It may initially be quite confusing to understand how to specify the file name. If you
load data in a jupyter notebook, then the working directory is normally the same
directory where the notebook is located3. Notebook also let’s you to complete file
names with TAB key. But in any case, the working directory can be found
with os.getcwd (get current working directory):
import os
os.getcwd()
## '/home/siim/tyyq/lecturenotes/machinelearning-py'
This helps to specify the relative path if your data file is not located in the same place
as your code. You can also find which files does python find in a given folder,
e.g. in ../data/:
files = os.listdir("../data/")
files[:5]
## ['house-votes-84.csv.bz2', 'marathon.csv.bz2', 'growth-unemployment-
2016.csv.bz2', 'hadcrut-5.0.1.0-annual.csv.bz2', 'trump-approval.csv']
As we see, this function returns a list of file names it finds in the given location.
• Select variables explains how to select desired variables from a data frame
• Modifying data frames: there are slight differences when modifying data instead of
extracting, these are discussed here.
Fortunately, Series and data frames behave in a broadly similar way, e.g. selecting
cases by logical conditions, based on index, and location are rather similar. As series
do not have columns, we cannot access elements by column name or by column
position though.
approval.head(4)
## date approve disapprove dontknow
## 0 2001 Dec 14-16 86 11 3
## 1 2001 Dec 6-9 86 10 4
## 2 2001 Nov 26-27 87 8 5
## 3 2001 Nov 8-11 87 9 4
To begin with, data frames have variable names. We can extract a single variable
either with ["varname"] or a shorthand as attribute .varname (note:
replace varname with the name of the relevant variable):
approval["approve"] # approval, as series
## 0 86
## 1 86
## 2 87
## 3 87
## 4 87
## 5 88
## 6 89
## 7 87
## 8 90
## 9 86
## Name: approve, dtype: int64
approval.approve # the same, as series
## 0 86
## 1 86
## 2 87
## 3 87
## 4 87
## 5 88
## 6 89
## 7 87
## 8 90
## 9 86
## Name: approve, dtype: int64
These constructs return the column as a series. If we prefer to get a single-column
data frame, we can wrap the variable name into a list:
The previous example where we extracted a single column as a data frame instead
of Series also hints how to extract more than one variable: just wrap all the required
variable names into a list:
The filtered object is not a new data frame but a view of the original data frame. This
may give you warnings and errors later when you attempt to modify the filtered data.
If you intend to do that, perform a deep copy of data using the .copy method. See
more in Section 3.3.5.
One can also drop the .loc[] syntax and just use square brackets, so instead of
writing pop.loc[["ID", "MY"]], one can just write pop[["ID", "MY"]].
The fact that there are several ways to extract positional data causes a lot of
confusion for beginners. It is not helped by the common habit of not using indices and
just relying on the automatic row-numbers. In this case positional access
by .iloc[] produces exactly the same results as the index access by .loc[], and one
can conveniently forget about the index and use whatever feels easier. But
sometimes the index changes as a result of certain operations and that may lead to
errors or unexpected results. For instance, we can create an alternative population
series without explicit index:
pop1 = pd.Series([np.nan, 26, 19, 13]) # index is 0, 1, ...
pop1
## 0 NaN
## 1 26.0
## 2 19.0
## 3 13.0
## dtype: float64
In this example, position and index are equivalent and hence it is easy to forget
that .loc[] is index-based access, not positional access! So one may freely mix both
methods (and remember, .loc is not needed):
pop1.loc[2]
## 19.0
pop1.iloc[2]
## 19.0
pop1[2]
## 19.0
This becomes a problem if a numeric index is not equivalent to row number any more,
for instance after we drop missings:
Exactly as series, data frames allow positional access by .iloc[]. However, as data
frames are two-dimensional objects, .iloc accepts two arguments (in brackets,
separated by comma), the first one for rows, the second one for columns. So we can
write
countries.iloc[2] # 3rd row, as series
## capital Phnom Penh
## population 15.3
## Name: KH, dtype: object
countries.iloc[[2]] # 3rd row, as data frame
## capital population
## KH Phnom Penh 15.3
countries.iloc[2,1] # 3rd row, 2nd column, as a number
## 15.3
There is also an index-based extractor .loc[] that accepts one (for rows) or two (for
rows and columns) indices. In case of data frames, the default row index is just the
row number; but the column index is the variable names. So we can write
countries.loc["MY","capital"] # Malaisian capital
## 'Kuala Lumpur'
countries.loc[["KH", "ID"], ["population", "capital"]]
# Extract a sub-dataframe
## population capital
## KH 15.3 Phnom Penh
## ID 267.7 Jakarta
Unfortunately, data frames add their confusing constructs. When accessing data
frames with .loc[] then we have to specify rows first, and possibly columns second. If
we drop .loc then we cannot specify rows. That is, unless we extract one variable
with brackets, get a series and extract the desired row in the second set of brackets…
countries["capital"]
## MY Kuala Lumpur
## ID Jakarta
## KH Phnom Penh
## Name: capital, dtype: object
countries["capital"]["MY"]
## 'Kuala Lumpur'
Finally, remember that 2-D numpy arrays will use similar integer-positional syntax
as .iloc[], just without .iloc.
In conclusion, it is very important to know what is your data type when using numpy
and pandas. Indexing is all around us when working with data, there are many
somewhat similar ways to extract elements, and which way is correct depends on the
exact data type.
However, there are several exceptions and caveats. Let’s demonstrate this by
modifying the data frame of three countries we created above.
Explicit copy is not needed before you start modifying data, you can do various
filtering steps without .copy as long as you make the copy before modifications.
Ensure that you store and print the final data frame!
M = np.array([[1507, 12478],
[-500, 11034],
[1537, 8443],
[1591, 6810]])
M
## array([[ 1507, 12478],
## [ -500, 11034],
## [ 1537, 8443],
## [ 1591, 6810]])
df = pd.DataFrame(M, columns=["established", "population"],
index=["Mumbai", "Delhi", "Bangalore", "Hyderabad"])
df
## established population
## Mumbai 1507 12478
## Delhi -500 11034
## Bangalore 1537 8443
## Hyderabad 1591 6810
s = pd.Series(M[:,0], index=df.index)
s
## Mumbai 1507
## Delhi -500
## Bangalore 1537
## Hyderabad 1591
## dtype: int64
(This is data about four cities, the year when those were established, and population
in thousands).
Exercise 3.13 Create another numpy matrix and a data frame about cities in a similar
fashion: create a matrix of data, and create a data frame from it using pd.DataFrame.
Specify index (row names) and columns (variable names). Include at least 3 cities
and 3 variables (e.g. population in millions, size in km2, and population density
people per km2).
Hint: you may invent both city names and the figures!
• Series: use iloc and brackets (but these are just 1-dimensional):
s.iloc[1] # second row
## -500
Extract using index (city names/column names):
• Numpy arrays: use brackets and use a colon : in row indicators place:
M[:,0]
## array([1507, -500, 1537, 1591])
• Data frames: you can use iloc and brackets, exactly as in case of numpy arrays. You
can also use brackets and column names (column index) without iloc, or dot-column
name:
df.iloc[:,0]
## Mumbai 1507
## Delhi -500
## Bangalore 1537
## Hyderabad 1591
## Name: established, dtype: int64
df["established"]
## Mumbai 1507
## Delhi -500
## Bangalore 1537
## Hyderabad 1591
## Name: established, dtype: int64
df.established
## Mumbai 1507
## Delhi -500
## Bangalore 1537
## Hyderabad 1591
## Name: established, dtype: int64
If you want to extract rows and columns in a mixed, e.g. rows by number, and
columns by column names (index), you can use double extraction (two sets of
brackets) and chain your extractions into a single line:
df.iloc[:3,:]["population"]
## Mumbai 12478
## Delhi 11034
## Bangalore 8443
## Name: population, dtype: int64
Exercise 3.14 Take your own city matrix and city data frame. From both of these
extract:
• population density (for all cities)
• data for the third city. For the data frame do it in two ways: using index, and using row
number!
• area of the second city. For the data frame, do it in two ways: using column name
(column index), and column number!
Finally, if asking for a single entry (singleton), pandas simplifies the result into a
lower-ranked object (series instead of data frame, or a number instead of series). If
you want to retain a similar data structure as the original one, wrap your selector in a
list. For instance, the previous example that returns a data frame: single line:
df.iloc[:3,:][["population"]]
## population
## Mumbai 12478
## Delhi 11034
## Bangalore 8443
All these methods can create rather confusing situations sometimes. For instance, if
we do not specify index, it will be automatically created as row numbers (but starting
from 0, not 1). In that case df.iloc[i] and df.loc[i] give the same result
(assuming i is a list of row numbers). Even worse, if the index skips some numbers,
then df.loc[i] may or may not work, and even where it works, it may give wrong
results! In a similar fashion, M[i,j] works but df[i,j] does not
work, df.loc[i,j] works but M.loc[i,j] does not work. In order to tell if the syntax is
correct it is necessary to know what is the data structure.
2. There are also operations that are not performed elementwise when using
array, in particular matrix product↩︎
3. If you run your code from command line, the working directory is the directory
where you run the command, not the directory where the program is located.↩︎
import numpy as np
import pandas as pd
4.1 Matplotlib
Matplotlib is designed to be similar to the plotting functionality in the popular matrix
language matlab. This is a library geared to scientific plotting. In these notes, we are
mainly interested in the pyplot module but matplotlib contains more functionality,
e.g. handling of images. Typically we import pyplot as plt:
import matplotlib.pyplot as plt
This page is compiled using matplotlib 3.5.1.
x = np.random.normal(size=50)
y = np.random.normal(size=50)
_ = plt.scatter(x, y)
_ = plt.show()
This small code demonstrates several functions:
• plt.scatter creates a scatterplot (point plot). It takes arguments x and y for the
horizontal and vertical placement of dots
• plt.scatter returns an object, we may want to assign it to a temporary variable to
avoid printing. We use variable name _ (just underscore) for a value we are not really
storing but just avoiding printing.
• Scatterplot automatically computes the suitable axis range.
• plt.show makes the plot visible. It may not be necessary, depending on the
environment you use. For instance, when you run a notebook cell, it will automatically
make the plot visible at the end. However, if you want to make two plots inside of the
cell, you still need to call plt.show to tell matplotlib that you are done with the first plot
and now it is time to show it.
• Finally, plt.show also returns an object, and we assign it to a temporary variable to
avoid printing.
Next, here is another simple example of line plot:
x = np.linspace(-5, 5, 100)
y = np.sin(x)
_ = plt.plot(x, y)
_ = plt.show()
Most of the functionality should be clear by now, but here are a few notes:
• The first lines create a linear sequence of 100 numbers between -5 and 5, and
compute sin of these numbers.
• Line plots are done using plt.plot, it also takes arguments x and y.
x = np.random.normal(size=50)
y = np.random.normal(size=50)
_ = plt.scatter(x, y,
color="red", # dot color
edgecolor="black",
alpha=0.5 # transparency
)
_ = plt.xlabel("x") # axis labels
_ = plt.ylabel("y")
_ = plt.title("Random dots") # main label
_ = plt.xlim(-5, 5) # axis limits
_ = plt.ylim(-5, 5)
_ = plt.show()
Most of the features demonstated above are obvious from the code and comments.
However, some explanations are still needed:
• Argument color denotes dot color when specified as color name, like “red” or “black”.
There is also another way to specify colors, c, see below.
• Alpha denotes transparency with alpha=0 being completely transparent (invisible)
and alpha=1 being completely oblique (default).
• All the additional functions return an object that we store into a temporary variable in
order to avoid printing.
• All the additional functions in plt are executed before the actual plot is drawn on
screen. In particular, despite we specify the axis limits after plt.scatter, they still
apply to the scatterplot.
Sometimes we want to make color of the dots dependent of another variable. In this
case we can use argument c instead of color:
x = np.random.normal(size=50)
y = np.random.normal(size=50)
z = np.random.choice([1,2,3], size=50)
_ = plt.scatter(x, y,
c=z # color made of variable "z"
)
_ = plt.show()
Now the dots are of different color, depending on the value of z. Note that the values
submitted to c argument must be numbers, strings will not work.
4.1.3 Histograms
Histograms are a quick and easy way to get an overview of 1-D data distributions.
These can be plotted using plt.hist. As hist returns bin data, one may want to
assign the result into a temporary variable to avoid spurious printing in ipython -based
environments (such as notebooks):
x = np.random.normal(size=1000)
_ = plt.hist(x)
_ = plt.show()
Not surprisingly, the histogram of normal random variables looks like, well, a normal
curve.
We may tune the picture somewhat using arguments bins to specify the desired
number of bins, and make bins more distinct by specifying edgecolor:
_ = plt.hist(x, bins=30, edgecolor="w")
_ = plt.show()
Introduction to Tkinter Module in Python
Tkinter module in Python is a standard library in Python used for creating Graphical
User Interface (GUI) for Desktop Applications. With the help
of Tkinter developing desktop applications is not a tough task.
The Tkinter module in Python is a good way to start creating simple projects in Python
Before starting with Tkinter you should have basic knowledge of Python.
The Tkinter library provides us with a lot of built-in widgets (also called Tk widgets or
Tk interface) that can be used to create different desktop applications.
What is Tkinter?
Tkinter in Python helps in creating GUI Applications with minimum hassle. Among
various GUI Frameworks, Tkinter is the only framework that is built-in into Python's
Standard Library.
Let's try to understand more about the Tkinter module by discussing more about its
origin.
• Tkinter is based upon the Tk toolkit, which was originally designed for the Tool
Command Language (Tcl). As Tk is very popular thus it has been ported to
a variety of other scripting languages, including Perl (Perl/Tk), Ruby
(Ruby/Tk), and Python (Tkinter).
• The wide variety of widgets, portability, and flexibility of Tk makes it the right
tool which can be used to design and implement a wide variety of simple and
complex projects.
• Python with Tkinter provides a faster and more efficient way to build useful
desktop applications that would have taken much time if you had to program
directly in C/C++ with the help of native OS system libraries.
Tkinter comes as part of standard Python installation. So if you have installed the
latest version of Python, then you do not have to do anything else.
If you do not have Python installed on your system - Install Python (the current version is 3.11.4 at the
time of writing this article) first, and then check for Tkinter.
You can determine whether Tkinter is available for your Python interpreter by
attempting to import the Tkinter module.
import tkinter
If Tkinter is available, then there will be no errors, otherwise, you will see errors in the
console.
The basic steps of creating a simple desktop application using the Tkinter module in
Python are as follows:
So let's write some code and create a basic desktop application using the Tkinter
module in Python.
Hello World tkinter Example
• When you create a desktop application, the first thing that you will have to do is
create a new window for the desktop application.
• The main window object is created by the Tk class in Tkinter.
• Once you have a window, you can text, input fields, buttons, etc. to it.
import tkinter as tk
win = tk.Tk()
win.title('Hello World!')
win.mainloop()
The window may look different depending on the operating system.
The two main methods that are used while creating desktop applications in Python are:
1. Tk( )
This is how you can use it, just like in the Hello World code example,
This method is used to start the application. The mainloop() function is an infinite
loop that is used to run the application.
It will wait for events to occur and process the events as long as the window is not
closed.
import tkinter as tk
root = tk.Tk()
root.title("Tkinter World")
label.pack()
entry = tk.Entry(root)
entry.pack()
button.pack()
root.mainloop()
Let's create two simple applications using Tkinter: a greet app and a counter app.
1. Greet App
This app will have a text entry where the user can input their name and a button to display a
greeting message.
Explanation:
This app will have a label displaying a counter and two buttons to increment and decrement the
counter.
Explanation:
1. Import tkinter and math modules: Import necessary modules for the GUI and
mathematical functions.
2. Calculator Class: Create a class Calculator to manage the calculator's functionalities.
o Initialization: Initialize the calculator, create input and button frames, and add
buttons.
o create_input_frame: Create the input frame where the expression is displayed.
o create_buttons_frame: Create the frame for the buttons.
o add_buttons: Define the buttons and add them to the button frame.
o create_button: Create individual buttons and place them in the grid.
o on_button_click: Define the behavior when a button is clicked, handling
operations, special functions, and updating the expression.
3. Main Loop: Instantiate the Calculator class and start the Tkinter main loop.