Iloc, Loc, and Ix For Data Selection in Python Pandas - Shane Lynn
Iloc, Loc, and Ix For Data Selection in Python Pandas - Shane Lynn
Shane Lynn
Data science, Startups, Analytics, and Data visualisation.
posts by email.
Email Address
Subscribe
Selection Options
There’s three main options to achieve the selection and indexing activities in Pandas, which can be
confusing. The three selection cases and methods covered in this post are: Pandas Tutorials
2. Selecting data by label or by a conditional statement (.loc) Aggregating, and Grouping data in
3. Selecting in a hybrid approach (.ix) (now Deprecated in Pandas 0.20.1) Python
This blog post, inspired by other tutorials, describes selection activities with these operations. The Merge and Join DataFrames with Pandas
tutorial is suited for the general data science situation where, typically I find myself: in Python
3. I need to quickly and often select relevant rows from the data frame for modelling and Libraries for Data Visualisation
summarising data with Pandas. Pandas Drop: Delete DataFrame Rows &
Columns
Categories
blog
C++
Cycling
data science
Data Visualisation
Pandas
Summary of iloc and loc methods discussed in this blog post. iloc and loc are operations for python
retrieving data from Pandas dataframes.
R
DataFrames Software
Talks
For these explorations we’ll need some sample data – I downloaded the uk-500 sample data set
Tutorials
from www.briandunning.com. This data contains artificial names, addresses, companies and
Uncategorized
phone numbers for fictitious UK characters. To follow along, you can download the .csv
file here. Load the data as follows (the diagrams here come from a Jupyter notebook in the web
Anaconda Python install):
1
2 import pandas as pd
3 import random
4
5 # read the data from the downloaded CSV file.
6 data = pd.read_csv('https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv')
9
10 data.head(5)
The iloc indexer syntax is data.iloc[<row selection>, <column selection>], which is sure to be a
source of confusion for R users. “iloc” in pandas is used to select rows and columns by
number, in the order that they appear in the data frame. You can imagine that each row has a row
number from 0 to the total rows (data.shape[0]) and iloc[] allows selections based on these
numbers. The same applies for columns (ranging from 0 to data.shape[1] )
There are two “arguments” to iloc – a row selector, and a column selector. For example:
3 data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.
6 # Columns:
Pandas Index - Single iloc selections.py hosted with ❤ by GitHub view raw
Multiple columns and rows can be selected together using the .iloc indexer.
3 data.iloc[:, 0:2] # first two columns of data frame with all rows
4 data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.
5 data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1).
Pandas Index - Multi iloc selections.py hosted with ❤ by GitHub view raw
1. Note that .iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame
when multiple rows are selected, or if any column in full is selected. To counter this, pass a
single-valued list if you require DataFrame output.
When using .loc, or .iloc, you can control the output format by passing lists or single values
to the selectors.
2. When selecting multiple columns or multiple rows in this manner, remember that in your
selection e.g.[1:5], the rows/columns selected will run from the first number to one minus
the second number. e.g. [1:5] will go 1,2,3,4., [x,y] goes from x to y-1.
In practice, I rarely use the iloc indexer, unless I want the first ( .iloc[0] ) or the last ( .iloc[-1] )
row of the data frame.
The loc indexer is used with the same syntax as iloc: data.loc[<row selection>, <column
selection>] .
1 data.set_index("last_name", inplace=True)
2 data.head()
Pandas Index - Setting index for iloc.py hosted with ❤ by GitHub view raw
Now with the index set, we can directly select rows for different “last_name” values using
.loc[<label>] – either singly, or in multiples. For example:
Selecting single or multiple rows using .loc index selections with pandas. Note that the first
example returns a series, and the second returns a DataFrame. You can achieve a single-column
DataFrame by passing a single-element list to the .loc operation.
Select columns with .loc using the names of the columns. In most of my data work, typically I have
named columns, and use these named selections.
When using the .loc indexer, columns are referred to by names using lists of strings, or “:” slices.
You can select ranges of index labels – the selection </code>data.loc[‘Bruch’:’Julio’]</code> will
return all rows in the data frame between the index entries for “Bruch” and “Julio”. The following
examples should now make sense:
1
2 # Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'
4 # Select same rows, with just 'first_name', 'address' and 'city' columns
6
7 # Change the index to be based on the 'id' column
8 data.set_index('id', inplace=True)
10 data.loc[487]
Pandas Index - Select rows with loc.py hosted with ❤ by GitHub view raw
Note that in the last example, data.loc[487] (the row with index value 487) is not equal to
data.iloc[487] (the 487th row in the data). The index of the DataFrame can be out of numeric
order, and/or a string or multi-value.
In most use cases, you will make selections based on the values of different columns in your data
set.
For example, the statement data[‘first_name’] == ‘Antonio’] produces a Pandas Series with a
True/False value for every row in the ‘data’ DataFrame, where there are “True” values for the rows
where the first_name is “Antonio”. These type of boolean arrays can be passed directly to the .loc
indexer as so:
Using a boolean True/False series to select rows in a pandas data frame – all rows with first name
of “Antonio” are selected.
As before, a second argument can be passed to .loc to select particular columns out of the data
frame. Again, columns are referred to by name for the loc indexer and can be a single string, a list
of columns, or a slice “:” operation.
Selecting multiple columns with loc can be achieved by passing column names to the second
argument of .loc[]
Note that when selecting columns, if one column only is selected, the .loc operator returns a Series.
For a single column DataFrame, use a one-element list to keep the DataFrame format, for
example:
If selections of a single column are made as a string, a series is returned from .loc. Pass a list to get
a DataFrame back.
Make sure you understand the following additional examples of .loc selections for clarity:
1
2 # Select rows with first name Antonio, # and all columns between 'city' and 'email'
4
5 # Select rows where the email column ends with 'hotmail.com', include all columns
6 data.loc[data['email'].str.endswith("hotmail.com")]
7
8 # Select rows with last_name equal to some values, all columns
10
11 # Select rows with first name Antonio AND hotmail email addresses
13
14 # select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
16
17 # A lambda function that yields True/False values can also be used.
20
21 # Selections can be achieved outside of the main .loc for clarity:
24 # Select only the True values in 'idx' and only the 3 columns specified:
Pandas index - loc selection examples.py hosted with ❤ by GitHub view raw
Logical selections and boolean Series can also be passed to the generic [] indexer of a pandas
DataFrame and will give the same results: data.loc[data[‘id’] == 9] == data[data[‘id’] == 9] .
The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc
indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer.
This only works where the index of the DataFrame is not integer based. ix will accept any of the
inputs of .loc and .iloc.
Slightly more complex, I prefer to explicitly use .iloc and .loc to avoid unexpected results.
As an example:
1
2 # ix indexing works just the same as .loc when passed strings
3 data.ix[['Andrade']] == data.loc[['Andrade']]
5 data.ix[[33]] == data.iloc[[33]]
6
7 # ix only works in both modes when the index of the DataFrame is NOT an integer itself.
As an example:
1 # Change the first name of all rows with an ID greater than 2000 to "John"
2 data.loc[data['id'] > 2000, "first_name"] = "John"
3
4 # Change the first name of all rows with an ID greater than 2000 to "John"
Pandas index - changing data with loc.py hosted with ❤ by GitHub view raw
That’s the basics of indexing and selecting with Pandas. If you’re looking for more, take a look at
the .iat, and .at operations for some more performance-enhanced value accessors in the Pandas
Documentation and take a look at selecting by callable functions for more iloc and loc fun.
! Subscribe !
{} [+] #
96 COMMENTS " #
0 Reply
mariana
" 1 year ago
Really helpful Shane for beginners. Very through and detailed. Looking for more of your blogs on pandas and
python.
8 Reply
Yahor
" 1 year ago
Very helpful content, Shane. Helped me clear my understanding of working with row selections.
2 Reply
Bowen
" 1 year ago
1 Reply
Maria
" 1 year ago
Very precise and clear. Easy to understand. Thanks for the content
1 Reply
Amol Wadpalle
" 1 year ago
3 Reply
Dihao Qi
" 1 year ago
2 Reply
Dung
" 1 year ago
2 Reply
Hari Natarajan
" 1 year ago
Finally, I have a clear picture. Your instructions are precise and self-explanatory. I wish you publish a detailed
book on Python Programming so that it will be of immense help for learners and programmers.
2 Reply
[…] You can read more about the usage of iloc here. […]
1 Reply
Excellent post. Thank you so much for coming with such awesome content
1 Reply
Marilu
" 11 months ago
Thank you so much, it helped me a lot to understand pandas selection, great article for beginners like me
1 Reply
Elaheh Arjomand
" 7 months ago
0 Reply
Chuck
" 6 months ago
0 Reply
khoa
" 6 months ago
this is so concise and fully side of selecting element in pandas. Thank you, writer!
0 Reply
Sujay Bhujbal
" 4 months ago
0 Reply
nick
" 4 months ago
0 Reply
MARCELLO DISTASIO
" 1 month ago
0 Reply
Aurelien
" 26 days ago
Hello!
Thank you very much for this nice article.
I try to use a dataset with scikit-learn M/L algorithm. I have approximatly 4000 samples (Sn), but my dataset is
in this format : (first image, multiple lines for one output); I would like to move it in this format (second image),
to have each sample on 1 raw.
loc and iloc can helps me in moving every 5 raw for column 1 in a single raw please?
0 Reply
Aurelien
" 26 days ago
Hello!
Thank you very much for this nice article.
I try to use a dataset with scikit-learn M/L algorithm. I have approximatly 4000 samples (Sn), but my dataset is
in this format : (multiple lines of input for one output); I would like to move it in this format (second image), to
have each sample on 1 raw.
loc and iloc can helps me in moving every 5 raw for column 1 in a single raw please?
0 Reply
« Previous 1 2 3