0% found this document useful (0 votes)
5 views21 pages

Data Engineering Tasks Lesson1

This document outlines Exercise 2 of a data engineering course using ArcGIS Pro, focusing on preparing U.S. presidential election data for predictive analysis. It includes software requirements, an introduction to data engineering concepts, and step-by-step instructions for resolving missing values, exploring ArcGIS Pro projects, and preparing datasets for enrichment. The exercise emphasizes the importance of data preparation in modeling voter turnout and provides resources for downloading necessary data files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views21 pages

Data Engineering Tasks Lesson1

This document outlines Exercise 2 of a data engineering course using ArcGIS Pro, focusing on preparing U.S. presidential election data for predictive analysis. It includes software requirements, an introduction to data engineering concepts, and step-by-step instructions for resolving missing values, exploring ArcGIS Pro projects, and preparing datasets for enrichment. The exercise emphasizes the importance of data preparation in modeling voter turnout and provides resources for downloading necessary data files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

Exercise 2: Perform data engineering tasks

How can I print an exercise to PDF format?

Software requirements
· ArcGIS Online
· ArcGIS Pro 3.5.x (latest release)

Introduction
Data engineering is a fundamental part of every analysis. The term refers to the planning, preparation, and processing of data to make it more useful for
analysis. It can include simple tasks like identifying and correcting imperfections in your data and calculating new fields. It can also include more complex
tasks like reducing the dimensions of a multivariate dataset.

Data engineering also involves the process of geoenriching your data. Geoenrichment can include various tasks:

· Adding a spatial location to your data, referred to as geocoding


· Using other data sources to extract information and add these new values to your dataset (that is, enrich your data)
· Calculating new fields that represent spatial characteristics, like the distance from a particular feature in a landscape
In this exercise, you will use the Data Engineering view in ArcGIS Pro to perform data engineering tasks. These tasks will use the built-in geoprocessing
tools and automation features that are available with ArcGIS Pro. At the end of the exercise, you will have the option to complete a stretch goal that uses
ArcGIS Notebooks to reproduce these tasks in Python. ArcGIS Notebooks allow you to document and share the steps that you take to prepare your data
for analysis, creating a transparent and reproducible workflow.

Scenario
Because voting is voluntary in the United States, the level of voter participation (referred to as "voter turnout") has a significant impact on the election
results and resulting public policy.

Modeling voter turnout, and understanding where low turnout is prevalent, can inform outreach efforts to increase voter participation. With the ultimate
goal of predicting voter turnout, in this exercise, you will focus on performing various data engineering tasks to prepare election result data for predictive
analysis.

The data for this section is obtained from the Harvard Dataverse ([Link] and the United States Census Bureau
([Link] The voter turnout dataset from Harvard Dataverse has vote totals from each U.S. county for U.S. presidential elections
from 2000 to 2020.

Note: The exercises in this course include View Result links. Click these links to confirm that your
results match what is expected.

Estimated completion time in minutes: Approximately 90 minutes

Expand all steps Collapse all steps

- Step 1: Download the exercise data files

Note: This exercise requires ArcGIS Pro 3.5 and your MOOC student username and password.
Please follow each step in Exercise 1: Prepare your machine for course exercises before
continuing.

In this step, you will download the exercise data files.

a Open a new web browser tab or window.

b Go to [Link] and download the exercise data ZIP file.

Note: The complete URL to the exercise data file is [Link]


id=9487775690064b159099d152ad04eec5.

c Create a folder on your local computer and name it EsriTraining.

1/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

Step 1c***: Download the exercise data


files.

Throughout this course, you will save all your data to this folder. When you create the folder, do not include any spaces or special characters in the
folder name.

d Extract the exercise data files to the EsriTraining folder on your local computer.

e After you extract the folder, confirm that the data files are stored in the DataEngineering_and_Visualization folder.

Step 1e***: Download the exercise data files.

f Leave the DataEngineering_and_Visualization folder open.

You downloaded and extracted the exercise data files that you will need to complete the first section of the MOOC.

Collapse

- Step 2: Explore an ArcGIS Pro project

In this step, you will use your course ArcGIS account username and password to sign in to ArcGIS Pro. You will need to use your course ArcGIS
account to license ArcGIS Pro and to access other software applications that are used throughout the MOOC exercises.

a In File Explorer, open the DataEngineering_and_Visualization folder, if necessary.

Step 2a***: Explore an ArcGIS Pro project.

The DataEngineering_and_Visualization folder shows all the data files that you need to complete the exercises in this section. You will open the
ArcGIS Pro project from File Explorer.

b Double-click the DataEngineering_and_Visualization ArcGIS project file .


- Hint

Do not open the files with .gdb or .tbx extensions.

c Sign in to ArcGIS Pro with the provided course ArcGIS account username that ends in _SDS, if necessary.

Note: The course ArcGIS account username and password are listed on the MOOC home page
under Lessons. Steps for accessing this information are available in Exercise 1: Prepare
your machine for course exercises.

After you have signed in, the ArcGIS Pro project opens to show the Data Engineering map. Next, you will explore the
DataEngineering_and_Visualization ArcGIS Pro project that you downloaded.

The top of your project displays the ArcGIS Pro ribbon. ArcGIS Pro uses this horizontal ribbon to display and organize functionality into a series of
tabs.

d On the ribbon, click the View tab, as shown in the following graphic.

e In the Windows group, click Pane Sets and choose Mapping .

Step 2e***: Explore an ArcGIS Pro project.

You have reset your application panes to the Mapping default. Your ArcGIS Pro project is open to a gray reference map, which is called a
basemap. Because you are preparing U.S. election data, the basemap is currently focused on the contiguous United States.

f On the ribbon, click the Map tab.

On the Map tab is the Navigate group, which provides the tools that you need to navigate the map. The default tool is the Explore tool , which
you can use to pan and zoom in and out of maps. To explore different areas of the world on this basemap, pan the map by clicking your mouse

2/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
and holding down the button while you move the map. When you pan a map with the mouse, the pointer becomes a hand. Zoom in or out of the
map by using the mouse wheel or by using the Fixed Zoom In button or Fixed Zoom Out button in the Navigate group.

You have reset the panes to show the default mapping panes. To the left side of the map is the Contents pane, which lists the layers that have
been added to the map. To the right side of the map is the Catalog pane, which lists the items that are associated with this ArcGIS Pro project:
Maps, Toolboxes, Notebooks, Databases, Styles, Folders, and Locations.

Note: To learn more about the ArcGIS Pro interface, go to ArcGIS Pro Help: ArcGIS Pro user
interface ([Link] To learn more about ArcGIS Pro projects,
go to ArcGIS Pro Help: Projects in ArcGIS Pro ([Link]

You explored an ArcGIS Pro project. Next, you will explore the data that you will prepare for analysis.

Collapse

- Step 3: Resolve missing values in a table

In this exercise, you will perform data engineering tasks to prepare United States presidential election data for predictive analysis. You have
multiple data sources on election results and voting-age population.

You will begin by preparing the presidential election results table. You must address missing values, reformat data types, and restructure the
format of the data table.

a In the Catalog pane, expand Databases, and then expand the DataEngineering_and_Visualization geodatabase.

Step 3a***: Resolve missing values in a table.

b Right-click CountyPresElect and choose Add To Current Map .

You have added a table to your ArcGIS Pro project. The table contains data by county for several United States presidential elections. First, you
will explore some fields in the elections table, using the Data Engineering view to inspect for inconsistent or missing values.

c In the Contents pane, locate the CountyPresElect table that you just added.

d Right-click CountyPresElect and choose Data Engineering .

The Data Engineering view opens. The Data Engineering view is in a dockable window that can be moved and docked in the same way that you
dock maps, layouts, and attribute tables. In addition to the view, a Data Engineering contextual tab is available in the ribbon. The ribbon tab
provides access to commands that are used for data engineering.

The Data Engineering view contains two panels: a fields panel on the left and a statistics panel on the right. The fields panel allows you to explore
fields, change symbology, and produce charts for fields in the table. The statistics panel allows you to explore the values and distribution of your
data by viewing statistics and data quality metrics. The panel's statistics table is empty by default. You can add fields from the fields panel.

e In the Data Engineering view, in the statistics panel, click Add All Fields And Calculate.

Step 3e***: Resolve missing values in a table.

Summary statistics are generated for each field.

f In the statistics panel, locate the column titled Nulls, then answer the following question.

? How many nulls does each field contain?

- Answer
The FIPS field contains 1 null value, which makes up approximately 0.03 percent of the
records. Additionally, each field that ends in _2000 contains 1 null value.

A record in the elections table is missing information in the FIPS field, which uniquely identifies counties. You will later join this table with voting-
age population data based on the FIPS field, so these values cannot be null. You will identify which row has a null value and establish a strategy
for dealing with that null value.

g Right-click the cell displaying the number of null values for FIPS, as shown in the following graphic.

3/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

h From the right-click menu, choose Select Null.

You have applied a selection to any record(s) with a null FIPS value. You can narrow down your view to just this selection to look for patterns.

i Above the statistics panel, click Attribute Table .

The table view opens.

j Below the table, on the bottom left, click the Show Selected Records button .

? What information does the selected record contain?

- Answer
The record represents votes cast in the District of Columbia for multiple election years.

Removing this record would affect hundreds of thousands of votes and make it impossible to include Washington, D.C., in the predictive analysis.
Additional research indicates that the appropriate FIPS code for Washington, D.C., is 11001. You will add the correct FIPS code for the
Washington, D.C., record.

Because there is only one record, you will make this change directly in the table.

k In the FIPS column, double-click <Null> to activate editing for this record.

Step 3k***: Resolve missing values in a


table.

l Type 11001, and then press Enter.

The FIPS field now displays 11001 for this row.

You activated an editing session for the selected record and changed a value. You must save your edits to confirm this change.

m On the Edit tab, in the Manage Edits group, click Save .

n Click Yes to save your edits, if prompted.

o Click the Data Engineering view tab, as shown in the following graphic.

p On the ribbon, from the Data Engineering contextual tab, in the Selection group, click Clear Selection .

q In the Data Engineering view, above the statistics panel, click Calculate to regenerate the statistics.

? How many nulls does each field now contain?

- Answer
The FIPS field now contains 0 null values. The fields ending in _2000 each contain 1 null
value.

r In the Nulls column, right-click the totalvotes_2000 cell and choose Select Null.

s Above the statistics panel, click Attribute Table .

t In the table, click the Show Selected Records button , if necessary.

u Explore the results and answer the following question.

? How many fields contain null values for the selected record?

- Answer
Three fields in one record contain null values. All three fields represent data for the year
2000.

4/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
All remaining null values in the dataset are attributed to one record. The values appear only in fields representing election year 2000. Later in this
exercise, you will remove data for any election year prior to 2008. Because all other values in this record are valid, there is no need to make
changes to this record.

v Above the table, click the Clear button to clear the selection.

w Close the table view and Data Engineering view for CountyPresElect.

As you work, it is a good idea to periodically save your project. There are several ways that you can save your work in ArcGIS Pro. One way to
quickly save the project is to use the Quick Access Toolbar.

x In the upper-left corner of the ArcGIS Pro app, locate the Quick Access Toolbar and click the Save button, as shown in the following graphic.

You have addressed missing values in the elections data. Next, you will restructure the table to combine it with another source.

Collapse

- Step 4: Prepare a table for enrichment

When preparing a dataset for analysis, it is important to consider the final format of the data and its fields. Later in this exercise, you will join the
elections data table with county voting-age population (CVAP) data. Then, you will join this data again with county geometries.

To match these sources, you will prepare the elections data table as follows:

· Your earliest voting-age population data source covers 2006 to 2010, which will represent voting-age populations for the 2008 election.
Therefore, the predictive analysis must be limited to election years 2008 and later. You can remove elections data for years prior to
2008.

· Counties in all three datasets are uniquely identified by a county FIPS field. In the data source containing county geometries, the FIPS
field data type is text, but in the elections data table, it is numeric. In the elections data table, you will create a new, compatible version of
this field.

You will modify the elections data table to account for these differences. First, you will create a text version of the numeric FIPS field.

a From the Contents pane, right-click CountyPresElect and choose Open.

b From the CountyPresElect table view, click the Calculate Field button .

You will convert your integer FIPS values to text and store the text values in a new field.

c In the Calculate Field dialog box, set or confirm the following parameters:

· Input Table: CountyPresElect


· Field Name (Existing Or New): FIPS_txt
· Field Type: Text
d Under FIPS_txt =, type the following expression: str(!FIPS!).zfill(5)

e Confirm that your tool matches the following graphic, and then click OK.

The FIPS_txt field is added to the table.

f In the table, scroll to the right, if necessary, to view the newly added field.

Step 4f***: Prepare a table for enrichment.

Next, you will clean the dataset by removing unnecessary fields. The earliest voting-age population data that is available starts at the 2008
election. You will remove election data prior to 2008.

g Above the table, on the right, click the Options button .

h From the options menu, choose Fields View .

The fields view displays information about each field in the table.

5/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
i From the fields view, in the fields table, locate the row for totalvotes_2000.

j To the left of the row, click the empty space, as shown in the following graphic, to select this field.

k On your keyboard, press and hold the Ctrl key, then select the following fields:

· totalvotes_2004
· votes_dem_2000
· votes_dem_2004
· votes_gop_2000
· votes_gop_2004
You have selected six fields.

l From the Fields tab, in the Edits group, click Delete .

Step 4l***: Prepare a table for enrichment.

You have deleted fields for election years 2000 and 2004. You must save your edits to retain these changes.

m From the Fields tab, in the Manage Edits group, click Save .

n Close the fields view and the table view.

o Save the project.

You have prepared your election data for future join processes.

Collapse

- Step 5: Use field mapping to merge tables

Next, you will prepare county-level citizen voting-age population (CVAP) data tables to enrich the election dataset. The voting-age data is
separated into four tables, each with estimates of the number of citizens voting in each county for a particular time period. You will combine the
four tables into one table.

a From the Catalog pane, expand Databases and DataEngineering_and_Visualization.gdb, if necessary.

b Select all four CountyCVAP tables in the database, as shown in the following graphic.

- Hint

Click the first CountyCVAP table, then hold the Shift key and click the last CountyCVAP table.

c Right-click any of the four selected tables and choose Add To Current Map .

d In the Contents pane, select all four CountyCVAP tables, if necessary.

e Right-click any of the four selected tables and choose Open.

Table views open for each of the four tables.

f Explore the table contents and answer the following question.

? Are there any differences between each CountyCVAP table's fields?

- Answer
All four tables have the same fields. However, the fields in two of the tables are in
lowercase, and the fields in the other two tables are in uppercase.

6/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You will use the Merge geoprocessing tool to combine the four population data tables into one output table. You will use the Field Mapping feature
to ensure that fields are matched appropriately. In particular, you are interested in the field CVAP_EST, which contains the estimated number of
U.S. citizens 18 years of age or older for that county and year.

g From the Analysis tab, in the Geoprocessing group, click Tools to open the Geoprocessing pane.

Within ArcGIS Pro, geoprocessing refers to a suite of tools for performing analysis, data management, and automation. From the Geoprocessing
pane, you can find a tool by keyword using the search tool, or browse for it by toolbox.

h In the Geoprocessing pane, in the search field, type Merge.

i From the search results, click Merge (Data Management Tools).

The Merge tool opens in the Geoprocessing pane.

j In the Geoprocessing pane, for Input Datasets, click the Add Many button .

k Check the box next to each of the four CountyCVAP tables, and then click Add.

The tool will automatically create an output dataset name that reflects the input. You can keep this name or modify it to be more meaningful for
your analysis.

l For Output Dataset, replace the current text with CountyCVAP.

Note: This parameter represents a file path that leads to the ArcGIS Pro project's file geodatabase
(DataEngineering_and_Visualization.gdb). In ArcGIS Pro, the Current Workspace
environment defaults to the project's default geodatabase.

m For Field Matching Mode, choose Use The Field Map To Reconcile Differences.

Step 5m***: Use field mapping to merge tables.

The field map is generated. A field map is a parameter that modifies how fields from input datasets are processed, written, or mapped to an output
dataset. Field mapping is a useful data engineering tool, allowing you to reconcile fields from difference sources, adjust data types, add or remove
fields, and perform other field-level edits. You will use the field map's Field Properties dialog box to validate and clean up the data before the
merge.

n Above the field map, click Edit .

Step 5n***: Use field mapping to merge tables.

The Field Properties dialog box opens. On the left, the Fields panel shows all fields that will appear in the output table. On the right, the Properties
panel shows information about the selected field. Below the Properties panel are two more panels: Table and Actions And Source Fields. These
panels provide tools for reconciling schema differences between input datasets.

You will use the Field Properties dialog box to remove unnecessary fields from the merge, verify field mapping, and modify output field names.

o On the left, in the Fields panel, point to LNTITLE, then click the Remove button to remove this field.

p Repeat this process until only the following fields remain:

· GEONAME
· GEOID
· CVAP_EST
· YEAR
q On the left, in the Fields list, select GEONAME, if necessary.

r Under Table, select CountyCVAP_2014_2018.

Step 5r***: Use field mapping to merge tables.

The Actions And Source Fields panel displays the field that is mapped to the output. Even though the 2014-2018 dataset uses all lowercase field
names, ArcGIS Pro accurately mapped these fields to uppercase field names in the other tables.

7/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You would like all voting-age population fields to be lowercase, so you will make that change before the merge.

s Under Properties, for Field Name, delete GEONAME and type geoname.

t For Alias, delete GEONAME and type geoname.

u Repeat this process for the remaining fields in the Fields panel so that all output field names and aliases are written in lowercase.

v Compare your results to the following graphic.

You have narrowed down the fields that will appear in the merged table, modified these fields, and confirmed that all four input tables are mapped
to these fields.

w Click OK.

x In the Geoprocessing pane, click Run.

y Open the new CountyCVAP table to see the results.


- Hint

In the Contents pane, right-click CountyCVAP and choose Open.

You have combined tables and created the necessary fields for a table join.

z In the Contents pane, remove the four CountyCVAP source tables.

Note: Do not remove CountyPresElect or CountyCVAP.

- Hint

From the Contents pane, select the four tables, then right-click the selection and choose Remove.

Collapse

- Step 6: Create a pivot table

In the previous step, you concatenated several voting-age population tables together to form one merged table. Before joining the election data
and voting-age population data into one table, you must first ensure that there is a matching field to use for the join.

a Open the table views for CountyCVAP and CountyPresElect.

b From the table view, right-click the tab for CountyPresElect and choose New Vertical Tab Group, as shown in the following graphic.

c Explore the contents of the two tables, then answer the following questions.

? Which field(s) in the CountyCVAP table also appear in the CountyPresElect table?

- Answer
None of the fields in the CountyCVAP table appear in the CountyPresElect table.

? Are there any fields in the CountyCVAP table that contain similar information as
CountyPresElect?

- Answer
Values in the geoname field in CountyCVAP are similar to county_name in the
CountyPresElect table. The geoname field contains each county's name and state, but
the county_name field only contains the county name. The geoid field in CountyCVAP
also contains similar information as the FIPS field in CountyPresElect. For each geoid
string, the last five digits correspond with the same county's FIPS code.

You will use the geoid field in the CountyCVAP table to extract FIPS values into a new field.

8/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
d Activate the CountyCVAP table view, if necessary, then click the Calculate Field button .

e In the Calculate Field dialog box, set or confirm the following parameters:

· Input Table: CountyCVAP


· Field Name (Existing Or New): FIPS
· Field Type: Text
· Expression Type: Python
f Under FIPS =, type the following expression: !geoid![-5:]

g Confirm that your parameters match the following graphic, and then click OK.

Note: The provided expression uses slice notation to extract characters in a string based on their
position. The syntax for this notation is [start:stop]. This expression starts with the item at
index -5 (five characters away from the end of the string, when moving left). Because there is
no value after the colon, the extraction stops at the end of the string.

h In the CountyCVAP table, scroll to the right to see the new field, if necessary.

You have combined tables and created the necessary fields for a table join.

i Close any open table views.

Next, you will restructure the voting-age population data table to match the election data table.

j Open the Data Engineering view for CountyCVAP.


- Hint

Contents pane > right-click CountyCVAP > Data Engineering

k Click Add All Fields And Calculate.

l In the statistics panel, click anywhere in the FIPS row to highlight it.

m Scroll to the right until the Count and Unique statistics columns are visible, then answer the following questions.

? How many records does the FIPS field contain?

- Answer
12,882

? How many unique values does the FIPS field contain?

- Answer
3,224

The format of the voting-age population data table will prevent a proper join onto the elections data table. Currently, each record in the table
corresponds to the voting-age population in each county for each election year. You need to reformat the table so that each county has one unique
record.

You will use the Pivot Table geoprocessing tool to perform this unstacking operation.

The Data Engineering view provides quick access to commonly used data engineering geoprocessing tools, including the Pivot Table tool.

n From the Data Engineering view, in the fields panel, click FIPS to select this field.

o Right-click FIPS, point to Format , and choose Pivot Table .

The Pivot Table tool dialog box opens. Some parameters have been pre-populated, including the Input Table and one of the Input Fields.

p In the Pivot Table dialog box, for Input Fields, in the blank field, choose geoname.

9/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
The Input Fields are the fields that will remain the same for each county in the pivot table.

q For Pivot Field, choose year.

The pivot field determines which values will become new fields. Each year in the voting-age population data will be used to uniquely distinguish
the new fields.

r For Value Field, choose cvap_est.

The value field determines which values will be reported for each of the new fields.

s For Output Table, type CountyCVAP_pivot.

t Confirm that your settings match the following graphic, and then click OK.

Note: Any fields not provided in the Input, Pivot, and Value field parameters will be dropped from
the output pivot table.

You have created a pivoted version of the CountyCVAP table.

You can use the Data Engineering statistics panel to check your work.

u Open the Data Engineering view for CountyCVAP_pivot.

v Click Add All Fields And Calculate.

w In the statistics panel, click anywhere in the FIPS row to highlight it.

x Scroll to the right until the Count and Unique statistics columns are visible, then answer the following questions.

? How many records does the FIPS field contain?

- Answer
3,225

? How many unique values does the FIPS field contain?

- Answer
3,224

When you ran the Pivot Table tool, you set the Input Fields to both FIPS and county_name. In the pivot result, there are more records in the table
than there are unique FIPS values. These results suggest that multiple county names exist for a FIPS code in this dataset. Later in the exercise,
you will locate these duplicate records and resolve them.

y In the Contents pane, right-click CountyCVAP and choose Remove .

z Close any open views and save the project.

Collapse

- Step 7: Enrich data by joining tables

In the previous step, you pivoted values in the cvap_est field by their election year. The resulting field names have a specific syntax: year_<4-digit
year>. In this step, you will rename these fields to make them more informative. Then, you will create an enriched voting dataset by joining the
elections table with the voting-age population table.

a In the Geoprocessing pane, click the Back button, if necessary.

b Search for and open Alter Fields (Multiple) (Data Management Tools).

The Alter Fields (Multiple) geoprocessing tool provides options for modifying field properties of multiple fields in a feature class or table.

c For Input Table, choose CountyCVAP_pivot.

10/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
d Under Field Properties, perform the following actions:
1. For Field Name, choose year2008.
2. For New Field Name, type cvap_est_2008.
3. Check the Clear Alias box.
4. Leave the remaining defaults, then click Add Another.

Note: The Clear Alias option will remove the field alias. During the join process, a new field alias
will be generated from the new field name.

e Repeat this process to rename the three remaining year fields as follows:

Field name New field name Clear alias

year2012 cvap_est_2012

year2016 cvap_est_2016

year2020 cvap_est_2020

Confirm that your tool matches the following graphic, and then click Run.

You updated the field names for CountyCVAP_pivot. Next, you will join this table to the election data table.

f From the Geoprocessing pane, click the Back button, then search for and open Join Field (Data Management Tools).

The Join Field geoprocessing tool joins data onto the input table based on common values in a field. You will join the data from CountyCVAP_pivot
onto CountyPresElect using their common FIPS code.

g In the Geoprocessing pane, fill out the parameters as follows, leaving any remaining defaults:

· Input Table: CountyPresElect


· Input Field: FIPS_txt
· Join Table: CountyCVAP_pivot
· Join Field: FIPS
· For Transfer Fields, click the Add Many button , then check the boxes next to the following fields:
· geoname
· cvap_est_2008
· cvap_est_2012
· cvap_est_2016
· cvap_est_2020
h Click Validate Join, then use the results to answer the following questions.

? How many matches does this join find?

- Answer
3,115

? Will all records in the input table and join table be successfully matched?

- Answer
No. The input table has 3,153 records, which is more than the match number.

Later in the exercise, you will investigate any missing values between the two sources. For now, you will proceed with the join.

i Close the Validate Join message window.

j In the Geoprocessing pane, click Run.

k In the Contents pane, remove CountyCVAP_pivot, then save the project.

11/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You have enriched your elections data with voting-age population data.

Collapse

- Step 8: Resolve missing data

Previously, you saw that not all records were matched between the elections data and the voting-age population data. This suggests that further
data validation may be needed. You will inspect these records using the Data Engineering statistics panel.

a Open the Data Engineering view for CountyPresElect.

b In the Data Engineering view, next to Search Fields, click the Add Fields And Calculate Statistics button .

You have added the joined fields to the statistics panel and regenerated statistics.

c In the statistics panel, scroll down if necessary to view the statistics for the cvap_est fields, then answer the following question.

? How many nulls are in the cvap_est fields?

- Answer
The field cvap_est_2008 contains 39 null values, and the other three cvap_est fields
contain 38 null values.

d In the Nulls column, right-click the cell for cvap_est_2008, then choose Select Null.

e Click Attribute Table to open the CountyPresElect table view.

f From the table view, click Show Selected Records .

g Explore the selected records, then answer the following questions.

? Which state(s) have null values for cvap_est fields?

- Answer
Several records for Alaska have null values in every cvap_est field. One record for
Louisiana has a null value for cvap_est_2008.

? For the selected records for Alaska, what do the values in the county_name field have in
common?

- Answer
All county_name values are called "DISTRICT", followed by a number.

The voting records for Alaska appear unusual because they are referencing a district rather than a county. Additional research shows that Alaska
has a unique government model that does not use counties. The election data is reported for regions that do not correspond to the counties in the
voting-age population sources.

Because of this data incompatibility, you will remove records from the state of Alaska.

h Click the Select By Attributes button.

i In the Select By Attributes dialog box, confirm that Input Rows is set to CountyPresElect and Selection Type is set to New Selection.

j Build an expression using the following steps:


1. Next to Where, choose State.
2. For the second field, choose Is Equal To, if necessary.
3. For the third field, choose ALASKA.

k Click OK to apply this selection.

You have selected all records for the state of Alaska.

l Above the table, click the Delete Selection button .

12/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
m Click Yes to confirm that you want to delete the data, if prompted.

You have removed records for Alaska. You must save to make your changes permanent.

n From the Edit tab, in the Manage Edits group, click Save .

o Click Yes to confirm, if necessary.

You will check your work by regenerating the Data Engineering statistics panel.

p Close the table view and return to the Data Engineering view for CountyPresElect.

q Click Calculate .

? How many nulls are in the cvap_est fields?

- Answer
There is 1 null value for cvap_est_2008. All other cvap_est fields have 0 null values.

The record with null values for 2008 represents a small percentage of the total, and it is unlikely to change the overall analysis. You will continue to
the next data engineering step.

r Close the Data Engineering view and save the project.

You removed incompatible records from your data, and you validated your work.

Collapse

- Step 9: Calculate new fields using a custom script tool

When working repeatedly with similar data, it can be useful to have tools customized to automate your process. In ArcGIS Pro, you can use
Python to create custom Python script tools. These tools execute a script or an executable file within ArcGIS Pro's Geoprocessing pane. Python
script tools are useful for automation, specific error messaging, custom parameter behavior, and more.

Earlier, you used the Calculate Field geoprocessing tool, which is a commonly used data management tool in ArcGIS. You will now use a script
tool to calculate additional statistics for the voting data.

a From the Catalog pane, expand Toolboxes, then expand Multi-Year Calculate [Link].

b Right-click the Multi-Year Calculate Field script and choose Open.

The script tool opens in the Geoprocessing pane. This custom script tool is built around the Calculate Field tool. The tool is designed to complete
simple algebraic expressions across multiple fields that share a common field name suffix. To use this tool, input variable field names must follow
this pattern: <variable>_<year>.

For the first statistic, you will count all major party voters. You will name these fields with the variable name votes_major, for example:
votes_major_2008.

c In the Geoprocessing pane, set the following parameters:

· Input Table: CountyPresElect


· New Field Prefix: votes_major
· New Field Type: LONG
· Variable A Prefix: votes_dem
· Variable B Prefix: votes_gop
· Expression Type: Add (A+B)
d Confirm that your tool matches the following graphic, and then click Run.

e At the bottom of the Geoprocessing pane, click View Details.

A details window opens for the custom tool that you just ran.

13/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
f In the details window, click Messages, then answer the following question.

? Which field name suffixes did the custom script tool detect?

- Answer
The tool detected suffixes for years 2008, 2012, 2016, and 2020.

g Close the details window.

h From the Contents pane, open the CountyPresElect table.

i Scroll to the right, if necessary, and explore the results.

You created four new fields in the table.

j In the Geoprocessing pane, repeat this process for the following statistics, paying careful attention to the New Field Type parameter.

New field prefix New field type Variable A prefix Variable B prefix Expression type

rawdiff_dem_vs_gop LONG votes_dem votes_gop Subtract (A-B)

votes_other LONG totalvotes votes_major Subtract (A-B)

pctdiff_dem_vs_gop FLOAT rawdiff_dem_vs_gop totalvotes Divide (A/B)

voter_turnout FLOAT totalvotes cvap_est Divide (A/B)

voter_turnout_dem FLOAT votes_dem cvap_est Divide (A/B)

voter_turnout_gop FLOAT votes_gop cvap_est Divide (A/B)

You used a custom script tool to batch calculate fields in a table. To learn more about script tools, go to ArcGIS Pro Help: Create A Script Tool
([Link]

Lastly, you will use the Calculate Field tool's Code Block window to assign the winning party for each county.

k In the Geoprocessing pane, click the Back button.

l Search for and open Calculate Field (Data Management Tools).

m Set the following parameters:

· Input Table: CountyPresElect


· Field Name: winning_party_2008
· Field Type: Text
· Expression: return_winning_party(!votes_dem_2008!, !votes_gop_2008!, !votes_other_2008!)

n Add the following code block:

def return_winning_party(total_votes_dem, total_votes_gop, total_votes_other):

if total_votes_dem > total_votes_gop and total_votes_dem > total_votes_other:


return "Democratic Party"

elif total_votes_gop > total_votes_dem and total_votes_gop > total_votes_other:

return "Republican Party"

elif total_votes_other > total_votes_dem and total_votes_other > total_votes_gop:

return "Other Party"

o Click Run.

p Use the following table to repeat this process for the years 2012, 2016, and 2020. Leave all remaining parameters in the Calculate Field tool the
same.

Field Name (Existing Or New) Expression

winning_party_2012 return_winning_party(!votes_dem_2012!, !votes_gop_2012!, !votes_other_2012!)

winning_party_2016 return_winning_party(!votes_dem_2016!, !votes_gop_2016!, !votes_other_2016!)

14/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

winning_party_2020 return_winning_party(!votes_dem_2020!, !votes_gop_2020!, !votes_other_2020!)

You have calculated new attributes for raw difference, percent difference, and voter turnout. Your table is now ready for final validation.

q Save the project.

Collapse

- Step 10: Validate data

You have enriched an election results table with citizen voting-age population (CVAP) data. Next, you must perform a final validation on the newly
calculated values and prepare the dataset for mapping.

First, you will inspect the values for voter turnout. Because these values represent a fraction (total votes divided by voting age population), you will
confirm that the values range between 0 and 1.

a In the table, right-click the voter_turnout_2008 heading and choose Explore Statistics.

The Data Engineering dialog box opens.

b In the Data Engineering statistics panel, for voter_turnout_2008, locate the value for Max.

The maximum value in the column is above 1, indicating a voter turnout above 100%. Further investigation shows two reasons for this
discrepancy:
1. All counties with voter turnout above 1 have small populations, which makes it harder to estimate their population of citizens of voting
age with a high degree of accuracy. The small sizes of these counties suggest that the most likely source of the issue is an
underestimate of the population of citizens of voting age.
2. There is also a temporal mismatch between the two datasets used to calculate voter turnout. For the election results table, the number
of votes was calculated at a specific point in time. However, for the CVAP sources, the estimates for voting age were compiled from
American Community Survey results averaged over a five-year period.

To address the impossibly high voter turnout values for these counties, you will adjust these impossible values, capping the voter turnout at 1
(100%).

c From the Geoprocessing pane, return to the Calculate Field geoprocessing tool.

d In the Calculate Field dialog box, set or confirm the following parameters:

· Input Table: CountyPresElect


· Field Name (Existing Or New): voter_turnout_2008
e Under the Code Block window, click the Clear button , if necessary, to remove any previous code.

f Under voter_turnout_2008 =, type the following expression: 1 if !voter_turnout_2008! > 1 else !voter_turnout_2008!

g After you are finished, click Run.

h Use the following table to modify the tool parameters and re-run this calculation for election years 2012, 2016, and 2020.

Field Name (Existing Or New) Expression

voter_turnout_2012 1 if !voter_turnout_2012! > 1 else !voter_turnout_2012!

voter_turnout_2016 1 if !voter_turnout_2016! > 1 else !voter_turnout_2016!

voter_turnout_2020 1 if !voter_turnout_2020! > 1 else !voter_turnout_2020!

You will use the statistics panel to validate the calculation.

i Return to the CountyPresElect Data Engineering view.

j Above the fields panel, click the Add All Fields And Calculate Statistics button to regenerate statistics.

? What is the maximum value in each voter_turnout field?

- Answer
1

15/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You have addressed inconsistent values in your table.

k Close any open views.

Collapse

- Step 11: Geoenable the data

For spatial analysis, your data must include location information to determine where each county is located on a map. You will geoenable the
voting data by joining it with existing county geometries.

Several resources are available for finding geoenabled data. ArcGIS Living Atlas of the World is an authoritative source provided by Esri. You will
use a locally stored copy of an ArcGIS Living Atlas dataset that represents county geometries. You will join this feature class with the
CountyPresElect data table.

a In the Catalog pane, expand Databases and DataEngineering_and_Visualization.gdb, if necessary.

b Right-click US_Counties and choose Add To Current Map .

Step 11b***: Geoenable the data.

Your features may appear in a different color than shown here.

You have added a feature class to your project. You can now prepare the join between the county geometries and the voting data that you
prepared.

c Open the table view for US_Counties, then answer the following question.
- Hint

In the Contents pane, right-click US_Counties and choose Attribute Table.

? Which field should be used to join the voting table onto these features?

- Answer
The fips field should be used. This field follows the same 5-character syntax as the
FIPS_txt field in the CountyPresElect table.

d In the Geoprocessing pane, click the Back button, if necessary.

e Search for and open Join Field (Data Management Tools).

f Set the following parameters, leaving any remaining defaults:

· Input Table: US_Counties


· Input Field: fips
· Join Table: CountyPresElect
· Join Field: FIPS_txt
g Click Validate Join, then answer the following question.

? How many records will be matched in this join?

- Answer
3,112

h Close the Validate Join window.

i In the Geoprocessing pane, click Run.

You have geoenabled your voting data. The Join tool kept all county geometries and assigned voting data to each county in the feature class.

Collapse

16/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

- Step 12: Modify environment settings

Before continuing with your data engineering task, you will set the geoprocessing environments. Environments are additional settings that affect
geoprocessing tools and provide a powerful way to ensure that geoprocessing is performed in a controlled environment.

a On the ribbon, click the Analysis tab.

b In the Geoprocessing group, click Environments.

Step 12b***: Modify environment settings.

The Environments dialog box opens. Here, you can set parameters that apply to geoprocessing tools, such as the processing extent that limits
processing to a specific geographic area, a coordinate system for all output geodatasets, or the cell size of output raster datasets.

c Under Processing Extent, click the Extent Of A Layer button and choose US_Counties, as shown in the following graphic.

The coordinates update to the US_Counties layer coordinates.

Next, you will set the data source for the Enrich tool to use demographic variables from the United States because the United States is your study
area.

d Scroll to the bottom of the Environments dialog box.

e Under Business Analyst, next to Data Source, click the Browse button .

The Business Analyst Data Source dialog box opens. In this dialog box, you can set the data source for geoenrichment to a specific country. You
will set the data source to the United States.

f In the Business Analyst Data Source dialog box, on the left side, under Portal, click All Countries And Regions.

g Scroll through the countries listed to see which countries and regions have demographic data available through Esri.

Esri's GeoEnrichment service enables you to query authoritative global data for more than 150 countries and regions. This extensive global data
portfolio allows you to integrate global demographics, business, behavioral, environmental, and places datasets into your own data.

h On the left side of the dialog box, click North America.

i From the options that display under United States, click Esri 2024.

Step 12i***: Modify environment settings.

j Click OK.

Because your study area is the United States, you set the region to select demographic variables from the United States to geoenrich your data.

k In the Environments dialog box, click OK.

Note: For more information on demographic data from Esri, go to Esri Location Data Resources
([Link]

Collapse

- Step 13: Explore geoenrichment options

Geoenrichment will use the location of your data to add demographic variables as attributes to your feature class. The Data Engineering view in
ArcGIS Pro allows you to explore potential variables that you would like to add to counties feature class.

a Open the Data Engineering view for US_Counties.


- Hint

Contents pane > right-click US_Counties > Data Engineering

b In the ribbon, from the Data Engineering contextual tab, in the Tools group, click Integrate and choose Enrich.

17/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

Step 13b***: Explore geoenrichment options.

The Enrich dialog box opens.

The four tool galleries on the Data Engineering contextual tab (Clean, Construct, Integrate, and Format) each contain a subset of geoprocessing
tools that can be used for data engineering tasks. You selected the Enrich tool, which enables you to add demographic variables as attributes to
your feature class. The Enrich tool lists the parameters that are required to run the tool. Parameters define the values that are used to run the tool
and its underlying algorithms. To run the Enrich tool, you will need to define the input feature class, a name for the output feature class, and the
variables that will be added to the output feature class.

c Leave the Enrich dialog box open.

You will review the workflow for geoenriching your data using the Enrich tool in the Data Engineering view.

d In the Enrich dialog box, confirm that the Input Features parameter is set to US_Counties.

e For Output Feature Class, type CountyElectionsPresEnrich.

f Next to Variables, click the Add button .

Step 13f***: Explore geoenrichment options.

In the Data Browser dialog box, you can explore demographic variables available for data enrichment. Esri provides demographic variables that
are regularly updated with the latest available data. For the United States, Esri also provides attributes from previous censuses (2000 and 2010)
that are recalibrated with the most current census (2020) geography. You can quickly add various demographic variables to your data using the
Enrich tool. You can also add variables that you created or that were shared with you.

g In the Data Browser dialog box, in the Search Variables field, type Median Age and press Enter.

Step 13g***: Explore geoenrichment options.

On the left, you have the option to filter the available variables so that you can easily focus your search. To the right of the 2024 Median Age
variable, you see a hashtag and the word Index. For each variable, these icons, along with a percent sign icon, are used to specify whether you
want a total count (hashtag), index, or percentage (percent sign) for the variable.

h Click the Show/Hide Details Panel button .

Step 13h***: Explore geoenrichment options.

The Selected Variables pane helps you keep track of the variables that you select. When a variable is selected, it is automatically listed in the
pane.

i Select the Median Age variable closest in time to 2020.

Step 13i***: Explore geoenrichment options.

j Search for and select the Per Capita Income variable closest in time to 2020.

k In the Data Browser dialog box, click OK.

Step 13k***: Explore geoenrichment options.

The variables that you selected are added in the Variables section.

l At the top of the Enrich dialog box, click the Estimate Credits link.

The Enrich tool consumes ArcGIS credits when it is run. By clicking Estimate Credits, an estimate of the number of ArcGIS credits displays in the
banner, as well as the number of available credits that you have. For this course, your ArcGIS Online organizational account is allocated 300
credits. You will not enrich the data because an enriched data layer has been provided for you in the next exercise.

18/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
m At the top of the Enrich dialog box, click the Close button .

You explored the workflow for geoenriching your data using the Enrich tool.

n Close the Data Engineering view.

After completing various data engineering techniques, you cleaned and prepared the election data. Geoenabling and geoenriching the data
provides demographic variables that you can use to model or predict voter turnout.

In the next exercise, you will use various visualization techniques to explore relationships between voter turnout and these variables. You will use
this information to identify potential variables to use in your prediction model later in the MOOC.

o If you would like to perform additional data engineering tasks, proceed to the optional stretch goals; otherwise, close the Data Engineering map,
save the project, and then exit ArcGIS Pro.

Collapse

- Step 14: Stretch goal (Optional)

Throughout this course, you will see exercise stretch goals. These goals include ways that you can continue or enhance the work that you
completed during the exercise.

Stretch goals are community-supported (meaning that your fellow MOOC participants can assist you with the steps to complete the stretch goal
using the Lesson Forum), and they are a great opportunity to work together to learn.

If you would like to continue engineering your data, use ArcGIS Pro geoprocessing tools to perform the following tasks:

a Identify and remove records with null candidatevotes values in the election data.

b Apply a symbology layer ([Link]) to the 2020 election turnout feature class (out_2020_fc_name).

The [Link] file is located in the DataEngineering_And_Visualization folder. The ArcGIS Pro Help documentation Apply Symbology From Layer
(Data Management) ([Link] describes the process of applying a symbology layer.

c Determine how to incorporate Alaska into this analysis.

Note: Alaska does not have counties. Research its administrative and political subdivisions to
determine how the data would need to be engineered to address this issue.

d Use the Lesson Forum to post your questions and observations. Be sure to include the #stretch hashtag in the posting title.

e After you are finished, save the project.

f If you intend to complete the next stretch goal, leave ArcGIS Pro open; otherwise, close the Data Engineering map, save the project, and then exit
ArcGIS Pro.

Collapse

- Step 15: Stretch goal (Optional)

Earlier in the exercise, you learned how to prepare, combine, and validate data using ArcGIS Pro geoprocessing tools. Another option for data
engineering is Python. This exercise stretch goal uses ArcGIS Notebooks in ArcGIS Pro to complete a more detailed and automated version of the
exercise steps.

If you would like to recreate and expand on your data engineering workflow using Python code, complete the following tasks:

a In the Catalog pane, expand Notebooks, then open Data Engineering [Link].

First, you will import the necessary Python modules to execute the cells in the notebook. A Python module is a file that contains Python definitions
and statements. A module can define functions, classes, and variables, and it can include runnable code. You will use the import statement to
import the modules.

b In the notebook, locate the markdown cell titled Import Needed Modules.

The ArcGIS Notebooks interface is built on top of Jupyter Notebook, which structures Python content using cells. Code cells contain executable
Python code, and Markdown cells contain explanatory text and media.

19/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
c Under the Import Needed Modules markdown cell, insert a new code cell.

d In the new code cell, write code that uses the import statement to import the following modules: arcgis, pandas, os, and arcpy. Ensure that the
pandas module is imported as the alias pd.

e From the ArcGIS Notebooks toolbar, click Run or press Shift + Enter on your keyboard to run the code cell.

Next, you will specify the source dataset for the elections data. For this stretch goal, you will start with a CSV file containing United States
presidential elections data from the years 2000 to 2020. You must read this file into your notebook as a DataFrame object. You must also specify
that the county_fips attribute field in this data frame will be an object.

f Under the Read Data Into Python markdown cell, in the first code cell, write code that reads the file countypres_2000-[Link] into the notebook.
Take the following into consideration:
· The code should use the pandas read_csv() function to read the file as a DataFrame object.
· The variable name for the data frame should be: elections_complete_df.
· In the read_csv() function, set the dtype parameter to the following: {"county_fips":object}.
g After the read_csv() function's close parenthesis, press Enter to get a new line in the cell.

h In the new line, write code to preview the contents of the elections_complete_df data frame.

i Run the code cell.

j In ArcGIS Pro, from the Notebook tab, in the Notebook group, click Save.

k Expand each section in the Data Engineering notebook, if necessary.

Although you did not write all the Python code, it is recommended that you carefully look at the Python syntax and logic in each cell. Reviewing
each cell can help familiarize you with the ArcGIS Notebooks interface and the relevant Python syntax. The notebook can also act as sample code
that you can reference for data engineering tasks.

l Run each cell from start to finish. Review the information in the preceding markdown cells as you run each code cell.

Note: You must run each code cell in the notebook before proceeding to the next.

m Use the Lesson Forum to post your questions, observations, and syntax examples. Be sure to include the #stretch hashtag in the posting title.

n When you are finished, close the Data Engineering map and notebook tabs, save the project, and then exit ArcGIS Pro.

Collapse

20/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?

Copyright © Esri
All rights reserved.

Published in the United States of America.

The information contained in this document is the exclusive property of Esri. This work is protected under United States copyright law and other
international copyright treaties and conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying and recording, or by any information storage or retrieval system, except as expressly permitted in writing by Esri. All
requests should be sent to Attention: Director, Contracts and Legal, Esri, 380 New York Street, Redlands, CA 92373-8100, USA.

Export Notice: Use of these Materials is subject to U.S. export control laws and regulations including the U.S. Department of Commerce Export
Administration Regulations (EAR). Diversion of these Materials contrary to U.S. law is prohibited.

The information contained in this document is subject to change without notice.

Commercial Training Course Agreement Terms: The Training Course and any software, documentation, course materials or data delivered with the
Training Course is subject to the terms of the Master Agreement for Products and Services found at the following website: [Link]/TrainingTerms.
The license rights in the Master Agreement strictly govern Licensee's use, reproduction, or disclosure of the software, documentation, course materials
and data. Training Course students may use the course materials for their personal use and may not copy or redistribute for any purpose.
Contractor/Manufacturer is Esri, 380 New York Street, Redlands, CA 92373-8100, USA.

Esri Marks: Esri marks and product names mentioned herein are subject to the terms of use found at the following website: [Link]/EsriMarks.

Other companies and products or services mentioned herein may be trademarks, service marks, or registered marks of their respective mark owners.

21/21

You might also like