Data Engineering Tasks Lesson1
Data Engineering Tasks Lesson1
Software requirements
· ArcGIS Online
· ArcGIS Pro 3.5.x (latest release)
Introduction
Data engineering is a fundamental part of every analysis. The term refers to the planning, preparation, and processing of data to make it more useful for
analysis. It can include simple tasks like identifying and correcting imperfections in your data and calculating new fields. It can also include more complex
tasks like reducing the dimensions of a multivariate dataset.
Data engineering also involves the process of geoenriching your data. Geoenrichment can include various tasks:
Scenario
Because voting is voluntary in the United States, the level of voter participation (referred to as "voter turnout") has a significant impact on the election
results and resulting public policy.
Modeling voter turnout, and understanding where low turnout is prevalent, can inform outreach efforts to increase voter participation. With the ultimate
goal of predicting voter turnout, in this exercise, you will focus on performing various data engineering tasks to prepare election result data for predictive
analysis.
The data for this section is obtained from the Harvard Dataverse ([Link] and the United States Census Bureau
([Link] The voter turnout dataset from Harvard Dataverse has vote totals from each U.S. county for U.S. presidential elections
from 2000 to 2020.
Note: The exercises in this course include View Result links. Click these links to confirm that your
results match what is expected.
Note: This exercise requires ArcGIS Pro 3.5 and your MOOC student username and password.
Please follow each step in Exercise 1: Prepare your machine for course exercises before
continuing.
1/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
Throughout this course, you will save all your data to this folder. When you create the folder, do not include any spaces or special characters in the
folder name.
d Extract the exercise data files to the EsriTraining folder on your local computer.
e After you extract the folder, confirm that the data files are stored in the DataEngineering_and_Visualization folder.
You downloaded and extracted the exercise data files that you will need to complete the first section of the MOOC.
Collapse
In this step, you will use your course ArcGIS account username and password to sign in to ArcGIS Pro. You will need to use your course ArcGIS
account to license ArcGIS Pro and to access other software applications that are used throughout the MOOC exercises.
The DataEngineering_and_Visualization folder shows all the data files that you need to complete the exercises in this section. You will open the
ArcGIS Pro project from File Explorer.
c Sign in to ArcGIS Pro with the provided course ArcGIS account username that ends in _SDS, if necessary.
Note: The course ArcGIS account username and password are listed on the MOOC home page
under Lessons. Steps for accessing this information are available in Exercise 1: Prepare
your machine for course exercises.
After you have signed in, the ArcGIS Pro project opens to show the Data Engineering map. Next, you will explore the
DataEngineering_and_Visualization ArcGIS Pro project that you downloaded.
The top of your project displays the ArcGIS Pro ribbon. ArcGIS Pro uses this horizontal ribbon to display and organize functionality into a series of
tabs.
d On the ribbon, click the View tab, as shown in the following graphic.
You have reset your application panes to the Mapping default. Your ArcGIS Pro project is open to a gray reference map, which is called a
basemap. Because you are preparing U.S. election data, the basemap is currently focused on the contiguous United States.
On the Map tab is the Navigate group, which provides the tools that you need to navigate the map. The default tool is the Explore tool , which
you can use to pan and zoom in and out of maps. To explore different areas of the world on this basemap, pan the map by clicking your mouse
2/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
and holding down the button while you move the map. When you pan a map with the mouse, the pointer becomes a hand. Zoom in or out of the
map by using the mouse wheel or by using the Fixed Zoom In button or Fixed Zoom Out button in the Navigate group.
You have reset the panes to show the default mapping panes. To the left side of the map is the Contents pane, which lists the layers that have
been added to the map. To the right side of the map is the Catalog pane, which lists the items that are associated with this ArcGIS Pro project:
Maps, Toolboxes, Notebooks, Databases, Styles, Folders, and Locations.
Note: To learn more about the ArcGIS Pro interface, go to ArcGIS Pro Help: ArcGIS Pro user
interface ([Link] To learn more about ArcGIS Pro projects,
go to ArcGIS Pro Help: Projects in ArcGIS Pro ([Link]
You explored an ArcGIS Pro project. Next, you will explore the data that you will prepare for analysis.
Collapse
In this exercise, you will perform data engineering tasks to prepare United States presidential election data for predictive analysis. You have
multiple data sources on election results and voting-age population.
You will begin by preparing the presidential election results table. You must address missing values, reformat data types, and restructure the
format of the data table.
a In the Catalog pane, expand Databases, and then expand the DataEngineering_and_Visualization geodatabase.
You have added a table to your ArcGIS Pro project. The table contains data by county for several United States presidential elections. First, you
will explore some fields in the elections table, using the Data Engineering view to inspect for inconsistent or missing values.
c In the Contents pane, locate the CountyPresElect table that you just added.
The Data Engineering view opens. The Data Engineering view is in a dockable window that can be moved and docked in the same way that you
dock maps, layouts, and attribute tables. In addition to the view, a Data Engineering contextual tab is available in the ribbon. The ribbon tab
provides access to commands that are used for data engineering.
The Data Engineering view contains two panels: a fields panel on the left and a statistics panel on the right. The fields panel allows you to explore
fields, change symbology, and produce charts for fields in the table. The statistics panel allows you to explore the values and distribution of your
data by viewing statistics and data quality metrics. The panel's statistics table is empty by default. You can add fields from the fields panel.
e In the Data Engineering view, in the statistics panel, click Add All Fields And Calculate.
f In the statistics panel, locate the column titled Nulls, then answer the following question.
- Answer
The FIPS field contains 1 null value, which makes up approximately 0.03 percent of the
records. Additionally, each field that ends in _2000 contains 1 null value.
A record in the elections table is missing information in the FIPS field, which uniquely identifies counties. You will later join this table with voting-
age population data based on the FIPS field, so these values cannot be null. You will identify which row has a null value and establish a strategy
for dealing with that null value.
g Right-click the cell displaying the number of null values for FIPS, as shown in the following graphic.
3/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You have applied a selection to any record(s) with a null FIPS value. You can narrow down your view to just this selection to look for patterns.
j Below the table, on the bottom left, click the Show Selected Records button .
- Answer
The record represents votes cast in the District of Columbia for multiple election years.
Removing this record would affect hundreds of thousands of votes and make it impossible to include Washington, D.C., in the predictive analysis.
Additional research indicates that the appropriate FIPS code for Washington, D.C., is 11001. You will add the correct FIPS code for the
Washington, D.C., record.
Because there is only one record, you will make this change directly in the table.
k In the FIPS column, double-click <Null> to activate editing for this record.
You activated an editing session for the selected record and changed a value. You must save your edits to confirm this change.
o Click the Data Engineering view tab, as shown in the following graphic.
p On the ribbon, from the Data Engineering contextual tab, in the Selection group, click Clear Selection .
q In the Data Engineering view, above the statistics panel, click Calculate to regenerate the statistics.
- Answer
The FIPS field now contains 0 null values. The fields ending in _2000 each contain 1 null
value.
r In the Nulls column, right-click the totalvotes_2000 cell and choose Select Null.
? How many fields contain null values for the selected record?
- Answer
Three fields in one record contain null values. All three fields represent data for the year
2000.
4/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
All remaining null values in the dataset are attributed to one record. The values appear only in fields representing election year 2000. Later in this
exercise, you will remove data for any election year prior to 2008. Because all other values in this record are valid, there is no need to make
changes to this record.
v Above the table, click the Clear button to clear the selection.
w Close the table view and Data Engineering view for CountyPresElect.
As you work, it is a good idea to periodically save your project. There are several ways that you can save your work in ArcGIS Pro. One way to
quickly save the project is to use the Quick Access Toolbar.
x In the upper-left corner of the ArcGIS Pro app, locate the Quick Access Toolbar and click the Save button, as shown in the following graphic.
You have addressed missing values in the elections data. Next, you will restructure the table to combine it with another source.
Collapse
When preparing a dataset for analysis, it is important to consider the final format of the data and its fields. Later in this exercise, you will join the
elections data table with county voting-age population (CVAP) data. Then, you will join this data again with county geometries.
To match these sources, you will prepare the elections data table as follows:
· Your earliest voting-age population data source covers 2006 to 2010, which will represent voting-age populations for the 2008 election.
Therefore, the predictive analysis must be limited to election years 2008 and later. You can remove elections data for years prior to
2008.
· Counties in all three datasets are uniquely identified by a county FIPS field. In the data source containing county geometries, the FIPS
field data type is text, but in the elections data table, it is numeric. In the elections data table, you will create a new, compatible version of
this field.
You will modify the elections data table to account for these differences. First, you will create a text version of the numeric FIPS field.
b From the CountyPresElect table view, click the Calculate Field button .
You will convert your integer FIPS values to text and store the text values in a new field.
c In the Calculate Field dialog box, set or confirm the following parameters:
e Confirm that your tool matches the following graphic, and then click OK.
f In the table, scroll to the right, if necessary, to view the newly added field.
Next, you will clean the dataset by removing unnecessary fields. The earliest voting-age population data that is available starts at the 2008
election. You will remove election data prior to 2008.
The fields view displays information about each field in the table.
5/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
i From the fields view, in the fields table, locate the row for totalvotes_2000.
j To the left of the row, click the empty space, as shown in the following graphic, to select this field.
k On your keyboard, press and hold the Ctrl key, then select the following fields:
· totalvotes_2004
· votes_dem_2000
· votes_dem_2004
· votes_gop_2000
· votes_gop_2004
You have selected six fields.
You have deleted fields for election years 2000 and 2004. You must save your edits to retain these changes.
m From the Fields tab, in the Manage Edits group, click Save .
You have prepared your election data for future join processes.
Collapse
Next, you will prepare county-level citizen voting-age population (CVAP) data tables to enrich the election dataset. The voting-age data is
separated into four tables, each with estimates of the number of citizens voting in each county for a particular time period. You will combine the
four tables into one table.
b Select all four CountyCVAP tables in the database, as shown in the following graphic.
- Hint
Click the first CountyCVAP table, then hold the Shift key and click the last CountyCVAP table.
c Right-click any of the four selected tables and choose Add To Current Map .
- Answer
All four tables have the same fields. However, the fields in two of the tables are in
lowercase, and the fields in the other two tables are in uppercase.
6/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You will use the Merge geoprocessing tool to combine the four population data tables into one output table. You will use the Field Mapping feature
to ensure that fields are matched appropriately. In particular, you are interested in the field CVAP_EST, which contains the estimated number of
U.S. citizens 18 years of age or older for that county and year.
g From the Analysis tab, in the Geoprocessing group, click Tools to open the Geoprocessing pane.
Within ArcGIS Pro, geoprocessing refers to a suite of tools for performing analysis, data management, and automation. From the Geoprocessing
pane, you can find a tool by keyword using the search tool, or browse for it by toolbox.
j In the Geoprocessing pane, for Input Datasets, click the Add Many button .
k Check the box next to each of the four CountyCVAP tables, and then click Add.
The tool will automatically create an output dataset name that reflects the input. You can keep this name or modify it to be more meaningful for
your analysis.
Note: This parameter represents a file path that leads to the ArcGIS Pro project's file geodatabase
(DataEngineering_and_Visualization.gdb). In ArcGIS Pro, the Current Workspace
environment defaults to the project's default geodatabase.
m For Field Matching Mode, choose Use The Field Map To Reconcile Differences.
The field map is generated. A field map is a parameter that modifies how fields from input datasets are processed, written, or mapped to an output
dataset. Field mapping is a useful data engineering tool, allowing you to reconcile fields from difference sources, adjust data types, add or remove
fields, and perform other field-level edits. You will use the field map's Field Properties dialog box to validate and clean up the data before the
merge.
The Field Properties dialog box opens. On the left, the Fields panel shows all fields that will appear in the output table. On the right, the Properties
panel shows information about the selected field. Below the Properties panel are two more panels: Table and Actions And Source Fields. These
panels provide tools for reconciling schema differences between input datasets.
You will use the Field Properties dialog box to remove unnecessary fields from the merge, verify field mapping, and modify output field names.
o On the left, in the Fields panel, point to LNTITLE, then click the Remove button to remove this field.
· GEONAME
· GEOID
· CVAP_EST
· YEAR
q On the left, in the Fields list, select GEONAME, if necessary.
The Actions And Source Fields panel displays the field that is mapped to the output. Even though the 2014-2018 dataset uses all lowercase field
names, ArcGIS Pro accurately mapped these fields to uppercase field names in the other tables.
7/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You would like all voting-age population fields to be lowercase, so you will make that change before the merge.
s Under Properties, for Field Name, delete GEONAME and type geoname.
u Repeat this process for the remaining fields in the Fields panel so that all output field names and aliases are written in lowercase.
You have narrowed down the fields that will appear in the merged table, modified these fields, and confirmed that all four input tables are mapped
to these fields.
w Click OK.
You have combined tables and created the necessary fields for a table join.
- Hint
From the Contents pane, select the four tables, then right-click the selection and choose Remove.
Collapse
In the previous step, you concatenated several voting-age population tables together to form one merged table. Before joining the election data
and voting-age population data into one table, you must first ensure that there is a matching field to use for the join.
b From the table view, right-click the tab for CountyPresElect and choose New Vertical Tab Group, as shown in the following graphic.
c Explore the contents of the two tables, then answer the following questions.
? Which field(s) in the CountyCVAP table also appear in the CountyPresElect table?
- Answer
None of the fields in the CountyCVAP table appear in the CountyPresElect table.
? Are there any fields in the CountyCVAP table that contain similar information as
CountyPresElect?
- Answer
Values in the geoname field in CountyCVAP are similar to county_name in the
CountyPresElect table. The geoname field contains each county's name and state, but
the county_name field only contains the county name. The geoid field in CountyCVAP
also contains similar information as the FIPS field in CountyPresElect. For each geoid
string, the last five digits correspond with the same county's FIPS code.
You will use the geoid field in the CountyCVAP table to extract FIPS values into a new field.
8/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
d Activate the CountyCVAP table view, if necessary, then click the Calculate Field button .
e In the Calculate Field dialog box, set or confirm the following parameters:
g Confirm that your parameters match the following graphic, and then click OK.
Note: The provided expression uses slice notation to extract characters in a string based on their
position. The syntax for this notation is [start:stop]. This expression starts with the item at
index -5 (five characters away from the end of the string, when moving left). Because there is
no value after the colon, the extraction stops at the end of the string.
h In the CountyCVAP table, scroll to the right to see the new field, if necessary.
You have combined tables and created the necessary fields for a table join.
Next, you will restructure the voting-age population data table to match the election data table.
l In the statistics panel, click anywhere in the FIPS row to highlight it.
m Scroll to the right until the Count and Unique statistics columns are visible, then answer the following questions.
- Answer
12,882
- Answer
3,224
The format of the voting-age population data table will prevent a proper join onto the elections data table. Currently, each record in the table
corresponds to the voting-age population in each county for each election year. You need to reformat the table so that each county has one unique
record.
You will use the Pivot Table geoprocessing tool to perform this unstacking operation.
The Data Engineering view provides quick access to commonly used data engineering geoprocessing tools, including the Pivot Table tool.
n From the Data Engineering view, in the fields panel, click FIPS to select this field.
The Pivot Table tool dialog box opens. Some parameters have been pre-populated, including the Input Table and one of the Input Fields.
p In the Pivot Table dialog box, for Input Fields, in the blank field, choose geoname.
9/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
The Input Fields are the fields that will remain the same for each county in the pivot table.
The pivot field determines which values will become new fields. Each year in the voting-age population data will be used to uniquely distinguish
the new fields.
The value field determines which values will be reported for each of the new fields.
t Confirm that your settings match the following graphic, and then click OK.
Note: Any fields not provided in the Input, Pivot, and Value field parameters will be dropped from
the output pivot table.
You can use the Data Engineering statistics panel to check your work.
w In the statistics panel, click anywhere in the FIPS row to highlight it.
x Scroll to the right until the Count and Unique statistics columns are visible, then answer the following questions.
- Answer
3,225
- Answer
3,224
When you ran the Pivot Table tool, you set the Input Fields to both FIPS and county_name. In the pivot result, there are more records in the table
than there are unique FIPS values. These results suggest that multiple county names exist for a FIPS code in this dataset. Later in the exercise,
you will locate these duplicate records and resolve them.
Collapse
In the previous step, you pivoted values in the cvap_est field by their election year. The resulting field names have a specific syntax: year_<4-digit
year>. In this step, you will rename these fields to make them more informative. Then, you will create an enriched voting dataset by joining the
elections table with the voting-age population table.
b Search for and open Alter Fields (Multiple) (Data Management Tools).
The Alter Fields (Multiple) geoprocessing tool provides options for modifying field properties of multiple fields in a feature class or table.
10/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
d Under Field Properties, perform the following actions:
1. For Field Name, choose year2008.
2. For New Field Name, type cvap_est_2008.
3. Check the Clear Alias box.
4. Leave the remaining defaults, then click Add Another.
Note: The Clear Alias option will remove the field alias. During the join process, a new field alias
will be generated from the new field name.
e Repeat this process to rename the three remaining year fields as follows:
year2012 cvap_est_2012
year2016 cvap_est_2016
year2020 cvap_est_2020
Confirm that your tool matches the following graphic, and then click Run.
You updated the field names for CountyCVAP_pivot. Next, you will join this table to the election data table.
f From the Geoprocessing pane, click the Back button, then search for and open Join Field (Data Management Tools).
The Join Field geoprocessing tool joins data onto the input table based on common values in a field. You will join the data from CountyCVAP_pivot
onto CountyPresElect using their common FIPS code.
g In the Geoprocessing pane, fill out the parameters as follows, leaving any remaining defaults:
- Answer
3,115
? Will all records in the input table and join table be successfully matched?
- Answer
No. The input table has 3,153 records, which is more than the match number.
Later in the exercise, you will investigate any missing values between the two sources. For now, you will proceed with the join.
11/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You have enriched your elections data with voting-age population data.
Collapse
Previously, you saw that not all records were matched between the elections data and the voting-age population data. This suggests that further
data validation may be needed. You will inspect these records using the Data Engineering statistics panel.
b In the Data Engineering view, next to Search Fields, click the Add Fields And Calculate Statistics button .
You have added the joined fields to the statistics panel and regenerated statistics.
c In the statistics panel, scroll down if necessary to view the statistics for the cvap_est fields, then answer the following question.
- Answer
The field cvap_est_2008 contains 39 null values, and the other three cvap_est fields
contain 38 null values.
d In the Nulls column, right-click the cell for cvap_est_2008, then choose Select Null.
- Answer
Several records for Alaska have null values in every cvap_est field. One record for
Louisiana has a null value for cvap_est_2008.
? For the selected records for Alaska, what do the values in the county_name field have in
common?
- Answer
All county_name values are called "DISTRICT", followed by a number.
The voting records for Alaska appear unusual because they are referencing a district rather than a county. Additional research shows that Alaska
has a unique government model that does not use counties. The election data is reported for regions that do not correspond to the counties in the
voting-age population sources.
Because of this data incompatibility, you will remove records from the state of Alaska.
i In the Select By Attributes dialog box, confirm that Input Rows is set to CountyPresElect and Selection Type is set to New Selection.
12/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
m Click Yes to confirm that you want to delete the data, if prompted.
You have removed records for Alaska. You must save to make your changes permanent.
n From the Edit tab, in the Manage Edits group, click Save .
You will check your work by regenerating the Data Engineering statistics panel.
p Close the table view and return to the Data Engineering view for CountyPresElect.
q Click Calculate .
- Answer
There is 1 null value for cvap_est_2008. All other cvap_est fields have 0 null values.
The record with null values for 2008 represents a small percentage of the total, and it is unlikely to change the overall analysis. You will continue to
the next data engineering step.
You removed incompatible records from your data, and you validated your work.
Collapse
When working repeatedly with similar data, it can be useful to have tools customized to automate your process. In ArcGIS Pro, you can use
Python to create custom Python script tools. These tools execute a script or an executable file within ArcGIS Pro's Geoprocessing pane. Python
script tools are useful for automation, specific error messaging, custom parameter behavior, and more.
Earlier, you used the Calculate Field geoprocessing tool, which is a commonly used data management tool in ArcGIS. You will now use a script
tool to calculate additional statistics for the voting data.
a From the Catalog pane, expand Toolboxes, then expand Multi-Year Calculate [Link].
The script tool opens in the Geoprocessing pane. This custom script tool is built around the Calculate Field tool. The tool is designed to complete
simple algebraic expressions across multiple fields that share a common field name suffix. To use this tool, input variable field names must follow
this pattern: <variable>_<year>.
For the first statistic, you will count all major party voters. You will name these fields with the variable name votes_major, for example:
votes_major_2008.
A details window opens for the custom tool that you just ran.
13/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
f In the details window, click Messages, then answer the following question.
? Which field name suffixes did the custom script tool detect?
- Answer
The tool detected suffixes for years 2008, 2012, 2016, and 2020.
j In the Geoprocessing pane, repeat this process for the following statistics, paying careful attention to the New Field Type parameter.
New field prefix New field type Variable A prefix Variable B prefix Expression type
You used a custom script tool to batch calculate fields in a table. To learn more about script tools, go to ArcGIS Pro Help: Create A Script Tool
([Link]
Lastly, you will use the Calculate Field tool's Code Block window to assign the winning party for each county.
o Click Run.
p Use the following table to repeat this process for the years 2012, 2016, and 2020. Leave all remaining parameters in the Calculate Field tool the
same.
14/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You have calculated new attributes for raw difference, percent difference, and voter turnout. Your table is now ready for final validation.
Collapse
You have enriched an election results table with citizen voting-age population (CVAP) data. Next, you must perform a final validation on the newly
calculated values and prepare the dataset for mapping.
First, you will inspect the values for voter turnout. Because these values represent a fraction (total votes divided by voting age population), you will
confirm that the values range between 0 and 1.
a In the table, right-click the voter_turnout_2008 heading and choose Explore Statistics.
b In the Data Engineering statistics panel, for voter_turnout_2008, locate the value for Max.
The maximum value in the column is above 1, indicating a voter turnout above 100%. Further investigation shows two reasons for this
discrepancy:
1. All counties with voter turnout above 1 have small populations, which makes it harder to estimate their population of citizens of voting
age with a high degree of accuracy. The small sizes of these counties suggest that the most likely source of the issue is an
underestimate of the population of citizens of voting age.
2. There is also a temporal mismatch between the two datasets used to calculate voter turnout. For the election results table, the number
of votes was calculated at a specific point in time. However, for the CVAP sources, the estimates for voting age were compiled from
American Community Survey results averaged over a five-year period.
To address the impossibly high voter turnout values for these counties, you will adjust these impossible values, capping the voter turnout at 1
(100%).
c From the Geoprocessing pane, return to the Calculate Field geoprocessing tool.
d In the Calculate Field dialog box, set or confirm the following parameters:
f Under voter_turnout_2008 =, type the following expression: 1 if !voter_turnout_2008! > 1 else !voter_turnout_2008!
h Use the following table to modify the tool parameters and re-run this calculation for election years 2012, 2016, and 2020.
j Above the fields panel, click the Add All Fields And Calculate Statistics button to regenerate statistics.
- Answer
1
15/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
You have addressed inconsistent values in your table.
Collapse
For spatial analysis, your data must include location information to determine where each county is located on a map. You will geoenable the
voting data by joining it with existing county geometries.
Several resources are available for finding geoenabled data. ArcGIS Living Atlas of the World is an authoritative source provided by Esri. You will
use a locally stored copy of an ArcGIS Living Atlas dataset that represents county geometries. You will join this feature class with the
CountyPresElect data table.
You have added a feature class to your project. You can now prepare the join between the county geometries and the voting data that you
prepared.
c Open the table view for US_Counties, then answer the following question.
- Hint
? Which field should be used to join the voting table onto these features?
- Answer
The fips field should be used. This field follows the same 5-character syntax as the
FIPS_txt field in the CountyPresElect table.
- Answer
3,112
You have geoenabled your voting data. The Join tool kept all county geometries and assigned voting data to each county in the feature class.
Collapse
16/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
Before continuing with your data engineering task, you will set the geoprocessing environments. Environments are additional settings that affect
geoprocessing tools and provide a powerful way to ensure that geoprocessing is performed in a controlled environment.
The Environments dialog box opens. Here, you can set parameters that apply to geoprocessing tools, such as the processing extent that limits
processing to a specific geographic area, a coordinate system for all output geodatasets, or the cell size of output raster datasets.
c Under Processing Extent, click the Extent Of A Layer button and choose US_Counties, as shown in the following graphic.
Next, you will set the data source for the Enrich tool to use demographic variables from the United States because the United States is your study
area.
e Under Business Analyst, next to Data Source, click the Browse button .
The Business Analyst Data Source dialog box opens. In this dialog box, you can set the data source for geoenrichment to a specific country. You
will set the data source to the United States.
f In the Business Analyst Data Source dialog box, on the left side, under Portal, click All Countries And Regions.
g Scroll through the countries listed to see which countries and regions have demographic data available through Esri.
Esri's GeoEnrichment service enables you to query authoritative global data for more than 150 countries and regions. This extensive global data
portfolio allows you to integrate global demographics, business, behavioral, environmental, and places datasets into your own data.
i From the options that display under United States, click Esri 2024.
j Click OK.
Because your study area is the United States, you set the region to select demographic variables from the United States to geoenrich your data.
Note: For more information on demographic data from Esri, go to Esri Location Data Resources
([Link]
Collapse
Geoenrichment will use the location of your data to add demographic variables as attributes to your feature class. The Data Engineering view in
ArcGIS Pro allows you to explore potential variables that you would like to add to counties feature class.
b In the ribbon, from the Data Engineering contextual tab, in the Tools group, click Integrate and choose Enrich.
17/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
The four tool galleries on the Data Engineering contextual tab (Clean, Construct, Integrate, and Format) each contain a subset of geoprocessing
tools that can be used for data engineering tasks. You selected the Enrich tool, which enables you to add demographic variables as attributes to
your feature class. The Enrich tool lists the parameters that are required to run the tool. Parameters define the values that are used to run the tool
and its underlying algorithms. To run the Enrich tool, you will need to define the input feature class, a name for the output feature class, and the
variables that will be added to the output feature class.
You will review the workflow for geoenriching your data using the Enrich tool in the Data Engineering view.
d In the Enrich dialog box, confirm that the Input Features parameter is set to US_Counties.
In the Data Browser dialog box, you can explore demographic variables available for data enrichment. Esri provides demographic variables that
are regularly updated with the latest available data. For the United States, Esri also provides attributes from previous censuses (2000 and 2010)
that are recalibrated with the most current census (2020) geography. You can quickly add various demographic variables to your data using the
Enrich tool. You can also add variables that you created or that were shared with you.
g In the Data Browser dialog box, in the Search Variables field, type Median Age and press Enter.
On the left, you have the option to filter the available variables so that you can easily focus your search. To the right of the 2024 Median Age
variable, you see a hashtag and the word Index. For each variable, these icons, along with a percent sign icon, are used to specify whether you
want a total count (hashtag), index, or percentage (percent sign) for the variable.
The Selected Variables pane helps you keep track of the variables that you select. When a variable is selected, it is automatically listed in the
pane.
j Search for and select the Per Capita Income variable closest in time to 2020.
The variables that you selected are added in the Variables section.
l At the top of the Enrich dialog box, click the Estimate Credits link.
The Enrich tool consumes ArcGIS credits when it is run. By clicking Estimate Credits, an estimate of the number of ArcGIS credits displays in the
banner, as well as the number of available credits that you have. For this course, your ArcGIS Online organizational account is allocated 300
credits. You will not enrich the data because an enriched data layer has been provided for you in the next exercise.
18/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
m At the top of the Enrich dialog box, click the Close button .
You explored the workflow for geoenriching your data using the Enrich tool.
After completing various data engineering techniques, you cleaned and prepared the election data. Geoenabling and geoenriching the data
provides demographic variables that you can use to model or predict voter turnout.
In the next exercise, you will use various visualization techniques to explore relationships between voter turnout and these variables. You will use
this information to identify potential variables to use in your prediction model later in the MOOC.
o If you would like to perform additional data engineering tasks, proceed to the optional stretch goals; otherwise, close the Data Engineering map,
save the project, and then exit ArcGIS Pro.
Collapse
Throughout this course, you will see exercise stretch goals. These goals include ways that you can continue or enhance the work that you
completed during the exercise.
Stretch goals are community-supported (meaning that your fellow MOOC participants can assist you with the steps to complete the stretch goal
using the Lesson Forum), and they are a great opportunity to work together to learn.
If you would like to continue engineering your data, use ArcGIS Pro geoprocessing tools to perform the following tasks:
a Identify and remove records with null candidatevotes values in the election data.
b Apply a symbology layer ([Link]) to the 2020 election turnout feature class (out_2020_fc_name).
The [Link] file is located in the DataEngineering_And_Visualization folder. The ArcGIS Pro Help documentation Apply Symbology From Layer
(Data Management) ([Link] describes the process of applying a symbology layer.
Note: Alaska does not have counties. Research its administrative and political subdivisions to
determine how the data would need to be engineered to address this issue.
d Use the Lesson Forum to post your questions and observations. Be sure to include the #stretch hashtag in the posting title.
f If you intend to complete the next stretch goal, leave ArcGIS Pro open; otherwise, close the Data Engineering map, save the project, and then exit
ArcGIS Pro.
Collapse
Earlier in the exercise, you learned how to prepare, combine, and validate data using ArcGIS Pro geoprocessing tools. Another option for data
engineering is Python. This exercise stretch goal uses ArcGIS Notebooks in ArcGIS Pro to complete a more detailed and automated version of the
exercise steps.
If you would like to recreate and expand on your data engineering workflow using Python code, complete the following tasks:
a In the Catalog pane, expand Notebooks, then open Data Engineering [Link].
First, you will import the necessary Python modules to execute the cells in the notebook. A Python module is a file that contains Python definitions
and statements. A module can define functions, classes, and variables, and it can include runnable code. You will use the import statement to
import the modules.
b In the notebook, locate the markdown cell titled Import Needed Modules.
The ArcGIS Notebooks interface is built on top of Jupyter Notebook, which structures Python content using cells. Code cells contain executable
Python code, and Markdown cells contain explanatory text and media.
19/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
c Under the Import Needed Modules markdown cell, insert a new code cell.
d In the new code cell, write code that uses the import statement to import the following modules: arcgis, pandas, os, and arcpy. Ensure that the
pandas module is imported as the alias pd.
e From the ArcGIS Notebooks toolbar, click Run or press Shift + Enter on your keyboard to run the code cell.
Next, you will specify the source dataset for the elections data. For this stretch goal, you will start with a CSV file containing United States
presidential elections data from the years 2000 to 2020. You must read this file into your notebook as a DataFrame object. You must also specify
that the county_fips attribute field in this data frame will be an object.
f Under the Read Data Into Python markdown cell, in the first code cell, write code that reads the file countypres_2000-[Link] into the notebook.
Take the following into consideration:
· The code should use the pandas read_csv() function to read the file as a DataFrame object.
· The variable name for the data frame should be: elections_complete_df.
· In the read_csv() function, set the dtype parameter to the following: {"county_fips":object}.
g After the read_csv() function's close parenthesis, press Enter to get a new line in the cell.
h In the new line, write code to preview the contents of the elections_complete_df data frame.
j In ArcGIS Pro, from the Notebook tab, in the Notebook group, click Save.
Although you did not write all the Python code, it is recommended that you carefully look at the Python syntax and logic in each cell. Reviewing
each cell can help familiarize you with the ArcGIS Notebooks interface and the relevant Python syntax. The notebook can also act as sample code
that you can reference for data engineering tasks.
l Run each cell from start to finish. Review the information in the preceding markdown cells as you run each code cell.
Note: You must run each code cell in the notebook before proceeding to the next.
m Use the Lesson Forum to post your questions, observations, and syntax examples. Be sure to include the #stretch hashtag in the posting title.
n When you are finished, close the Data Engineering map and notebook tabs, save the project, and then exit ArcGIS Pro.
Collapse
20/21
9/26/25, 8:55 AM Exercise 2: Perform data engineering tasks How can I print an exercise to PDF format?
Copyright © Esri
All rights reserved.
The information contained in this document is the exclusive property of Esri. This work is protected under United States copyright law and other
international copyright treaties and conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying and recording, or by any information storage or retrieval system, except as expressly permitted in writing by Esri. All
requests should be sent to Attention: Director, Contracts and Legal, Esri, 380 New York Street, Redlands, CA 92373-8100, USA.
Export Notice: Use of these Materials is subject to U.S. export control laws and regulations including the U.S. Department of Commerce Export
Administration Regulations (EAR). Diversion of these Materials contrary to U.S. law is prohibited.
Commercial Training Course Agreement Terms: The Training Course and any software, documentation, course materials or data delivered with the
Training Course is subject to the terms of the Master Agreement for Products and Services found at the following website: [Link]/TrainingTerms.
The license rights in the Master Agreement strictly govern Licensee's use, reproduction, or disclosure of the software, documentation, course materials
and data. Training Course students may use the course materials for their personal use and may not copy or redistribute for any purpose.
Contractor/Manufacturer is Esri, 380 New York Street, Redlands, CA 92373-8100, USA.
Esri Marks: Esri marks and product names mentioned herein are subject to the terms of use found at the following website: [Link]/EsriMarks.
Other companies and products or services mentioned herein may be trademarks, service marks, or registered marks of their respective mark owners.
21/21