0% found this document useful (0 votes)
94 views

Lesson 1 PG2

This document provides an overview and table of contents for the course notes for SAS Programming 2: Data Manipulation Techniques. The course notes were developed by Stacey Syphus, Beth Hardin, and Michele Ensor along with several other contributors. The notes cover topics like controlling DATA step processing, summarizing data, manipulating data with functions, creating custom formats, combining tables, and processing repetitive code using DO loops. The document lists 6 lessons that make up the course notes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Lesson 1 PG2

This document provides an overview and table of contents for the course notes for SAS Programming 2: Data Manipulation Techniques. The course notes were developed by Stacey Syphus, Beth Hardin, and Michele Ensor along with several other contributors. The notes cover topics like controlling DATA step processing, summarizing data, manipulating data with functions, creating custom formats, combining tables, and processing repetitive code using DO loops. The document lists 6 lessons that make up the course notes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

SAS® Programming 2: Data

Manipulation Techniques

Course Notes
SAS® Programming 2: Data Manipulation Techniques Course Notes was developed by Stacey
Syphus, Beth Hardin, and Michele Ensor. Additional contributions were made by Bruce Dawless,
Anita Hillhouse, Marty Hultgren, Mark Jordan, Eva-Maria Kegelmann, Gina Repole, Samantha
Rowland, Allison Saito, Prem Shah, Kristin Snyder, Peter Styliadis, and Kitty Tjaris . Instructional
design, editing, and production support was provided by the Learning Design and Development
team.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

SAS ® Programming 2: Data Manipulation Techniques Course Notes

Copyright © 2020 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise,
without the prior written permission of the publisher, SAS Institute Inc.

Book code E71658, course code LWPG2V2/PG2V2_001, prepared date 01Apr2020. LWPG2V2_001

ISBN 978-1-62960-550-0
For Your Infor mation iii

Table of Contents

Lesson 1 Controlling DATA Step Processing ......................................................1-1

1.1 Setting Up for This Course ................................................................................1-3

1.2 Understanding DATA Step Processing.............................................................. 1-11

Demonstration: DATA Step Processing ....................................................... 1-20

Practice ................................................................................................... 1-24

1.3 Directing DATA Step Output............................................................................. 1-26

Demonstration: Controlling Row Output ....................................................... 1-32

Demonstration: Controlling Column Output ................................................. 1-35

Practice ................................................................................................... 1-38

1.4 Solutions ....................................................................................................... 1-40

Solutions to Practices................................................................................ 1-40

Solutions to Activities and Questions .......................................................... 1-42

Lesson 2 Summarizing Data ...............................................................................2-1

2.1 Creating an Accumulating Column .....................................................................2-3

Demonstration: Creating an Accumulating Column .........................................2-5

Practice ................................................................................................... 2-10

2.2 Processing Data in Groups .............................................................................. 2-12

Demonstration: Identifying the First and Last Row in Each Group ................. 2-15

Demonstration: Creating an Accumulating Column within Groups ................. 2-22

Practice ................................................................................................... 2-28

2.3 Solutions ....................................................................................................... 2-30

Solutions to Practices................................................................................ 2-30

Solutions to Activities and Questions .......................................................... 2-33

Lesson 3 Manipulating Data with Functions .......................................................3-1

3.1 Understanding SAS Functions and CALL Routines ..............................................3-3


iv For Your Information

3.2 Using Numeric and Date Functions ....................................................................3-9

Demonstration: Using Numeric Functions .................................................... 3-12

Demonstration: Shifting Date Values .......................................................... 3-23

Practice ................................................................................................... 3-25

3.3 Using Character Functions .............................................................................. 3-27

Demonstration: Using Character Functions to Extract Words from a String ..... 3-32

Practice ................................................................................................... 3-39

3.4 Using Special Functions to Convert Column Type ............................................. 3-42

Demonstration: Using the INPUT and PUT Functions to Convert Column


Types ................................................................................ 3-54

3.5 Solutions ....................................................................................................... 3-60

Solutions to Practices................................................................................ 3-60

Solutions to Activities and Questions .......................................................... 3-63

Lesson 4 Creating Custom Formats ...................................................................4-1

4.1 Creating and Using Custom Formats..................................................................4-3

Demonstration: Creating and Using Custom Formats ................................... 4-10

Practice ................................................................................................... 4-13

4.2 Creating Custom Formats from Tables.............................................................. 4-16

Demonstration: Creating Custom Formats from Tables ................................. 4-19

Practice ................................................................................................... 4-26

4.3 Solutions ....................................................................................................... 4-30

Solutions to Practices................................................................................ 4-30

Solutions to Activities and Questions .......................................................... 4-34

Lesson 5 Combining Tables ...............................................................................5-1

5.1 Concatenating Tables .......................................................................................5-3

Demonstration: Concatenating Tables ..........................................................5-6

Practice ................................................................................................... 5-10

5.2 Merging Tables ............................................................................................... 5-12


For Your Infor mation v

Demonstration: Merging Tables .................................................................. 5-22

5.3 Identifying Matching and Nonmatching Rows .................................................... 5-25

Demonstration: Merging Tables with Nonmatching Rows ............................. 5-31

Practice ................................................................................................... 5-41

5.4 Solutions ....................................................................................................... 5-44

Solutions to Practices................................................................................ 5-44

Solutions to Activities and Questions .......................................................... 5-47

Lesson 6 Processing Repetitive Code ................................................................6-1

6.1 Using Iterative DO Loops ..................................................................................6-3

Demonstration: Executing an Iterative DO Loop .............................................6-7

Demonstration: Using Iterative DO Loops .................................................... 6-16

Practice ................................................................................................... 6-19

6.2 Using Conditional DO Loops ........................................................................... 6-24

Demonstration: Using Conditional DO Loops ............................................... 6-28

Demonstration: Combining Iterative and Conditional DO Loops ..................... 6-32

Practice ................................................................................................... 6-36

6.3 Solutions ....................................................................................................... 6-40

Solutions to Practices................................................................................ 6-40

Solutions to Activities and Questions .......................................................... 6-48

Lesson 7 Restructuring Tables ...........................................................................7-1

7.1 Restructuring Data with the DATA Step...............................................................7-3

Demonstration: Creating a Narrow Table with the DATA Step ..........................7-6

Practice ................................................................................................... 7-10

7.2 Restructuring Data with the TRANSPOSE Procedure ........................................ 7-13

Demonstration: Creating a Wide Table with PROC TRA NSPOSE .................. 7-16

Practice ................................................................................................... 7-23

7.3 Solutions ....................................................................................................... 7-25


vi For Your Information

Solutions to Practices................................................................................ 7-25

Solutions to Activities and Questions .......................................................... 7-27


For Your Infor mation vii

To learn more…
For information about other courses in the curriculum, contact the
SAS Education Division at 1-800-333-7660, or send e-mail to
[email protected]. You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course
Catalog.

For a list of SAS books (including e-books) that relate to the topics
covered in this course notes, visit https://www.sas.com/sas/books.html or
call 1-800-727-0025. US customers receive free shipping to US
addresses.
viii For Your Information
Lesson 1 Controlling DATA Step
Processing
1.1 Setting Up for This Course........................................................................................... 1-3

1.2 Understanding DATA Step Processing ....................................................................... 1-11


Demonstration: DATA Step Processing ................................................................... 1-20
Practice............................................................................................................... 1-24

1.3 Directing DATA Step Output ....................................................................................... 1-26


Demonstration: Controlling Row Output ................................................................... 1-32
Demonstration: Controlling Column Output .............................................................. 1-35
Practice............................................................................................................... 1-38

1.4 Solutions ................................................................................................................... 1-40


Solutions to Practices ............................................................................................ 1-40
Solutions to Activities and Questions........................................................................ 1-42
1-2 Lesson 1 Controlling DATA Step Processing

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-3

1.1 Setting Up for This Course

Course Overview

Analyze and
Access Explore Prepare report on
Export
data data data data
results

3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

The complete SAS programming process includes accessing data, exploring and validating data,
preparing data, analyzing and reporting on data, and exporting results. But it is likely that the
majority of your time as a programmer is spent preparing data. For this reason, this course is
focused on the SAS DATA step and various procedures that expand your skills and help make you
more productive working with your data.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Lesson 1 Controlling DATA Step Processing

Course Overview

OUTPUT

Prepare
data

$customFormat.

4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

• This lesson, “Controlling DATA Step Processing,” digs deeper into the DATA step. You learn how
the DATA step processes data behind the scenes. Then you use this knowledge to control when
and where the DATA step outputs rows to new tables.
• In “Summarizing Data,” you learn how to create an accumulating column—in other words, how to
generate a running total. Then you learn how to process data in groups so that you can perform
an action when each group begins or ends.
• In “Manipulating Data with Functions,” you learn how to use some new functions that enable you
to manipulate numeric, date, and character values. In addition, you learn how to use functions that
change a column from one data type to another.
• In “Creating Custom Formats,” you learn how to create and use custom formats to enhance how
your data is displayed in a table or report.
• In “Combining Tables,” you learn how to concatenate tables, merge tables, and identify matching
and nonmatching rows.
• In “Processing Repetitive Code,” you learn how to save time by taking advantage of iterative and
conditional processing with DO loops.
• In “Restructuring Tables,” you learn techniques that can be used to transpose or restructure a
table.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-5

Practicing in This Course

US National
class
Park data
cars
international
storm and
weather data
shoes
European
tourism and
trade data
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

In this course, you analyze mainly international storm and weather data. This is real data about
storms such as hurricanes, typhoons, and cyclones that has been collected since 1980. The
practices use various tables from US national parks and European tourism and trade. The course
also uses tables from the Sashelp library to illustrate new data manipulation techniques.
• The detailed international storm data can be found at
https://www.ncdc.noaa.gov/ibtracs/index.php?name=wmo-data as part of the International Best
Track Archive for Climate Stewardship (IBTrACS). The data has been summarized and cleansed
to use in this course.
• The US National Park data can be found at https://irma.nps.gov/Stats/Reports/National. The data
has been summarized and cleansed to use in this course.
• The European tourism data can be found at http://ec.europa.eu/eurostat/data/database. The data
has been summarized and cleansed to use in this course.
• SAS sample tables are provided in the Sashelp library. See
https://support.sas.com/documentation/tools/sashelpug.pdf for documentation about the available
tables.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Lesson 1 Controlling DATA Step Processing

Practicing in This Course


Demonstration Performed by your instructor as an example for you to
observe
Activity Short practice opportunities for you to work in SAS,
either independently or with the guidance of your
instructor
Practice Extended practice opportunities for you to work
independently
Case Study A comprehensive practice opportunity at the end of
the class

6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Case studies can be accessed on the Extended Learning page for your course.

Choosing a Practice Level


Level 1 Solve basic problems with step-by-step
guidance.
Level 2 Solve intermediate problems with defined
goals.
Challenge Solve complex problems with SAS Help
Choose one
and documentation resources. practice to do in
class based on your
interest and skill
level.

7
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-7

SAS Programming Interfaces

SAS Enterprise
SAS Studio Guide
SAS windowing
environment

You can use the interface of your choice, but some demonstrations
in this course use features specifically in SAS Enterprise Guide.
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

SAS has several programming interfaces that you can use to interactively write and submit code.
These interfaces include the SAS windowing environment (the interface that is part of SAS), SAS
Enterprise Guide (a client application that runs on your PC and accesses SAS on a local or remote
server), and SAS Studio (a web-based interface to SAS that you can use on any computer).
Note: In this class, we use SAS Studio and SAS Enterprise Guide because they include the most
modern programming tools.

Accessing the Course Files

course
files
activities
Make note of
data the location of
your course files
demos folder.

practices

9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Lesson 1 Controlling DATA Step Processing

Accessing the Course Files


Programs in the
activities, demos,
course and practices
files folders follow this
activities naming convention.

data

demos

practices
p204d01.sas
Programming 2, Lesson 4, demo 1

10
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

These folders contain starter SAS programs for you to use. The file names follow this naming
convention: the name starts with p1 for programming 1, followed by two digits for the lesson number.
Then the letter A, D, or P indicates activity, demo, or practice, followed by a sequential two-digit
number within the lesson. When you come to an activity, demo, or practice, the instructions indicate
the file that you need to open. There is also a solutions folder in the practices folder that has
complete solution programs.

Creating the Course Data

course
files
activities

data cre8data.sas

demos

practices

11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-9

1.01 Activity (Required)


1. Navigate to the location of the course files.
SAS Studio: In the Navigation pane, expand Files and Folders.
SAS Enterprise Guide: In the Servers list, expand Servers  Local  Files.
2. Double-click the cre8data.sas file to open the program.
3. Find the %LET statement. As directed by your instructor, provide the
path to your course files.
4. Run the program and verify that a report is created listing the generated
tables.

12
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

1.02 Activity (Required)


1. Open the libname.sas program in the course folder. The path macro variable
should be the folder where your course files are located.
2. Run the code and verify that the library was successfully assigned in the log.
3. Navigate to your list of libraries and expand the pg2 library. Open and view
the storm_summary SAS table.
Note: In Enterprise Guide, click Libraries  Refresh to update the library list.

Be sure to run the libname.sas program


each time that the SAS session is restarted.

14
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Lesson 1 Controlling DATA Step Processing

Extending Your Learning

Use your Extended


Learning page to
download course files
and access additional
videos, papers, and
other helpful resources!

18
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-11

1.2 Understanding DATA Step Processing

DATA Step Review


read and data storm_complete;
write tables set pg2.storm_summary_small;
length Ocean $ 8;
filter rows drop EndDate;
where Name is not missing;
and columns
Basin=upcase(Basin);
compute StormLength=EndDate-StartDate;
columns
if substr(Basin,2,1)="I"
conditionally then Ocean="Indian";
process else if substr(Basin,2,1)="A"
then Ocean="Atlantic";
else Ocean="Pacific";
run;

20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p201a03

The DATA step is the primary tool that you use in the SAS programming language for manipulating
data. This DATA step reads and writes tables, filters rows, computes new columns, uses conditional
processing to assign values to a new column, and subsets columns. We know what this code does,
but now we will learn how these statements work behind the scenes to process data.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Lesson 1 Controlling DATA Step Processing

1.03 Activity
Open p201a03.sas from the activities folder and perform the following tasks:
1. Run the program and examine the log, PROC CONTENTS report, and
output table.
2. Move the DROP statement to the end of the DATA step, just before the
RUN statement. Run the program and examine the log, PROC CONTENTS
report, and output table. Did the results change?
3. Move the LENGTH statement between the DROP and RUN statements.
Run the program and examine the log, PROC CONTENTS report, and
output table. Did the results change?

21
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

DATA Step Processing

Compilation Execution
establish data read, manipulate, and
attributes and rules write data
for execution
What happens
behind the
scenes when a
DATA step runs?

24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

To truly understand the DATA step and take advantage of its many powerful and unique features,
you must understand exactly how the DATA step processes data behind the scenes.
The DATA step follows a very logical process that is easy to customize to your data processing
needs. When you run a DATA step, it goes through two phases: compilation and execution. In the
compilation phase, SAS prepares the code and establishes data attributes and the rules for
execution. In the execution phase, SAS follows those rules to read, manipulate, and write data.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-13

DATA Step Processing: Compilation

Compilation PDV
Season Name StartDate Ocean
1) Check for syntax errors. N8 $ 25 N8 $8
2) Create the program data
vector (PDV), which includes
all columns and attributes.
3) Establish the specifications The PDV is the
for processing data in the magic behind the
PDV during execution. DATA step's
4) Create the descriptor processing power!
portion of the output table.

25
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

In the compilation phase, SAS runs through the program to check for syntax errors. If there are no
errors, SAS builds a critical area of memory called the program data vector, or PDV for short. The
PDV includes each column referenced in the DATA step and its attributes, including the column
name, type, and length. The PDV is used in the execution phase to hold and manipulate one row of
data at a time. Also in the compilation phase, SAS establishes rules for the PDV based on the code,
such as which columns will be dropped, or which rows from the input table will be read into the PDV.
Finally, SAS creates the descriptor portion, or the table metadata.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Lesson 1 Controlling DATA Step Processing

DATA Step Processing: Compilation


data storm_complete;
set pg2.storm_summary_small; Define the library and a
length Ocean $ 8; name for the output table.
drop EndDate;
where Name is not missing;
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
if substr(Basin,2,1)="I" then Ocean="Indian";
else if substr(Basin,2,1)="A" then Ocean="Atlantic";
else Ocean="Pacific";
run;

26
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p201d01

The DATA statement creates the output table Storm_Complete in the Work library.

DATA Step Processing: Compilation


data storm_complete;
set pg2.storm_summary_small; Columns are added to the PDV
length Ocean $ 8;
in the order in which they
drop EndDate;
appear in the input table.
where Name is not missing;
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
if substr(Basin,2,1)="I" then Ocean="Indian";
else if substr(Basin,2,1)="A" then Ocean="Atlantic";
else Ocean="Pacific";
run;

PDV
Name Basin MaxWind StartDate EndDate Attributes are inherited
$ 15 $2 N8 N8 N8 from the input table.

27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

To build the PDV, SAS passes through the DATA step sequentially, adding columns and their
attributes. The SET statement in this program is listed first, so all of the columns from the
storm_summary_small table are added to the PDV along with the required column attributes
name, type, and length. Optional attributes such as formats or labels might also be included for
columns that have them.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-15

DATA Step Processing: Compilation


data storm_complete;
set pg2.storm_summary_small; The remaining columns are
length Ocean $ 8; added to the PDV in the
drop EndDate; order in which they appear in
where Name is not missing; the DATA step.
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
if substr(Basin,2,1)="I" then Ocean="Indian";
else if substr(Basin,2,1)="A" then Ocean="Atlantic";
else Ocean="Pacific";
run;
Each column must have at least
PDV a name, type, and length.
Name Basin MaxWind StartDate EndDate Ocean StormLength
$ 15 $2 N8 N8 N8 $8 N8

28
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Any other statements that define new columns will add to the PDV. The LENGTH statement is next
after SET, and it explicitly defines the character column Ocean with a length of 8. StormLength is
the last new column, and, based on the arithmetic expression, it is defined as a numeric column with
a default length of 8.
Ocean appears in assignment statements later in the step. However, after a column and its
attributes are established in the PDV, they cannot be changed. That is why the LENGTH statement
must occur before the IF-THEN statements. Otherwise, the assignment statement OCEAN="Indian"
would be the first statement that SAS would use to define Ocean with a length of 6. Remember, SAS
is not processing at this point, so the IF expression is not evaluated. SAS is simply looking for the
first definition of any new column that it must add to the PDV.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Lesson 1 Controlling DATA Step Processing

DATA Step Processing: Compilation


data storm_complete;
set pg2.storm_summary_small; DROP or KEEP statements flag
length Ocean $ 8; columns that will be excluded
drop EndDate;
from the output table.
where Name is not missing;
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
if substr(Basin,2,1)="I" then Ocean="Indian";
else if substr(Basin,2,1)="A" then Ocean="Atlantic";
else Ocean="Pacific";
run;

PDV
Name Basin MaxWind StartDate EndDate Ocean StormLength
$ 15 $2 N8 N8 N8 $8 N8
D
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

There are certain statements that are specific to the compilation phase and establish the behavior of
the PDV. The DROP statement does not remove a column from the PDV. Instead, SAS marks the
column with a drop flag so that it is dropped later in execution. In this program, EndDate will
eventually be dropped from the output data, but it is still be available to use in the PDV for
calculating the column StormLength.

DATA Step Processing: Compilation


data storm_complete;
set pg2.storm_summary_small; The WHERE statement
length Ocean $ 8; establishes conditions for
drop EndDate;
which rows will be read from
where Name is not missing;
the input table into the PDV.
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
if substr(Basin,2,1)="I" then Ocean="Indian";
else if substr(Basin,2,1)="A" then Ocean="Atlantic";
else Ocean="Pacific";
run;

PDV
Name Basin MaxWind StartDate EndDate Ocean StormLength
$ 15 $2 N8 N8 N8 $8 N8
D
30
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

The WHERE statement defines which rows are read from the input table into the PDV during
execution.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-17

DATA Step Processing: Compilation


data storm_complete;
set pg2.storm_summary_small;
length Ocean $ 8; The descriptor portion is
drop EndDate; created for the output table.
where Name is not missing;
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
if substr(Basin,2,1)="I" then Ocean="Indian";
else if substr(Basin,2,1)="A" then Ocean="Atlantic";
else Ocean="Pacific";
run;

work.storm_complete
Name Basin MaxWind StartDate Ocean StormLength
$ 15 $2 N8 N8 $8 N8
31
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Finally, the descriptor portion of the output table is complete. Notice that the EndDate column is not
included in the descriptor portion of the output table.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Lesson 1 Controlling DATA Step Processing

DATA Step Processing: Execution

Execution data output-table;


set input-table;
1) Initialize the PDV. ...other statements...
2) Read a row from the input run;
table into the PDV. Implicit OUTPUT;
3) Sequentially process Implicit RETURN;
statements and update values Automatic
in the PDV. looping makes
4) At the end of the step, write processing data
the contents of the PDV to the easy!
output table.
5) Return to the top of the DATA
step.
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

When the compilation phase is complete, the program is ready for action in the execution phase. In
this phase, SAS reads data, processes it in the PDV, and outputs it to a new table.
DATA step execution acts like an automatic loop. The first time through the DATA step, the SET
statement reads the first row from the input table, and then processes any other statements in
sequence, manipulating the values in the PDV. When SAS reaches the end of the DATA step, there
is an implied OUPUT action so that the contents of the PDV, minus any columns flagged for
dropping, are written as the first row in the output table. The DATA step then automatically loops
back to the top and executes the statements in order again, this time reading, manipulating and
outputting the next row. That implicit loop continues until all of the rows are read from the input table.
Compile-time statements such as DROP, LENGTH, and WHERE are not executed for each row.
However, because of the rules that they established in the compilation phase, their impact will be
observed in the output table.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-19

DATA Step Processing in Action

You can watch


execution happen one
statement at a time in
the Enterprise Guide
DATA step debugger.

33
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

A great way to learn how data is processed in the execution phase is to watch it happen statement
by statement, and row by row. This is possible using an interactive debugging tool unique to SAS
Enterprise Guide. We will use the DATA step debugger to peek behind the scenes and watch the
impact of each statement on the values in the PDV as the step executes.
For more information about using the Enterprise Guide DATA step debugger, see
https://support.sas.com/resources/papers/proceedings17/SAS0447-2017.pdf.
The SAS windowing environment also provides an interactive DATA step debugger. It can be
accessed by adding the DEBUG option in the DATA statement:

DATA table / DEBUG;

Visit the Using the DATA Step Debugger page in SAS Help for more details.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Lesson 1 Controlling DATA Step Processing

DATA Step Processing

Scenario
Use the DATA step debugger in SAS Enterprise Guide to observe the process of execution.

Files
• p201d01.sas
• storm_summary_small – a SAS table that has one row per storm for the 1980 through 2016
storm seasons

Notes
• The DATA step is processed in two phases: compilation and execution.
• During compilation, SAS creates the program data vector (PDV) and establishes data attributes
and rules for execution.
• The PDV is an area of memory established in the compilation phase. It includes all columns that
will be read or created, along with their assigned attributes. The PDV is used in the execution
phase to hold and manipulate one row of data at a time.
• During execution, SAS reads, manipulates, and writes data. All data manipulation is performed in
the PDV.

Demo
Note: This demo must be performed in Enterprise Guide.
1. Open the p201d01.sas program in the demos folder.

2. The DATA step markers for debugging toolbar button enables debugging in the
program. If this option is enabled, you see the same icon and a green bar next to each DATA
step in your program.
3. Click the Debugger icon next to the DATA statement. The DATA Step Debugger window
appears.
4. At this point, the compilation phase is complete and the PDV is displayed on the right side of the
window. Notice that all columns read from the storm_summary_small table start with a missing
value.
5. Two additional columns are included in the PDV during execution. _ERROR_ is 0 by default but
is set to 1 whenever a data error is encountered, such as a value that cannot be read or
calculated. _N_ is initially set to 1. Each time the DATA step loops past the DATA statement, the
variable _N_ increments by 1. The value of _N_ represents the number of times that the DATA
step has iterated.

6. Click Step execution to next line to execute the highlighted SET statement and step to the
next executable statement. Recall that during the compilation phase, the WHERE statement
established a rule to read rows into the PDV only where Name is not missing. The first two rows
of the input table have missing values for Name, so the third row is read. However, because this
is the first iteration of the DATA step, _N_ is still equal to 1. Values for the Name, Basin,
MaxWind, StartDate, and EndDate columns are assigned in the PDV.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-21

Note: Red text in the Value column represents data values that were updated with the
execution of the previously highlighted statement.
7. The assignment statement for Basin is the next executable statement. LENGTH, DROP, and
WHERE are compile-time statements. Click Step execution to next line twice to execute
the Basin and StormLength assignment statements. Notice that Basin was already in
uppercase and did not change, but a value of 6 was assigned to StormLength.

8. Click Step execution to next line to execute the IF, ELSE IF, and ELSE statements. After
line 10, Pacific is assigned to Ocean.

9. With the RUN statement highlighted, click Step execution to next line . As the concluding
step boundary for the DATA step, the RUN statement triggers an implicit output. The values in
the PDV are written as the first row in storm_complete. After the implicit output, the process
returns to the top of the DATA step.
Note: While debugging a program, the output table is not created. When the program runs
outside of the debugger, the implicit output writes rows to the output table.
10. Notice that _N_ is now 2, representing the second iteration of the DATA step. Columns read from
the SET table retain their values. However, the new computed columns, Ocean and
StormLength, are reset to missing. This action is called reinitializing the PDV.

11. Click Step execution to next line to step through the program until line 8. Notice that the
value of Basin is SI, so the IF condition is true. Execute the IF statement, and SAS assigns
Indian to the Ocean column. The remaining ELSE statements are skipped and RUN is
highlighted.
12. Execute the RUN statement. _N_ is increased to 3, and the PDV is reinitialized.

13. Click Start/continue debugger execution to proceed through the rest of execution. Close
the DATA step debugger.
Note: The DATA step debugger is available by default in other programs. To suppress the
debugger icon in the editor, click the DATA step markers for debugging toolbar button
.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Lesson 1 Controlling DATA Step Processing

Viewing Execution in the Log


writes all columns
and values in the
PDV to the log
PUTLOG _ALL_;
writes selected If you don’t have the
PUTLOG column=; columns and values interactive debugger,
in the PDV to the log use the PUTLOG
statement to write
PUTLOG "message";
writes a text string information about
to the log execution to the log.

35
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

If you do not have access to the interactive DATA step debugger in Enterprise Guide, you can add
PUTLOG statements to your code so that you can examine the contents of the PDV at any time
during execution.
The _ALL_ keyword writes all columns in the PDV and their values to the log, and column= writes
one or more specific columns and their values to the log. Message writes a text string that you
specify to the log.

Viewing Execution in the Log

The OBS= data set option


data storm_complete; limits the observations
set pg2.storm_summary_small(obs=2); that are read.
putlog "PDV after SET Statement";
putlog _all_;
...

PDV after SET Statement


Name=AGATHA Basin=EP MaxWind=115 StartDate=09JUN1980
EndDate=15JUN1980 Ocean= StormLength=. _ERROR_=0 _N_=1
PDV after SET Statement
Name=ALBINE Basin=SI MaxWind=. StartDate=27NOV1979
EndDate=06DEC1979 Ocean= StormLength=. _ERROR_=0 _N_=2
36
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-23

1.04 Activity
Open p201a04.sas from the activities folder and perform the following tasks:
1. Examine the PUTLOG statements that are in the DATA step.
2. Add two PUTLOG statements before the RUN statement to print "PDV
before RUN statement" and write all columns in the PDV to the log. Run
the program.
3. View the log. What is the value of StormLength at the end of the second
iteration of the DATA step?
4. Type NOTE: (use uppercase and include the colon) inside the quotation
marks of the following PUTLOG statement. Run the program. What
changes in the log?
putlog "NOTE: PDV before RUN statement";
37
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Lesson 1 Controlling DATA Step Processing

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
1. Using the DATA Step Debugger to Examine Execution Steps
Examine the National Park data that is used in most practices. Use the DATA step debugger to
follow the steps of execution in a DATA step that reads the np_final table.
Note: This practice must be performed in SAS Enterprise Guide to use the interactive DATA
step debugger. If you did not do the first activities in Enterprise Guide, first open and run
the libname.sas program.
a. In Enterprise Guide, use the Servers list to expand Servers  Local  Libraries  PG2.
Double-click np_final to open the table. The table includes one row per US national park.
Note that the first row in the table is Cape Krusenstern National Monument.
b. Become familiar with the following columns in the np_final table:
• Region (Alaska, Intermountain, Midwest, National Capital, Northeast, Pacific West, and
Southeast)
• Type (Monument, Park, Preserve, River, Seashore)
• ParkName (full name of national park)
• DayVisits (number of daily visitors in 2017)
• Campers (number of campers in 2017)
• OtherLodging (number of people in other lodging, including cabins and hotels, in 2017)
• Acres (total park size in acres)
c. Open p201p01.sas in the practices folder of the course files. Click DATA step markers for
debugging to enable debugging in the program. Click the Debugger icon next to
the DATA statement. The DATA Step Debugger window appears.
d. How many variables are in the PDV? What are the initial values?

e. Click Step execution to next line to execute the highlighted SET statement. Recall that
the first row of the np_final table is Cape Krusenstern National Monument. Why was the first
row not read into the PDV in the first iteration of the DATA step?

f. Click Step execution to next line to step through the remaining statements in the DATA
step. Which statements are executable? Which statements are compile-time only?
g. Exit the debugger and run the program to view the output table.
Note: The DATA step debugger is available by default in other programs. To suppress the
debugger icon in the editor, click DATA step markers for debugging .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Understanding DATA Step Processing 1-25

Level 2
2. Using PUTLOG Statements to Examine Execution Steps
a. Open p201p02.sas in the practices folder of the course files. Examine the program and
answer the following questions:
1) Which statements are compile-time only?
2) What will be assigned for the length of Size?
b. Run the program and examine the results.
c. Modify the program to resolve the truncation of Size. Read the first five rows from the input
table.
d. Add PUTLOG statements to provide the following information in the log:
1) Immediately after the SET statement, write START DATA STEP ITERATION to the log
as a color-coded note.
2) After the Type= assignment statement, write the value of Type to the log.
3) At the end of the DATA step, write the contents of the PDV to the log.
e. Run the program and read the log to examine the messages written during execution.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Lesson 1 Controlling DATA Step Processing

1.3 Directing DATA Step Output

Controlling DATA Step Processing

data output-table;
set input-table; You can alter the
...other statements... default DATA step
run;
processing rules to
Implicit OUTPUT; control how the
Implicit RETURN; steps of execution
proceed.

41
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Now that we have a better idea of what happens with the PDV during the compilation and execution
phases of DATA step processing, we can use the knowledge to our advantage. There are times
when the default processing rules of execution are perfectly fine for your needs. But there are other
times when you need to alter those rules to process your data in a different way. The DATA step
provides syntax that enables you to control exactly how the steps of execution proceed.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Directing DATA Step Output 1-27

Controlling Output
sashelp.shoes

Use the DATA step to create a


table that includes a sales
forecast for each of the next
three years.

forecast

42
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Let’s start by focusing on the implicit output that occurs at the RUN statement. By default, SAS reads
one row from the input table, manipulates the values, and writes that updated row to the output
table. But what if you want to control exactly when and where each row is written?
To illustrate this, let’s look at the sashelp.shoes table. Each row includes the annual sales for
Region, Product, and Subsidiary. Suppose we want to create a table that includes a sales forecast
for each of the next three years, assuming that sales increase by 5% annually.

Controlling Output
sashelp.shoes forecast

data forecast;
set sashelp.shoes;
keep Region Product Subsidiary Year ProjectedSales;
format ProjectedSales dollar10.;
Year=1; Will this
ProjectedSales=Sales*1.05; program write
Year=2; three rows for
ProjectedSales=ProjectedSales*1.05;
Year=3; every one row
ProjectedSales=ProjectedSales*1.05; that it reads?
run;
43
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Lesson 1 Controlling DATA Step Processing

1.05 Activity
Enterprise Guide: Open p201a05a.sas from the activities folder.
1. Use the DATA step debugger to step through one iteration of the DATA
step. Observe the values of Year and ProjectedSales as they are updated.
2. Close the debugger and run the program. Examine the log and output
data. How many rows are in the input and output tables?

SAS Studio: Open p201a05b.sas from the activities folder.


1. Run the program. Observe the values of Year and ProjectedSales written
in the log.
2. How many rows are in the input and output tables?
Keep the program open for the next activity.
44
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Note: If you are using the windowing environment, follow the steps for SAS Studio.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Directing DATA Step Output 1-29

Implicit Output

data forecast;
set sashelp.shoes;
keep Region Product Subsidiary Year ProjectedSales;
format ProjectedSales dollar10.;
Year=1;
ProjectedSales=Sales*1.05;
Year=2;
ProjectedSales=ProjectedSales*1.05;
Year=3;
ProjectedSales=ProjectedSales*1.05;
run;
Implicit OUTPUT;
Implicit RETURN;

46
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

By default, SAS sequentially executes all appropriate statements in the DATA step, and when it
reaches the end of the DATA step, it implicitly outputs the data in the PDV as a row in the output
table. Then SAS automatically loops back to the top of the DATA step and goes through the same
process for the next row.
In this program, although the value of ProjectedSales is calculated for years 1, 2, and 3, the implicit
output occurs only once at the bottom of the loop. When SAS reaches the end of the DATA step, the
values in the PDV are for year 3, and those values are written to the output table.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-30 Lesson 1 Controlling DATA Step Processing

Explicit Output

OUTPUT;

data forecast; data forecast;


set sashelp.shoes; set sashelp.shoes;
... ...
run; output;
run;
Implicit OUTPUT;
Implicit RETURN; Implicit OUTPUT;
Implicit RETURN;

47
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

You can use an explicit OUTPUT statement in the DATA step to force SAS to write the contents of
the PDV to the output table at specific points in the program.
If you use an explicit OUTPUT statement anywhere in a DATA step, there is no implicit output at the
end of the DATA step. The implicit return still returns processing to the top of the DATA step.

1.06 Activity
Modify the p201a05 program that you have open from the previous activity.
1. Add an explicit OUTPUT statement after each ProjectedSales assignment
statement. Run the program. How many rows are in the output table?
2. Comment the final OUTPUT statement and run the program again. Are
rows where Year=3 written to the new table?

48
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 D irecting DATA Step Output 1-31

Sending Output to Multiple Tables

DATA table1 <table2...>;

OUTPUT table1 <table2...>;

data sales_high sales_low;


set sashelp.shoes;
if Sales>100000 then output sales_high;
else output sales_low;
run;

51
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

The OUTPUT statement controls when to output, but one of the great features is that it also controls
where to output. The DATA step can create multiple tables simultaneously simply by listing more
than one table in the DATA statement. You can use the OUTPUT statement followed by the name of
the table to indicate where SAS should write the contents of the PDV.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-32 Lesson 1 Controlling DATA Step Processing

Controlling Row Output

Scenario
Create multiple output tables in a single DATA step and use IF-THEN/ELSE logic to designate which
rows are written to each table.

Files
• p201d02.sas
• storm_summary – a SAS table that has one row per storm for the 1980 through 2016 storm
seasons

Notes
• By default, the end of a DATA step causes an implicit output, which writes the contents of the PDV
to the output table.
• The explicit OUTPUT statement can be used in the DATA step to control when and where each
row is written.
• If an explicit OUTPUT statement is used in the DATA step, it disables the implicit output at the end
of the DATA step.
• One DATA step can create multiple tables by listing each table name in the DATA statement.
• The OUTPUT statement followed by a table name writes the contents of the PDV to the specified
table.

Demo
1. Open the p201d02.sas program in the demos folder and find the Demo section. Modify the
DATA statement to create three tables named indian, atlantic, and pacific.
data indian atlantic pacific;

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Directing DATA Step Output 1-33

2. Modify the IF-THEN/ELSE conditional statements to write output to the appropriate table.
data indian atlantic pacific;
set pg2.storm_summary;
length Ocean $ 8;
Basin=upcase(Basin);
StormLength=EndDate-StartDate;
MaxWindKM=MaxWindMPH*1.60934;
if substr(Basin,2,1)="I" then do;
Ocean="Indian";
output indian;
end;
else if substr(Basin,2,1)="A" then do;
Ocean="Atlantic";
output atlantic;
end;
else do;
Ocean="Pacific";
output pacific;
end;
run;
3. Add a DROP statement to remove MaxWindMPH. Highlight the DATA step, run the selected
code, and examine the output tables. Notice that MaxWindMPH has been dropped from all three
tables.
drop MaxWindMPH;

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-34 Lesson 1 Controlling DATA Step Processing

Controlling Column Output


data sales_high
sales_low; A DROP or KEEP statement
set sashelp.shoes; applies to all output tables
... listed in the DATA statement.
drop Inventory Returns;
run;

PDV
Region Product Subsidiary Stores Sales Inventory Returns
$ 25 $ 14 $ 12 N8 N8 $8 N8
D D

53
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

When you use a DROP or KEEP statement in the DATA step, the column is flagged for dropping or
keeping in the PDV, so the action applies to all of the tables listed in the DATA statement.

Controlling Column Output


data sales_high(drop=Returns) table(DROP=col1 col2...)
sales_low(drop=Inventory);
set sashelp.shoes; table(KEEP=col1 col2...)
...
drop Inventory Returns;
run; A DROP= or KEEP= data set
option applies to the output
table that it follows.

PDV
Region Product Subsidiary Stores Sales Inventory Returns
$ 25 $ 14 $ 12 N8 N8 $8 N8
D D

54
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

You can use the DROP= or KEEP= data set options to specify a unique list of columns for each table
listed in the DATA statement. The PDV keeps track of columns to drop from the specific output table.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Directing DATA Step Output 1-35

Controlling Column Output

Scenario
Control which columns are read in and out of the PDV with DROP= or KEEP= data set options.

Files
• p201d03.sas
• storm_summary – a SAS table that has one row per storm for the 1980 through 2016 storm
seasons

Notes
• DROP= or KEEP= data set options can be added on any table in the DATA statement.
• Columns that will be dropped are flagged in the PDV and are not dropped until the row is output to
the designated table. Therefore, dropped columns are still available for processing in the DATA
step.
• DROP= or KEEP= data set options can be added in the SET statement to control the columns that
are read into the PDV. If a column is not read into the PDV, it is not available for processing in the
DATA step.

Demo
Note: This demo must be performed in Enterprise Guide.
1. Open the p201d03.sas program in the demos folder and find the Demo section. Use the
DROP= data set option to drop MaxWindMPH from the indian table and MaxWindKM from the
atlantic table. Do not drop any columns from the pacific table.
data indian(drop=MaxWindMPH) atlantic(drop=MaxWindKM) pacific;
2. Start the DATA step debugger. Note that MaxWindMPH and MaxWindKM are included in the
PDV.
3. Close the debugger, run the program, and examine the three output tables. MaxWindMPH has
been dropped from the indian table, MaxWindKM has been dropped from the atlantic table,
and the pacific table has all columns.
4. Add a DROP= data set option in the SET statement to drop MinPressure. Start the debugger.
Notice that MinPressure is not included in the PDV.
set pg2.storm_summary(drop=MinPressure);
5. Close the debugger, run the program, and examine the three output tables. Confirm that
MinPressure has been dropped from each table.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-36 Lesson 1 Controlling DATA Step Processing

Controlling Column Input

SET table(DROP=col1 col2...)


SET table(KEEP=col1 col2...)
input table

If you use DROP= or


PDV KEEP= in the SET
statement, the
columns are not
added to the PDV.

56
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

It is important to think about whether you need columns for processing when you are deciding where
to drop or keep them. When you use a DROP= or KEEP= data set option on a table in the SET
statement, the excluded columns are not read into the PDV and are not available for processing. It
does not delete columns from the original data.

Controlling Column Output

DATA table(DROP=col1 col2...) DROP col1 col2...;


DATA table(KEEP=col1 col2...) KEEP col1 col2...;
input table

If you use DROP= or PDV


KEEP= in the DATA
statement, the
columns are not
added to output. output table

57
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

When you use a DROP or KEEP statement or a DROP= or KEEP= data set option in the DATA
statement, the columns are included in the PDV and can be used for processing. They are flagged to
be dropped when an implicit or explicit output is reached.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Directing DATA Step Output 1-37

1.07 Question

data indian(drop=MaxWindMPH)
atlantic(drop=MaxWindKM)
pacific;
set pg2.storm_summary;
StormLength=EndDate-StartDate;
...
run; What would be the
easiest way to drop
EndDate from all
three tables?

58
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Beyond SAS Programming 2


What if you want to ...

. . . access SAS DATA . . . learn about writing . . . use alternate


step programming multi-threaded DATA syntax to
documentation? step code in SAS Viya? IF-THEN/ELSE that
is similar to SQL?
• Visit the SAS Help • Watch free videos about
Center and the DATA programming in SAS Viya. • Learn about the
Step Programming • Take the Programming for SAS SELECT-WHEN-
section. Viya course. OTHERWISE statement
in SAS Help.
• Complete the Challenge
practice.

60
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .

Links:
• Visit the SAS Help Center and the DATA Step Programming section.
• Watch free videos about programming in SAS Viya.
• Take the Programming for SAS Viya course.
• Learn about the SELECT-WHEN-OTHERWISE statement in SAS Help.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-38 Lesson 1 Controlling DATA Step Processing

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
3. Conditionally Creating Multiple Output Tables
The pg2.np_yearlytraffic table contains annual traffic counts at locations in national parks.
Parks are classified as one of five types: National Monument, National Park, National Preserve,
National River, and National Seashore.
a. Open the p201p03.sas program from the practices folder. Modify the DATA step to create
three tables: monument, park, and other. Use the value of ParkType as indicated above to
determine which table the row is output to.
b. Drop ParkType from the monument and park tables. Drop Region from all three tables.
c. Submit the program and verify the output.
The notes in the SAS log indicate how many rows are in each table.
NOTE: There were 478 observations read from the data set PG2.NP_YEARLYTRAFFIC.
NOTE: The data set WORK.MONUMENT has 84 observations and 3 variables.
NOTE: The data set WORK.PARK has 246 observations and 3 variables.
NOTE: The data set WORK.OTHER has 148 observations and 4 variables.

Level 2
4. Conditionally Creating Columns and Output Tables
The pg2.np_2017 table contains monthly public use figures for national parks, including these
columns:
a. Create a new program. Write a DATA step that creates temporary SAS tables named
camping and lodging and reads the pg2.np_2017 table.
b. Compute a new column, CampTotal, that is the sum of CampingOther, CampingTent,
CampingRV, and CampingBackcountry. Format CampTotal so that values are displayed
with commas.
c. The camping table has the following specifications:
1) includes rows if CampTotal is greater than zero
2) contains the ParkName, Month, DayVisits, and CampTotal columns
d. The lodging table has the following specifications:
1) includes rows where LodgingOther is greater than zero
2) contains only the ParkName, Month, DayVisits, and LodgingOther columns
e. Submit the program and verify the output. The notes in the SAS log indicate how many rows
are in each table.
NOTE: The data set WORK.CAMPING has 1374 observations and 4 variables.
NOTE: The data set WORK.LODGING has 383 observations and 4 variables.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Directing DATA Step Output 1-39

Challenge
5. Processing Statements Conditionally with SELECT-WHEN Groups
SELECT and WHEN statements can be used in a DATA step as an alternative to IF -THEN
statements to process code conditionally.
a. Open the p201p05.sas program in the practices folder. The program contains the solution
programs for Practices 3 and 4.
b. Use SAS Help or online documentation to read about using SELECT and WHEN statements
in the DATA step.
c. Modify the Practice 3 program to use SELECT groups and WHEN statements.

Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

You might also like