Functionality of PROC SQL in SAS
Anne Wolfley
Senior Programmer/Analyst
February 9, 2015
Topics for todays presentation
What to expect from this presentation
Why PROC SQL?
Syntax
Aliases
Joins and unions
Case logic
Summary functions
Macro variables
Resources and references
All Rights Reserved, Duke Medicine 2007
What to expect from this presentation
Basic information about several topics
Code to get you started
Resources and references so you can explore on
your own
All Rights Reserved, Duke Medicine 2007
Why PROC SQL?
The power of SQL with SAS functionality
Only way to correctly do a many-to-many merge
Dont need to sort datasets in order to merge (join)
them
Quick reporting
Summary functions are quick and easy
Create data-driven macro variables on the fly (great
with summary functions!)
All Rights Reserved, Duke Medicine 2007
Basic syntax
proc sql;
create table output-dataset as
select comma-separated variable list
from input-dataset
where
order by ;
quit;
Create table if you want an output dataset.
Eliminate this step if you just want to print to your .lst
file.
All Rights Reserved, Duke Medicine 2007
Quick report
proc sql;
select name, age, height, weight
from sashelp.class
where sex=F
order by name;
quit;
------------------------------------------------proc sort data=sashelp.class out=class; by name; run;
proc print data=class noobs;
var name age height weight;
where sex=F;
quit;
All Rights Reserved, Duke Medicine 2007
Quick report - Results
Name
Age
Height
Weight
-------------------------------------Alice
13
56.5
84
Barbara
13
65.3
98
Carol
14
62.8
102.5
Jane
12
59.8
84.5
Janet
15
62.5
112.5
Joyce
11
51.3
50.5
Judy
14
64.3
90
Louise
12
56.3
77
Mary
15
66.5
112
All Rights Reserved, Duke Medicine 2007
Quick note: Using SAS functionality to
achieve the same results
proc sql;
select name, age, height, weight
from sashelp.class (where=(sex=F))
order by name;
quit;
All Rights Reserved, Duke Medicine 2007
SELECT UNIQUE / SELECT DISTINCT:
Selects only the unique values of variables in the select clause.
proc sql;
select unique sex
from sashelp.class;
quit;
OR
proc sql;
select distinct sex
from sashelp.class;
quit;
All Rights Reserved, Duke Medicine 2007
SELECT UNIQUE - Results
Sex
----F
M
All Rights Reserved, Duke Medicine 2007
Joins
PROC SQL join = SAS data step merge*
Types of joins
Full join: All observations from both datasets
Inner join: Observations matched in both datasets
Left join: All observations from the left dataset +
matching observations from the right dataset
Right join: All observations from the right dataset +
matching observations from the left dataset (same
as a left join, just referencing the dataset listed on
the right side instead of the left side)
* Except for many-to-many merges
All Rights Reserved, Duke Medicine 2007
Full join: All observations from both datasets
proc sql;
create table patientdata as
select coalesce(ds1.patient, ds2.patient), ds1.age,
ds2.name
from ds1 full join ds2
on ds1.patient = ds2.patient
order by ds1.patient;
quit;
-------------------------------------------------------------proc sort data=ds1; by patient; run;
proc sort data=ds2; by patient; run;
data patientdata;
merge ds1 ds2;
by patient;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007
Inner join: Observations matched in both datasets
proc sql;
create table patientdata as
select ds1.patient, ds1.age, ds2.name
from ds1, ds2
where ds1.patient = ds2.patient
order by ds1.patient;
quit;
-------------------------------------------------------------proc sort data=ds1; by patient; run;
proc sort data=ds2; by patient; run;
data patientdata;
merge ds1(in=a) ds2(in=b);
by patient;
if a and b;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007
Left join: All observations in one dataset +
observations matched in other dataset
proc sql;
create table patientdata as
select ds1.patient, ds1.age, ds2.name
from ds1 left join ds2
on ds1.patient = ds2.patient
order by ds1.patient;
quit;
-------------------------------------------------------------proc sort data=ds1; by patient; run;
proc sort data=ds2; by patient; run;
data patientdata;
merge ds1(in=a) ds2;
by patient;
if a;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007
Aliases
Aliases are nicknames for datasets, used as a
shortcut
proc sql;
create table patientdata as
select a.patient, a.age, b.name
from sasdata.patientage a, sasdata.patientname b
where a.patient = b.patient
order by a.patient;
quit;
All Rights Reserved, Duke Medicine 2007
Many-to-many join (Cartesian join)
data ds1;
input patient :8. date1 :date9.;
format date1 date9.;
datalines;
12345 '09DEC2014'
12345 '01JAN2015'
12345 '15JAN2015'
;
run;
data ds2;
input patient :8. date2 :date9. visit :8.;
format date2 date9.;
datalines;
12345 '08DEC2014' 1
12345 '01JAN2015' 2
12345 '09JAN2015' 3
12345 '16JAN2015' 4
;
run;
All Rights Reserved, Duke Medicine 2007
Many-to-many join (contd)
proc sql;
create table manytomany as
select coalesce(a.patient,b.patient) as patient,
date1, date2, visit
from ds1 a, ds2 b
where a.patient = b.patient
order by patient, date1, date2;
quit;
All Rights Reserved, Duke Medicine 2007
Many-to-many join (contd)
patient
date1
date2
visit
12345
12345
12345
12345
12345
12345
12345
12345
12345
12345
12345
12345
09DEC2014
09DEC2014
09DEC2014
09DEC2014
01JAN2015
01JAN2015
01JAN2015
01JAN2015
15JAN2015
15JAN2015
15JAN2015
15JAN2015
08DEC2014
01JAN2015
09JAN2015
16JAN2015
08DEC2014
01JAN2015
09JAN2015
16JAN2015
08DEC2014
01JAN2015
09JAN2015
16JAN2015
1
2
3
4
1
2
3
4
1
2
3
4
All Rights Reserved, Duke Medicine 2007
Union
proc sql;
create table patientdata as
select patient, age, name
from ds1
union
select patient, age, name
from ds2
order by patient;
quit;
(Use UNION ALL instead of UNION to prevent removal of duplicate rows)
---------------------------------------------------------------------data patientdata;
set ds1 ds2;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007
Creating a new variable and controlling
variable attributes - Syntax
proc sql;
create table newvar as
select name, age format=8.1,
Rolling Green Elementary as school length=100
label=School Name
from sashelp.class
order by name;
quit;
All Rights Reserved, Duke Medicine 2007
Case Logic - Syntax
case when logical-expression1 then new-variable-value1 when
logical-expression2 then new-variable-value2 when else newvariable-value3 end as new-variable
proc sql;
create table caselogic as
select name, age, sex,
case when sex=F then Female
when sex=M then Male
else
end as sex2
from sashelp.class
order by name;
quit;
Equivalent to if/then/else clause in SAS data step.
All Rights Reserved, Duke Medicine 2007
Case Logic Results
Name
Age
Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
Jeffrey
John
Joyce
Judy
Louise
etc
14.0
13.0
13.0
14.0
14.0
12.0
12.0
15.0
13.0
12.0
11.0
14.0
12.0
All Rights Reserved, Duke Medicine 2007
School Name
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
IFC and IFN - Syntax
ifc(logical-expression, value-if-true, value-if-false,
value-if-missing) as new-variable
proc sql;
create table ifc as
select name, age, sex
ifc(sex=F,Female,Male,) as sex2
from sashelp.class
order by name;
quit;
Similar to case when/then/else but only works for binaries.
IFC is for resultant character variables (e.g., sex2 above); IFN
is for resultant numeric variables.
All Rights Reserved, Duke Medicine 2007
IFC and IFN Results (look familiar?)
Name
Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
Jeffrey
John
Joyce
Judy
Louise
etc
All Rights Reserved, Duke Medicine 2007
Age
14
13
13
14
14
12
12
15
13
12
11
14
12
Sex
sex2
M
F
F
F
M
M
F
F
M
M
F
F
F
Male
Female
Female
Female
Male
Male
Female
Female
Male
Male
Female
Female
Female
Summary functions
Summary functions summarize the data vertically, like PROC
MEANS or PROC UNIVARIATE.
Full list of functions can be found here.
MEDIAN is not an available summary function. You must use
PROC MEANS or PROC UNIVARIATE.
AVG|MEAN: arithmetic mean or average of values
COUNT|FREQ|N: number of non-missing values
MAX: largest value
MIN: smallest value
NMISS: number of missing values
STD: standard deviation
SUM: sum of values
All Rights Reserved, Duke Medicine 2007
Summary functions Basic syntax
proc sql;
create table mean_age as
select mean(age) as mean_age
from sashelp.class;
quit;
------------------------------------------------proc univariate noprint data=sashelp.class;
var age;
output out=mean_age mean=mean_age;
run;
All Rights Reserved, Duke Medicine 2007
Summary functions Results
PROC SQL
mean_sql
13.3158
-----------------------------------------------------------PROC UNIVARIATE
mean_uni
13.3158
All Rights Reserved, Duke Medicine 2007
Summary functions Using GROUP BY
proc sql;
create table mean_sql as
select sex, mean(age) as mean_sql
from sashelp.class
group by sex;
quit;
------------------------------------------------proc univariate noprint data=sashelp.class;
class sex;
var age;
output out=mean_uni mean=mean_uni;
run;
All Rights Reserved, Duke Medicine 2007
Summary functions Results
PROC SQL
Sex
mean_sql
F
13.2222
M
13.4000
------------------------------------------------PROC UNIVARIATE
Sex
F
M
All Rights Reserved, Duke Medicine 2007
mean_uni
13.2222
13.4000
Summary functions CAUTION!
Be careful that you do not confuse a SQL summary function with a
SAS function! If you list more than one variable within the
parentheses then it is a SAS function.
proc sql;
select mean(height,weight) as mean
from sashelp.class;
Quit;
This gives you the mean of the height and the weight for each
observation in the dataset.
All Rights Reserved, Duke Medicine 2007
Summary functions CAUTION! (contd)
If you want mean height and mean weight across all
observations, then
proc sql;
select mean(height) as mean_height, mean(weight)
as mean_weight
from sashelp.class;
Quit;
All Rights Reserved, Duke Medicine 2007
Macro variables
proc sql noprint;
select mean(age)
into :mean_age
from sashelp.class;
quit;
%put MEAN_AGE = &mean_age;
Results in log:
MEAN_AGE = 13.31579
All Rights Reserved, Duke Medicine 2007
Multiple macro variables
proc sql noprint;
select mean(age), min(age), max(age)
into :mean_age, :min_age, :max_age
from sashelp.class;
quit;
%put MEAN_AGE = &mean_age / MIN_AGE = &min_age / MAX_AGE =
&max_age;
Results in log:
MEAN_AGE = 13.31579 / MIN_AGE =
All Rights Reserved, Duke Medicine 2007
11 / MAX_AGE =
16
Macro variable tip (SAS v9.3 and up)
Remove leading spaces (call symputx)
proc sql noprint;
select mean(age), min(age), max(age)
into :mean_age trimmed, :min_age trimmed,
:max_age trimmed
from sashelp.class;
quit;
%put MEAN_AGE = &mean_age / MIN_AGE = &min_age / MAX_AGE =
&max_age;
Results in log:
MEAN_AGE = 13.31579 / MIN_AGE = 11 / MAX_AGE = 16
All Rights Reserved, Duke Medicine 2007
Macro variable lists
proc sql noprint;
select age
into :ages
separated by ", "
from sashelp.class;
quit;
%put AGES = &ages;
Results in log:
AGES = 14, 13, 13, 14, 14, 12, 12, 15, 13, 12, 11, 14, 12, 15,
16, 12, 15, 11, 15
All Rights Reserved, Duke Medicine 2007
Resources and References
SAS website:
http://support.sas.com/documentation/cdl/en/sqlproc/
65065/HTML/default/viewer.htm#n1oihmdy7om5rmn
1aorxui3kxizl.htm
Lex Jansen website: www.lexjansen.com
Top 10 Most Powerful Functions for PROC SQL
Ready To Become Really Productive Using PROC
SQL?
All Rights Reserved, Duke Medicine 2007
Summary functions using HAVING clause
You have a dataset with multiple records per
PATIENT, each with different DATE values. You
want to select the most recent date per patient.
patient
12345
12345
12345
23456
23456
23456
23456
All Rights Reserved, Duke Medicine 2007
date
15JUN2012
18SEP2013
19JUN2014
01FEB2011
03MAR2012
15FEB2013
08FEB2014
Summary functions using HAVING clause
(contd)
proc sql;
select patient, date
from ds1
group by patient
having date=max(date);
quit;
patient
date
------------------12345 19JUN2014
23456 08FEB2014
All Rights Reserved, Duke Medicine 2007
Querying SAS Views (proc contents)
Dictionary tables (information about the dataset, e.g.
creation date, number of observations)
Dictionary columns (information about the variables
in the dataset, e.g. length, format, label)
All Rights Reserved, Duke Medicine 2007
Querying SAS Views Example 1
Get dataset creation date and write it to a macro
variable (to use in the file name, for example).
proc sql noprint;
select datepart(crdate) format=date9.
into :datadate
from dictionary.tables
where libname=SASHELP and memname=CLASS;
quit;
%put &datadate;
All Rights Reserved, Duke Medicine 2007
Querying SAS Views Example 2
Figure out whether a variable is numeric or character.
proc sql;
select type
from dictionary.columns
where libname=SASHELP and memname=CLASS and
name=Age;
quit;
Results:
Column
Type
-----num
All Rights Reserved, Duke Medicine 2007