100% found this document useful (1 vote)
161 views

Proc SQL

Proc SQL can be used to create new variables, merge datasets, and perform other data management tasks. It allows creating a new table with additional variables like means or merging two datasets on a common variable. One limitation is that Proc SQL can only merge up to 16 tables at a time while the normal SAS limit is 100 tables. Exercises demonstrate using Proc SQL to merge two datasets on a common ID variable.

Uploaded by

hima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
161 views

Proc SQL

Proc SQL can be used to create new variables, merge datasets, and perform other data management tasks. It allows creating a new table with additional variables like means or merging two datasets on a common variable. One limitation is that Proc SQL can only merge up to 16 tables at a time while the normal SAS limit is 100 tables. Exercises demonstrate using Proc SQL to merge two datasets on a common ID variable.

Uploaded by

hima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Proc SQL

Proc SQL can be used either to create new variables or to merge datasets.

Proc SQL; /* create mean of a variable */


CREATE TABLE <new dataset> AS
SELECT *, mean(<var-name>) AS <new var-name>
FROM <existing dataset>;
Quit;

Proc SQL; /* merge two datasets by a variable */


CREATE TABLE <new dataset> AS
SELECT * FROM <dataset 1>, <dataset 2>
WHERE <dataset 1>.<Var 1> = <dataset 2>.<Var 2>;
Quit;

Note: One limitation of PROC SQL is that it can merge only up to 16 tables
(datasets) at a time. The normal limit in SAS is 100. Also note that if you
use PROC SQL to merge two or more datasets, you need not sort any
dataset.

Exercise
Ex 1: Given two datasets A and B in the folder, use PROC SQL to merge
these two datasets to create a new dataset, D by a common I variable.

Comparing Proc SQL and Datastep

Function Non-SQL Base SAS PROC SQL


STATEMENT STATEMENT/CLAUSE
OPTION
Create a Table DATA CREATE TABLE
Create a table from INPUT INSERT
raw data
Add columns Assignment SELECT.as.
statement
Drop columns DROP SELECT
KEEP
Add rows OUTPUT INSERT INFO
Delete rows WHERE DELETE FROM
IF/THEN DELETE
Sorting PROC SORT ORDER BY
De-dupe records NODUPLICATE DISTINCT
Establish a LIBNAME CONNECT
connection with a DISCONNECT
RDBMS
Send a RDBMS- CONNECTION TO
specific non-query
SQL statement to a
RDBMS
Concatenating SET OUTER JOIN
Match merging MERGE/SET FROM
BY LEFT JOIN
IF in1 RIGHT JOIN
IF in2 FULL JOIN
IF in1 &in2 WHERE/ON
Rename column RENAME AS
Displaying resultant PROC PRINT SELECT
table
summarizing PROC/BY GROUP BY

10.3 SQL Examples


Using dataset FLTATTND
List all information in the table.
proc sql;
select * from trg.fltattnd;
quit;

Create a new table with empid and salary information.


proc sql;
create table new as
select empid, salary from trg.fltattnd;
quit;

SAS features work with sql


proc sql;
create table new as
select * from trg.fltattnd (drop=annivmo);
quit;

SQL Examples
Removing duplicates
Proc sql;
create table new as
select distinct jobcode from trg.flttattnd;
Quit;

Sorting using sql


Proc sql;
create table new as
select * from trg.fltattnd
order by jobcode, salary descending;
quit;

Subsetting and computing new fields

Proc sql;
create table new as
select salary, salary*0.05 as tax, hiredate format=date7. from trg.fltattnd
where salary > 10000
order by salary, hiredate descending;
quit;

(any of the data step functions can be used to create new variables except
sound, dif and lag.)

SQL Examples

Proc sql;
create table new as
select count(*) as n, round(mean(salary),0.01) format=6.2 as salarymean
from trg.fltattnd;
quit;
Proc sql;
create table new as
select jobcode, count(jobcode) as n, hiredate format=date7., salary,
max(salary) as salarymax, round(salary/(calculated salarymax)*100,0.01)
format 6.2 as salpct
from trg.fltattnd
group by jobcode;
quit;

Note: Summary functions are restricted to the SELECT and HAVING


clauses only.
All the above functions will be calculated for each jobcode group not for
each observation

SQL Examples
Proc sql;
create table new as
select jobcode, count(jobcode) as n, hiredate format=date7., salary,
max(salary) as salarymax
from trg.fltattnd
group by jobcode
having salary=calculated salarymax
order by calculated salarymax;
quit;

Proc sql;
create table new as
select lastname, case when salary >30000 then high sal when salary < 20000
thenlow salary else errorend as saltype length=8 from trg.fltattnd order
by salary;
quit;

Note: The case expression can be used to create a new variable that is a
re-categorization of the values of another variable.

10.4 String Operations


SQL includes a string-matching operator for comparisons on
character strings. Patterns are described using two special
characters:
percent (%). The % character matches any substring.
underscore (_). The _ character matches any character.
Find the names of all students whose course name includes the
substring EE.

select name
from student
where course like %EE%

SQL supports a variety of string operations such as


concatenation (using ||)
converting from upper to lower case (and vice versa)
finding string length, extracting substrings, etc.

10.5 Nested Subqueries


SQL provides a mechanism for the nesting of subqueries.
A subquery is a select-from-where expression that is nested within
another query.
A common use of subqueries is to perform tests for set membership,
set comparisons, and set cardinality.
Example :Find all customers who have both an account and a loan at the bank.

select distinct customer-name


from borrower
where customer-name in (select customer-name from depositor)

10.6 Merging Using SQL


10.6.1 Cartesian Join
A Cartesian join is when you join every row of one table to every row
of another table. You can also get one by joining every row of a table
to every row of itself.

Example. Run the following code on the dataset forsql and note the
results
proc sql;
create table matrix as select * from
(select ans as ans0001 from trg.forsql where var='0001'),
(select ans as ans0006 from trg.forsql where var='0006'),
(select ans as ans0003 from trg.forsql where var='0003')
order by ans0001, ans0006, ans0003;
quit;

Cartesian Joins allow for combining tables to self.


Ex. Find the names of all branches that have greater assets
than some branch located in Brooklyn.

Proc Sql;
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and
S.branch-city = Brooklyn

10.6.3 Outer UNION


OUTER UNION is equivalent to concatenate of Base SAS. However, columns
with the same name are in separate columns in the output window .
Example- the following query expression concatenates the A and B tables
but does not overlay like-named columns shows the result.
proc sql;
title A and B: OUTER UNION';
select * from trg.a outer union select * from trg.b;
quit;

10.6.4 Merging Files with Different Names For Variables


Ex 4. There are various departments in IIT. There are two different
attendance records formats available. We are required to create a single
file having information from both the departments.

Record Layout 1 Record Layout 2


Year Year
Cno Course_num
Rnum
S1-S10 S1-S10

You are required to convert the source variables having analogous


information and copy information from these two files in a single file using
SQL.

You might also like