stata notes

The document provides a comprehensive guide on using STATA for data analysis, covering data file formats, loading data, and basic commands for data manipulation and statistical analysis. It explains how to perform regression analysis, create programs for batch processing commands, and manage output files. Additionally, it includes a command appendix for generating variables, finding correlations, and conducting hypothesis testing.

Uploaded by

Vinci Chan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

stata notes

Uploaded by

Vinci Chan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Basics of STATA

November 5, 2017

1 Data …les
Variables within a data set are typically organized in columns, while rows
represent di¤erent observations of a given variable. An important feature of
data sets is their format. Our data sets will come either in the ASCII (text)
format or in STATA format. Both formats are compatible with STATA.
Data can either be stored in a separate …le - which we will call DATA or
typed in when using STATA in the interactive mode. Obviously, we won’t
be typing in long data sets each time we want to analyze them, so we will
prefer to store our data in a separate …le. In STATA, text format data …les
have the su¢ x .RAW while STATA format data …les will bear the su¢ x,
.DTA (text format data sets may bear another su¢ x, such as .TXT). So
assume that we have a data …le, named either DATA.RAW or DATA.DTA.

2 Loading data into STATA

Launching STATA from the Windows menu allows you to enter the interac-
tive mode of the program, which means that you can type commands which
will be executed one by one. The prompt ’.’ (dot) indicates that you are
within STATA. The …rst thing you will have to do is to enter the data in
STATA’s memory:
- If the data are in text (ASCII) format, the command to perform this
is:
INFILE VAR1 VAR2 USING c:npathnDATA

1
where VAR1 and VAR2 (and possibly VAR3...) are names you will give
to the variables (columns) which make up DATA. You must specify drive
(in this example c:) and the path to the directory where the DATA …le is
stored (here path). The maximum length of a name is 8 characters.
- If the data are in STATA format, the command is:
USE c:npathnDATA
Unlike text …les, STATA format data …les already contain variable names,
so you should not re-specify these. You can create a STATA format data
…le from a text format …le by …rst loading the text format data using the
INFILE command, and then typing:
SAVE c:npathnDATA
This will create a …le called DATA.DTA in your directory.

3 Getting started with data analysis

We are now ready to start analyzing our data set. The …rst thing we might
want to do is to make sure we have loaded the right …le, and to get a rough
idea of its components. A convenient way to obtain this information is the
following command:
DESCRIBE
This gives you some information on the data which STATA has in its
memory (number of variables, number of observations, names of variables,
etc...).
We may also want to view the data directly on screen: LIST VAR1 VAR2
displays the variables named VAR1, VAR2.
LIST displays all the variables in STATA’s current memory.
LIST IN 1/10 displays the …rst 10 observations of all the variables in
the data set.
Going yet further, we may be interested in summary statistics about the
data:
SUMMARIZE provides statistics such as mean, standard deviation of the
variables in STATA’s memory.
Other potentially useful commands are the following:

2
SORT VAR1 reorganizes the data in such a way that VAR1 will appear
in ascending order. For example, if we have a sample of individuals, we may
want to organize our data in ascending order of their income or education
levels. Never use this when dealing with time series !
TABULATE VAR1 VAR2 provides frequency tables; if VAR1 is an age
group and VAR2 is the education level, it will tell you how many individuals
of each age group have a given education level.
CORRELATE VAR1 VAR2 VAR3 provides the autocorrelation table of the
listed variables.
GRAPH VAR1 VAR2 provides a scatter plot of the data with VAR2 on
the x-axis and VAR1 on the y-axis.
GENERATE NEWVAR=VAR1+VAR2 generates a new variable called NEW-
VAR, which is the sum of VAR1 and VAR2, and stores it in the sessions
memory. Of course, you can create any combination of any number of
variables using +, -, *, / etc...
DROP VAR1 removes VAR1 from the sessions memory.
REPLACE VAR1=VAR1+VAR2 replaces the values of VAR1 with the
sum of its old values plus VAR2. This is equivalent to (1) GENERATE
NEWVAR=VAR1+VAR2; (2) DROP VAR1; (3) RENAME NEWVAR VAR1.
You can combine these commands with logical quali…ers such as if, &
(and) and or. For example, you can use:
LIST VAR1 IF VAR1>100 & VAR2==1 which will display VAR1 when-
ever the value of this variable is greater than 100 and the value of VAR2 is
equal to 1. Note that a single equal sign corresponds to a variables name
(as in the GENERATE command) while two equal signs are needed when
dealing with a given value of a variable (as in VAR2==1).
More commands are of course available; you can get a complete list by
typing HELP under STATA, or by consulting the users manual. The best
way to learn STATA is through practice.

3
4 Introduction to statistical analysis using STATA
Least squares regression is one of the essential statistical methods we will
be studying in the course. A discussed in the …rst lecture, this consists of
minimizing the vertical distance between the scattered data points and the
line we are trying to …t through them. Suppose that we wish to predict
VAR1 using VAR2 and VAR3:
VAR1 = 1 + 2 VAR2 + 3 VAR3 +"
To do this, and to produce many of the useful statistics that go with it,
STATA has a very convenient command:
REGRESS VAR1 VAR2 VAR3
You will immediately obtain estimated values for 1; 2 ;and 3, as well as
their standard errors, con…dence intervals and other useful statistics which
have been or will be introduced in class.
To obtain …tted values or regression residuals from this regression, type:
PREDICT FITTED stores the …tted values from the regression in a data
column (variable) called FITTED, and keeps it in memory.
PREDICT RESID, RESIDUALS stores the residuals from the regression in a
data column (variable) called RESID, and keeps it in memory.

5 Writing programs and getting output …les

The interactive mode requires you to enter commands one at a time, and
to get results one at a time. This may be inconvenient if you have many
commands to run, or if you make mistakes which require you to type a
whole chain of commands over again. In this case, you may prefer to write
a program which contains all the commands in order of desired execution.
You can then correct, modify and run the program whenever need arises.
The easiest way to write programs is the following: Write the program
in the Windows editor (notepad or any text editor), one command on each
line, and save the program …le in a convenient directory. This …le must bear
the su¢ x .DO. For example, it can be called PROGRAM1.DO.
An example of a possible STATA program is the following:

4
INFILE VAR1 VAR2 VAR3 USING c:npathnDATA
SUMMARIZE
GENERATE VAR4=VAR1-VAR2
REGRESS VAR1 VAR2 VAR4
PREDICT FITTED
LIST FITTED IN 1/20
Save this program under the name PROGRAM1.DO, then enter STATA
and run the program by going to the …le menu and choosing DO. This will
perform all the commands of PROGRAM1 in the order you typed them in,
and provide you with the output on the output window.
Another convenient tool is to store the output of your work (regression
results, statistics, transformed data ...) in an output …le which can then be
printed. Sometimes, the output will be too long to …t on a single screen (as
in our example above), so it is convenient to store it in a text (ASCII)
…le, which you can later view and print. To do this, you can start your
session or your program with:
LOG USING c:npathnOUTPUT
This will create an output …le in text format, called OUTPUT.LOG,
stored in the subdirectory from which you invoked STATA. This …le will con-
tain all the results from your session, as well as the commands you typed in
the interactive mode. To access this …le, simply type EDIT OUTPUT.LOG
at the DOS prompt, or use a word processor. Note that the LOG command
will not keep track of graphs (which you can print directly from STATA us-
ing the PRINT GRAPH command). Note also: LOG CLOSE stops logging
a session and closes the text …le containing the log.
LOG OFF temporarily stops the logging session without closing the log
…le.
LOG ON resumes logging on the open log …le.
One more thing ! You must save your data before STATA lets you exit:
SAVE c:npathnDATA will save the data in the sessions memory (including
new variables, transformed variables...) in a .DTA …le (STATA format), on
the c: drive (or any other drive you specify) in the path directory. You
cannot use an existing name for the data …le, so it is a good idea to delete

5
useless data …les from your subdirectory, and keep only the initial data
set and/or useful transformed data sets. Its also a good idea to delete old
output …les, especially if the LOG command is written in your program (again,
STATA won’t overwrite).

6 Command appendix
Generate a log variable

– gen VARNAME=ln(var)

Generate a variable with constant value

– gen VARNAME = VALUE

Store residuals from regression

– regress x1 x2
– predict VARNAME, residuals

Find correlation

– correlate x1 x2

Estimate and save the predicted value

– regress y x1
– predict VARNAME, xb

Save the coe¢ cient

– regress y x1
– gen VARNAME=_b[x1]

Test signi…cance level

6
– Method 1: regress y x1 x2, level(90)
– Method 2:
regress y x1 x2
lincom x2;level (90)
Note: type help lincom in stata to get more information.

Hypothesis testing

– regress y x1 x2
– test x1 = a
– Note: type help test in stata to get more information.

Sas Handbook: By: Luis Montes
No ratings yet
Sas Handbook: By: Luis Montes
20 pages
Coding Theory
100% (1)
Coding Theory
297 pages
Stata Application Part I
No ratings yet
Stata Application Part I
27 pages
Introduction To Stata 2012 - Econ4150
No ratings yet
Introduction To Stata 2012 - Econ4150
17 pages
A quick introduction to STATA
No ratings yet
A quick introduction to STATA
14 pages
Stata
No ratings yet
Stata
6 pages
STATA Commands
100% (2)
STATA Commands
35 pages
STATA Training Session 1
100% (3)
STATA Training Session 1
46 pages
Stata Prirucnik
No ratings yet
Stata Prirucnik
75 pages
Data Analyses Stata Manual NYTS
No ratings yet
Data Analyses Stata Manual NYTS
40 pages
STATA Notes 2022
No ratings yet
STATA Notes 2022
25 pages
SAS Tips
No ratings yet
SAS Tips
34 pages
software material
No ratings yet
software material
13 pages
Stata0 2008 Quique Moral Benito
No ratings yet
Stata0 2008 Quique Moral Benito
8 pages
SAS Introduction To Time Series Forecasting-Libre
No ratings yet
SAS Introduction To Time Series Forecasting-Libre
34 pages
MGMT 469 Helpful Stata Commands
No ratings yet
MGMT 469 Helpful Stata Commands
8 pages
Applied Econometrics Using Stata
100% (2)
Applied Econometrics Using Stata
100 pages
interview3
No ratings yet
interview3
5 pages
Stat A Guide
No ratings yet
Stat A Guide
16 pages
Zorn - Stata 4 Dummies - 2007
No ratings yet
Zorn - Stata 4 Dummies - 2007
12 pages
Stata Tutorial
No ratings yet
Stata Tutorial
63 pages
Sas Interview Questions
No ratings yet
Sas Interview Questions
15 pages
SAS Interview Questions
No ratings yet
SAS Interview Questions
4 pages
Ocean Technologies, Hyderabad.: SAS Interview Questions:Base SAS
No ratings yet
Ocean Technologies, Hyderabad.: SAS Interview Questions:Base SAS
5 pages
SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)
No ratings yet
SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)
8 pages
Very Basic
No ratings yet
Very Basic
5 pages
A Short Guide To Stata 15: Version: 20-9-2021, 22:10
No ratings yet
A Short Guide To Stata 15: Version: 20-9-2021, 22:10
17 pages
Best programming language
No ratings yet
Best programming language
23 pages
Command Window: Stata Results Window: Variables Window: Review Window
No ratings yet
Command Window: Stata Results Window: Variables Window: Review Window
3 pages
(Statistical Analysis System) : By: Kirtikrushna
No ratings yet
(Statistical Analysis System) : By: Kirtikrushna
129 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
Stata Review
No ratings yet
Stata Review
9 pages
Programming With The KEEP, RENAME, and DROP Data Set Options
No ratings yet
Programming With The KEEP, RENAME, and DROP Data Set Options
13 pages
An Introductory SAS Course
No ratings yet
An Introductory SAS Course
17 pages
Howtouser: 1 What Is R
No ratings yet
Howtouser: 1 What Is R
6 pages
Stata notes by Dr NK Singh
No ratings yet
Stata notes by Dr NK Singh
15 pages
Stata 1
No ratings yet
Stata 1
24 pages
2015 SPSS Exercise
No ratings yet
2015 SPSS Exercise
69 pages
TUTORIAL I: SAS Basics and Data Management I. SAS Basics: SAS (Statistical Analysis Software)
No ratings yet
TUTORIAL I: SAS Basics and Data Management I. SAS Basics: SAS (Statistical Analysis Software)
13 pages
Advanced SAS Interview Questions and Answers
No ratings yet
Advanced SAS Interview Questions and Answers
11 pages
Introduction To Stata and Data Management
No ratings yet
Introduction To Stata and Data Management
30 pages
Topic 3-SPSS and STATA
100% (1)
Topic 3-SPSS and STATA
73 pages
Stataguide
No ratings yet
Stataguide
17 pages
Wooldridge 2002 Rudiments of Stata
No ratings yet
Wooldridge 2002 Rudiments of Stata
11 pages
STATA Capacity Building March 8
No ratings yet
STATA Capacity Building March 8
15 pages
The Basics of STATA_2020
No ratings yet
The Basics of STATA_2020
15 pages
Interview
No ratings yet
Interview
34 pages
26 Run Cody
No ratings yet
26 Run Cody
5 pages
Stataguide
No ratings yet
Stataguide
16 pages
SPSS Step-by-Step Tutorial: Part 1
No ratings yet
SPSS Step-by-Step Tutorial: Part 1
50 pages
What Is Stata?
No ratings yet
What Is Stata?
16 pages
Workshop Series: Contents
No ratings yet
Workshop Series: Contents
10 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Base SAS Interview Questions You'll Most Likely Be Asked
From Everand
Base SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
CR 15-2
No ratings yet
CR 15-2
3 pages
Ermens, J.P.M. 1
No ratings yet
Ermens, J.P.M. 1
73 pages
Conference Agenda The 1st VOICe 2023
No ratings yet
Conference Agenda The 1st VOICe 2023
19 pages
Turnstile Solution - Sample Only
No ratings yet
Turnstile Solution - Sample Only
18 pages
Role of Teachers Inside The Classroom
No ratings yet
Role of Teachers Inside The Classroom
5 pages
AutoCad P&ID Assemblies
No ratings yet
AutoCad P&ID Assemblies
4 pages
The Hazards of Fired Heater Flooding, Dig. Ref Q4-2020 JZinkHC
100% (1)
The Hazards of Fired Heater Flooding, Dig. Ref Q4-2020 JZinkHC
6 pages
Grade 7 (Biotic and Abiotic Passage)
No ratings yet
Grade 7 (Biotic and Abiotic Passage)
2 pages
Laplace Transform Note 4
No ratings yet
Laplace Transform Note 4
16 pages
Northern India:: Heralding The Next Chapter of Growth and Development
No ratings yet
Northern India:: Heralding The Next Chapter of Growth and Development
56 pages
Operating Manuel FPD350
No ratings yet
Operating Manuel FPD350
36 pages
Report Project Stopwatch
97% (36)
Report Project Stopwatch
21 pages
Bone Specific Alkaline Phophatase
No ratings yet
Bone Specific Alkaline Phophatase
6 pages
Project Proposal
No ratings yet
Project Proposal
11 pages
MOHAMED AbdelAziz Resume
No ratings yet
MOHAMED AbdelAziz Resume
3 pages
Awl
No ratings yet
Awl
16 pages
PPT-4th Q Prob and Stats (PAGAL)
No ratings yet
PPT-4th Q Prob and Stats (PAGAL)
101 pages
G12 CHAPTER 5 First Half
No ratings yet
G12 CHAPTER 5 First Half
6 pages
Leadership and Management at Ford Vietnam Company
No ratings yet
Leadership and Management at Ford Vietnam Company
8 pages
VevoxReport Final Math Grade 7 Live Quiz For 22 Dec 2023-12!22!19-48+0530
No ratings yet
VevoxReport Final Math Grade 7 Live Quiz For 22 Dec 2023-12!22!19-48+0530
22 pages
LAS Stat Prob Q4 Wk2 Test-on-Population-Mean
No ratings yet
LAS Stat Prob Q4 Wk2 Test-on-Population-Mean
10 pages
42 Sales Page
No ratings yet
42 Sales Page
18 pages
Armfield PCT51 Datasheet V2a - Web
No ratings yet
Armfield PCT51 Datasheet V2a - Web
2 pages
Boiler Hydro Test Field Side Check List
No ratings yet
Boiler Hydro Test Field Side Check List
1 page
Marmara University, Faculty of Engineering Electrical and Electronics Engineering Fall 2020, EE3082 - Communication Engineering
No ratings yet
Marmara University, Faculty of Engineering Electrical and Electronics Engineering Fall 2020, EE3082 - Communication Engineering
25 pages
EIC 6 Practice Exercises Unit 9
No ratings yet
EIC 6 Practice Exercises Unit 9
7 pages
Where can buy Standard Specifications for Transportation Materials and Methods of Sampling and Testing and AASHTO Provisional Standards 2009 Edition He Ji-Huan He ebook with cheap price
100% (15)
Where can buy Standard Specifications for Transportation Materials and Methods of Sampling and Testing and AASHTO Provisional Standards 2009 Edition He Ji-Huan He ebook with cheap price
60 pages
Five Ideas For How Professors Can Deal With GPT3
No ratings yet
Five Ideas For How Professors Can Deal With GPT3
7 pages
Forecasting IT Security Vulnerabilities - An Empirical Analysis
No ratings yet
Forecasting IT Security Vulnerabilities - An Empirical Analysis
24 pages