Mengistu Yismaw (MSc.)
Department of Economics
Debre Markos University (Burie Campus)
Email:
[email protected]Lorie ra mics
This course: objective
After learning this course, students will be able to:
Develop skills in using advanced statistical software
Provide practical experience related to data management and analysis
Develop skills related to modern univariate and multivariate statistical analysis
Develop skills to deal with time series and panel data
Develop skills in summarizing information directed to practical decision making
VVvVVWVV Vv
Able to compute and generate results as well as interpret the resultsChapter Outline
Introduction to STATA
What is STATA,
Basic features of Stata
Do file
Opening do fle
Running commands from the do-fle
Log file
Opening log file
ata Management
Importing data in to STATA
© labelling variables
© labelling values
(© Getting to know your data: browse, edit lst, codebook, Summarizing
* Generating and transforming a variable: natural logarithm, sum/dlflerence.
© Dealing with missing values
(© Some of the Stata logical and relational operators
©. Brief Overview of Main SPSS Features
OE TT TT TTIntroduction to STATA
What is STATA?
Q The word stata is a combination of the words statistics and data
Q Stata is a complete, integrated and powerful data analysis software
package that is capabilities for:
= Data management and manipulation
= Statistical analysis
= Data visualization
TTY en ne ee ee en ee Tn ryLei
U Tool bar: Contains buttons that provide quick access to Stata’s more commonly used features
U Working directory: shows from where the stata load and save files
++ Use the command cd to change the working directory (It will be discussed in detail later)
U Variable window: Once you have data loaded, variables in the dataset will be listed with their labels in
the order they appear on the dataset
U Properties window: Displays variable and dataset properties.
0 Command window: You can enter commands directly into the Command window
Note: Stata don't accept uppercase letters!
O Result window: Contains all the results from performing analyses, e.g. syntax, tables, charts etc
Note: Double-clicking on a variable name will cause it to appear in the Command Window
OTT PIII
TOTTICeti)
Review window: The review window lists
previously issued commands
*% Successful commands will appear black. slo SC sesnensie
{do "c\Ueesymenate
5 taba detine Gonder
“+ Unsuccessful commands will appear red an
* Double-click a command to run it again
ra
STIMenu Bar:
The menu bar shows the stata menu used for data management, visualization and analysis.
The following is a description of the Stata menus:
File: Allows to opens and saves Stata data files; opens and closes Log files; saves or prints graphs; imports and exports
files, exits Stata.
Edit: Allows you to copy output to a word processor or other application.
Data: Helps to open the data editor and data browser; Summarize data; Label datasets and variables; Replace and
generate data.
Graphics: Contains all of Stata’s graphing tools.
Statistics: Allows data summaries and all statistical tests.
User: Place to store any user-generated commands.
Window: Controls the windows opened in Stata.
Help: A good resource if you have questions about how to use Stata.O Stata do-files are text files where users can store and run their
commands for reuse, rather than retyping the commands into
the command window
= Reproducibility
= Easier debugging and changing commands
QO We recommend always using a do-file when using Stata
QO The file extension .do is used for do-files
Peres TIT I
‘T DEBRE MARKOS UNIVERSITY(DAAU) >Q Click on the pencil and
the toolbar
Q Click filES do > select appropriate
folder to save the do-file > open
Q Or use the command doedit and then
press enter
® You will see the do-file open.
Ones Eo
PIIos
PrpeCenc ony
ens,
Comments, which are not executed, are usually preceded by *
or /{and are
O The do-file editor colors Stata commands blue a et
Q Words in quotes (file names, string values) are colored “red
PTT rr
TITTIES y TRunning commands from the do-file
Ee
Q Highlight the command and execute (click) the
rightmost icon on the do-file editor toolbar
Q Highlight the command and hit:
= Ctrl-D for windows
= Shift+Cmd+D for Mac
Notes:
> Multiple commands can be selected and executed
> Stata will normally assume that a newline signifies the end of
command
>You can extend commands over multiple lines by placing ///
at the end of each line except for the last
Make sure to put a space before ///O Log-file helps to create a copy or save of everything
that is sent to the results window, with the exceptiog
of graphs.
O Click file> log + begin select appropriate folder to
save the log-file save
Although it will not save them in the same way a do-
does, it also retain your commands
Q Anew log-file starts with begin a log file
and ends with close the log-file.
TT erryImporting data in to STATA
OBefore new data can be loaded, memory must be
cleared.
Any dataset from the memory must be removed
Because Stata will only hold one data set in memory at a
time.
‘Syntax: clear or clear all
CO change or adjust directory
® We have to told the stata in which folder we are working
on
syntax: ed “paste folder location”
STENT IMT "7Import exeel data
After you adjust the directory to where you stored the data, nin the following
‘command
Syntecx: import excel “Excel file name”, sheet ("sheet name”) firstrow clear
> The “firstiow clear” option helps to teat the frst row as a variable 5 saeetca stare pplication toner
Save data in stata format : :
Syne: save “data file name", replace :
> The replace option will overwrite an existing file withthe same name
> Data files stored in State's format are known as ta files
Use the saved data in stata format
‘Syntax: use “datafile name”, clear
> The ‘clear’ option helps to remave the any opened dataset from the s
Notes:
1 Double clicking on a cts file in Windows will open up a the data in a now
instance of Stata (not in the current instance)
Hence, be careful of having many Statas open
rrr aT Soe
TTUsing the menu to import EXCEL data oo
Q To import data using the stata menu follow the
following steps
Select File -> Import -> Excel spreadsheet (xls, xlsx)
or “Text data(delimited, *.csy,..)"
> tick “import first row as variable names” -> ok
‘CHAPTER ONE: INT'N TO SOFTWARES AND DATA MAGT DEBE MARKDOS UNIVE!Preparing data for import
QTo get data into Stata cleanly, make sure the data in your
“ Rectangular
v Each column (variable) should have the same number of rows (observations)
Y No graphs, sums, or averages in the file
ran Fre rere TTY TITTYGetting to know your data
A. Browse or Edit the dataset w . . *
nee the data are loaded, we can view the dataset as a gsr Boo anantst
spreadsheet using the command browse or edit
Syntax: browse or edit
In other way:
> Click on the magnifying glass with spreadsheet icon for browse
Click on the pencil with spreadsheet icon for edit
Note: browse helps to see the data, while edit allows modifying
the data
Using both techniques you will se the da
> Black columns are numeric
> Red columns are strings, and
> Blue columns are numeric with string labels
ry
Ce tT "Q The list command prints observai to the Stata
console
simply issuing “List” Will list all observations and
variables
> Not usually recommended except for small datasets
Q. Specify variable names to list only those varjablés
Example: list Yield age fragment
ITT oT rylabelling variables
Q Label allows you to provide the variable with a longer, more
complete description
Q The variable label will sometimes be used in output and often in
graphs
Syntax: label variable var.name “label of the variable”
Examples:
>» label variable Yield "Maize Output per total land size cultivated”
> label variable age “Age of respondents"
> label variable fragment "Number of plots"
TT Cr SOT
OT)Describe
Ushort variable names make coding more efficient but
can obscure the variable’s meaning
Hence, use ‘describe or des’ command to know full
meaning of the variable
Notes:
> Simply issuing “des or describé will describe all
observations and variables
> You can specify variable names to describe only
those variables
Example: list Yield age
Sy oe) 1C. labelling values
a
a
a
>
>
>
Many stati
variables
al software including stata only accept numeric
labelling values used give numerical values for the nominal or
ordinal variables
To create a new set of value labels use label define
Syntax: label define varname valuel “labelname1” value2,
labelname2"..
Examples:
label define Gender 0 "Female" 1 "Male*
label define EducLevel 0 “iliterate” 1 “Primary” 2 “Secondary ~ 3 “Post-Secondary
label define Credit Dummy 0"
lo 1 "Yes
ar Pere
SIOS
Codebook
Oo
Used to inspect more about the variable/s
including:
Labelling variable
Value label
Type of the variable i.e. string or numeric
Missing value
SNK NK
Range, std. dev., percentiles.
Q Specify variable names to want to know more
about the variable
Example: codebook Yield Gender
Keep in mind!
CUTEE, Summarizing
QOThe summarize/sum command
helps to calculate some summary
statistics such as mean, std. d
minimum and maximu
Why?
TT aF. Encode
Helps to change string variable to numer.
Note: when we encode the variable, the stata
automatically may change the value label.
> So, you have to adjust or change the value label
which is appropriate for you
> Know you can see some statist
EducLevel and Credit.
values for Gender,
ITI OTG. Keep and drop
OQ keep helps to keep the variable/s you need, and remove other variables
Syntax: keep Varl var2...
Example: keep Yield age
v
The stata automatically remove all variables except Yield and age
Oo
Drop helps to remove the variable/s you don’t need, and keep other variables
Syntax: drop varl var2...
Example: drop Yield age
v
The stata automatically remove the variables Yield and keep all other variablesH. Generating and transforming a variable
Q Variables often do not arrive in the form that we need
Q Use generate (often abbreviated gen or g) to create variables, usually from
operations on existing variables such as:
Y Sums/differences/products/squares of variables/natural logarithm
CITTO To generate natural logarithm of Yield
gen een rely =
O To generate a sum of two variables
sus oaga,
lgenerate LandFrag-
nd + fragment “
O To generate a difference of two variables : : :
‘generate LandFragement= land —fragment| Bt 7
0 To generate square of the variable
‘gen agesqu= age" age
gen agesqu2= ages?
TST(yn
Missing values
Q Values of the variable can be missed due to various
reasons such as non-response, refusal...
U Missing numeric values in Stata are represented by
Q You can check for missing by testing for equality to
Syntax: count if Varname==.
Example: count if Credit==.
QO When make any estimation or analysis , the anal
will be wrong.
> So, we have to correct the missed values, before
making any estimation.
rrr
TayDealing with missing values
UIf missed vale can't be corrected through call or retum the
questionnaire to the respondent, we can correct the missing
values manually, like:
‘Substitute an Imputed Response: The respondents’
pattern of responses to other questions may be related to one
another and it might be possible to calculate or infer the answer :
to one question from the answer to another.
Suppose the following questionnaire:
1. Did you get credit during the previous cropping season? Yes no
2. Ifyou said yes, for question no. 1, how much?
> Though the respondent didn’t give “answer for Q4, the answér for
Q2 gives a hint for missed value. i.e yes!
ra ITT ET Eee rT) os
ry* Use a global constant: such as Null, unknown, not applicable
etc.
“ Substitute a neutral value: use the mean response to the
variable
+ Ignore/delete the record with missing values i.e
© Case (list) wise deletion G
> Cases or respondents with any missing responses are
discarded from the analysis (eg. delete case 10).
s ote sec %
drop if Credit_Dummy_nt==.
© Pair wise deletion
The variable with any missing responses are discarded from the 3 4 maimey sd a
analysis (ea. delete the varlable Credit_Dummy).
drop Credit_Dummy_nt
v
ry
TITTLE IT EI eres rr)Stata logical and relational operators Examples:
keep if Gender
< less than
> greater than
>= greater than or equal to
<= less than or equal to
& and
| or
! not equal to
drop if age = 50
keep if age |= 50
replace Credit = 500 if Credit
'MENGISTU Y. 'DEBRE MARKOS UNIVERSITY(DMU] aStata documentation
Q For any further help stata has an
excellent documentation
Q Select Help > PDF Documentation’
> Stata will © automatically
generate the PDF
documentation =
— =
TITQ_ Brief Overview of Main SPSS Features
Q Entering and Saving Data in SPSSExercises
Q Suppose you want to make analysis for respondents 65 and older.
What is the syntax (command) used to keep the data for 65 and
older respondents?