0% found this document useful (0 votes)
23 views55 pages

(Ebook) Statistical Programing in SAS by A. John Bailer ISBN 9780367357979, 9780367358006, 0367357976, 036735800X

The document provides information about various ebooks available for download on ebooknice.com, including titles such as 'Statistical Programming in SAS' by A. John Bailer and 'Biota Grow, Gather, Cook' by Jason Loucas and James Viles. It includes links to access these ebooks in different formats like PDF, ePub, and MOBI, and emphasizes instant access to digital products. Additionally, the document outlines the structure and content of the book 'Statistical Programming in SAS', detailing its chapters and topics covered.

Uploaded by

ahyansawdy0i
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views55 pages

(Ebook) Statistical Programing in SAS by A. John Bailer ISBN 9780367357979, 9780367358006, 0367357976, 036735800X

The document provides information about various ebooks available for download on ebooknice.com, including titles such as 'Statistical Programming in SAS' by A. John Bailer and 'Biota Grow, Gather, Cook' by Jason Loucas and James Viles. It includes links to access these ebooks in different formats like PDF, ePub, and MOBI, and emphasizes instant access to digital products. Additionally, the document outlines the structure and content of the book 'Statistical Programming in SAS', detailing its chapters and topics covered.

Uploaded by

ahyansawdy0i
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Download the Full Ebook and Access More Features - ebooknice.

com

(Ebook) Statistical programing in SAS by A. John


Bailer ISBN 9780367357979, 9780367358006,
0367357976, 036735800X

https://ebooknice.com/product/statistical-programing-in-
sas-17390500

OR CLICK HERE

DOWLOAD EBOOK

Download more ebook instantly today at https://ebooknice.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James ISBN
9781459699816, 9781743365571, 9781925268492, 1459699815, 1743365578, 1925268497

https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374

ebooknice.com

(Ebook) Statistical Programming in SAS by A. John Bailer PhD ISBN 9781599946566,


1599946564

https://ebooknice.com/product/statistical-programming-in-sas-5086948

ebooknice.com

(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans Heikne, Sanna
Bodemyr ISBN 9789127456600, 9127456609

https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312

ebooknice.com

(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success) by Peterson's
ISBN 9780768906677, 0768906679

https://ebooknice.com/product/sat-ii-success-math-1c-and-2c-2002-peterson-s-sat-
ii-success-1722018

ebooknice.com
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT Subject Test: Math
Levels 1 & 2) by Arco ISBN 9780768923049, 0768923042

https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-arco-master-
the-sat-subject-test-math-levels-1-2-2326094

ebooknice.com

(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study: the United
States, 1919-41 2nd Edition by Benjamin Harrison ISBN 9781398375147, 9781398375048,
1398375144, 1398375047

https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044

ebooknice.com

(Ebook) Analyzing Environmental Data by Walter W. Piegorsch, A. John Bailer ISBN


9780470012222, 9780470848364, 0470848367, 0470012226

https://ebooknice.com/product/analyzing-environmental-data-1644886

ebooknice.com

(Ebook) Statistical Data Analysis Using SAS: Intermediate Statistical Methods


(Springer Texts in Statistics) by Marasinghe, Mervyn G., Koehler, Kenneth J. ISBN
9783319692388, 3319692380

https://ebooknice.com/product/statistical-data-analysis-using-sas-intermediate-
statistical-methods-springer-texts-in-statistics-55280536

ebooknice.com

(Ebook) Statistical Graphics in SAS: An Introduction to the Graph Template Language


and the Statistical Graphics Procedures by Warren F. Kuhfeld ISBN 9781607644859,
1607644851

https://ebooknice.com/product/statistical-graphics-in-sas-an-introduction-to-
the-graph-template-language-and-the-statistical-graphics-procedures-1683706

ebooknice.com
Statistical Programming in SAS
Second Edition
Statistical Programming in SAS
Second Edition

A. John Bailer
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2020 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-0-3673-5797-9 (Paperback)


978-0-3673-5800-6 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made
to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all
materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future
reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.
copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that
have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface...............................................................................................................................................ix
Acknowledgments....................................................................................................................... xiii
Author..............................................................................................................................................xv

1. Structuring, Implementing, and Debugging Programs to Learn about Data............1


1.1 Statistical Programming................................................................................................. 1
1.2 Learning from Constructed, Artificial Data................................................................2
Processing a Particular Data Set—Extracting Variable Names from a
Column of an Input Data Set.........................................................................................2
Learning More about Unfamiliar Statistical Methods—Linear Mixed
Effects Models..................................................................................................................5
Improving Your Intuition about Statistical Theory— Sampling Distribution
of Means............................................................................................................................8
1.3 Good Programming Practice....................................................................................... 11
Document Your Programs!........................................................................................... 11
Use Meaningful Variable Names................................................................................. 13
Use a Variety of CaSeS in Program Statements......................................................... 14
Indent Program Statements That Naturally Go Together........................................ 14
1.4 SAS Program Structure................................................................................................. 15
1.5 What Is a SAS Data Set?................................................................................................ 21
1.6 Internally Documenting SAS Programs.....................................................................22
1.7 Basic Debugging............................................................................................................ 23
1.8 Getting Help................................................................................................................... 27
Using Help in SAS......................................................................................................... 27
Getting Help from a Web Browser Search.................................................................. 29
1.9 Exercises.......................................................................................................................... 29

2. Reading, Creating, and Formatting Data Sets................................................................. 31


2.1 What Does a SAS DATA Step Do?............................................................................... 31
2.2 Reading Data from External Files............................................................................... 33
Reading Data Directly as Part of a Program—Anyone for Datalines?..................34
Reading Data Sets Saved as Text—INFILE Can Be Your Friend (PROC
IMPORT Too!)................................................................................................................. 38
Sometimes, Variables Are in Particular Columns or in Particular Formats..........40
2.3 Reading CSV, Excel, and TEXT Files........................................................................... 41
2.4 Temporary versus Permanent Status of Data Sets....................................................43
2.5 Formatting and Labeling Variables............................................................................. 46
Using Formats to Read and Display Variable Values............................................... 46
Internal Representations and Output Displays......................................................... 49
Character, Numeric, Time, and Date Formats........................................................... 53
2.6 User-Defined Formatting.............................................................................................. 58
Saving Formats for Later Use.......................................................................................63

v
vi Contents

2.7 Recoding and Transforming Variables in a DATA Step......................................... 66


Indicator Variables....................................................................................................... 68
2.8 Writing Out a File or Making a Simple Report....................................................... 73
Simple Report Generation.......................................................................................... 73
Exporting a File............................................................................................................77
2.9 Exercises........................................................................................................................80

3. Programming a DATA Step.................................................................................................83


3.1 Writing Programs by Subdividing Tasks.................................................................83
Estimate the Probability That a Randomly Selected 30- to 39-Year-Old
Male Is Taller than a Randomly Selected Female of the Same Age......................83
Conditional Execution............................................................................................84
Looping to Repeat a Task....................................................................................... 86
Returning to the Height Probability Simulation................................................ 87
3.2 Ordering How Tasks Are Done.................................................................................90
Missing Data in Functions.......................................................................................... 92
3.3 Indexable Lists of Variables (Also Known as Arrays)............................................ 93
Defining Values in the Variable List.......................................................................... 93
Inputting Values in the Variable List......................................................................... 94
Reassign Missing Value Codes for Numeric Variables “.”.................................... 95
Recoding Missing Values for All Numeric and Character Variables................... 95
3.4 Functions Associated with Statistical Distributions.............................................. 96
3.5 Generating Variables Using Random Number Generators................................. 102
3.6 Remembering Variable Values across Observations............................................ 105
Processing Multiple Observations for a Single Observation............................... 106
3.7 Case Study 1: Is the Two-Sample t-Test Robust to Violations of the
Heterogeneous Variance Assumption?.................................................................. 109
Case Study 1 (Revisited with DATA Step Programming)................................... 118
3.8 Efficiency Considerations—How Long Does It Take?.......................................... 122
3.9 Case Study 2: Monte Carlo Integration to Estimate an Integral......................... 123
3.10 Case Study 3: Simple Percentile-Based Bootstrap................................................. 128
3.11 Case Study 4: Randomization Test for the Equality of Two Populations.......... 130
3.12 Exercises...................................................................................................................... 134

4. Combining, Extracting, and Reshaping Data................................................................ 137


4.1 Adding Observations by SET-ing Data Sets.......................................................... 137
4.2 Adding Variables by MERGE-ing Data Sets.......................................................... 140
4.3 Working with Tables in PROC SQL........................................................................ 148
4.4 Converting Wide to Long Formats.......................................................................... 161
4.5 Converting Long to Wide Formats.......................................................................... 164
4.6 Case Study: Reshaping a World Bank Data Set..................................................... 166
4.7 Building Training and Validation Data Sets.......................................................... 175
4.8 Exercises...................................................................................................................... 179
4.9 Self-study Lab............................................................................................................. 180

5. Macro Programming........................................................................................................... 191


5.1 What Is a Macro and Why Would You Use It?...................................................... 191
5.2 Motivation for Macros: Numerical Integration to Determine
P(0 < Z < 1.645).......................................................................................................... 191
Contents vii

5.3 Processing Macros..................................................................................................... 195


5.4 Macro Variables, Parameters, and Functions........................................................ 195
5.5 Conditional Execution, Looping, and Macros....................................................... 198
More Complicated Macro Variable Construction................................................. 203
Changing Locations in a Macro during Execution............................................... 204
5.6 Debugging Macro Code and Programs................................................................. 206
Write Out Values of Macro Variables...................................................................... 206
Useful SAS Options for Debugging Macros.......................................................... 207
5.7 Saving Macros............................................................................................................ 211
5.8 Functions and Routines for Macros........................................................................ 211
5.9 Case Study: Macro for Constructing Training and Test Data Set for Model
Comparison................................................................................................................ 216
5.10 Case Study: Processing Multiple Data Sets............................................................223
5.11 Exercises...................................................................................................................... 227

6. Customizing Output and Generating Data Visualizations........................................ 229


6.1 Using the Output Delivery System......................................................................... 229
Basic Ideas................................................................................................................... 229
Destinations—RTF, HTML, PDF, and More!.......................................................... 230
What’s Produced and How to Select It................................................................... 235
Another Destination That Stat Programmers Should Visit—OUTPUT............. 243
6.2 Graphics in SAS.......................................................................................................... 249
6.3 ODS Statistical Graphics........................................................................................... 250
6.4 Modifying Graphics Using the ODS Graphics Editor.......................................... 257
6.5 Graphing with Styles and Templates...................................................................... 260
6.6 Statistical Graphics—Entering the Land of SG Procedures................................ 266
SGPLOT....................................................................................................................... 266
SGPANEL.................................................................................................................... 269
SGSCATTER............................................................................................................... 271
6.7 Case Study: Using the SG Procedures.................................................................... 273
6.8 Enhancing SG Displays—Options with SG Procedure Statements................... 279
6.9 Using Annotate Data Sets to Enhance SG Displays.............................................284
6.10 Using Attribute Maps to Enhance SG Displays.................................................... 287
6.11 Exercises...................................................................................................................... 290

7. Processing Text..................................................................................................................... 293


7.1 Cleaning and Processing Text Data........................................................................ 293
7.2 Starting with Character Functions.......................................................................... 293
7.3 Processing Text........................................................................................................... 298
7.4 Case Study: Sentiment in State of the Union Addresses...................................... 302
7.5 Case Study: Reading Text from a Web Page..........................................................309
7.6 Regular Expressions.................................................................................................. 315
7.7 Case Study (Revisited)—Applying Regular Expressions.................................... 319
7.8 Exercises...................................................................................................................... 321

8. Programming with Matrices and Vectors...................................................................... 323


8.1 Defining a Matrix and Subscripting....................................................................... 323
8.2 Using Diagonal Matrices and Stacking Matrices.................................................. 329
8.3 Using Elementwise Operations, Repeating, and Multiplying Matrices............ 332
viii Contents

8.4 Importing a Data Set into SAS/IML and Exporting Matrices from
SAS/IML to a Data Set.............................................................................................. 333
Creating Matrices from SAS Data Sets and Vice Versa......................................... 333
8.5 Case Study 1: Monte Carlo Integration to Estimate π........................................... 336
8.6 Case Study 2: Bisection Root Finder....................................................................... 337
8.7 Case Study 3: Randomization Test Using Matrices Imported from PROC
PLAN...........................................................................................................................340
8.8 Case Study 4: SAS/IML Module to Implement Monte Carlo Integration
to Estimate π...............................................................................................................342
8.9 Storing and Loading SAS/IML Modules...............................................................344
8.10 SAS/IML and R..........................................................................................................345
8.11 Exercises...................................................................................................................... 350

References.................................................................................................................................... 355

Index.............................................................................................................................................. 357
Preface

Intended Audience
Upper-level undergraduates, beginning graduate students, or any professional interested
in solving data analytics problems using statistical programming will find this book use-
ful. People interested in realizing their full potential and achieving self-actualization
might find this book to be inspirational and life changing. (Reading a stat programming
book probably won’t result in self-actualization; however, this sentence tells you a little
about what you might expect from the tone and style of the book.)

Origin and Perspective of This Book


This book evolved from course notes from a statistical programming class. The second
edition reflects a major renovation based on a decade of experience in teaching using the
first edition, enhancements to the SAS system, and a change in a practitioner’s perspective
on learning from data that is a focus of data science and analytics practice.

What Makes This Book Different?


The intent of this book is to provide a holistic view of programming to answer questions
using data. Examples and case studies trace the formulation of programming solutions
from a starting framework to a final working program.
The author views the computer as the laboratory instrument for statisticians and data
scientists, and programming is a way to access the power of this instrument. Simulation
studies, computationally intensive statistical methods, and reshaping complicate data sets
all provide context for learning about statistical programming.
There are SAS books that give simple introductions to SAS; other books focus on
individual topics such as macros and matrix programming; and some other books help
for the preparation of certification exams. This book focuses only on the process of
statistical programming in SAS where introductory ideas are presented, focused topics
are employed, and a collection of topics that are part of basics and certification exams
is touched.

ix
x Preface

Extensive Examples and Case Studies


You learn programming by doing it. You learn statistical programming by implementing
data analysis tasks and investigating statistical methods. Case studies implemented in this
book include the following:

Case Study Concept


Generating figures from World Bank data Reshaping data sets from wide to long formats
Sampling distributions of means from Poisson data Simulation to understand statistical concepts
Generating and analyzing data from linear mixed Constructing artificial data sets to gain insight when
effects models starting to use a new statistical method
Evaluating the robustness of the two-sample t-test DATA step programming to implement a Monte
robust to violations of the heterogeneous variance Carlo simulation
assumption
Monte Carlo estimate of an integral DATA step programming to implement numerical
solution
Trapezoidal rule estimate of an integral DATA step programming to implement numerical
solution
Percentile-based bootstrap Implementing a computationally intensive estimation
method
Randomization test for two-population testing Implementing a computationally intensive
hypothesis-testing method
Sentiment in state of the union addresses Parsing text, removing stop words, and linking to
sentiment for display
Names and levels of courses offered in academic Extracting data from web pages using text processing
department and using functions for regular expressions
Root finding Implementing bi-section algorithm using SAS/IML
Life expectancy changes over time Data visualization using attribute maps and annotate
data sets with SAS statistical graphics

A number of these case studies are implemented using different strategies including
DATA step programming, SAS procedures with by statements, and SAS/IML modules.
Efficiency of different solutions is also discussed.

What Topics Are Covered in This Book?


This book covers statistical programming tasks, including

• Getting data into the SAS system, engineering new features, and formatting
variables
• Writing readable and well-documented code
• Structuring, implementing, and debugging programs
• Creating solutions to novel problems
• Combining data sources, extracting parts of data sets, and reshaping data sets as
needed for other analyses
• Generating general solutions using macros
• Customizing output
Preface xi

• Producing insight-inspiring data visualizations


• Parsing, processing, and analyzing text
• Programming solutions using matrices and connecting to R

Assumed Background of the Reader


A reader would be best served to have had some introduction to statistical modeling up
through multiple regression and one-way ANOVA models. A formal course in computer
programming would provide a useful foundation, although this is not strictly necessary.

Recommended Order for Reading


The book chapters are ordered in a way that working through the book sequentially
makes a lot of sense. In particular, I suggest that working through Chapters 1–4 in order
would be a great way to start with this book. With this foundation in place, then the other
chapters could be selection based on the interest and focus for a reader or course instruc-
tor. For example, some might want to jump to Chapter 6 to learn more about ODS and
data visualization while others might want to think about functions for text processing
(Chapter 7) before taking a look at macros (Chapter 5).

Code from the Book


There is a GitHub site for the code for each chapter, data, and, now, supplemental materials
such as the appendix. You can visit the site at https://github.com/baileraj/SPiSv2.
Acknowledgments

The first edition of this book grew out of course notes used to teach a statistical program-
ming class at the Miami University. The second edition is a major renovation that reflects
years of using the first edition to teach this class and the feedback (and occasional grief)
from colleagues. My Miami colleagues Tom Fisher and Steve Wright provided feedback
and suggestions. In particular, I’d like to thank Steve Wright for discussions that impacted
the flow of content and presentation of material in this revision. Yes, I realize that you can
hear my voice when you read some of the sections. I still think that’s a good thing.
Thank you to generations of students who taught me how to be a more effective com-
municator of statistical and programming content. I hope that this book reflects some of
the lessons that I’ve learned.
I would like to thank John Kimmel and his colleagues at Chapman & Hall/CRC Press.
Their support to publish a quality product is much appreciated.
I asked my wife, Jenny, to remind me about the pain and suffering associated with
book projects. She did what I asked, but I yielded to the book-writing temptation once
again. I am blessed to have the love and support of such a great life partner. I closed the
acknowledgments section of my last book with the statement swearing, for the third
time, that it would be the last book I write. Clearly, I shouldn’t take oaths related to book
writing.

xiii
Author

A. John Bailer, PhD, PStat®, is a University Distinguished Professor and a founding chair
of the Department of Statistics and an affiliate member of the Departments of Biology and
Sociology and Gerontology as well as the Institute for the Environment and Sustainability
at the Miami University in Oxford, Ohio. He is President of the International Statistical
Institute (2019–2021). He previously served on the Board of Directors of the American
Statistical Association. He is a Fellow of the American Statistical Association, the Society
for Risk Analysis, and the American Association for the Advancement of Science. His
research has focused on the quantitative risk estimation but has collaborations address-
ing problems in toxicology, environmental health, and occupational safety. He received
the E. Phillips Knox Distinguished Teaching Award in 2018 after previously receiving the
Distinguished Teaching Award for Excellence in Graduate Instruction and Mentoring and
the College of Arts and Science Distinguished Teaching Award. He is also the co-founder
and continuing panelist on the Stats+Stories podcast (www.statsandstories.net).

xv
1
Structuring, Implementing, and Debugging
Programs to Learn about Data

1.1 Statistical Programming


The computer is the exploratory instrument and experimental tool for data scientists, data
analysts, and statisticians. Analogous to chemists needing proficiency with mass spec-
trometers and geneticists needing competence with gene and protein expression technolo-
gies, statisticians and data scientists need skills using the computer as a tool for preparing
data for later analysis, structuring data sets, conducting analyses, fitting models, and
simulating stochastic systems. In the chapters that follow, you will develop an apprecia-
tion for some of the questions that can be answered with statistical programming, and
how these questions can be answered using SAS. The emphasis of this book is statistical
programming. Statistical programming can be viewed as the coding required to conduct
an analysis of interest, which might include using existing procedures or customizing an
analysis not possible in existing procedures. In contrast, statistical computing might be
considered the efficient coding of statistical procedures (e.g., numerical methods, random
number generation). Data management might be conceptualized as the structure, storage,
and manipulation of data files.
There are a number of books that provide excellent basic introductions to SAS (e.g., Cody
and Pass 1995, Delwiche and Slaughter 2012). This book complements and extends those
other books by emphasizing the use of SAS to solve statistical programming problems.
In a world with a host of programming and software options for data analysis, why sta-
tistical programming in SAS? SAS is one of the leading data analysis platforms in terms of
data-cleaning capabilities, memory and storage management, interfacing with enterprise
databases, and turnkey computational efficiency. SAS has a comprehensive set of data pro-
cessing and management tools and a broad range of statistical procedures that are united
with a similar syntactical structure. It has decades of use, and backward compatibility is
maintained over new releases. It is well tested before new versions are released, and it
continues to innovate. SAS is used extensively in a broad set of areas with particular use
in finance and banking, pharmaceutical research and manufacturing, and government.
Ultimately, the modern data scientist and statistician will be a computational polyglot,
and I believe that SAS can be a useful component of this collection of analytic languages.
There are many ways to learn to swim. One way is a gentle, gradual entry into warm,
welcoming water with incremental instruction on swimming details. Another way is to
be thrown into the deep end of pool with cold water. A little flailing of limbs is likely
along with hope you don’t sink. In the next section, you will be thrown into the deep end
for statistical programming. While these motivating examples are complex, the goal of

1
2 Statistical Programming in SAS

introducing the examples early is to inspire and to prime your imagination with ways you
might apply the skills you will learn in subsequent chapters. So while you are being tossed
in the deep end in the next section, a lifesaver will be tossed in to assist and more gentle
instruction will be provided. If you are interested in delaying your swimming lesson,
skim Section 1.2 for motivation and move quickly to Section 1.3.

1.2 Learning from Constructed, Artificial Data


The computer is the laboratory instrument for statisticians. An important strategy in sta-
tistical programming is to build examples with known characteristics to provide a basis
for developing solutions to complicated problems for understanding the analysis of data
with particular structure and for exploring statistical theory. In this section, you will see
three examples that touch on each of these topics. In each example, a simple simulated
data example captures the essential features of the problem and thus provides this insight.

Processing a Particular Data Set—Extracting Variable Names


from a Column of an Input Data Set
You may want to process particular data sets that have a structure that may require
reshaping to produce a useful analysis data set. You might be interested in analyz-
ing data from the World Bank (see www.worldbank.org). Data from this site often have
a format where a row is the unique variable for a particular country and columns are
different years. For example, data on health, nutrition, and population statistics can be
downloaded from the World Bank (e.g., http://databank.worldbank.org/data/reports.
aspx?source=health-nutrition-and-population-statistics-by-wealth-quintile#).
An edited subset of the data set extracted from this site is given below. You see a column
for country, variable (Series Name), and data from different years.

Country Name Series Name 2001 [YR2001] 2002 [YR2002]

Angola Antenatal care (% of women with a birth): Q1 (lowest) 47.1 ..


Angola Antenatal care (% of women with a birth): Q2 60.9 ..
Angola Antenatal care (% of women with a birth): Q3 58.5 ..
Angola Antenatal care (% of women with a birth): Q4 70.8 ..
Angola Antenatal care (% of women with a birth): Q5 (highest) 86.1 ..

A restructuring of the data would allow for better analysis and visualization options.
For the above data, this involves having a separate column for each “Series Name” as well
as year. The entries in the “Series Name” column are the names of variables in the data set.
For this subset, an analysis data set would have the following structure:

Antenatal Care Antenatal Care Antenatal Care Antenatal Care


Country (% of women with a (% of women with (% of women with (% of women
Name Year birth): Q1 (lowest) a birth): Q2 a birth): Q3 with a birth): Q4

Angola 2001 47.1 60.9 58.5 70.8


Angola 2002 . . . .
Angola 2003 . . . .
Structuring, Implementing, and Debugging Programs to Learn about Data 3

This data set has more than 87,000 rows and 32 columns. An alternative to developing the
code to reshape this data set is to build a smaller test data set that captures the key features
of the larger problem, to develop a solution for this smaller problem, and then to general-
ize this solution to the larger problem.

WORTH NOTING: Programs included in this text will use macro variables to specify
folders and subfolders for storing. All of the output/tables from procedures are pro-
duced using ODS RTF output control. While this code will be included in the code
file online, these statements are omitted from the text.

Program 1.1: Constructing a SAS Data Set with Similar Structure to the World
Bank Data and Then Reshaping This into a Better Form for Analysis
/*
Need to extract variable names from a column of an input data set
*/

data colvarname;
input country $ variableName $ YR1960 YR1961;
datalines;
C1 a 10 15
C2 a 12 17
C3 a 14 19
C1 b 20 25
C2 b 22 27
C3 b 24 29
;
run;
title 'data set where reshaping was needed';

proc print data=colvarname;


run;

/* First: get the YR variables in a column with year and a column


with variable value */
proc sort data=colvarname;
by country variableName;
run;

proc transpose data=colvarname out=test let;


by country variableName;
var yr1960 yr1961;
run;

/* Second: extract the year (e.g. 1960) from the character value
(e.g. YR1960) and make it numeric
*/
4 Statistical Programming in SAS

data test2;
set test;
year = 1.*substr(_NAME_,3); * makes this variable numeric;
drop _NAME_;
run;

proc print data=test2;


run;

/* Third: move the variableName column to separate distinct columns


*/
proc sort data=test2;
by country year;
run;

proc transpose data=test2 out=test3;


by country year;
var col1;
run;

proc print data=test3;


run;

/* Fourth: rename columns with variable names and remove unneeded


column */
data test4;
set test3 (rename=(COL1=a COL2=b));
drop _NAME_;
run;

proc print data=test4;


run;
ods rtf close;

The first part of Program 1.1


data colvarname;
input country $ variableName $ YR1960 YR1961;
datalines;
C1 a 10 15
C2 a 12 17
C3 a 14 19
C1 b 20 25
C2 b 22 27
C3 b 24 29
;
run;

This program constructs a simple data set with six rows of data representing the
combination of two variables (a, b) measured for three countries (C1, C2, C3). Four
columns are associated with these rows: country, variable, and measurements of the
variable for 2 years, the variables YR1960 and YR1961. This working example data
set is shown in Table 1.1.
Structuring, Implementing, and Debugging Programs to Learn about Data 5

TABLE 1.1
Print Out of Example Data Set
Obs country variableName YR1960 YR1961

1 C1 a 10 15
2 C2 a 12 17
3 C3 a 14 19
4 C1 b 20 25
5 C2 b 22 27
6 C3 b 24 29

The following program was developed to convert this constructed prototype data set into
a form that is more useful for analysis and visualization. The remainder of Program 1.1
produces the reshaping and executing this code result in the reshaped data set shown in
Table 1.2.

TABLE 1.2
Reshaped Example Data Where Variables Now
Presented in Columns and Observations in Rows
Obs country year a b

1 C1 1960 10 20
2 C1 1961 15 25
3 C2 1960 12 22
4 C2 1961 17 27
5 C3 1960 14 24
6 C3 1961 19 29

The details associated with the DATA and PROC steps of this program are explored in
later chapters.
You could now generalize this coding strategy to reshape the original World Bank data
set. The details of how this code is used to reshape the World Bank Data are discussed in
later chapters. This example illustrates the value of constructing a small artificial data set
to prototype code for solving a larger, more complicated problem.

Learning More about Unfamiliar Statistical Methods—Linear Mixed Effects Models


You may want to learn more about statistical methods with data that possess known
characteristics. Generating a data set that captures the particular data structure that
meets the assumed structure of a statistical technique is a good way to understand how
to code the analysis for such data and will help with comparing the results of the output
of the analysis to the structure used to generate the data. Consider linear mixed effect
(LME) models (Littell et al. 2006) applied to a situation where repeated measurements
are taken on each subject. As an example, the growth of a set of subjects over time could
be the focus of such an analysis, and the height of each subject would be measured at
multiple points in time. In such data, we might start with the assumption that subjects
are independent, but measurements on the same subject are correlated. This correlation
6 Statistical Programming in SAS

could be presented in a model by having a random effect for each subject. One simple
form of this model is
Yij = β 0 + β1 time ij + b0 + ε ij ,
where Yij is the response of the ith individual at the jth time, timeij. The bs represent
the population slope and intercept parameters with the mean response at timeij, that is,
E(Yij) = β0 + β1 timeij. The error terms are assumed to be normally distributed with different
standard deviations: εij ~ N(0, σε) and b0 ~ N(0, σ0).
The SAS procedure MIXED is used to fit such a model. To explore this procedure, we
generate a data set based on a model Yij = 20 + 4 timeij + b0 + εij with εij ~ N(0, σε = 4) and
b0 ~ N(0, σ0 = 2) for i = 1, …, 25 (subjects) and j = 1, 2, …, 5 (times). The code for constructing
this data set and the fitting of the mixed model is given in Program 1.2.

WORTH NOTING: A “seed” for the random number generation sequence will be
used in the next program. Random numbers generated by computers are not really
random. They have a pattern that appears random. Once a seed is specified, then the
sequence of random numbers produced will be the same. This is good for replicated
examples, say in a book or in a simulation experiment.

Program 1.2: Generating Linear Relationship with a Random Intercept


title "Linear Mixed Effects Model illustration";
call streaminit(450561641); * set the seed for the pseudorandom
variable generation;
data randomint;
beta0=20;
beta1=4;
sigma=4;
do subject = 1 to 50;
b0 = rand('normal',0,2); * random intercept;
do time = 1 to 5;
response = beta0 + b0 + beta1*time + rand('normal',0,sigma);
output;
end;
end;
run;
proc print data=randomint;
run;
/* plot the subject-specific trajectories */
proc sgplot data=randomint noborder;
series x=time y=response / group=subject;
run;
/* Fit LME model estimating the variance components */
proc mixed data=randomint;
class subject ;
model response = time / SOLUTION;
random Intercept / SUBJECT=subject type=UN;
run;

The line graph with the individual trajectories is given in Figure 1.1.
Structuring, Implementing, and Debugging Programs to Learn about Data 7

FIGURE 1.1
Subject-specific response profiles for simulated data.

The estimates of the fixed effects (β0, β1) (Table 1.3) and variance components (σε, σ0)
(Table 1.4) are produced in the output from PROC MIXED. It is encouraging to see that the
estimated fixed effects from our simulated data example are similar to the values used in
generating the data, namely β0 = 20 (b0 = 19.85) and β1 = 4 (b1 = 4.19).

TABLE 1.3
Estimates of the Fixed Effects (β0, β1) Obtained from PROC MIXED
Solution for Fixed Effects
Effect Estimate Standard Error DF t Value Pr > |t|
Intercept 19.8506 0.6804 49 29.18 <0.0001
time 4.1964 0.1794 199 23.40 <0.0001

The estimated variance components from our simulated data example are similar to
the values used in generating the data, namely σε = 4 (sε = 4.46 = sqrt[19.8506]) and σ0 = 2
(s0 = 2.05 = sqrt[4.1964]).

TABLE 1.4
Variance Components (σε, σ0) Obtained
from PROC MIXED
Covariance Parameter Estimates
Cov Parm Subject Estimate
UN(1,1) subject 5.4533
Residual 16.0846
8 Statistical Programming in SAS

For a data analyst faced with using a new statistical model, a simple strategy will provide
useful insights:

1. Construct an artificial data set with particular characteristics that are captured by
a statistical method,
2. Apply that method to the simulated data, and
3. Observe the results and compare the conditions used to simulate the data.

Improving Your Intuition about Statistical Theory— Sampling Distribution of Means


The Central Limit Theorem (CLT) is likely the only named theorem mentioned in an intro-
ductory statistics class. This theorem has implications about the sampling distribution of
the sample mean. To paraphrase, the CLT implies that the sampling distribution of the
sample mean converges to a normal distribution for a sequence of variables sampled from
a distribution with finite means and variance. You might be interested in generating data
to explore this directly. Consider a discrete random variable that follows a Poisson distri-
bution with mean μ = 0.5. This is a non-negative random variable that assumes values 0, 1,
2, 3, … Program 1.3 does the following:

1. Generates the Pr(Y=y) for y=0,1, … 10 for Poisson with mean 0.5.
2. Graphs this probability mass function.
3. Generates 1000 samples of size 10, 30, and 50 from this Poisson (0.5) distribution.
4. Calculates the sample mean for each sample.
5. Produces a histogram for each sample size group with a superimposed kernel
density and normal density estimates.

Program 1.3: Generating Samples of Poisson Variables and


Displaying the Sampling Distribution of the Means
title 'Poisson (mu=0.5) distribution';

data PoissonDistrib;
mu=0.5;
do x=0 to 10;
ProbX = PDF("Poisson",x,mu);
output;
end;
run;

* Probability Function for Poisson(mu = 0.5) ......................;

proc sgplot data=PoissonDistrib noborder; * remove box around plot;


needle x=x y=ProbX;
xaxis offsetmin =.1; * create space between first tick
mark and x-axis;
run;
Structuring, Implementing, and Debugging Programs to Learn about Data 9

data RanPoi;
mu = 0.5;
do nsize = 10, 30, 50;
do sample = 1 to 1000;
sumobs = 0;
do obs = 1 to nsize;
sumobs = sumobs + rand("Poisson",mu);
end;
xbar = sumobs/nsize;
output;
end;
end;
run;

* Sampling Distribution for means of samples from Poisson(mu = 0.5) ..;

proc sgpanel data=RanPoi;


title2 'Sampling Distribution of Xbar for different sample sizes';
panelby nsize / layout=panel columns=1 noborder;
histogram xbar / binwidth=.1;
density xbar;
density xbar / type=kernel;
run;

Figure 1.2 clearly illustrates the discrete nature of this random variable along with its
strong right skew.

FIGURE 1.2
Probability mass function for Poisson (mean = 0.5) variable.
10 Statistical Programming in SAS

One thousand samples of size n = 10, 30, and 50 were selected from this distribution, and
the sample mean was calculated for each of these samples. The distribution of the sample
mean is shown in Figure 1.3 as a histogram for these different sample sizes along with a
superimposed normal density and kernel density estimate.

FIGURE 1.3
Distribution of the sample mean for samples from a Poisson (mean=0.5) distribution.

From this visualization, you can see that the sampling distribution of the sample mean
for a Poisson variable with mean 0.5 appears symmetric and unimodal for samples of
size n = 10 although the normal density and kernel density estimates don’t overlap until
sample sizes of n = 50. In later chapters, you will learn how to customize figures to avoid
horrible default axis labels such as BIN _ XBAR _ BINSTART __ BINSTART ___ X as
seen in Figure 1.3.
Structuring, Implementing, and Debugging Programs to Learn about Data 11

1.3 Good Programming Practice


Good programming practice needs to go beyond “I know it when I see it.” In this sec-
tion, some guidelines for good practice are suggested. These guidelines are influenced by
recommendations that I received in many programming classes over the years, and by
regrets that I experienced when I looked at my own inadequately documented code years
after it was constructed. As a final disclaimer, you should usually follow these guidelines.
However, in cases when you write a quick program for a simple, single-use application,
you might not follow these guidelines as closely. So, let’s start.

Document Your Programs!


Every program should start with the following header information:

file location and name


Provide the name of the program and the full directory path for a program. With
programs stored on office desktops and laptops, on servers and in the cloud, this
is key information when you try to locate a program years or months or days after
its initial construction.
date
When did you write this program? This is great information when you need to do
a date-restricted search to find your program.
author
Who wrote this program? Are you using someone else’s macro? This information
helps identify contacts when clarifying program code.
revision (Was it based on a previous program?)
I rarely complete complicated programs at one sitting. Revision tracking is helpful
when developing code. In addition, I often do this with file naming conventions
(e.g., chapter1-statprog-20may09.doc might be a useful name for tracking a ver-
sion of this chapter). More sophisticated methods of version tracking exist that are
particularly useful for larger programming projects that might be supported by
multiple developers. An example of this is GitHub.

WORTH NOTING: All of the SAS code from this book can be obtained from the
GitHub repository (www.github.com/baileraj/SPiSv2).

purpose of the program


A description of what a program does in a sentence or two is never wasted
documentation.
input variables and output variables
What input is required to run a program? What output does the program produce?
12 Statistical Programming in SAS

Program 1.4 provides an illustration of header documentation in an analysis program.

Program 1.4: A SAS Program That Fits a Multiple Regression


filename onweb url
"http://www.users.miamioh.edu/baileraj/classes/sta402/data/country.
data";
data country;
title "country data analysis";
infile onweb;

inputname $ area Popn_Size Pct_Urban lang $ Literacy Life_Men


Life_Women PC_GNP;
log_area = log10(area);
log_popn = log10(Popn_Size);
log_GNP = log10(PC_GNP);
speaks_english = (lang="English");
drop area Popn_Size PC_GNP;
run;

proc print data=country;


run;

proc reg data=country;


title "LITER and LOGGNP as predictors of Life expectancy of women";
model Life_Women = Literacy log_GNP/ tol vif collinoint;
output out=new p=yhat r=resid;
run;

* setting up macro variable for Folder;


%let Folder1 = C:\Users\baileraj\Documents\book-SPiS-2nd-ed\chapter01;

/* plot life expectancy of women vs. log(GNP) with a linear regression


fit and LOESS fit superimposed
*/
ods rtf file="&Folder1.\ch01-pgm1p4-output.rtf"
image_dpi=300
style=sasuser.customSapphire;
title "";
proc sgplot data=country;
reg y=Life_Women x=log_GNP;
loess y=Life_Women x=log_GNP;
run;
ods rtf close;

In the example, a block of comments is found at the beginning of this program (all text
between the /* and */ symbols). The name of the program is the first line of the comment
block. You may find it useful to name files with the date as part of the filename. The direc-
tory, author information, and previous program that is the foundation of this current pro-
gram follow in the next lines. If you use multiple machines, and it is helpful to identify
this information here. Next, the purpose of the program is explained. After that, the name
of the input data file is provided. The directories of the external data files and the input
Discovering Diverse Content Through
Random Scribd Documents
Bétàfo, 208
Bétsibòka, River, 77, 174, 295
Bétsiléo province, 229
Bétsimitàtatra, 77, 92
Bétsimisàraka people, the, 43
Bezànozàno tribe, 6
Bird life, 63
Bird life, scant, 279
Birds, extinct gigantic, 213
Birds: parakeets, green pigeons, cardinal-birds, sun-birds, 32;
crows, 34;
egret, 34;
ducks and geese, 38;
storks, herons, 69;
rapacious, 82;
egret, 105;
crow, 105;
kingfisher, 105;
song, 137;
sun-birds, 137;
rollers, 138;
shrike, parrot, warbler, cuckoo, wood-pigeon, hawks, 138;
goat-sucker, 140;
owls, 140;
weaver-bird, 169;
bee-eater, 170;
birds on Lake Itàsy, 210;
parrots, 233;
Prevost’s broadbill, 281;
black parrots, 291;
bee-eater, 291;
fork-tailed shrike, 294;
fly-catcher, 293
Blow-pipe, native, 61
Boa, a, 44
Botanising in Madagascar, 128
“Bound-by-blood” ceremony, 235
Brehmia spinosa, 42
Bridges, 187, 194, 234, 238
Buddleia madagasc., 90
Buildings, modern, 99
Bull-baiting, 194
“Bullockers,” 20
Burial customs, 43
Butterflies, 110, 254

Cærostris stygiana, 162


Cæsalpinia sepiaria, 90
Camels, 308
Canals, 37
“Candle-nut-tree,” the, 158
Canoe chants, 271
Canoes, native, 33
Cape Lilac, 81
Cardinal-birds, 32
Carnivora, species of, 66, 167, 303
Carving in Bétsiléo, 230
Cassia lævigata, the, 90
Cassis, 36
Casuarina, the, 270
Caterpillars, 132
Caterpillars, a bag of, 130
Cattle rearing, 182, 195
Centetes ecaudatus, or tail-less tenrec, 167
Centetidæ, the, 278
Centipedes, 160
Cercopis species, 91
Chameleons, 135, 288
Chameleonidæ, species of the, 135
Charms, 86, 249
Cheirogaleus minor, 243
Children, Hova, 122
Cicada, the, 171
Clay in building, use of, 96
Clematis bojeri, 101
Climate, 75
Climbing plants, 37, 142
Clothing of the Malagasy, 124
Coast-line, the, 36
Coffee, 51
Cold month, the, 124
Commelyna Madagasc., 89
Constellations, Malagasy names for, 125
Conus, 36
Convolvuli, 49
Coraciadæ, 138
Coracopsis obscura, 234
Corvus scapulatus, 34
Cory, Mr, 99
Cosmaria, 200
Couas, the, 170
Crabs, 35
Crater lakes, 215
Craters, extinct, 208
Crayfish, 157
Crocodiles, 294
Crocodiles of Lake Alaotra, 200
Crocodiles, superstitious dread of, 49
Crocodiles, extinct species of, 223
Crows, 34
Cryptogamic vegetation, 143
Custom, a curious, 194
Customs at the New Year, 88
Customs of the Sihànaka, 203
Cycas thouarsii, 41
Cyclones, 148
Cynoglossum, 101
Cypræa, 36

Dauphine, Fort, 232


Davidson, Dr, 74
Day, divisions of the, 93
Days, uniformity in the length of the, 92
“Death-moths,” 110
Death’s-head moth, 145
Deciduous trees, 125
Delphinus pas, 275
Dhows, 307
Dialects, Hova and Malagasy, 236
Dinner with the Governor, 191
Dishes and spoons, primitive, 268
Dolphins, 275
Doorways, Bétsiléo, 236
Dracæna, 289
Dragonflies, 108
Dress, children’s, 125
Dress, Sihànaka, 202
Drury, Robert, 183
Dry season, the, 113
Dye from trees, 158

Earthquake, 224
Earthworms, enormous, 112, 155
Ebony, 159
Eels, 107
Èfitra, or desert, 289
Eggs of the Æpyornis, 213
Egret, white, 105
Egyptian kite, the, 83
Eleocarpus sericeus, leaves of, 158
Embankments, 78
Eucalyptus, cultivation of, 125
Euphorbia, the, 60, 125
Euryceros prevosti, 281

Fàhitra, or pens for oxen, 121


Famòa, 179
Fànataovana, or lucky heaps, 155
Fauna and flora, 17
Feather-bellows, 156
Félana, or decoration, 233
Ferns, 59, 128, 157
Fianàrantsòa, 232
Fibres, for rope, 158
Filanjàna, the, 18, 24
Fire, method of producing, 151
Fireflies, 271, 284
Firing the grass, 82
Fish, 39;
octopus, 40;
mullet, 40;
prawns and shrimps, 40;
shark, 40;
saw-fish, 40;
dolphins, 275, 303
Fishing, 196
Fishing eagle, 310
“Fitomanìanòmby,” 63
Flamingoes, 210
Flora: orchids, 32;
arums, 34;
palms, 37;
climbing plants, 37;
ferns, 38;
tangèna, 38;
sago palms, 41;
Filào, 41;
Brehmia spinosa, 42;
Hibiscus, 42;
Stephanotis, 42;
Ipomæa, 42;
pitcher-plant, 42;
gum-copal, 42;
india-rubber, 42;
bamboo, sugar-cane, manioc, banana, palms, pandanus, water-
lilies,
palms, convolvuli, traveller’s tree, 49;
raspberries, 51;
coffee, 51;
lace-leaf plant, 53;
bamboo, 57;
tree-ferns, 57;
pine-apples, 57;
rofìa-palm, 58;
ferns, 59;
euphorbias, 60;
orchids, 64;
bamboo, 65;
rice, 79;
Cape lilac, 81;
vine, 81;
euphorbia, 81;
orchids, 101;
indigenous plants, 127;
ferns, orchids, 128;
grasses and ferns, 128;
palms, 142;
climbing plants, 142;
cryptogamic vegetation, 143;
mosses and lichens, 143;
fungi, 144;
spiny plants, 145;
stinging plants, 146;
ferns, 157;
valuable trees, 158;
Tamarind-trees, 295
Flowers, comparative scarcity of, 64
Fly-catchers, 295
Food, curious articles of, 106
Food, articles of, 23
Forest, stillness of the, 60, 65, 277
Fòsa, the, 302
Fosses, 119
Fossils, 212
Foundry, native, 156
Fragrance of wild plants, 178
French invasion, the, 28
Frigate-birds, 255
Frogs, 152
Fruit-bats, 298
Funeral, a heathenish, 276
Funeral memorial, a, 268
Funerals, expensive, 203
Fungi, 144
Furniture, 98

Games, 122
Gates of stone, 119
Geese, 186
“General Hàzo” and “General Tàzo,” 28
Geological formations, quartz, red sandstone, 39, 53
Goat-sucker, the, 140
Goudot, M., 91
Grainge, Mr, and the cattle, 287
Granaries, 54
Grandidier, Alfred, 17, 169, 235
Grasses and rushes, 128, 178, 191, 201, 206
Grass, firing the, 72, 73
“Grave of the French,” 42
Guinea-fowl, 186
Gum-copal tree, 37, 42
Gums and resins, 158

Hail, 86
Hair-dressing, 252, 258
Hàmby, the, 311
Hapalemur simus, 243
Hawks, 84
Hàzondràno, or rush, 108
Hearth, the, 97
Hèrana, the, 108
Herons, 69, 312
Hibiscus, 42
Hibiscus diversifolius, 90
Hills, outline of, 52
Hippopotamus, extinct, 212
Hippopotamus Lemerlei, 212
Hivòndrona, 32
Hoar-frost, 113
Hooker, Sir W. J., 54
Horned memorial poles, 182
Hospitality of the Malagasy, 41
Hot springs, 53
Houlder, Mr, and the boa, 44
Houses, native, 23, 70, 236
“House-horns,” 97
Hovas, 299
Humped duck, 186

Iàboràno, 268
Iàritsèna, 232
Iatsìfitra volcano, 224
Ice, 92
Ifànja marsh, 225
Ifòdy Hills, 69
Ihàroka river, 45, 48
Ihòvana, chieftainess, 282
Ikòngo, 235
Ikòpa river, 76, 286, 291
Imàhazòny, 236
Imèrina, 71
India-rubber, 42, 158
Indigenous plants, 128
Insect life, 65, 279
Insectivora, species of, 67
Insects: ants, 34;
cockroaches, 38, 43;
a new spider, 43;
beauty of, 70;
spiders, 71;
water-producing, 91;
black wasp, 99;
silkworm moth, 109;
butterflies, 110;
grasshoppers, 111;
mantis, 112;
dog-locust, 112;
nests, 130;
ants, 131;
beetles, 132;
caterpillars, 132;
spiders, 133;
mantis religiosa, 153;
grasshoppers, 153;
beetles, 154;
ball-insect, 159;
millipedes, centipedes, scorpions, 160;
venomous spiders, 162;
protective resemblance, 164, 280;
mòkafòhy, 289
Intelligence of the people, 56
Inundations, damage by, 78
Iòlomàka, 237
Ipomæa, 42, 101
Iron, 156
Irrigation, 80
Isoàvina, 110
Itàsy, Lake, 208
Itsìatòsika, 275
Ivàlokiànja, 236
Ivàtoàvo, 232
Ivòhibé Mountain, 259
Ivòhitràmbo, 283
Ivòhitròsa, 237, 241
Ivòko volcano, 224

Jacanas, 211
Jàka, 50
Jigger, the, 161
Jìro, or memorial poles, 203
Johnson, Rev. H. T., 107
Jorèry or cicada, 277

Kabàry or National Assembly, 117


Kankàfotra, or cuckoo, 82
Kànkandoròka, a species of worm, 277
Karàbo, the, 259
Kestrel, the, 83
Kètsa grounds, 79, 80
“King-butterfly,” 110
Kingfisher, 105
Kinòly, the, 159
Kiròmbo roller, 138
Kòlikòly, or after-crop, 304
Komàngo-tree, 299

Lace-leaf plant, 53
Ladders, primitive, 54
Lagoons, 36, 273
Lake-dwellers, 173
Lakes and marshes, anciently a country of, 22
Làmba, the, 25, 58, 62, 109
Land-shells, 136
Landolphia Madagas., 42
Làpa, or Government House, 179
Le Sage, Captain, 59
Leeches, 157
Lemur Catta, 243
Lemur mongos, 45
Lemuroid animals, extinct, 222, 226
Lemuroida, species of, 66
Lemurs, 45, 66, 67, 168
Lichens, 116, 143
Lightning, freaks played by, 85, 87
Lilìa, river, 209
Lime deposit, 211
Lizards, 43, 134
Lizards, extinct species of, 223
Locusts, 73
Longòzy plant, the, 249
Looms, primitive, 58

Madagascar, 19;
its ancient connection with Africa, 67
Madagascar bee, the, 144
Màhamànina, 257
Màhavèlona, 259
Major, Dr Forsyth, 278
Malarial fever, 42
Mammalia, 66
Mammals, species of, 278
Mampìta-hàdy, or fosse-crosser, 109
Manàkambahìny, 185
Mànanàra river, 174, 295
Mànanjàra river, 275, 281
Mandànivàtsy, 207
Mandràka river, 71
Mandràka Valley, 142
Màngasoàvina, 289
Mango-trees, 81
Mangòro river, 69
Màningòry river, 174, 193
Manioc, 49
Mantidactylus genus of frogs, 153
Mantis, a curious, 112
Mantis religiosa, 153
Market day, 181
Markets, 116
Màrokalòy, 290
Maròmby, 51
Maromita, or porters, 24, 30
Màrosalàzana, 202
Màrovoày, 304
Marshes, 108
Màsindràno, 275
Màtitànana river, 240, 295
Matthey, M. C., 111
Mats, Sihànaka, 185
Mead, 145
Medicinal waters, 212
Medicine from trees, 158
Melia azederach, 81
Memorial poles, 203, 231
Merops superciliosus, 170
Mesites, 211
Mèvatanàna, 225, 292
Millipedes, 160
Mimicry amongst plants, 155
Mineral wealth of the country, 57
Mitra, 36
Mojangà, 285, 307
Mòkafòhy, insect, 288, 300
Money, 117
Months, origin of names of, 88
Moraféno, 187
Mòramànga, 68
Mòraràno, 187, 193
Mortar and pestle, the, 97
Mosses, 143
Moths, 109
Mouse-lemurs, 243
Mozambique Channel, 67
Mugil borbonicus, 40
Mullens, Dr, 31, 173
Mullet, 40
Mundulea suberosa, the, 90
Mungooses, 303
Musical instruments, 56
Mysore thorn, the, 146
Mythical creatures of Lake Alaotra, 201

Nàndihìzana, 229
Native houses, structure of, 95
Nectarinidæ or sun-birds, 32
Neodrepanis coruscans, 137
Nephila spider, 109
Nest of the aye-aye, 47
Nests of insects, 130
Nests of wasps, 99
New Year, Malagasy, 87
Nòsibé, 224

Obstructions in rivers and paths, 51


Ocean currents, 39
Octopus, 40, 312
Oliva, 36
Opuntia ferox, 145
Oranges, 51
Orchards, 81
Orchids: angræcum, 32, 38, 64;
terrestrial, 101, 212
Ordeals, 251
Ornamentation, female, 240
Outrigger canoes, 304
Ouvirandra fenestralis, 53
Owen, Sir R., 45
Owls, 140
Ox, extinct species of, 223
Oxen, 35, 183
Oysters, 311

Paddles, native, 34
Palms, 142
Pandanus, the, 32, 37, 49
Papàngo, or Egyptian kite, 83
Parakeets, 32, 233, 291
Parrots, 233, 291
Paths, forest, 150
Pearse, Rev. J., 191
Pelophilus madagasc., 44
Pigeons, 32
Pillans, Rev. J., 173
Pine-apples, 57
Pitcher-plant, 42, 261
Plant, Mr, 22, 67
Ploceus pensilis, 169
Poison ordeal, the, 38
Poison tree, a, 38
Poisonous fish, 40
Pollen, M., 137
Potamochærus larvatus, 136
Prawns and shrimps, 40
Prevost’s broadbill, 281
Prickly pear, 90, 119
Pristis sp., 40
Proctor, S., 22
Protective mimicry, 111, 153, 164, 280
Psittacula Madagasc., 233
Pulpit, a decorated, 253
Pumice from Krakatoa, 38

Quadrumana, 66
Quadrupeds, 167

Radàma I., 37
Radàma II., 38
Rail, 211
Railòvy, or fork-tailed shrike, 294
Railways, 18
Rain, 81, 85, 100
Rallus gularis, 211
Rànavàlona I., 38
Rànavàlona, Queen, 87
Rànomafàna, 53
Rapacious birds, 83
Rapèto, chief, 210
Raphia ruffia, 62
Ràry, or war-chant, 98
Raspberries, 51, 242
Rats, 54, 59
Ravenala madagasc., 49
Rayed Gymnogene, the, 84
Religious observances, 250
Reptiles: snakes, 134;
lizards, 134;
chameleons, 135;
ancient, 226
Rest-houses, 33
Rice cultivation, 77, 79, 92, 103, 106, 177, 195, 263
Rice cultivation in Bétsiléo, 230
Rice-houses, 241
Ring-tailed lemur, 243
River-hog, extinct species of, 223
Rivers, 36
Roads and pathways, 27
Rocks, 223, 233, 292, 298
Rofìa-palm, 31, 56, 58, 62
Rofìa cloth, 57
Ròva, or square, 26, 179, 305
Rollers (Coraciadæ), 138
Rose-apple, the, 91
Rum drinking, 176

Sago palms, 41
Ste Marie, Isle, 42
Sàkalàva, 176, 299
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like