UNIVERSITY OF SOUTHERN MINDANAO
KIDAPAWAN CITY CAMPUS
Sudapin, Kidapawan City
Stat 121
Engineering Data Analysis
Espadera, Andre Paul V.
College of Engineering
Chapter 1
Obtaining Data
2
Methods of Data Collection
3 Methods for Collecting Data
Three major techniques for Collecting Data:
1. Questionnaires
2. Interviews
3. Observations
3
Using these data gathering methods
• (Each method has advantages and problems. No single method can fully
measure the variable important to OD
• Examples:
– Questionnaires and surveys are open to self-report biases, such as
respondents’ tendency to give socially desirable answers rather than
honest opinions.
– Observations are susceptible to observer biases, such as seeing what one
wants to see rather than what is actually there.
4
Use more than one
• Because of the biases inherent in any data-collection
method, it is best to use more than one method when
collecting diagnostic data.
• The data from the different methods can be compared,
and if consistent, it is likely the variables are being validly
measured.
5
Demographics
• Information about the people you are gathering data from is important.
• Collect the specific demographics necessary. Some examples
– Age
– Gender
– Income level
– Ethnic background
– Status (student, teacher, visitor)
• Be careful not to collect demographics that are not specific to your data
collection purpose.
6
Questionnaires; there are drawbacks;
• Responses are limited to the questions asked in the instrument.
• They provide little opportunity to probe for additional data or ask
for points of clarification.
• They tend to be impersonal.
• Often elicit response biases – tend to answer in a socially
acceptable manner.
7
Sample Employee / Management Relationship Survey
Team Goals and Objectives
Unclear; diverse; conflicting 1 2 3 4 5 Clear; understood; shared by
all.
Role Clarity
Employees are unclear about 1 2 3 4 5 Employees are clear about
their roles; responsibilities and what is expected of them;
authority are ambiguous. they know their
responsibilities and authority.
Communications
Employees are guarded and 1 2 3 4 5 Employees are open and
cautious when communicating authentic when
with management communicating with
management
Decision Making
Little opportunity for input; 1 2 3 4 5 Decisions made jointly
uninvolved; decisions made through group participation;
autonomously. plenty of opportunity for
input; persons affected 8
asked for their opinion.
Interviews
• Interviews are probably the most widely used technique for
collecting data in OD.
• They permit the interviewer to ask the respondent direct
questions.
• Further probing and clarification is possible as the interview
proceeds.
• This flexibility is invaluable for gaining private views and feelings
about the organization and exploring new issues that emerge
during the interview.
9
Interviews
• Interviews may be highly structured, resembling
questionnaires, or highly unstructured, starting with real
questions that allow the respondent to lead the way.
• Interviews are usually conducted one-to-one but can be
carried out in a group.
• Group interviews save time and allow people to build on
other’s responses.
• Group interviews may, however, inhibit respondent’s
answers if trust is an issue.
10
Interviews / Focus Groups
• Another unstructured group meeting conducted by a
manager or a consultant.
• A small group of 10-15 people is selected representing a
larger group of people
• Group discussion is started by asking general questions
and group members are encouraged to discuss their
answers in some depth.
• The richness and validity of this information will depend
on the extent that trust exists.
11
Traditional Approach To Experimentation:
T8-POSS Example
• set T = 40 C, H2O concentration = 10%; try cSi=0.1, 0.2, 0.3, 0.8,0.9,1.0
M
• set T = 60 C, cSi=0.5M, H2O concentration = 5, 10, 12.5, 15, 17.5, 20%
•…
This is called a One-Factor-At-a-Time (OFAT) or Change-One-Separate-
factor-at-a-Time (COST) strategy. Disadvantages:
• may lead to suboptimal settings (see next slide)
• requires too many runs to obtain good coverage of experimental region
(see later)
12
Drawback to interviews
• They can consume a great deal of time if interviewers
take full advantage of the opportunity to hear respondents
out and change their questions accordingly.
• Personal biases can also distort the data.
• The nature of the question and the interactions between
the interviewer and the respondent may discourage or
encourage certain kinds of responses.
• It take considerable skill to gather valid data.
13
Sample Interview Questions
1. How do management and non-management employees
interact in the office?
2. How do you know when you have done an excellent job?
3. How do non-management employees learn about
organizational change?
4. If you could change one or two things about the way
management and non-management personnel interact,
what would you change?
14
Observations
• Observing organizational behaviors in their functional
settings is one of the most direct ways to collect data.
• Observation can range from complete participant
observation, where the OD practitioner becomes a
member of the group under study to a more detached
observation using a casually observing and noting
occurrences of specific kinds of behaviors.
15
Advantages to Observation:
• They are free of the biases inherent in the self-report
data.
• They put the practitioner directly in touch with the
behaviors in question.
• They involved real-time data, describing behavior
occurring in the present rather than the past.
• They are adapting in that they can be modified depending
on what is being observed.
16 16
Problems with Observation
• Difficulties interpreting the meaning underlying the
observations.
• Observers must decide which people to observe; choose
time periods, territory and events
• Failure to attend to these sampling issues can result in a
biased sample of data.
17 17
Observation Protocol
• A decision needs to be made on what to observe.
• Example:
– Observe how managers and employees interact in the office.
– Observe who has lunch with whom. (Do managers and non-
managers eat together? Do executives have a private lunch
area?)
18 18
Planning and Conducting Surveys
19
Things to consider
• Characteristics of a well designed and well conducted
surveys
• Population, samples and random selection
• Sources of bias in sampling and surveys
• Sampling methods, including simple random sampling,
stratified random sampling and cluster
20
Population, Samples and Random
Selection
• The population in a statistical study is the entire group of
individuals , scores, measurements, etc. about which we want
information
• A sample is a part of the population from which we actually collect
information and is used to draw conclusions about the whole
• Random selection is a process of gathering a representative
sample for a particular study. Random means the people are
chosen by chance, each person has the same probability of being
chosen . When you have truly random sample, you reduce the
chance that the results are due to factors of the participants in the
study
21
Sources of Bias in Sampling and
Surveys
• Convenience Samples use a selection of individuals that
are easiest to reach, and Voluntary Response Samples
where respondents decide if they want to be included, are
common methods of data collection that will usually
produced biased results. These sampling methods will
usually favor one part of a population over another.
• If the high school guidance office wanted to know of
students are interested in an AP Statistics elective, would
the district get accurate information if the counselors
asked the Calculus teachers to survey their students?
22
Sources of Bias in Sampling and
Surveys (cont..)
• Why would more accurate results be gathered in an
English or History class?
• Would asking students to stop by the office at the end of
the day to fill out a questionnaire regarding testing
policies in the district yield valid results?
• What could be changed to make this a more valid
sample?
23
Design Your Own Bad Sample
• The school administration wants to gather student
opinion about parking on campus. It is not practical
to contact every student
1. Give an example of a way to choose a sample of
students that is poor practice because it depends on
voluntary response.
2. Give an example of a way to choose a sample of
students that is poor practice that does not depend on
voluntary response.
24
• A sample chosen by chance allows neither favoritism by
the sampler nor self-selection by respondents. All
individuals have an equal chance to be chosen.
• A Simple Random Sample allows all members of a
population an equal chance of being selected, avoiding
bias. Drawing names from a hat works for small
population (students in a classroom) but would not be
practical when conducting a national survey.
• Computer-generated Random Digits can be used when
working with large population
25
• A Table of Random Digits is a long string of the digits
0,1,2,3,4,5,6,7,8,9 where each entry in the table is equally
likely to be any of the digits and the entries are
independent of each other.
• Systematic Sampling selects a starting point and then
selects every kth (such as 50th) element in the population.
26
• Stratified Random Sampling subdivides the population
into at least two different subgroups (Strata) so that
subjects within the same subgroup share the same
characteristics (gender, age) then draw a sample from
each. Ex. The Orange County DMV plans to test an online
registration system by using a sample consisting of 20
randomly selected men and 20 randomly selected
women.
27
• Cluster Sampling divides the population into selection
(clusters), and then chooses all members of the selected
clusters. Ex. Pre-election polls randomly select 30
precincts from a large number of precincts, then a survey
all members from each of the selected pecincts.
28
Planning and Conducting experiments: Introduction to
Design Experiments
29
Example of Experiment : Synthesis of T8-POSS
• context: development of new synthesis route for polymer
additive
• goal: optimize yield of reaction
• synthesis route consists of elements that are not uniquely
determined (control variables):
– time to let reaction run
– concentration water
– concentration silane
– temperature
–… 30
Issues in Example T8-POSS Synthesis
• how to measure yield
– what to measure (begin/end weight,…)
– when to measure (reaction requires at least one day)
• how to vary control variables
– which values of pH, concentrations, … (levels)
– which combinations of values
– equipment only allows 6 simultaneous reactions, all with the same
temperature
• how many combinations can be tested
– reaction requires at least one day
– only 4 experimentation days are available
31
Necessity of Careful Planning of Experiment
• limited resources
– time to carry out experiment
– costs of required materials/equipment
• avoid reaching suboptimal settings
• avoid missing interesting parts of experimental region
• protection against external uncontrollable/undetectable
influences
• getting precise estimates
32
30 The real maximum
40
50
60
factor B has been optimised
The apparent maximum
factor A has been optimised
Statistical Terminology for Experiments:
Illustrated by T8-POSS Example
• response variable: yield
• factors: time, temperature, cSi, H2O concentration
• levels: actual values of factors (e.g., T=30 C, 40 C ,50 C)
• runs: one combination of factor settings (e.g., T=30 C, cSi=0.5M,
H2O concentration = 15%)
• block: 6 simultaneous runs with same temperature in reaction
station
34
Modern Approach: DOE
• DOE = Design of Experiments
• key ideas:
– change several factors simultaneously
– carefully choose which runs to perform
– use regression analysis to obtain effect estimates
• statistical software (Statgraphics, JMP, SAS,…) allows to
– choose or construct designs
– analyse experimental results
35
Example of Analysis
simple experiment:
– response is conversion
– goal is screening (are time and temperature influencing conversion?)
– 2 factors (time and temperature), each at two levels
– 5 centre points (both time and temperature at intermediate values)
Statgraphics demo with conversion.sfx. (choose Special ->
Experimental Design etc. from menu)
More advanced (5 factors, not all 25 combinations): colour.sfx
36
Example of Construction: T8-POSS Example
• 36 runs
– 2 reactors available each day (each reactor 6 places)
– 3 experimental days
• factors:
– H2O concentration
– temperature
– cSi
• goal is optimization of response
• choose in Statgraphics: Special -> Experimental Design -> Create
Design -> Response Surface
37
Goals in Experimentation
• there may be more than one goal, e.g.:
– yield
– required reaction time until equilibrium
– costs of required chemical substances
– impact on environment (waste)
• these goals may contradict each other
• goals must be converted to explicitly measurable quantities
38
Types of Experimental Designs
• “Screening Designs”
These designs are used to investigate which factors are important
(“significant”).
• “Response Surface Designs”
These designs are used to determine the optimal settings of the significant
factors.
39
Interactions
Factors may influence each other. E.g, the optimal setting of a factor may
depend on the settings of the other factors.
When factors are optimised separately, the overall result (as function of all
factors) may be suboptimal ...
40
Interaction Effects
Cross terms in linear regression models cause interaction effects:
Y = 3 + 2 xA + 4 xB + 7 xA xB
xA → xA +1 Y→Y + 2 + 7 xB,
so increase depends on xB. Likewise for xB→ xB+1
This explains the notation AB for the interaction of factors A and B.
41
No Interaction
55
B low
Output 50
B high
25
20
low high
Factor A
Interaction I
55
50
B low
Output
B high
45
20
low high
Factor A
Interaction II
55 50
B low
Output
B high
45
20
low high
Factor A
Interaction III
Output 55
B high
45
20 20 B low
low high
Factor A
Centre Points and Replications
If there are not enough measurements to obtain a
good estimate of the variance, then one can perform
replications. Another possibility is to add centre
points .
Adding centre points serves two Centre point
purposes: better variance estimate allow
to test curvature using a lack-of-fit test b ab
+1
-1 (1) a
-1 +1
A 46
Multi-layered Experiments
Experiments are not one-shot adventures. Ideally one performs:
• an initial experiment
– check-out experimental equipment
– get initial values for quantities of interest
•main experiment
– obtain results that support the goal of the experiment
•confirmation experiment
– verify results from main experiment
– use information from main experiment to conduct more focussed
experiment (e.g., near computed optimum)
47
Example
• testing method for material hardness :
force
pressure pin/tip
strip testing material
practical problem: 4 types of pressure pins
do these yield the same results? 48
Experimental Design 1
1 5 9 13
testing 2 6 10 14
strips
3 7 11 15
4 8 12 16
pin 1 pin 2 pin 3 pin 4
Problem: if the measurements of strips 5 through 8 differ, is this caused by
the strips or by pin 2?
49
Experimental Design 2
•Take 4 strips on which you measure (in
random order) each pressure pin once :
1 1 4 2
pressure
pins
3 4 3 3
2 3 2 1
4 2 1 4
strip 1 strip 2 strip 3 strip 4
50
Blocking
• Advantage of blocked experimental design 2:
differences between strips are filtered out
• Model: Yij = + i + j + ij
factor block effect
pressure pin error term
strip
Primary goal: reduction error term
51
Short Checklist for DOE
• clearly state objective of experiment
• check constraints on experiment
– constraints on factor combinations and/or changes
– constraints on size of experiment
• make sure that measurements are obtained under constant external
conditions (if not, apply blocking!)
• include centre points to validate model assumptions
– check of constant variance
– check of non-linearity
• make clear protocol of execution of experiment (including randomised
order of measurements)
52