K.A. Varghese, B.R. Ranwah, Nisha Varghese, Nikhil Varghese - Research Methodology and Quantitative Techniques - A Guide For Interdisciplinary Research-Routledge (2025)
K.A. Varghese, B.R. Ranwah, Nisha Varghese, Nikhil Varghese - Research Methodology and Quantitative Techniques - A Guide For Interdisciplinary Research-Routledge (2025)
Quantitative Techniques
Research Methodology and Quantitative Techniques is a guide tailored for students and research
scholars navigating the intricate landscape of research degrees across various disciplines.
From clearing coursework to formulating research synopses, selecting methodologies,
conducting analysis and penning impactful theses, this book is a roadmap for every stage of
the research journey. It empowers scholars to undertake original, quality research that not
only fulfils academic requirements but also contributes to the burgeoning pool of knowledge
in diverse fields. Uniquely structured to address the specific needs of researchers, this guide
goes beyond traditional boundaries, delving into areas like IPRs and research ethics often
overlooked in discipline-oriented texts. By offering comprehensive support, from topic selection
to publication, it aims to be the go-to resource for researchers seeking a seamless path from
inception to dissemination.
This book, Research Methodology and Quantitative Techniques, addresses every facet of
research with clarity and insight and serves as both a companion and a vital tool for scholars
poised to make a meaningful research impact in their fields.
K.A. Varghese is Statistician at Pacific Institute of Medical Sciences, Udaipur and retired
Professor (Statistics), Maharana Pratap University of Agriculture and Technology, Udaipur.
B.R. Ranwah is a former ICAR Emeritus Scientist and retired Professor and Head of the
Department of Genetics and Plant Breeding, Maharana Pratap University of Agriculture and
Technology, Udaipur.
Nisha Varghese is Associate Professor at Indira Gandhi National Open University, New Delhi.
Foreword vii
Preface ix
1 Introduction 1
Index247
Foreword
It gives me immense pleasure to know that a book titled Research Methodology and Quantita-
tive Techniques: A Guide for Interdisciplinary Research written by very experienced former
professors/faculty members of Maharana Pratap University of Agriculture and Technology,
Udaipur and others is being published by Routledge. The importance of research in academic
institutions has gone up tremendously during the recent past and the New Education Policy of
the Government of India pinpoints the need to have quality research in Indian universities and
colleges. The problem-oriented and location-specific research has been getting utmost priority
in the ICAR system including State Agricultural Universities for the last more than four dec-
ades. With the UGC directives to clear the entrance test as well as research methodology courses
by the scholars to get registered for research degrees from colleges and universities, the focus on
research in various disciplines has been further streamlined. While student research in academic
universities and colleges is only a partial fulfilment of degree programmes, it has tremendous
scope to interlace with the institutional research output being generated by various research
institutions in India. It will not only avoid duplication of research efforts but also raise the total
research output to a greater level.
Academic research can be perceived as a science as well as an art since scientific methods are
to be used for conducting research and choices among alternatives are to be made by scholars at
various stages in research, like selection of research topic, research methods/designs, research
hypothesis, quantitative techniques and so on. Besides, at the initial stage of research, the schol-
ars with limited exposure to research find a lot of problems at different stages of their research.
It is a fact that the books available on research methodology are discipline-oriented and confined
to selected topics related to research methodology limiting the scope to set a strong platform by
the researchers to carry out the research. The issues related to the role of research in the higher
education system, the linkages of research with other functional areas of the university system,
factors to be considered for the selection of topic, identification of appropriate research method
and design, quantitative techniques to be used, ethical issues associated with research, publi-
cations, IPR related issues are becoming more and more pertinent in different disciplines like
health and medical sciences, veterinary sciences, agriculture and others. Most scholars find it
difficult to select a research topic in their respective disciplines, to formulate the research synop-
sis, to do a meaningful review work at the initial stages of research. The collection of data and its
analysis using appropriate quantitative techniques are also very vital for the timely completion
of a successful research by scholars in universities and colleges.
The lack of comprehensive reference books addressing all these issues has been a major
problem for many researchers in various disciplines. The scholars have to refer to many books
to prepare themselves to clear the research methodology course and to make a platform to
pursue their research work on a scientific footing. I understand that this book being published
viii Foreword
by Prof. K.A. Varghese, Prof. B.R. Ranwah, Dr. Nisha Varghese and Dr. Nikhil Varghese is a
comprehensive book on research methodology and quantitative techniques which will help the
researchers right from selection of research topic to publication of research papers after comple-
tion of their research. It can be a good reference book for faculty members, research scholars
and others engaged in the task of doing scientific research in various disciplines as the authors
have different academic backgrounds in different disciplines.
I congratulate and compliment the authors and publishers for their great efforts in bringing
out a comprehensive reference book on Research Methodology and Quantitative Techniques as
one volume which can be a useful reference material and a guide for research scholars, faculty
members, students and others across disciplines at various stages of their research.
Dr. N.S. Rathore
Former Deputy Director General (ICAR) Former Vice-Chancellor, MPUAT,
Udaipur and Former Vice-Chancellor, SKRAU, Jobner, and
Director, Geetanjali Institute of Technical Studies
Udaipur, Rajasthan, India
Preface
Research in higher education in all branches of learning has been getting increasing importance
during the recent past. In fact, research and extension education/community services are inte-
gral parts of higher education in different branches of learning. Research, being a scientific and
systematic probe which is more multidisciplinary in nature and the application of statistics at
different stages of research has made it a difficult task for many scholars/students pursuing their
post-graduate and PhD studies. As the majority of students in biological sciences come through
the biological stream after the tenth class, it is natural for them to find subjects with mathemati-
cal applications difficult. In order to familiarize such students with research basics, online and
offline courses in research methodology are offered by different institutions in higher education.
Realizing the fact that there is a shortage of comprehensive reference books on research
methodology and there is a great need to have more reference books on research methodology
including statistical applications specially designed for all streams of students, this book has
been written by the authors. The book is designed in such a manner that the research scholars
can identify their research topic, conduct it and do the analysis work themselves or with the help
of a computer programmer/software expert.
The very purpose of writing this book is to encourage research scholars to do original quality
research so that the huge research outcome through PG and PhD research forms part of knowl-
edge in different disciplines/subject matter areas in different disciplines. Apart from part of
partial fulfilment of respective degree programmes, the student’s research must lead to the gen-
eration of new knowledge, theories, concepts, practices, etc. in different disciplines. Besides,
student research can be a major part of location-specific problem solving as students can take up
location-specific problems for their research. The active involvement of students at all stages of
research will help to familiarize them with all issues in the conduct of meaningful research as
a means for expanding the knowledge components in different subject matter areas of different
disciplines.
The first three chapters are to familiarize students with research as a scientific activity.
Chapter 3 would give enough background to prepare the synopsis or research protocol as the
first step to conducting research. Chapter 4 gives the required knowledge on research meth-
odology in general and research methods and designs widely applicable in different areas of
study in particular which will help the students to identify the specific research methods and
designs as applicable to the topic finalized by them. Chapters 5 and 6 give an outline of differ-
ent designs available under experimental methods of research, mostly applicable in medical
sciences and agricultural sciences. Chapter 7 on observational and epidemiological research
covers the general research approaches for studies without interventions but based on field or
laboratory observations. Chapters 8 outlines designs used for interview and survey methods
including online surveys. Chapter 9 enlists various designs applicable to qualitative research.
x Preface
Chapter 10 is meant to give exposure to research scholars on different designs applicable to
sample selection. Chapter 11 outlines the scaling and scoring techniques applicable in research
for the quantification of qualitative variables. Chapter 12 covers the concept of different types
of research variables and research data. Chapters 13 to 17 cover the common statistical quan-
titative techniques used to analyze research data to generate results and findings of research.
Chapter 18 is just to acquaint the students on the choices available for computer analysis of data
including basic statistical analysis using options available under Microsoft Excel. Chapter 19 is
on the documentation part related to synopsis, thesis and research papers. Chapter 20 is on how
to write the bibliography and references at the end of the thesis and research paper as per the
standards specified for it. Chapter 21 is to give a brief exposure to ethical issues in research and
Chapter 22 is to familiarize researchers with IPR-based issues related to research.
While writing this book, the authors had to refer a large number of books, journals, research
papers and other published and unpublished, online and offline sources including their own
developed teaching material in different streams of education. We are highly grateful to authors
and publishers of all such sources.
We have tried to briefly include various aspects of research methodology and statistical tech-
niques. The book is intended to guide post-graduate and PhD students on all matters related
to research as part of their degree programmes. The faculty members in different disciplines
may also find it as a useful reference material. We aim to further improve and make this book a
comprehensive one for students, faculty members and other researchers. Suggestions from all
esteemed readers are most welcome.
1 Introduction
1.1 Introduction
Research in almost all branches of learning is closely associated with the development of the
area and enhancement of the quality of life of people. Basic and applied research in various
areas of study has made spectacular transformations in developing new theories, concepts,
knowledge, technologies, etc. Continued research makes it possible to add new theories and
information to the body of subject matter knowledge in various disciplines. Research focuses
on widening knowledge on the one hand and aims for improvement and refinement of ongoing
practices and techniques, search for newer techniques, interventions, processes, methods, etc.
on the other hand. There are many definitions for research which is a systematic and scientific
probe for new knowledge or an intensive search to find out solution to a problem.
Research and development (R&D) is very crucial in business, as it aims for useful knowledge
which can pave the way for better business processes with increased production efficiency and
reduced cost. It can also help businesses to have new products and services to meet the ensuing
challenges from other players in the market. Research in all branches of learning not only keeps
the subject matter vibrant but also adds to the body of subject matter knowledge. It is a tool for
building knowledge and facilitates the learning process.
The word research is derived from the French term ‘recherche’ or ‘recerchier’ which means
‘to go about seeking’ or to look for something or to examine closely and carefully. Research can
be defined as a systematic and scientific probe in any field for new knowledge, new information,
new theory, new practice, new intervention, new invention, new product, new process or any
such matters so as to answer a question or solve a problem. Research always aims to bring out
something new. Research is a continuous process not only for the development of the subject
matter but also for the development of the people as the output of research leads to enhanced
quality of life of the people. Most of the institutional research aims to evolve new techniques
and technologies which in turn enhance the quality of life of the people and thereby the mission
of institutions gets popularized.
The close nexus of research with education and community/extension services makes the
higher education system more vibrant. The problems of society are the determining factors for
the research agenda in different areas of learning in higher education and the research output
becomes a solution to their problems, and hence the social relevance of higher education is mate-
rialized. These days excellence, equity and expansion are the focal points of our higher educa-
tion on the one side and on the other side, the system is striving for quality-oriented, job-linked
and demand-driven higher education. Therefore, problem-oriented, location-specific and
development-based research in our educational institutions is getting more and more emphasis.
The penetration of academic professionals into local communities through extension education
DOI: 10.4324/9781003527183-1
2 Research Methodology and Quantitative Techniques
or community services helps to get firsthand information about emerging location-specific
researchable issues.
The complementary and supplementary association of institutional research with academic
research made it possible for the subject matter areas of various disciplines and subjects to
register spectacular growth in content and quality in areas of learning. The spillover effect of
the research-based development transforms the socio-economic status of people the world over
which is attributable to the evolution of need-based technologies in health, food, education,
business and other associated sectors. The research-based innovations in both products and
processes of these crucial sectors have paved the way to enhance quality improvement in the
day-to-day lives of people. In the health sector, the quality and quantity of health services have
increased tremendously leading to horizontal and vertical growth in institutional networks. In
agriculture, health and business sectors revolutionary changes could be brought out through
continued and systematic research. As a result, the magnitude of basic health indicators of peo-
ple the world over has registered a rising trend. The need-based high-quality research for inno-
vations and the people-friendly mechanism to transfer research outcomes across the borders of
countries in the world altogether transformed the quality of life of the present people which is
quite dynamic now as compared to what was available for just a few earlier generations.
Some of the definitions of research given in Research Methodology and Biostatistics—A
Comprehensive Guide for Health Care Professionals are as follows:
Research is defined as the creation of new knowledge and/or the use of existing knowl-
edge in a creative way so as to generate new concepts, methodologies, and understanding.
Research essentially is a problem-solving process, a systematic intensive study directed
towards full scientific knowledge of subject studies.
—Ruth M. French
Research may be defined as the planned, systematic search for information for the purpose of
increasing the total body of humanity’s knowledge.
—Archold Lancaster
New Knowledge
is a systematic New Information
and scientific New Invention
probe for New Process
Research New Product
New Theory
New Concept
New Technology
New (something) or
Solution to a Problem
1.2 Types of Research
Based on the nature of the problem, purpose of study and research outcome, all the research fall
under any one of the following two broad categories:
Basic/Fundamental Research: Basic and fundamental research is almost the same and aims
to answer how things work so as to evolve new theoretical postulates related to the subject
matter area of any subject. Basic research is more tedious, time-consuming and continuous
in nature. Basic research is the way to enhance the scope of any subject.
Applied Research: Applied research is deductive in nature and aims to find solutions to emerg-
ing problems or find answers to a vital question related to an area of study. It can be the appli-
cation of proven research results to solve a definite practical problem. It leads to generating
new facts or information and sometimes has commercial applications.
Based on the approach, manner of conducting the research and outcome variables, the applied
research can fall into one of the two following categories:
Over the years the ambit of research has been expanded and the categorization of research
under different names based on the nature, scope, applications, method of conduct and so on has
become more pertinent. Hence, several research types with a number of prefix or suffix terms
have emerged over the years with different nomenclature. Some of these are:
Research with prefix of subject matter: The research as economics, social, agricultural, medi-
cal, epidemiological, surgical, homeopathic, ayurvedic, management, business, commercial,
accounting, engineering, etc. is widely used these days to focus the subject matter relevance
of research in respective areas.
Research with focus on place and agency of conduct: Research as institutional, inter-
institutional, academic, college, departmental, faculty, students, industrial, collaborative,
multi-locational, etc. is used to give emphasis as to where and how the research is managed
or carried out.
Research-based on goals/objectives: Research as policy-oriented, problem-oriented, longitu-
dinal, comparative, exploratory, descriptive, correlational, evaluation studies, impact studies,
adaptive research, operational research etc. link to the aim or goal, purpose or broad objec-
tive of various such research work.
4 Research Methodology and Quantitative Techniques
Research based on the manner of conduct: It includes quantitative and qualitative research
types. Quantitative research as experimental, quasi-experimental, pre-experimental; as
non-experimental such as observational, survey, descriptive, or analytical, case-control,
cohort, meta-analysis etc.; qualitative research as phenomenological, ethnographic, grounded
theory, case studies, historical studies, etc. based on the specific research methods and designs
used to carry out various research studies.
Besides, there are some specific types of research reflecting the objective of research, some of
which are:
Conceptual Research: Such research comes out with new concepts in any subject matter area.
It enhances the scope of the subject matter of a subject.
Empirical Research: It comes out with empirical evidence related to new concepts evolved.
Diagnostic Research: It aims to assess the reasons for any happening as it is not apparently
known earlier.
Clinical Research: It is focused on remedial measures for any unwanted or harmful occurrence
of an event. It is also used to refer to research carried out in clinical units of medical colleges
and hospitals.
Academic Research: The research carried out in higher academic institutions as a regular activ-
ity by faculty members, students or both.
Developmental Research: The research programmes for improving the quality of a product
or a process and thereby the quality of life of the people falls under development research.
Action Research: Applied research for knowledge and socio-economic empowerment of peo-
ple forms action research. Research and action go together in action research.
Cross-Sectional Research: Research carried out to assess the status of a situation at a point
in time across various sections of society. It can be exploratory, descriptive or explanatory.
Longitudinal Research: It has a time dimension and is used to answer questions about tempo-
ral changes taking place over a specified time frame.
Time Series Study: The changes taking place over time in the values of a research variable with
or without an intervention.
Panel Study: It is the combination of time series with cross-sectional study.
Cohort Studies: These are studies held on groups of homogenous subjects/objects to compare
outcomes across groups including a control group.
1.3 Quality of a Researcher
A researcher must have certain qualities to be called a true researcher. A researcher must be crea-
tive with a critical mind and must have an analytical approach. He must have multi-disciplinary
aptitude and be honest. The capacity to anticipate research output and to identify the real benefi-
ciary of the research makes the researcher more focused.
1.4 Importance of Research
All research is development linked in one way or the other. The benefit of research is not only
confined to the professional growth and development of the researcher but also to the society
at large as most of the location-specific and problem-oriented research leads to the evolution of
new techniques and technologies which is meant to improve the quality of life of the people or
to reduce the burden of their life. Research which is a continued activity enhances the scope of
Introduction 5
the study area and field of study. It develops a critical mind for the researcher on the one hand
and on the other hand, the mission of the institution where it is conducted gets popularized and
brings name and fame for the researcher and the institutions.
1.5 Research Project
Any project is a time-bound activity with stipulated goals and targets. It will have stated costs
and anticipated benefits. A research project is not an exception to these aspects of a project. The
benefit of research is through the outcome of research. There are a number of sponsoring agen-
cies to support the conduct of research as a project by competent implementing agencies. Spon-
sored research enhances the technical output of the sponsor, helps the implementing agencies
to develop research infrastructure and research capability and the society to derive enhanced
quality of life.
• Research is a systematic and scientific probe for new knowledge, information, technology,
etc. which adds to the subject matter content and enhances the quality of life of the people.
• The social relevance of research makes it a developmental activity for the welfare of the
people.
• The broad types of research include basic and applied based on the goal and outcome of
research.
• Based on the nature, the way of conduct and the outcome, the research can be quantitative or
qualitative.
• Other categories of research are based on subject matter, place of conduct, manner of con-
duct, goals and objectives, research designs besides various other specific types.
• A researcher must be creative and honest with a critical mind and analytical approach.
• The benefit of research is not only confined to the professional growth and development of the
researcher but also to the society at large as most of location specific and problem-oriented
research leads to the evolution of new techniques and technologies which are meant to
improve the quality of life of the people or to reduce the burden of their life.
Suggested Readings
Ghosh, B. N., Scientific methods and social research, Sterling Publishers Private Ltd., New Delhi, 1984.
Kerlinger, F. N., and H. B. Lee, Foundation of behavioral research, 4th ed., Harcourt College, Atworth,
TX, 2000.
Sharma, B. S., Research methods in social sciences, Sterling Publishers Private Ltd., New Delhi, 1983.
Sharma, S. K., Research methodology and biostatistics—A comprehensive guide for health care profes-
sionals, Elsevier RELX India Pvt. Ltd., New Delhi, 2017.
Sidhu, K. S., Methodology of research in education, Sterling Publishers Private Ltd., New Delhi, 1985.
Trivedi, R. N., Research methodology, College Book Depot, Jaipur, 1991.
Williams, M. A., ‘Editorial: assumptions in research’, Research in Nursing & Health 3(2): 47–48, 1980.
2 Network of Institutional
and Academic Research
2.1 Introduction
The total research output in different areas of study or operational areas comes from various
exclusive research institutions as well as from academic institutions where research is an inte-
gral activity. Research institutes or research centres are established at international, national,
state and regional levels by various governments and other agencies to focus research in various
mandatory areas. Some institutes are meant for basic research while others focus on applied
research. Technology-based economic development was given adequate importance ever since
the planned economic development process was initiated in India after independence. A network
of research institutions came into existence in India under various ministries and departments
like the Indian Council of Agricultural Research (ICAR), Indian Council of Medical Research
(ICMR), Council of Scientific and Industrial Research (CSIR), Defence Research and Devel-
opment Organization (DRDO), Indian Council of Social Science Research (ICSSR), Indian
Council of Forestry Research and Education (ICFRE), Indian Council of Historical Research
(ICHR), Department of Science and Technology (DST), Department of Biotechnology (DBT)
and so on. There are also several research institutions in areas like chemical sciences, physi-
cal sciences, mathematics, earth sciences, engineering sciences and material sciences, minerals
and metallurgy, multi-disciplinary, etc. As technology-based development has been given prime
importance in the development strategy in India, there are research centres in almost all areas of
learning as well as related to production activities in India to augment the process of develop-
ment in different areas of study and development.
DOI: 10.4324/9781003527183-2
Network of Institutional and Academic Research 7
coordination and advancement of biomedical research. It is under the Department of Health
Research, Ministry of Health and Family Welfare, Government of India. The ICMR established
the Clinical Trials Registry - India in 2007 which serves as India’s national registry for clinical
trials. Research on specific health issues, such as AIDS, malaria, cholera, diarrheal diseases,
vector control, nutrition, food and drug toxicology, reproduction, haematology, oncology, medi-
cal statistics, etc. is undertaken by the 26 national institutes of the ICMR. The other six regional
medical research centres under ICMR focus on addressing region-specific health problems so
as to strengthen and generate research capabilities in different geographic areas of the country.
Research priorities of the Council are in line with the national health priorities which include
research on major non-communicable diseases like cancer, cardiovascular diseases, blindness,
diabetes and other metabolic and haematological disorders; research on mental health; and
research on drugs (including traditional remedies), control and management of communicable
diseases, fertility control, maternal and child health, control of nutritional disorders and devel-
opment of alternative strategies for health care delivery. The goals of these initiatives are to
improve population health and well-being of the people while lowering the overall burden of
disease. A variety of research initiatives conducted by medical students, faculty members and
institutions are sponsored by the ICMR.
• Dispersal of talent from more developed to less developed regions, especially to areas where
social science research is underdeveloped; and
• Development of quality of research and interdisciplinary research in social sciences to
improve the social science inputs into development.
The Council is, at present, assisting 30 research institutes and six regional centres in different
regions in India.
Apex academic bodies like UGC, AICTE and others also have provisions to support research
in different areas. The need-based technologies developed by these institutions, either directly
or through sponsored research projects through other institutions, not only add to the national
Network of Institutional and Academic Research 9
knowledge bowl but also play a crucial role to ease the lives of the people by enhancing their
quality of life.
2.8 Academic Research
India has the third largest public education system, after the USA and China, in the world with
the University Grant Commission (UGC) as the apex body to regulate and monitor the higher
education system. There are about 15 accreditation agencies approved by the UGC for quality
assurance in various faculties of learning. As of 2020, India had about 1000 universities with
54 as central universities, 416 state universities, 125 deemed to be universities and 361 private
universities. Additionally, there are about 160 institutes of national importance such as AIIMS,
IIMs, IIITs, IISERs, IITs, NITs, etc. In all, there are about 53000 colleges in India. Every year
nearly 25,000 PhDs are awarded and nearly 200,000 stand as enrolled for research degrees.
Apart from central, state and private universities, there is a network of institutions offering
PhD degrees in science, technology, agriculture, medicine and other areas under apex bodies
like CSIR, ICSSR, ICAR, ICMR, etc. Thus, there are adequate opportunities for those who are
desirous to do research in various subjects.
Due to the UGC regulations for PhD admission tests, regular coursework and its clearance,
the PG and PhD research in academic institutions is a novel attempt in training for research
methodology in different areas. The huge research efforts made by the educational institutions,
teaching and research faculty members and also by the PhD scholars add to the research out-
come of the country. The academic research output is yet to be fully recognized as a vital
means for socio-economic development of society for improving their quality of life and also
for the development of the subject in most of the areas. However, there are institutions focusing
on problem-oriented and location-specific research which proved to be pivotal in adding new
knowledge to the subject matter which is useful to society in one way or another. Hence the need
of the hour is to interlace the research in higher education institutions with the research efforts
carried out by the research institutions of high repute in different areas in the country as a means
for development and quality enhancement of the life of ordinary people.
Community Services/
Extension Education
Teaching Research
When a college teacher discharges all these functions that are complementary and supple-
mentary to each other, the social relevance of higher education will be much higher on one side
and on the other side the spectrum of higher education will be widened. For example, a teacher
teaching a particular topic in a class based on contents in the latest edition of standard textbooks
when linking his research and community service/extension education experience makes the
teaching more effective. A teacher as a researcher must have the latest theoretical knowledge of
the subject matter and the location-specific problems perceived through community services/
extension education. Similarly, the extension services provided by the teacher as an extension
specialist become perfect when he or she links theatrical knowledge with the latest research
outcome related to the area of relevance. Over the years, the importance of integrated teach-
ing, research and extension/community services is getting strengthened in India. In agricul-
ture sciences, medical sciences, etc. the integrated teaching, research and extension/community
services are relatively more in practice. With the idea of all colleges and universities adopting
villages in rural areas, the cost-effective and quality of life-enhancing technologies will get
popularized in rural areas.
The integrated teaching, research and clinical services/extension education in higher educa-
tion institutions is comparable to a growing TREE where ‘T’ stands for effective teaching, ‘R’
for problem-oriented research and ‘EE’ for extension education or the community services that
a college/university renders to the society.
A growing tree has many parts which broadly comprise the root system, the stem and
branches including leaves. The root system makes the tree stand firm on the ground even at
times of natural calamities and other adversities which is comparable to the teaching/education
part of a university/college. A strong education system keeps the college growing over the years.
The stem of a tree gives visibility to it from far-off places which is comparable to research in
a university/college. A university/college doing good research will have visibility from far-off
Network of Institutional and Academic Research 11
places through the quality of research-based publications it makes including publications by
faculty members and students. The branches and leaves of a tree give shelter and shade to peo-
ple, animals and birds and it is comparable to the community services and extension education
it gives to the local community. Thus the ‘TREE model’ of higher education is going to be more
relevant in the years to come where research plays a pivotal role in the growth and development
of a university/college. The academic research conducted in universities and colleges along with
the institutional research through the network of institutions in related areas makes the higher
education system more vibrant and sustainable.
In India, there are two broad systems of research in different areas. Firstly, institutional research
is carried out on a regular basis in mandatory areas by the research institutes having specific
research mandates. Secondly, the academic research carried out by PG and PhD students and
faculties of higher educational institutes.
• The need of the hour is to interlink these parallel sets to avoid duplication and to make
research more comprehensive.
• Integrated teaching, research and extension/community services in universities and colleges
can make the higher education system more vibrant and effective.
• The teaching, research and community services/extension education in higher educational
institutions will have to become more and more integrated.
• The ‘TREE’ model of higher education in India where T stands for Teaching, R stands for
Research and EE stands for Extension Education/Community Services makes the higher
education system more vibrant and effective.
Suggested Readings
www.csir.res.in
www.drdo.org
www.icar.org.in
www.icmr.org.in
www.icssr.org.in
3 Research Methodology, Research Topic
and Research Synopsis
3.1 Introduction
Often the terms research methodology, research methods and research designs are used inter-
changeably. In the true sense, all these are different and each of these has a specific meaning.
Research methodology is a wider concept in relation to research methods and research designs.
In fact, research method is only a part of research methodology and research design is a part of
research method.
There are two broad ways of doing research, namely institutional research and academic
research. Those who do research as their main task in research institutions to meet its mandatory
requirements have little option to choose their topics for research as the same is assigned to them
in most cases. Secondly, those who do research as part of their academic activity or for partial
fulfilment of the degree programme have a choice to select their research topic. The research-
ers in academic institutions are often confronted with the task of identifying a research topic,
an appropriate research method and a suitable design as well as adopting suitable quantitative
techniques befitting to a research problem. Hence a researcher must be aware of ways to select
a topic, appropriate research methods and designs as well as analytical techniques so that he
or she can choose the most appropriate methods and techniques to best resolve the challenges
coming in the way of conducting the research. If the researcher has adequate knowledge of vari-
ous research methods and analytical techniques, then the task of doing research becomes easy
and interesting. Research methodology covers all those methods to identify a research topic and
conduct the research to draw final conclusions using appropriate techniques including publica-
tion of research outcomes.
Most universities have designed courses on research methodology for PG and PhD stu-
dents. The UGC has also made it mandatory for PhD students to undergo the course work
and its clearance is necessary for registering as a PhD scholar in any department of the uni-
versity or a college or institution. Research being a systematic process, the researcher must
have knowledge about the complete process, from selecting a topic and making the synopsis
to final submission of the thesis as well as publication of research results. In most universities,
the research methodology (RM) course also includes modules on computer applications. The
ethical considerations in research and research linked IPR issues also form part of RM courses
in different universities.
Research methodology is a wider term as it involves aspects like the selection of a
topic, review of literature, setting aims and objectives of research, fixing appropriate
research method and design, determination of sample size and selection of sample, collec-
tion and compilation/computerization of data, data analysis, writing results, discussion,
summary and conclusion, writing a bibliography, writing a thesis, publication of research
papers, etc.
DOI: 10.4324/9781003527183-3
Research Methodology, Research Topic and Research Synopsis 13
Hence a comprehensive research methodology course generally includes aspects like:
Efforts have been made to cover all these topics briefly in this book.
(i) Planning
• Problem identification/topic finalization.
• Review of latest literature on the topic.
• Finalization of aim, objectives and research design.
• Synopsis drafting, submission and approval.
(ii) Execution
• Preparation of data collection format.
• Pre-test of data collection format.
• Sample selection.
• Pilot study.
• Revision of data collection format.
• Final data collection.
• Computerization of data.
• Validation of data.
3.3.1 Selection of a Problem
The selection of a topic for research by a scholar in any higher educational institution can be
done meaningfully with the following steps:
• Identify the research area of interest out of many areas of the subject.
• Extensive review of related work done in the identified area within the institution.
• Review of work done in the area of interest elsewhere.
• Visit websites of research organizations doing research on the same or similar areas.
• Close observation of field problems faced by people.
• Locate research gaps in the area and make out a tentative topic for research.
• Discussion with faculty members and others already doing research in the department.
• Listing and short listing of research problems.
• Finalize tentative aim and objectives of the identified research topic.
• Discuss the tentative topic with all concerned.
• One-to-one dialogue with major adviser/members of the advisory committee.
• Incorporate suggestions from peers and other researchers.
• Finalize tentative topic of research.
Thus, identification of a research problem is a tedious task for most of the researchers. The first
and foremost thing that a researcher has to do is to develop a platform for his/her research. For
that one should be very clear about the level of research done in the past by researchers in the
same or similar topic elsewhere and in the same institution. In other words, the researcher must
Research Methodology, Research Topic and Research Synopsis 15
Table 3.1 The FINER Concept
F Feasible: For the researcher in terms of available time, required resources, self-competence,
accessible guidance, availability of study units.
I Interesting: to self, to the peers, professionals, society.
N Novel: New, different from what is known or done already.
E Ethical: Meeting the ethical requirements like informed consent from subjects, privacy of data,
transparency, avoiding plagiarism, accuracy of data, etc.
R Relevant: to the subject matter, society, target beneficiaries, etc.
be clear about the ‘what is’ of the topic. Then the researcher must ponder the question, ‘what
ought to be’ about the topic. It will help the researcher to identify the real research gap. The topic
of research so identified must be one searching solution to an emerging problem or bridging the
knowledge gap. The problem must be convertible into specific research objectives which are
clear, precise and self-explanatory. In other words, the research problem must be a scientifically
probable statement expressed either as a question or as a hypothesis.
It is said that a research problem must be ‘FINER’ meaning that it must be—feasible, inter-
esting, novel, ethical and relevant. Here, feasible means cost-effective, manageable, within the
technical competence of the researcher and that can be completed timely. The research problem
must not only be interesting to the researcher but also to peers and society as a whole. The
novelty of the problem is in its uniqueness and innovativeness. The ethical aspect of research
refers to the acceptance of the problem and the conduct of research in a manner acceptable to
the scientific community. The research problem is relevant if its output contributes to scientific
advancement and paves the way for further research (Table 3.1).
3.4 Research Synopsis
The primary step of every academic research is to develop a research synopsis or research
protocol or research outline as the roadmap for research execution. A well-thought, thoroughly
planned and drafted synopsis makes the task of the researcher easy and practically feasible. In
order that any new research is a step forward in the field of study, the researcher has to make sure
the terminal point of past research carried out by others in the same or similar areas which can
be a platform to start research by the researcher. Hence a comprehensive review of past research
work, right from the same institution to anywhere else, is of very high relevance. A meaningful
synopsis will have the following items with some modifications in the nomenclature of these
items and sequences as decided by the respective institutions.
A synopsis made in haste by the researcher will cause enormous problems during the course
of conducting the research. Hence the time and energy put in by the researcher for preparation of
a synopsis will ease the task in a big way leading to the successful completion in time. Normally,
research scholars do not give that much emphasis in finalizing a meaningful synopsis and end
up with a series of problems and hurdles.
16 Research Methodology and Quantitative Techniques
The components of the research synopsis will have to be clearly understood by the researcher.
Right from the topic identification to finalization of the draft synopsis, all aspects will have to be
seriously treated and with the finalization of synopsis, the roadmap for research must be clear
and ready for execution by the researcher. Academic research is always a time-bound activity.
A crystal-clear synopsis makes it possible for the researcher to complete the task successfully
in time. The half-hearted way of drafting the synopsis leads to delays in timely completion and
sometimes leaves the task unaccomplished by the researcher forever.
The topic selection in most cases will have to be done by the researcher. In some cases, the
research guide may assign or suggest a topic of his interest to the scholar. The sequence of steps
in the selection of research topic by the researcher is given in Figure 3.1. The remaining sections
in this chapter are devoted to explaining the components of a research synopsis so that a well
thought out synopsis is prepared by the researcher.
Figure 3.1 Sequence of Steps in Finalizing the Synopsis for Academic Research
Research Methodology, Research Topic and Research Synopsis 17
3.5 Research Problem Formulation
It is the first technical section of a research synopsis and also termed as importance of the study
of justification of the topic. Hence the researcher has to fully justify the importance of the topic.
One can start with the origin/definition/emergence of the problem and its gravity at the global,
national and local levels. The consequences and implications of the problem along with the
efforts made to overcome the adverse implications and to promote positive implications can be
stated. It would be useful if the current scenario is compared with the ideal situation so as to iden-
tify the research gaps. After enlisting all possible gaps, the researcher can identify the particular
aspect being focused on by the researcher in the proposed study and its importance at different
levels. The justification for the selection of the topic will have to be made clear in this section.
The objectives, research hypothesis and expected outcome flow from this section of the synopsis.
The research hypothesis restates the research aims as the expected concrete final outcome
of research in a concise manner. It generally matches with the alternative statistical hypothesis.
The aim is a concrete statement reflecting the expected final outcome of the research. The objec-
tives reflect the strategies or supporting activities to arrive at a definite conclusion.
3.7 Review of Literature
It is the concise documentation of published research work related to a specific area/topic that
the researcher has gone through and to be used at various stages of the research. The review of
literature can be a good source to formulate/finalize the research problem, its objectives, meth-
odology etc. In fact, it is the starting point of any research as the researcher has to ensure the
present stage of research in the identified area related to the selected topic and to set a platform
for further research. Hence, it is advisable to make searches for the latest reference materials
from national and international published sources. Additionally, the researcher has to compare
the findings of his/her research with those of other researchers in the discussion section of the
thesis/report/paper and the review of literature plays a major role here. In short, the review of
literature helps to make a stock of past work done, to shortlist related areas for further research,
to identify a topic for research, to formulate the aim, objectives and research hypothesis, to
identify data needs and sources, to acquaint with quantitative techniques to be used to address
the stipulated objectives and above all to strengthen the content of research reports/thesis. Each
review must be documented in a concise and precise manner.
Sources of review of literature include:
The researcher must do the review in a scientific manner by noting the following information in
hard or soft copy to be used later on for the thesis chapters like introduction, review of literature,
methodology, result and discussion and finally the bibliography to be given at the end.
• Author(s) name.
• Title of paper/book/report.
• Name of journal in the case of paper/title of the book.
• Volume and Issue number of the journal/publisher of the book and year of publication.
• Pages covered by the referred paper in the journal.
• A brief write-up covering the location of the study, the main objectives, methodology used
and major outcome and conclusion of the study.
The documentation of the review of literature on these lines from the very beginning will
help the researcher to write the report in the manner in which the reports/thesis/research papers
are required to be written. In all cases, the bibliography/references are required to be given at
the end of the thesis/paper in a specified style and the same is also based on reviews done by the
researcher.
Research method is the method to generate data and evidence in support of the research
hypothesis which is based on the expected outcome of research. There are various research
methods and many designs under each method. The researcher will have to identify the best
befitting method and design for the research problem. This book is meant to give expo-
sure to researchers about the major research methods and designs applicable for different
disciplines.
Research Methodology, Research Topic and Research Synopsis 19
3.8.2 Sampling Plan
Normally every applied research is focused on a target population. The target population may
not always be accessible to the researcher. Hence based on a defined sample population, a suit-
able sampling plan is used to select the sample for detailed investigation. The details of the
sampling plan are outlined in Chapter 10. Based on the nature of the problem and target popula-
tion, a suitable sampling plan will have to be identified and a sample of adequate size will have
to be selected.
3.8.4 Data Collection
The quality of research data collected through the identified research method and design has a
great role in the accuracy of results obtained in any research. Hence utmost care and caution will
have to be made to collect the true and correct data from the respondents or units of the selected
study. The sampling errors, due to the fact that the study is sample-based, as part of the larger
population under study can be minimized by having the appropriate sampling plan and sample
size for the study. The non-sampling errors which are due to rectifiable causes will have to be
minimized by using the most befitting scientific methods for data collection. In experimental
research, there may be standard equipment to record data of the subject of study at the required
point in time. In opinion and perception-based survey methods, the data or opinion may not be
always consistent over time even for the same respondent. In observational studies also the time
reference of data is very vital. Hence the likely non-sampling errors in data will have to be mini-
mized at the time of data collection. In all methods of research, there should be a pre-designed
data format for a pre-fixed time reference for data collection which is guided by the objectives
of the study. The data collection format may be prepared in such a way that all required data to
fulfil the stipulated objectives are collected from each of the respondents and no unwanted data
is collected.
3.8.5 Data Analysis
Broad statistical quantitative techniques include classification of data, tabulation of data, graphi-
cal/diagrammatic presentation of data, measures of central tendency (mean, median mode, quar-
tiles, deciles, percentiles), measures of dispersion (standard deviation, mean deviation, range),
measures of skewness (asymmetry), measures of kurtosis (flatness or peakness), correlation and
regression analysis (associations and relationships of variables), Analysis of variance (in case of
multiple samples), estimation (population parameters are estimated as sample statistic), testing
of hypothesis (parametric and non-parametric) and so on.
20 Research Methodology and Quantitative Techniques
The application of appropriate research methods for data collection and quantitative tech-
niques for data analysis is very pertinent to draw meaningful inferences and conclusions
from research. Every subject matter has its own measurable indices, indicators and tech-
niques. In research, one can also make use of such measurements of other related subjects. In
multi-disciplinary research, it is very much required to use such concepts and measurements of
related subjects. The right selection of quantitative analytical techniques befitting the scope and
objectives of the study is quite vital.
Besides, statistics is such a subject that its quantitative techniques can be applied in the
research of many of the disciplines. Statistical techniques for descriptive and analytical objec-
tives of various research studies are available. Every researcher may not be well-versed with
all such techniques. Hence it is always better for the researcher to consult the statistician at
different stages of the study. It is in the fitness of things to consult the statistician at the time of
preparation of synopsis for purposes like selection of sampling method, sample size, analytical
techniques for different objectives, etc. The broad type of data analysis using univariate, bivari-
ate and multivariate statistical techniques is summarized in Figure 3.2.
The analysis of data can be perceived under three broad heads:
(i) Analytical techniques related to the main subject matter area of the topic.
(ii) Analytical techniques/quantitative methods from statistical methods.
(iii) Analytical techniques from related subject matter of the topic.
The knowledge of various quantitative techniques helps the researchers to apply the most
appropriate techniques in different situations which in turn helps the researcher to come out
with meaningful inferences. In fact, the quantitative techniques must function in a cafeteria
mode so that there exists a proper interface between research problems and quantitative tech-
niques. Knowledge of various quantitative techniques will help the researcher to use the most
appropriate technique suitably to address the research objectives and also to arrive at concrete
conclusions.
In univariate studies, the measures of central tendency like mean, median, mode, etc. help to
describe the phenomenon under study for a meaningful understanding. In some cases, the vari-
ability in the values can be of great importance. The range, mean deviation, standard deviation,
coefficient of variation, etc. makes it possible to assess variability or relative variability in the
• The research methodology course designed for PG and PhD students in universities and
research institutes is to prepare a scholar for systematic conduct of research.
• The broad stages in research include planning, execution, data analysis and reporting.
• The selection of topic for PG and PhD research on ‘FINER’ (feasibility, interesting, novel,
ethical and relevant) approach can make the research more practical, meaningful and for
timely completion.
• A systematically prepared synopsis makes the task easier and simpler for the researchers.
• Systematic review of literature can help the researcher from problem formulation/identifica-
tion to thesis writing and research publications.
• The research methodology course is a prerequisite for doing good research and includes con-
cepts, types, designs, selection of topics, review related issues, synopsis preparation, stages
of research, quantitative techniques in research, ethical issues, etc.
• It also includes basic statistical methods and quantitative data analysis including descriptive
and analytical techniques.
• It also includes computer applications for data management and analysis in research.
• Besides, ethical issues in research and publication, bibliography and references writing
methods, inclusion-exclusion criteria in sample-based studies are also part of RM.
• The quality of research data collected through the identified research method and design has
a great role in the accuracy of results obtained in any research.
• The sampling errors can be minimized by having the appropriate sampling plan and sample
size for the study.
• The non-sampling errors will have to be minimized by using the most befitting scientific
methods for data collection.
• The application of appropriate methods for data collection and quantitative techniques for
data analysis is very pertinent to draw meaningful inferences and conclusions from research.
• Every subject matter has its own measurable indices, indicators and techniques. In research,
one can make use of such measurements of other subjects including various statistical
techniques.
Suggested Readings
Bruce, C. S., ‘Supervising literature reviews’, in Quality in postgraduate education, edited by O.
Zuber-Skerritt and Y. Ryan (eds.), Kogan Page, London, 1994.
Cooper, H. M., Integrating research: a guide for literature reviews, 2nd ed., Sage Publications, Newbury
Park, CA, 1989.
Ghosh, B. N., Scientific methods and social research, Sterling Publishers Private Ltd., New Delhi, 1984.
Hart, C., Doing a literature review, Sage Publications, London, 1998.
Kerlinger, F. N., and H. B. Lee, Foundation of behavioral research, 4th ed., Harcourt College, Atworth,
TX, 2000.
Leedy, P. D., Practical research: planning and design, 6th ed., Merrill, Upper Saddle River, NJ, 1997.
Sharma, B. S., Research methods in social sciences, Sterling Publishers Private Ltd., New Delhi, 1983.
Trivedi, R. N., Research methodology, College Book Depot, Jaipur, 1991.
4 Research Methods and Designs
DOI: 10.4324/9781003527183-4
24 Research Methodology and Quantitative Techniques
sociology, education, history, etc. The data collection methods in qualitative research
include direct observations, interviews, focus group discussion (participatory rural
appraisal- PRA; rapid rural appraisal-RRA and so on.), surveys with open-ended ques-
tions and secondary sources like text images, audio, video recordings, etc. Besides, online
research is carried out in areas like business, marketing and other areas. The outcome of
qualitative research is generally in the form of description in text form.
• Mixed methods are the combination of both quantitative and qualitative methods.
Each of these methods has a number of different research designs. Research design is a specific
way of conducting research under different methods or situations as per the stipulated aim and
objectives of the study. A particular research problem can be carried out by applying differ-
ent research methods and designs, but for each problem, there can be one most appropriate
method and design. Hence the knowledge of various research methods and designs helps the
researcher to identify the best method and design to be used for the specific problem under a
specific situation. The selection of a research method and design depends on the nature and
size of the population under study, the nature of the research problem, the goal of the study,
researcher’s exposure to different research methods and designs, ethical issues, type of par-
ticipants and availability, access to required material, requirement of monetary and manpower
resources for the study, time available to complete the study and the extent of control that the
researcher can have on extraneous factors and many others. For some of the research problems,
alternative research methods and designs are possible. In any case, there is always a best-fit
research method and design for a specific research problem so as to ensure the authenticity and
precision of research results.
4.3 Experimental/Interventional Method
An experiment is a procedure used to accept or reject a research hypothesis on the basis of evi-
dence given by the outcome/dependent variable due to the intervention or manipulation made
on a particular factor or factors of the study. In experimental research normally there will be a
control so as to objectively compare, assess and quantify the impact or response of any specific
intervention or manipulation on the subjects of study. Both the experimental and control groups
are drawn randomly from the same population. It is generally applied in agriculture, educa-
tion, health and clinical research, biological sciences, etc. where one or more interventions
Research Methods and Designs 25
(as treatment/manipulation/addition/alteration) are made by the researcher on the subjects of
study to assess the corresponding impact on the outcome variable of the study. The interventions
can be a new method of teaching, a new crop variety, a new practice, a new drug/fertilizer, a
new combination of drugs/fertilizers, a new dose of a drug/fertilizers, withdrawal of any drug or
combinations, a new method of administering the inputs or drugs, a new method or process of
treatment or surgery, training, counselling and so on.
Validity of a research design: The authenticity of research results depends on the research
methods and design chosen for a research study. The validity of research results can be evalu-
ated by using internal and external criteria.
The internal validity ensures whether there is a real difference in the outcome variables
(interventional effect) in the experimental group compared to control group or the difference is
due to some extraneous factors. If the experiment is able to establish that the effect (on depend-
ent/outcome variable) is due to the independent variable (treatment or intervention), then we
say that the experiment has internal validity. Randomization either in the formation of experi-
mental and control groups or in the application of treatments randomly to the units of study is
the technique to ensure internal validity of results. The extraneous factors that can disturb the
internal validity of results include proxy factors of treatment (irrigation in fertilizer trials), the
psychological effect of pre-test results on post-test values, change in instruments and measure-
ment methods for pre and post-tests, replacement of dropouts in the study, non-homogeneity of
populations (treatment and control groups), non-random selection of subjects for the study and
similar factors can adversely affect the internal validity of results.
The external validity refers to the application of research findings to an extended population
or the scope for generalization of specific research results to other subjects in the population. The
factors contributing to external validity include the Hawthorne effect (giving positive responses
as a member of a study group), researcher-respondent relationship, pre-test knowledge effect,
socio-cultural differences of people, geographical effect and temporal effect and so on.
There are many research designs applicable under experimental research methods in areas like
agriculture, health sciences, education, psychology, business, etc. These designs are:
• True experiment design (TED) (post-test only design, pre-test post-test design, Solomon
four-group design).
• Randomized control trial (RCT).
• Completely randomized design (CRD).
• Randomized block design (RBD).
• Latin square design (LSD).
• Factorial design (FD).
• Quasi experimental design (QED).
(Non-randomized control group design, time series design).
• Pre-experimental design (PED).
(One-shot test design, one-shot pre-test post-test design).
More details about these designs are given in Chapter 5 and Chapter 6, respectively.
26 Research Methodology and Quantitative Techniques
4.4 Non-Interventional Observation Methods
Non-interventional observation research is a major and widely used research method in social sci-
ences, business, commerce, education, health sciences, psychology, agriculture, etc. In medical
sciences, it is applicable in both clinical and non-clinical areas. The research in basic medical sci-
ence subjects (anatomy, physiology, biochemistry, etc.) and community-based research in areas
of community medicine, are mostly based on non-interventional observations only. Researchers
make no external or internal manipulations to subjects of study as part of non-interventional
observation research. It is widely used to assess the status of a situation in a population/system or
to correlate internal factors or independent variables of the population/system. In fact, there are
situations where independent variables cannot be manipulated due to ethical or other reasons and
non-intervention observational study remains the best option in such cases. Natural occurrence of
events or the observations made through laboratory or field investigations without any interven-
tions or any deliberate control of events in any form is the basis of non-interventional observa-
tion research. In such research, observations are made without any manipulation of independent
variables. The observations/data is collected through suitably structured data format based on
personal observation, laboratory tests, medical test reports; OPD/IPD records etc. which are
analyzed to have findings to arrive at a conclusion. There are specific designs befitting different
situations under the non-interventional observation method of research. There is wide applica-
tion of this method in applied areas. In these areas, the observations/data are generally collected
from a selected sample of the population. Observational studies, while providing the status of a
situation or relationship among variables under study, also pave the way for further research by
developing meaningful hypotheses. The observational studies may include physical observations
(personal or field), clinical observations (physical or laboratory), epidemiological observations
(community-based) and interview-based responses. Such studies are mostly cross-sectional in
nature and can be descriptive or analytical based on the objectives of the research.
The observations in numerical form, either physical or values, are widely used in applied research in
many areas like agriculture, health sciences, economics, sociology, commerce, business, education
and many others. Non-interventional studies are carried out either to describe a situation or to come
out with cause-effect relationships in the system. Generally random samples are drawn from the
population under study to collect information to estimate population parameters or relationships.
Most of the research in para-clinical sciences like anatomy, physiology, biochemistry, micro-
biology, pathology, forensic medicine, community medicine, etc. and also some studies in clini-
cal subjects are based on observations related to patients as recorded in OPD, IPD, laboratories
and operation theatres and even after hospital discharge. The observed changes in physiological
functions, anatomical factors and pathological and biochemical parameters in relation to various
morbidities/disabilities are the basis of research in these areas. Most of the non-interventional
observation research is descriptive or analytical in nature. These designs are used to describe
incidence, prevalence, causes, inter-relationship of various factors associated with diseases or
events. It can also be used to describe the frequency of occurrence of a phenomenon.
The research designs under non-interventional observation research methods are:
Epidemiological research forms a major area of research in health sciences. Epidemiology is the
study of distribution (with respect to people, place and time) and determinants of diseases. Both
interventional and non-interventional research designs are used in epidemiological studies. The
interventional research designs have been listed in the previous section. The non-interventional
observational research designs used in the epidemiological category of studies are:
• Case-control research.
• Cohort studies: A cohort is a group of homogenous individuals with respect to a phenomenon
which can be a case or exposure to a risk factor. Cohort studies include:
Prospective cohort
Retrospective cohort
Ambispective cohort studies
• Field trials.
• Community trials.
• Uncontrolled natural trials.
• Natural experiments.
• Before and after comparison trials.
4.5.1 Survey Method
A survey in the true sense is the process of collecting, analyzing and interpreting data from
many individuals/respondents with the aim to determine insights, perceptions and opinions
from a selected group of people about a specific item/event/product, etc. A survey goes much
deeper than a questionnaire and can involve more than one form of data collection. A survey is
a combination of questions, processes and methodologies that analyze data collected from the
participants. Most of the surveys involve questionnaires. The ultimate purpose of a survey is
to find out more about the opinions, insights and perceptions of a certain group of people on a
specified aspect. This is done for many reasons. For example, business surveys are used to find
28 Research Methodology and Quantitative Techniques
out more about consumer behaviour. A single questionnaire is only one small part of a survey.
Otherwise, the approach for both survey and interview methods is more or less the same. Survey
research has different modes for conducting it i.e. e-mails, online, telephonic, face-to-face, etc.
The survey research has two broad designs:
4.5.2 Interview Method
Survey research and interview research are often used as synonymous with each other. But the
basic difference is that for interview research the direct or indirect personal contact between
the researchers or their representative with the respondents will be there, but for survey-based
research, it is not necessary. A questionnaire is generally used for interviews. The purpose of the
questionnaire is to gather data from a target audience on specific items in a systematic manner.
It will include open-ended questions, closed-ended questions, multiple-choice questions, etc.
In interview research, the investigator will have to move directly or indirectly from one sub-
ject of study to the other to get the required data through direct or indirect contacts whereas in
survey-based research the required items of information are listed in the form of a questionnaire
and can be collected directly or indirectly from the respondents. Both methods are applicable
for quantitative and qualitative research. A variety of studies can be conducted using survey
and interview methods. These studies are possible for census/population-based studies, random
sample-based studies and non-random sample-based studies.
The interview-based research can be carried out under different modes or designs. These include:
• Personal interview.
• Telephonic interview.
• Email or web page interview.
Researchers can collect data from the respondents using various online research techniques.
They are often called internet research or web-based research methods. Many of these research
methods are already being used in one way or the other but are being revived for the online
mediums. The latest in this type of online research method is social media research as it offers
extended levels of complexity and thus, new avenues for research are created. Both quantita-
tive and qualitative research can be carried out using online options. Online research methods
broadly include the following:
Online marketing and business research covers aspects like customer satisfaction, new product
response tests, brand loyalty, employee satisfaction, etc. Under situations like the Covid-19
period, online research was widely used.
More details on survey, interview and online research designs are discussed in Chapter 8.
Methodological Studies: Studies aimed to find out subject matter-specific approaches, meas-
urements, techniques, etc.
Meta-Analysis: Studies by pooling the results of many similar studies conducted at many loca-
tions for wider applications.
Evaluation Studies: Studies conducted to evaluate already implemented programmes and
policies.
Operational Research: It is the application of scientific methods of investigations to study the
complex human organization and services. It relates to studies to generate macro level evi-
dence based on micro level research results.
Knowledge, Attitude and Practice (KAP studies): KAP studies are generally applied to
assess the status of any phenomenon having relevance to the people. The level of knowledge
and the attitude of the people are determining factors to the extent of practice. Such studies
are carried out using well-formulated questionnaires to quantify the level of knowledge, to
assess the attitude of the people and also the extent of practice followed by the subjects of
the study phenomenon. Usually, Likert scale-based scores are developed for each respond-
ent. Then it is possible to examine the association of knowledge and attitude with respect to
practice.
Implementation Research: It is defined as research on knowledge linked to medical practice
for better health of a community. It is a participatory form of research in medical sciences
where all stakeholders of health sciences join hands with each other at different stages from
planning, implementation and evaluation and action plan for the benefit of the community.
Translational Research: Translational research also known as ‘Bench to Bedside Research’
is the process by which the results of research done in the laboratory are directly used to
develop new ways to treat patients. It advocates the need to interface the clinical sciences and
basic sciences. It has emerged as a new interdisciplinary branch of medical sciences. It aims
to translate the findings of basic fundamental research into medical practice and meaningful
health outcomes. It is an interdisciplinary branch of medical sciences which integrates bench
side, bedside and community research. Findings from basic science research are applied to
human health and medical practices.
Despite contextual differences between research methods and research designs, these terms
are interchangeably used by the researchers. What matters is the right interface between the
research problem and research method so as to scientifically generalize the evidence given by
the sample to the population under study. The broad research methods and different research
designs under each method are summarized in Table 4.1.
30 Research Methodology and Quantitative Techniques
Table 4.1 Broad Quantitative Research Methods and Designs Under Each Method
Each of these methods is particularly suited for obtaining a specific type of data.
As shown in Table 4.2, the quantitative and qualitative research methods differ primarily in
their analytical objectives, the types of questions they pose, the types of data collection instru-
ments they use, the forms of data they produce and the degree of flexibility built into study
design.
One advantage of qualitative methods in exploratory research is that the use of open-ended
questions and probing give participants the opportunity to respond in their own words, rather
than forcing them to choose from given fixed responses/options. Open-ended questions have the
ability to evoke responses that are:
Another advantage of qualitative methods is that they allow the researcher the flexibility to
probe initial participant responses—that is, to ask why or how. The researcher must listen care-
fully to what participants say, engage with them according to their individual personalities and
32 Research Methodology and Quantitative Techniques
Table 4.2 Comparison of Quantitative and Qualitative Research Approaches
styles, and use ‘probes’ to encourage them to elaborate on their answers. The following features
of qualitative research are noteworthy.
4.8 Mixed Methods
There is no water-tight demarcation for quantitative and qualitative research methods and
designs. In some cases, both quantitative and qualitative methods are used by the researchers to
cover all the objectives of the study. Besides some of the designs in survey and interview meth-
ods are used in both quantitative and qualitative research approaches.
4.10 Multi-Disciplinary Research
Most of the research in applied areas is multi-disciplinary in nature. The research in agriculture
sciences, medical sciences, business, etc. is multi-disciplinary in nature. While the research
34 Research Methodology and Quantitative Techniques
aims at cause-and-effect relationships, one has to go for support from other disciplines. Most of
the institutional-level studies are carried out by a multi-disciplinary team of researchers.
The factors of morbidity patterns in medical sciences, factors for low productivity in agri-
culture, or factors for the volume of sale of a product, etc. can be better assessed through a
multi-disciplinary research approach.
For most of the research problems, it may be possible to categorize under a specific method
followed by the research design under that method. But in some research problems, a combina-
tion of different research methods and designs may be most appropriate. In some other cases,
more than one research method or design may be required.
The research designs under different research methods, study purpose of each of these
designs and features are given in Table 4.3.
More details about experimental research designs generally applied in health sciences are
given in Chapter 5 and those in agricultural sciences are given in Chapter 6. The details of
observational research designs are given in Chapter 7. Different types of surveys, interviews
and online research designs are given in Chapter 8. The details of qualitative research designs
are given in Chapter 9.
Table 4.3 Research Methods, Research Designs, Study Purposes, Situations and Features at a Glance
(Continued)
Research Methods and Designs 35
Table 4.3 (Continued)
(Continued)
36 Research Methodology and Quantitative Techniques
Table 4.3 (Continued)
(Continued)
Research Methods and Designs 37
Table 4.3 (Continued)
(Continued)
38 Research Methodology and Quantitative Techniques
Table 4.3 (Continued)
• Research methods are the ways of conducting research for generating data, observations and
evidence as per stipulated goals and objectives and in support of results/findings and conclu-
sions/recommendations after the study is completed.
• The broad research methods include experimental (interventional), observational
(non-interventional), survey and interview and situation-based research methods.
• Experimental research methods generally used in health and agricultural sciences include
RCT, CRD, RBD, LSD, QED and PED.
• Observational methods are generally non-interventional methods used in health sciences
(clinical and non-clinical), psychology, agriculture, economics, sociology, etc.
• The observational studies in clinical and para-clinical areas include descriptive, case reports,
case series, cross-sectional, longitudinal and analytical studies.
• The observational epidemiological studies include case-control, cohort (prospective and ret-
rospective), field trials, community trials, uncontrolled natural trials, natural experiments, etc.
• The survey, interview and online research methods include cross-sectional and longitudinal
surveys, questionnaires, schedules, Google forms, telephonic interviews and online designs
like focus groups, text analysis, social network analysis etc.
• Other situation-specific research methods include methodological, meta-analysis, evalu-
ation studies, operational research, KAP studies, implementation research, transitional
research, etc.
• The qualitative research methods include phenomenological, ethnographic, ground theory,
historical models, narrative research, case study methods, etc.
Research Methods and Designs 39
Suggested Readings
Bernard, H. R., Research methods in anthropology, 2nd ed., Sage Publications, London, 1995.
Denzin, N. K., and Y. S. Lincoln (eds.), Handbook of qualitative research, Sage Publications, London, 2000.
Nkwi, P., I. Nyamongo, and G. Ryan, Field research into social issues: methodological guidelines,
UNESCO, Washington, DC, 2001.
Pope, C., and N. Mays, Qualitative research in health care, BMJ Books, London, 2000.
Rao, S., and J. Richard, Introduction to biostatistics and research methods, 4th ed., Prentice-Hall, New
Delhi, 2006.
Rao Sunder, P. S. S., Introduction to biostatistics and research methods, 5th ed., Prentice-Hall, New
Delhi, 2012.
Sharma, B. S., Research methods in social sciences, Sterling Publishers Private Ltd., New Delhi, 1983.
5 Experimental (Interventional)
Research Designs
5.1 Introduction
The main features of experimental research include presence of the experimental group(s) for
one or more interventions, presence of a control group (with placebo, standard intervention or
no intervention), random formation of both experimental groups and control group from the
same population and collection of observations from all the groups. It has got wide range of
applications in applied research in medical and health sciences, education, psychology, etc. In
order to fit into application in different areas of research and also under different experimental
conditions there are many designs under experimental methods, and these are discussed in this
chapter.
DOI: 10.4324/9781003527183-5
Experimental (Interventional) Research Designs 41
In clinical trials, there can be three sources of bias leading to erroneous inference. (i) If the
patients are aware that they are subjects of a new intervention, they may have the tendency to
report improvement (ii) the observer also can have a similar attitude (iii) the statistician also can
have a tendency to report positively. In order to overcome these biases ‘Blinding Procedure’ is
adopted in experimental research. In ‘single-blind trials’ patients are not aware of the treatment;
in ‘double-blind trials’ the patients and researcher/study team are not aware of the treatments
given and in ‘triple-blind trials’ the patients, the researcher and the data analyst are blind about
the identity of patients receiving the specific treatment. The knowledge of receiving/giving
a new treatment can influence the respondent/researcher to distort the factual situation while
recording information on dependent/outcome variables. This effect is called the placebo effect.
Hence, the placebo intervention is made to overcome the placebo effect in research experiments.
In true experimental research/RCT, the researcher will have control over extraneous factors
so as to assess with confidence the effect of treatment/intervention on outcome (dependent vari-
able). The main features of the true experimental research design include:
The various types of other true experiments depending upon the situation are:
• Experimental and Control Groups are formed randomly from the homogenous group of sub-
jects of study. The treatment or intervention is applied to the experimental group.
• Example: If the effect of special coaching a class is the problem under study, then two groups
of students are formed randomly from among the students of the class. Those students who
42 Research Methodology and Quantitative Techniques
attended special coaching classes form the experimental group and those who do not attend
the special coaching classes form the control group. The same test is held for both the groups
and the marks are obtained from the research data for further analysis.
• No pre-test is done before intervention.
• Post-test observations are recorded for both experimental and control groups and tested for
significance.
• Difference in mean values of experimental and control groups tested using appropriate statis-
tical tests. The layout of the post-test-only control design is shown in Figure 5.1.
• The intended number of subjects is randomly grouped into experimental and control groups.
• All units of both the groups are assessed before intervention is made on the experimental
group which helps to confirm post-treatment effects with more confidence.
• Treatments (special coaching for the previous example) applied to subjects in the experimen-
tal group only.
• After giving special coaching to the treatment group both the groups are assessed for post-test
performance.
• The post-test mean values are calculated and tested for significance.
• Examples: To assess the effect of a new teaching method, effect of counselling, effect of a
drug to control an ailment. The layout of pre-test post-test design is given in Figure 5.2.
Due to the difficulty in the formation of four random groups of more or less homogenous units,
this design is not frequently used by researchers. The layout of this design is shown in Figure 5.3.
• Same type of assessment measures is taken before and after the treatment or exposed to a
situation so as to assess the changes, if any, attributable to the treatment or condition.
• It is similar to quasi-experiments where instead of control and treatment group observations,
before and after observations of the same experimental group are taken.
• Example: Effect of a new teaching method upon a group of children or impact of counselling.
• The layout of the design is given in Figure 5.7.
• Experimental research designs are applied in areas like agriculture, health, education, psy-
chology and similar areas to assess the impact of one or more interventions (as treatment/
manipulation/addition/alteration) on the outcome variable of the study.
• The interventions can be a new practice, a new drug, a new combination of drugs, a new dose
of a drug, or withdrawal of any drug or combinations, a new method of administering the
inputs or drugs, a new method or process of treatment or surgery, a training, counselling and
so on.
• Randomly formed experimental group(s) with intervention and control group without inter-
vention from the same population are necessary.
• To avoid any form of bias by the researcher or subjects of study, adoption of phased blinding
technique can be adopted.
• Different experimental designs for use in research in areas of medical sciences, education,
psychology, etc. are available and in other areas per situational requirements.
• In true experimental research, the researcher will have full control over extraneous factors to
assess with confidence the observed effect of treatment/intervention on outcome (dependent
variable).
• Randomized Control Trials (RCT) have wide application in many areas, especially in clin-
ical, epidemiology, agriculture, etc.
• Post-Test Only Control Design: Experimental and control groups are randomly formed,
no pre-test is done before intervention, treatment/intervention is applied to the experimental
group and post-test observations are recorded for both groups for analysis.
• Pre-Test Post-Test Design: The intended number of subjects is randomly grouped as experi-
mental and control groups, pre-treatment observations are recorded for each subject in both
groups, treatments are applied to subjects in the experimental group and after-treatment
observations are recorded from subjects in both groups to assess the impact of treatment.
Experimental (Interventional) Research Designs 47
• Solomon Four-Group Design: Two experimental (E1 and E2) and two control (C1 and C2)
groups are formed and subjects are randomly assigned to these four groups followed by treat-
ments to all subjects of E1 and E2, post-test observations are taken from all subjects of all the
four groups to assess the effect of treatment.
• Quasi-Experimental Research (QER)/Non-Randomized Control Trials: It differs from
true experimental research with the absence of either randomization or a control group; even
if a control group is present, random allocation of subjects may not be possible for obvious
reasons.
• Time Series Quasi-Experiment Design: Is that in which treatment effect is observed over a
long period of time.
• Pre-Experimental Research Design: It is like a one-shot case study design. It has only one
experimental group on which treatment is applied and post-test observations are taken to
assess impact.
• One Group Pre-test Post-test Design: It has only one experimental group of study sub-
jects. Observations are taken before and after treatment. Normally the subjects are randomly
selected from the population to form the group.
Suggested Readings
Campbell, D. T., and J. C. Stanley, ‘Experimental and quasi-experimental designs for research on teach-
ing’, in Handbook of research on teaching, edited by N. L. Gage (ed.), Rand McNally, Chicago, IL,
pp. 171–246, 1963.
Campbell, D. T., and J. C. Stanley, Experimental and quasi-experimental designs for research, Houghton
Mifflin Company, Boston, MA, 1966.
Lachin, J. M., ‘Statistical properties of randomization in clinical trials’, Controlled Clinical Trials 9(4):
289–311, 1988.
Schulz, K. F., and O. A. Grimes, ‘Generation of allocation sequences in randomized trials: chance, not
choice’, Lancer 359(9305): 515–519, 2002.
Sharma, B. S., Research methods in social sciences, Sterling Publishers Private Ltd., New Delhi, 1983.
Willmann, R., A. De Luca, M. Benatar, M. Grounds, J. Dubach, J.-M. Raymackers, and K. Nagaraju,
‘Enhancing translation: guidelines for standard pre-clinical experiments in mdx mice’, Neuromuscular
Disorders 22: 43–49, 2012.
6 Design of Experiments for Field
Research
(LSD). It is calculated by using standard error of the difference (SEd) 2VE . CD =� SEd ×t .
r [ DFe ]
The two treatment means having difference ≥CD, the difference is statistically significant.
50 Research Methodology and Quantitative Techniques
Testing Procedure:
• As usual, the null hypothesis (H0) and alternative hypothesis (H1) are formed for all the
sources of variation.
• Perform the test using ‘F’ statistic/analysis of variance (ANOVA). If ‘F’ is significant.
• Apply a post-hoc test for each source of variation separately to check the difference between
any two means of a source.
Based on the number of treatments and the precision required, one may select the experimental design.
One treatment case: If there is only one treatment replicated r times, the applied test statistic
is the t-test. In this case, we apply the treatment to a number of homogenous experimental units
and observation records on these units. For five replication layout see Figure 6.1.
R1 R2 R3 R4 R5
Figure 6.1 Layout for One Treatment and Five Replication Case
In this case, all the five units are homogenous. No randomization is required as the same
treatment is applied to the units.
X
In this experiment, we can test the validity of mean by applying the t-test t[ n−1] = , sig-
SEm
nificance of this t suggests that the mean is valid and can be used for prediction. We can also use
X −µ MS
this SE to compare this mean with the population mean µ t n−1 = where, SEm = ;
[ ] SEm n
2
n
SS
n
∑ Xi
MS =
n −1
; SS = ∑ i =1
Xi −
2 i=1
n
. Significance of this t indicates that the observed
Figure 6.2 Layout for Two Treatment and Five Replication Case
X1 − X 2
Here we apply an independent sample t-test t[ n +n ]=
. Value of SEd depends on
2 −2
1
SEd
the homogeneity of variance of X 1 and X 2 . Calculate the mean square for both the groups as
MS H
suggested and test the homogeneity of MS by F-test, that is, F[ DF , DF ] = ; where DFH and
H L
MS L
Design of Experiments for Field Research 51
DFL are the degrees of freedom of higher and lower mean square (MSH and MSL) of mean 1 or 2.
If F is non-significant MS is homogenous else heterogeneous. In the case of homogeneous
SS1 + SS2
MS the SEd = and compared with t at DF1 + DF2. In the case of heterogeneous
DF1 + DF2
SEd = MS1 + MS2 and compared with average t of t at DF1 and t at DF2. Significance of this
can also be tested by the F-test. Significance of t and F suggest a significant difference between
the two means.
Pair t-test: The treatment is applied to n homogenous experimental units. Observations are
recorded prior to treatment and after the treatment. In such cases, treatment can be age, before
and after procedure, training, etc. The layout for paired observations is shown in Figure 6.3.
R1 R2 R3 R4 R5
Observations are recorded on each unit of study at T1 and T2 stages and apply paired t-test.
Where the difference between each pair is calculated Di = X i1 − X i0 and t-test is applied on
this difference Di. Calculate mean and SE for Di values as suggested above and apply t-test i.e.
D
t[ n−1] = . Significance of this t suggests that the difference is consistent else varies from unit
SEm
to unit (pair to pair).
Three or more treatments: If there are more than two means to be compared then one has to
apply F-test. All the treatments are applied on the r uniform experimental units, or each treat-
ment is replicated r times. The total experimental units are ‘rt’. If all the rt units are homog-
enous, then we can apply CRD, where all the rt combinations (as each t treatment is replicated
r times) are randomly applied on all the rt units. If not uniform, divide the units into r groups
known as block or replication where all the treatments appear in each block. Again, the total
number of units are rt. The design will be known as RBD. Even if r units are not uniform divide
rt (r is replication and t is treatments) units in rc (r is rows and c is columns) groups. Here
one treatment appears only once in a row and column. The experimental design is known as
row-column design or lattice design. Accordingly, these three are the basic designs. Details of
each design are explained:
A D B C E
B A C E D
E C D A B
Figure 6.4 Layout for CRD with Five Treatments and Three Replications
Layout of experiment: Say there are five treatments (A, B, C, D and E) replicated three times. The
layout is given in Figure 6.4. In this layout, the same treatment may appear in two adjoining units.
Preparation of data sheet: Data recorded on these treatments are arranged as shown in
Table 6.1.
Table 6.1 Data Sheet for Replicated Treatments
Treatment R1 R2 Rr Total
• DFT = t −1
r
• DFE = TDF − DFT
• Mean for the ith treatment is X i =
∑ j =1
X ij
ri
( X11 + X12 +…+ X tr )
2
• Correction factor (CF): CF = ; n = r1 + r2 +. . .+rt
n
t r 2
∑ ∑
i =1 j =1
X ij
GT 2
• Or for an equal number of replications, the formula is CF = or CF =
tr tr
t r
• Total sum of square (TSS) =
∑ ∑ i =1 j =1
X ij2 − CF
r 2
t
∑ j =1
X ij
• Treatment sum of square (SST) =
i=1
∑ ri − CF for unequal number of replications
t
r 2
• OR SST = ∑∑
i =1
j =1
X ij
or T1 + T2 + ... + Tk − CF for equal number of
2 2 2
replications − CF r
r
• Error sum of square (SSE) = TSS - SST
Design of Experiments for Field Research 53
SST
• Treatment mean square (MST) =
DFT
SS E
• Error mean square (MSE) =
DFE
MST
• F calculated F[ DF ,� � DFE ] =
T
MS E
• Calculate P with the help of Excel. Type “=FDIST(F,DFT,DFE)”
Put up all the values in the ANOVA table. The format is given in Table 6.2.
If P < 0.05 the H0 is rejected and H1 is accepted i.e. the treatment effect is significant and
treatments are falling at least in two groups. To identify the difference between any two treat-
ment means, apply the post-hoc test. Most common post-hoc test is the LSD test or CD.
The common critical difference (CD) for different levels of significance is
CD 5% = SEd ×t DFE 5% and CD 1% = SEd ×t DFE 1% .
Where,
MS E MS E
SEd = + for unequal number of replications
ri rj
2 MS E
SEd = for equal number of replications
r
tDFE 5% and 1% are the table vale of t at error degrees of freedom and α = 0.05 and 0.01,
respectively.
R1 A D B C E
R2 B A C E D
R3 E C D A B
Figure 6.5 Layout for RBD With Five Treatments and Three Replication Case
Preparation of data sheet: Data recorded on these treatments are arranged as shown in
Table 6.3.
Table 6.3 Data Sheet for RBD
Treatment R1 R2 Rr Total
• TDF = rt −1
• DFR = r −1
• DFT = t −1
Design of Experiments for Field Research 55
• DFE = TDF − DFT − DFR or (t -1)(r -1)
r
Put up all the values in the ANOVA table as shown in Table 6.4.
If PR and/or PT < 0.05, the H0 is rejected and H1 is accepted for replications and/or treatments
i.e. replication and/or treatment effect(s) is/are significant. To identify the difference between
any two treatments mean apply the post-hoc test. Most common post-hoc test is the LSD test.
56 Research Methodology and Quantitative Techniques
LSD or CD 5% = SEd ×tDFE 5% and CD 1% = SEd ×tDFE 1%
2 MS E
Where, SEd =
r
Generally, CD 5% is used. If more precision is required, it can be at 1% or even less. The two
treatment means having the difference greater than or equal to CD differs significantly. On the
basis of significance, alphabets may be assigned to different treatment means for easy under-
standing. Two treatment means having non-significant differences are assigned same alphabet
else different.
Some important points of RBD: The RBD design can be used for any experiment where
experimental units are divided into r homogenous blocks. Some of the important points about
RBD are:
Example:
(i) Effect of different drugs in controlling some illnesses of patients. Different wards may be
treated as replication.
(ii) Impact of different baby foods for balanced growth of underweight children. Group of chil-
dren from each selected location forms a replication.
• Two hypotheses tested in ANOVA i.e. one for replication and another for treatment:
• H0: No significant difference in the effect of drugs/baby food and no significant difference
between replications.
• H1: Significant effect of drugs/baby food and significant difference between replications.
Column/ C1 C2 C3 C4 C5
Row
R1 A D B C E
R2 B A C E D
R3 E C D A B
R4 D E A B C
R5 C B E D A
Treatment R1 R2 Ri Rr Total
Table 6.6 Data Sheet for LSD Rows and Column Effects
Columns
• TDF = rc −1
• DFR = t −1
• DFC = t −1
• DFT = t −1
• DFE = TDF − DFR − DFC − DFT or (t -1) (t - 2)
r
c r 2
∑ ∑
i =1 j =1
X ij
GT 2
• Correction factor (CF) = or using Table 6.6
cr cr
c r
• Total sum of square (TSS) = ∑ ∑ i =1 j =1
X ij2 − CF or X 112 + X 122 + ... + X 55
2
− CF using
Table 6.6
r c 2
∑ ∑ j =1 i=1
X ij
R12 + R22 + ... + Rr2
• Row sum of square (SSR)= − CF or − CF using
t t
Table 6.6 c r 2
∑ ∑
i =1
j =1
X ij
C12 + C22 + ... + Cc2
• Column sum of square (SSC)= − CF or − CF using
Table 6.6 t t
t r 2
∑ ∑
i =1
X ij
T 2 + T22 + ... + Tt 2
j =1
• Treatment sum of square (SST)= − CF or 1 − CF
using Table 6.5 r t
• In present case, r = c = t = 5 and T1 to T5 are A, B, C, D and E treatment
• Error sum of square (SSE) = TSS - SST - SSR - SSC
SS R
• Row mean square (MSR) =
DFR
SSC
• Column mean square (MSC) =
DFC
SST
• Treatment mean square (MST) =
DFT
SS E
• Error mean square (MSE) =
DFE
MS R
• F calculated for row (FR) F[ DF � DFE ] =
R ,�
MS E
MSC
• F calculated for column (FC) F[ DF ,� � DF ] =
C E
MS E
MST
• F calculated for treatments (FT) F[ DF ,� � DFE ] =
MS E T
Put up all the values in the ANOVA table as shown in Table 6.7.
Where,
2 MS E
SEd =
r
Example:
(i) Effect of different drugs in controlling some illnesses of patients of different ages and
weight groups.
(ii) Impact of different baby foods in balanced growth of underweight children in different age
and body weight groups.
Here each row has t patients of equal age, and each column has t patients of equal weight.
• Error estimated is more precise than CRD and RBD.
• Here we test three hypotheses: one for treatments, second for rows and third for columns:
• H0: No significant difference in the effect of drugs/baby food, between rows and between
columns.
• H1: Significant difference between drugs/baby food, between rows and between columns.
Apart from basic designs, designs are also classified on the basis of the number of sources
of variation such as treatments, year, location, environments, etc. and partition the treatments
accordingly. Analysis of designs depends on the nature of the source of variation. Say the exper-
iment is repeated at different locations then the number of replications maasdgadsy be the same
but the effect of replications at one location may be different than at another location because
experimental units are different at both locations. But, treatments or combinations of treatments
60 Research Methodology and Quantitative Techniques
remain the same at all locations. So, both sources are analyzed in different ways. Generally,
treatment SS is partitioned according to components of treatments, namely, the drugs and their
doses, fertilizers and doses, etc., that is the effect of drugs, concentration and interaction are
tested. In health science, for example, to treat a disease, four drugs (factor one) are used each
with three uniform concentrations (factor second) and all patients are homogenous, that is, the
basic design is CRD and this design is known as two-factor factorial CRD design. If these treat-
ments are evaluated in RBD, it will be known as two-factor factorial RBD where patients are
divided in r homogenous units and each treatment is applied in all units of each replication. If
the concentrations of drugs are not the same, such as different drugs having different concentra-
tions even if the number of concentrations are the same, the design is known as nested either
two-factor nested CRD or RBD depending on the use of the basic design. In this way, we can
have the number of factors and designs named accordingly.
6.5 Factorial Design
When all the members of one factor (drugs, D) are evaluated in combination with all members
of another factor (concentration, C) is known as factorial design D × C. Here we can assess the
effect of factors D and C separately and the interaction of D and C. These designs are required
to assess variations in two or more factors simultaneously through the same experiment. When
a combination of two drugs (A and B), each with varying doses a1 and a2 for A and b1 and b2 for
B having combination a1b1, a1b2, a2b1 and a2b2 will be considered as separate treatments and the
experiments are to be planned accordingly instead of conducting two separate experiments for
drugs A and B. The factorial design has the advantage of getting the interaction effect as well as
the main effect of the two factors. Factorial experiments are valid even if the factors are inde-
pendent without any interaction effect.
Layout: Layout depends on the selection of basic design. As RBD is the most popular we select
it. For four treatments combinations (A = a1b1, B = a1b2, C = a2b1 and D = a2b2) three replications
layout is given in Figure 6.7. Treatment combinations are randomly allocated to experimental
units in each replication.
Preparation of data sheet: Data recorded on these treatments are arranged as shown in
Table 6.8.
To calculate the main effects and interactions, the total over replications are arranged in
another table having a rows and b columns as shown in Table 6.9.
Steps of calculation: Calculation of different values for ANOVA table:
• TDF = rt −1
• DFR = r −1
• DFT = ab −1
• DFA = a −1
R1 A D B C
R2 B A C D
R3 C D A B
Figure 6.7 Layout for Two-Factor Factorial Design with Three Replication Case
Design of Experiments for Field Research 61
Table 6.8 Data Sheet for Factorial Design
Treatment R1 R2 Rr Total
r
a1b1 X111 X112 X11r
∑X k =1
11k
∑X k =1
12 k
∑X k =1
21k
∑X k =1
22 k
Total a b a b a b a b r
∑∑ X ij1 ∑∑
i =1 j =1
X ij 2 ∑∑
i =1 j =1
X ijr ∑∑∑X
i =1 j =1 k =1
ijk
i =1 j =1
Treatment b1 b2 Total
r r b r
a1
∑k =1
X11k ∑k =1
X12 k ∑∑X
j =1 k =1
1 jk
a2 r r b r
∑X
k =1
21k ∑X
k =1
22 k ∑∑X
j =1 k =1
2 jk
Total a r a r a b r
∑∑X
i =1 k =1
i1k ∑∑X
i =1 k =1
i 2k ∑∑∑X
i =1 j =1 k =1
ijk
• DFB = b −1
• DFAB = (a −1)(b −1)
• DFE = TDF − DFT − DFR or (ab -1) (r -1)
r
r a b 2
∑ ∑ ∑
k =1
i=1 j =1
X ijk
• Replication sum of square (SSR) = − CF
ab
a b r 2
∑ ∑ ∑i =1
j =1
k =1
X ijk
• Treatment sum of square (SST) = − CF
r
a b r 2
∑ ∑ ∑
i =1 j =1 k =1
X ijk
• A sum of square (SSA) = − CF
br
b a r 2
∑ ∑ ∑
j =1
i=1 k =1
X ijk
• B sum of square (SSB) = − CF
ar
• A × B sum of square (SSAB) = SST - SSA - SSB
• Error sum of square (SSE) = TSS - SST - SSR
SSi
• Mean square of ith source (MSi) =
DFi
MSi
• F calculated for ith source (Fi) F[ DF ,� � DF ] =
i E
MS E
• Calculate P for ith source with the help of Excel. Type “=FDIST (Fi, DFi, DFE)”
Put up all the values in the ANOVA table as shown in Table 6.10.
If P < 0.05 the H0 is rejected and H1 is accepted for any source, that is, replication, treat-
ment, A, B and A × B effect is said to be significant. To identify the difference between any two
6.6 Nested Design
When the members of factor B vary with levels of factor A in a factorial design it is called nested
design, for example, in health science, dosages vary with drugs, in agriculture, levels vary with
fertilizer and are evaluated in nested designs. In this design, sources are between A, between B
within A1, A2, A3, etc. The interaction effect is not there. Levels of B may or may not vary for
different A’s. For drug or fertilizer, A1 levels of B are B1, B2, B3, etc. and A2 levels are B1’, B2’,
B3’, etc. In other words, dosages of drug A1 are 1, 10 and 15 mg/kg body weight and for drug
A2 are 2, 8 and 12 mg/kg body weight. So, between concentrations, comparison is not possible
across the drugs. Between concertation, comparison is possible only within each A’s or possible
only within the drug. Accordingly, interaction A × B is also not possible.
64 Research Methodology and Quantitative Techniques
Layout: Layout of nested designs is the same as for factorial designs and depends on the selection
of the basic design. For two drugs with two levels of concentrations, four treatment combinations
are (A = a1b1, B =a1b2, C = a2b1’ and D = a2b2’) and a three replication layout is given in Figure 6.8.
Treatment combinations are randomly allocated to experimental units in each replication.
R1 A D B C
R2 B A C D
R3 C D A B
Figure 6.8 Layout for Two-Factor Nested Design with Three Replication Case
Preparation of data sheet: Data recorded are arranged in treatment by replication as shown in
Table 6.11.
Treatment R1 R2 Rr Total
r
a1b1 X11 X12 X1r
∑X
j =1
1j
r
a1b2 X21 X22 X2r
∑X
j =1
2j
∑X
j =1
3j
∑X
j =1
4j
Total t t t t r
∑ X i1 ∑ X i2 ∑
i =1
X ir ∑∑X
i =1 j =1
ij
i =1 i =1
To calculate the SS for different sources total over replications is arranged in Table 6.12 hav-
ing a rows and b columns but the total for each level of B over A is not required.
Table 6.12 Two-Way Table for Two-Factor Two-Level Nested Experimental Design
Treatment b1 b2 Total
a1 r r b1 r
∑
k =1
X1k ∑
k =1
X 2k ∑∑X
j =1 k =1
1 jk
(Continued)
Design of Experiments for Field Research 65
Table 6.12 (Continued)
Treatment b1 b2 Total
a2 r r b2 r
∑ X 3k ∑ X 4k ∑∑X
j =1 k =1
2 jk
k =1 k =1
Total a bi r
∑∑∑X
i =1 j =1 k =1
ijk
• TDF = rt −1
• DFR = r −1
• DFT = t −1
• DFA = a −1
• DFBi = bi −1
• DFE = TDF − DFT − DFR or (t -1) (r -1)
r
∑
r
r a b 2
∑ ∑ ∑
k =1
i=1 j =1
X ijk
• Replication sum of square (SSR) = a
− CF or
r t 2
∑ b i =1
i
∑ ∑
j =1
i=1
X ij
− CF
t a bi r 2
∑ ∑ ∑ i =1
j =1
k =1
X ijk
• Treatment sum of square (SST) = − CF or
r
t r 2
∑ ∑
i =1 j =1
X ij
− CF
r
66 Research Methodology and Quantitative Techniques
bi r 2
a
∑ ∑
j =1 k =1
X ijk
• A sum of square (SSA) =
i=1
∑ rbi − CF
bi r 2 bi r 2
∑ ∑
j =1 k =1
X ijk
∑ ∑
j =1 k =1
X ijk
• Within Ai sum of square (SSWi) = −
r rbi
• Error sum of square (SSE) = TSS - SST - SSR
SSi
• Mean square of ith source (MSi) =
DFi
MSi
• F calculated for ith source (Fi) F[ DFi ,� � DFE ] =
MS E
• Calculate P for ith source with the help of Excel. Type “=FDIST (Fi, DFi, DFE)”
Put all the values in the ANOVA table as shown in Table 6.13.
If P < 0.05, then H0 is rejected and H1 is accepted for any source such as replication, treat-
ment, A and within A’s, and that effect is said to be significant. To identify the difference between
any two-treatment mean, apply the post-hoc test. The most common post-hoc test is the LSD
test. The formula of SEd is different for different sources.
2 MS E
SEd for the difference between any two treatment means or within Ai SEd B =
r
MS E MS E
SEd for the difference between Ai and Aj means SEd A = +
rbi rb j
CDi 5% = SEdi ×t DFE 5% and CD 1% = SEdi ×t DF 1%
E
• Members of factor A may have varying numbers of b, that is, different drugs may have dif-
ferent concentrations, and the number of concentrations may also be different.
Design of Experiments for Field Research 67
• Different concentrations of factor A are considered as different treatments.
• Treatment SS is split in A and within A’s SS.
• The abi treatment combinations are evaluated in any basic experimental design such as CRD
or RBD depending upon the requirement of experimental conditions.
• Apart from the hypothesis of basic designs, treatment in CRD and replication and treatment
in RBD other hypotheses tested in ANOVA are:
• H0: No significant difference between A and within Ai.
• H1: Significant difference between A and within Ai.
• Difference between two means is tested with respective CD.
• If two means having a difference more than or equal to CD the difference is significant.
There are a number of advanced designs available for different experimental conditions but
those are beyond the scope of this book.
• Design of experiment is a research method to assess the outcome variable related to the treat-
ments on the experimental units under controlled conditions in such a way that the effect of
each source of variation in the outcome variable can be estimated separately.
• The basic principles of this method include randomization, replication and local control.
• The intervention or object of assessment is termed treatment.
• Experimental Material/Unit: Material unit on which experiments are conducted. Patients/
plants/animals/crop fields, etc.
• Experimental Error: This variation in outcome variable values due to uncontrolled factors
is called experimental error.
• Randomization: Random allocation of treatments on experimental units for the validity of
statistical tests or to avoid any biases in the experiment.
• Replication: Repetition of the same treatment to estimate experimental error based on which
final decisions are taken.
• Local Control: Arrangement of experimental units into homogenous groups/blocks so that
experimental units within the block are homogenous.
• Precision of Design: It is the ability with which a design detects the small real difference
between treatments.
• Degree of Freedom (DF): It is the difference between the number of observations used for
the analysis and the number of independent constraints. For a design with ‘k’ treatment, ‘r’
replication and ‘kr’ total observations ‘kr−1’ is the DF for total; ‘k−1’is the DF for treatment
‘r−1’ is the DF for replication and [(kr−1)−(k−1)−(r−1)] is the DF for error. The sum of DF
of different sources is equal to the total DF.
• ANOVA- Analysis of Variance: The total variance in the set of observations is split into
different components such as treatment, replication, error, etc.
• F test: The F value is the ratio of variances. Generally, treatment variance is compared with
error variance.
• Critical Difference (CD): If the null hypothesis is rejected by using ANOVA (F test), it
implies that all treatment effects are not the same. Then naturally the researcher would like to
know which combinations of treatment differ significantly which can be known by post-hoc
test. One of them is the CD. It is done by using standard error of the difference of treat-
ment means using the formula SE of difference between means (SEd) 2VE . The critical
r
68 Research Methodology and Quantitative Techniques
difference is the least significant difference above which the treatment mean differences are
statistically significant. The CD is calculated as, CD = SEd ×t[ DF ,a ] .
e
• The basic designs of field experiments include completely randomized block design (CRD),
randomized block design (RBD) and Latin square design (LSD).
• In CRD, the entire experimental area/units are to be homogenous. CRD experiments are
possible even for an unequal number of replications. All treatments including control are
randomly assigned to the experimental units. For the purpose of data analysis, the observa-
tions are arranged according to treatments and replications.
• The CRD design can be used for nutritional experiments like the effect of different products
on weight gain by children if an adequate number of children with the same age and initial
body weight are available.
• In CRD, all experimental units (‘n’ units) are homogenous; there are fixed number of treat-
ments t for comparison; when number of replication r for each treatment is same, then r = n/t
(number of replication can be different also); each treatment is applied randomly to the
experimental units; analysis under missing observation is possible; provides maximum DF
for error; H0: There is no significant difference between treatment effect and H1: Treatments
effects are significantly different
• In RBD, experimental units are divided into blocks with homogenous units within each
block. Each block is used as a replication and all the treatments will appear in a block. The
total variability is divided into treatment, block/replication and error. The number of blocks
is equal to the number of replications and the number of experimental units in each block is
equal to the number of treatments. Treatments are randomly allocated to experimental units
in each replication.
• In RBD, the total variability is split into three sources, replication, treatments and error; the
null hypothesis in RBD is H0: No significant difference in the effect of treatments and no
significant difference between replications.
• LSD is a basic design where entire experimental units are divided into horizontal (rows) and
vertical (columns) blocks having homogenous units in both ways. In nutritional trials, the
children can be horizontally grouped according to initial body weight and vertically grouped
according to age.
• In LSD, the total variability is split into four viz., rows, columns, treatments and error; the
null hypothesis H0: No significant difference in the effect of treatments between rows and
between columns.
• Factorial design is used to assess variations in two or more factors simultaneously through
the same experiment. When a combination of two drugs (A and B), each with varying doses
a1 and a2 for A and b1 and b2 for B having combination a1b1, a1b2, a2b1 and a2b2 will be con-
sidered as separate treatments and the experiments are to be planned accordingly instead of
conducting two separate experiments for A and B.
• When the levels of factor B vary with levels of factor A, in a factorial design it is called
nested design. For example, in health science dosages vary with drugs and in agriculture, lev-
els vary with fertilizers are evaluated in nested designs. In this design, sources are between
A and between B within A1, A2, A3, etc. The interaction effect is not there.
Suggested Readings
Huynh, H., and L. S. Feldt, ‘Estimation of the box correction for degrees of freedom from sample data in
the randomized block and split plot designs’, Journal of Educational Statistics 1: 69–82, 1976.
Kish, L., Statistical design for research, John Wiley & Sons, New York, 1987.
Scheffe, H., The analysis of variance, John Wiley & Sons, New York, 1959.
Winer, B. J., Statistical principles in experimental design, 2nd ed., McGraw-Hill, New York, 1971.
7 Non-Interventional Observation
Research Designs
DOI: 10.4324/9781003527183-7
70 Research Methodology and Quantitative Techniques
In health sciences, most of the research in para-clinical and basic science areas like anatomy,
physiology, biochemistry, microbiology, pathology, forensics, community medicine, etc. and
also some studies in clinical subjects are based on observations related to patients as recorded
in OPD, IPD, laboratories, operation theatres and even after hospital discharge. The observed
changes in physiological functions, anatomical factors and pathological and biochemical param-
eters in relation to various morbidities/disabilities are the basis of research in these areas. The
observational research can be descriptive or analytical in nature. The research designs under
observational research are:
These designs are used to identify and describe the situation, relationships, perception, aware-
ness, behaviour, attitude, etc. of people. It can also be used to describe the incidence and preva-
lence of various diseases and the frequency of occurrence of a phenomenon or to explain the
relationship among observable variables related to the subjects.
A descriptive case report is an in-depth study based on a single unique case, rare in nature with
complete details of the case. It may be a rare business case with specific features, a medical case
with a rare pattern, an extreme level of profitability/loss of a firm, etc. The researcher makes
possible interpretation of the outcome with possible causes for it. Such cases pave the way for
further detailed research in this direction.
If the researcher has a series of rare cases in any area of study, it forms the basis for descrip-
tive case series. It may be a group of patients having similar rare clinical features, a group of
business firms showing extraordinary profit/loss, or a group of students giving rare types of
performance, etc. The observations made are explained without any comparison with control or
other cases. It paves the way for further in-depth research based on the hypothesis developed
from the case series.
The descriptive cross-sectional research is a widely used research design under non-interventional
observation research method. A cross-sectional random sample representing the population
under reference is selected and the required information is collected from the selected respond-
ents to estimate the population parameters. The estimated statistic is tested for statistical sig-
nificance. In areas like economics, sociology, agriculture, the health sector, business, commerce
and many others, the detailed description in parametric form for the existing situation is often
required for identifying the areas for policy and development interventions, corrective steps
needed for upgradation of the situation, etc. In the health sector, the occurrence of a disease in
relation to associated risk factors at a given point in time for a specified population may be of
Non-Interventional Observation Research Designs 71
prime importance. The disease prevalence in epidemiology can be assessed when the present
status of a disease is the focus of such studies. In cross-sectional studies the exposure and out-
come are measured at the same point of time without any time lag. It is less expensive and less
time-consuming compared to other methods. It can cover a large-sized population with a scope
for disaggregated outcome analysis according to age, sex, etc.
Data from study subjects are collected over an extended time period and analyzed to assess
changes over time of a phenomenon. These include follow-up studies, trend studies, etc.
It is a major design under non-interventional observation research methods. Apart from the
magnitude, variability and mutual association among dependent variables, the interrelationship
of independent variables can also be studied under this design.
In statistics, univariate is used to describe the type of data which consists of only a single char-
acteristic or variable. It is used to assess the frequency of a specific event/disease in relation to
a specific situation. It is also applicable to identify perception, awareness, behaviour, attitude,
knowledge and practice related to an event/disease. It is also applicable to assess the prevalence
and incidence of diseases.
Example: Wages of daily workers across regions, the occurrence of disease across age classes.
Exploratory research design is used in both qualitative and quantitative research. It is used to
understand more about a particular topic or event of interest. This design is applicable when
the extent of cases and factors attributable to it are to be assessed for a defined population at a
given time. Univariate data can be better perceived using graphs, diagrams, measures of central
tendency, measures of dispersion, etc.
Example: Factors influencing the mental health of students, effect of online classes for pri-
mary school children, etc.
It is used to compare a common phenomenon in two populations using separate samples from
each population. The phenomenon may include mean values, proportions, or scores for knowl-
edge, perception, practices, attitudes, prevalence, incidence, etc. Researchers make attempts to
describe similarities as well as differences between groups. Measures of central tendency and
dispersions can be worked out and compared. Test of significance can also be applied to assess
statistical significance in difference of parameters.
72 Research Methodology and Quantitative Techniques
Example: Difference in wages of male and female workers, prevalence of diseases in rural
and urban areas, etc.
It is a type of non-experimental observation research method used to study the strength of the
relationship between two or more variables in natural settings without manipulation or control.
The correlation between variables reflects the magnitude of the strength of association as well
as the direction of association as positive or negative between variables.
Example: Study time and marks in examination, salt intake and hypertension, food intake
and body mass index, physical work hours and body mass index, etc.
7.4 Epidemiological Research
There are three components of epidemiological studies:
Epidemiological studies are mostly community-based studies on the determinants and distribu-
tion of diseases among people in a specified area at a given time. Hence some of the demographic
statistics and vital statistics apply to epidemiology also. Some of the widely used measurements/
indicators in these areas are:
• Measurement of mortality.
• Measurement of morbidity.
• Measurement of disability.
• Measurement of fertility.
Disability rates are of two types: (i) Event type indicators which include the number of days of
restricted activity as bed disability days and work loss days; (ii) person type indicators which
include limitation of mobility and limitation of activity.
Disability-adjusted life years (DALY) is the sum of years of life lost (YLL) + years lost due
to disability (YLD).
7.5 Epidemiological Studies
These are studies related to distribution and determinants of diseases with respect to people,
place and time. Observational and analytical studies are carried out.
Such studies are used to describe the distribution of diseases with respect to time (short, medium
and long-term fluctuations), place (international, national, regional and local) and person
(description of situation, problem, status).
Yes a b
No c d
Total a+c b+d
a b
If > then exposure is associated with the case.
a+c b+d
Odds Ratio = ad/bc OR can be = 1, or > 1, or < 1
Odds Ratio (> 1) gives the strength of the association of exposure to risk factors.
Yes No
The situations are RR < 1; RR = 1 and RR > 1. If RR is greater than one, it implies a much
higher risk of incidence of diseases due to exposure to risk factors.
Present a b a+b
Absent c d c+d
Total a+c b+d N
a b a
Odds for risk factor present = / =
a +b a +b b
c d c
Odds for risk factor absent = / =
c+d c+d d
Odds ratio = a / b ÷ c / d
= ad / bc
or > 1 implies an increased probability of having diseases for persons with risk factors, < 1
implies a decreased probability of having diseases for persons with risk factors and = 1 implies
no association of risk factors with disease.
In epidemiology, interventional studies such as randomized controlled trials are carried out.
Two groups—study group and control group are formed randomly. Intervention (application of
drugs or withdrawal of risk factors) is made in the study group. No intervention is made in the
control group. After experimentation, the outcome is compared in both groups. The experiment
is under the control of the researcher. It involves cost, time and ethics. There are three types of
experimental studies in epidemiology.
Steps Defined
(i) Protocol: Study aim, two groups, sample size, intervention.
(ii) Reference population: The population where results are applicable.
(iii) Randomization: Selection of study and control groups.
(iv) Intervention: New identified intervention in study group and statuesque (placebo) of
control group.
(v) Follow up: Examination of both the groups as required.
(vi) Elimination of bias (if needed).
(vii) Single blind trial-bias of respondents.
(viii) Double blind trial-bias of respondent and researcher.
(ix) Triple blind trial-bias of the preceding + statistician.
(x) Outcome assessment: The results are tested and compared.
• Uncontrolled trials: Evidence from rare study groups is compared with historical evidence.
• Natural experiments: Natural situations like earthquakes, famines, floods, etc. are treated as
experiments.
• Before and after comparison trials: Before interventional evidence of the group receiving the
(ii) control cohort intervention is treated as control.
• Screening for diseases among people in a community is a step toward prevention of diseases.
78 Research Methodology and Quantitative Techniques
• Screening for diseases means searching for the presence of diseases/defects in apparently
healthy individuals by means of rapidly applied tests/medical examinations.
• Screening tests are different from diagnostic tests.
Apparently healthy
(Subclinical cases, carriers, unidentified
Screening Test
History, examination,
diagnostic test
Types of Screening
• Mass screening (all school children for malnutrition).
• Selective screening (screening of immigrants for syphilis).
• Multi-purpose screening (two or more tests at one time—tests for syphilis and HIV).
Group A, those having and Group B, those not having the disease.
A summary of the results of screening test is shown in Table 7.4.
Yes No
Positive a b a+b
Negative c d c+d
Total a+c b+d a+b+c+d
Notes: ‘a’ True +ve ‘b’ False +ve
‘c’ False −ve ‘d’ true −ve
Sensitivity = (a/a + c) × 100
Specificity = (d/b + d) × 100
Predictive value of +ve test = (a/a + b) × 100
Predictive value of −ve test = (d/c + d) × 100
Percentage of false +ve test = (b/b + d) × 100
Percentage of false −ve test = (c/a + c) × 100
• Natural occurrence of events without any external factors or any deliberate control of events
in any form is the basis of observational research.
• Observational data collected through suitably structured data format based on interviews,
medical test reports, OPD/IPD records etc. are analyzed to have findings and arrive at a
conclusion.
• Most of the research in para-clinical and basic science areas like anatomy, physiology, bio-
chemistry, microbiology, pathology, forensics, community medicine, etc. and also some
studies in clinical subjects are based on observations related to patients as recorded in OPD,
IPD, laboratories, operation theatres and even after hospital discharge.
• Descriptive Research Case Reports: It is an in-depth descriptive study based on a single
unique case, rare in nature with complete details of the diagnostic test conducted, its results,
treatments made and its responses, etc.
• Descriptive Research Case Series: A group of patients having similar clinical features along
with diagnosis and interventions made are observed for a period of time. The observations so
made are explained without any comparison with control or other cases.
• Descriptive Research Cross-Sectional Studies: It falls under descriptive observational
study. It is used when the occurrence of a disease is related to associated factors at a given
point in time for a specified population.
80 Research Methodology and Quantitative Techniques
• Exploratory Research Design: This design is applicable when the extent of cases and fac-
tors attributable to it are to be assessed for a defined population at a given time.
• Comparative Research Design: Used to compare a common phenomenon in two popula-
tions using separate samples from each population. The phenomenon may include knowl-
edge, perception, practices, attitudes, prevalence, incidence, etc.
• Co-Relational Research Design: It is used to study the strength of the relationship between
two or more variables in natural settings without manipulation or control.
• Developmental Cross-Sectional Research Design: Data at one point in time are collected
and analyzed to assess the developmental process.
• Developmental Longitudinal Research Design: Data collected over an extended time
period and analyzed to assess changes over time of a phenomenon. These include follow-up
studies, trend studies, etc.
• Epidemiological Descriptive Observational Studies: Such studies are used to describe the
distribution of diseases with respect to time (short, medium and long-term fluctuations), place
(international, national, regional and local) and person (description of situation/problem/
status).
• Analytical Observational Studies: It is used to confirm the determinants of the diseases.
• Case-Control Study: Cases are available with the researcher and the risk factor is required
to be identified, case/disease group and comparable control group except for disease formed,
information on exposure to suspected risk factor ascertained from each member of both the
groups, exposure rate is calculated for both the groups and compared, If exposure rate of case >
exposure rate of control implies the exposure is a risk factor for the case, it is the same as a
retrospective cohort where odd’s ratio is worked out.
• Prospective Cohort: The researcher begins with the exposure group and comparable control
group and continues to monitor for future having or not having disease in individuals belong-
ing to both groups. Incidence rates in the exposure group (IREP) and control group (IRCP)
are worked out and compared.
• Screening for diseases means searching for the presence of diseases/defects in apparently
healthy individuals by means of rapidly applied tests/medical examinations.
Suggested Readings
Park, K., Park’s textbook of preventive and social medicines, Banarsidas Bhanot, Jabalpur, 2013.
Sharma, B. S., Research methods in social sciences, Sterling Publishers Private Ltd., New Delhi, 1983.
Suryakantha, A. H., Community medicine with recent advances, Jaypee Health Sciences Publisher, New
Delhi, 2017.
William, A. O., Epidemiology concepts and methods, CBS Publishers and Distributors Pvt. Ltd., New
Delhi, 2008.
8 Survey, Interview and Online
Research Designs
8.1 Introduction
The survey (offline), interview and online survey methods of research have a wide range of
applications in various areas of research. The online surveys that evolved in recent years have
added to the scope and applications of survey methods. Under these methods of research, the
data is collected from selected study respondents by seeking opinions and responses to the items
identified as per the objectives of the study. The responses of respondents can be information
in numerical form, perception, insights and opinions as multiple-choice questions, dichotomous
questions, open-ended questions, close-ended questions, scale-based marking, etc. These meth-
ods are widely used in social sciences, humanities, commerce, marketing research, business
studies, economics, sociological studies, education, law, community-based and other health
studies and many other areas.
8.2 Survey Research
Survey research includes the use of schedules, questionnaires, forms and other survey-based
tools to gather data/information from the respondents about their perceptions, opinions and
ideas. After survey data is gathered, statistical analysis is performed to produce insightful study
findings and recommendations. Survey research is one of the most effective research methods
in many areas of study. Online surveys are becoming a very popular approach these days to get
data from individual persons or a group of people. They are prepared as pre-structured survey
questions meant to encourage participants to reply with a minimum time lag. Many companies
use survey research to gather precise data based on respondents’ perceptions. Conventional
survey research uses both qualitative and quantitative techniques to gather data, opinions, per-
ceptions and other information from a set of respondents through a series of survey questions.
It is usually the initial step in starting more comprehensive and lengthier qualitative or quanti-
tative research procedures, such as focus groups, on-call interviews, surveys, polls as well as
the process of quickly obtaining information about popular subjects. To further their research,
researchers may use a combination of quantitative and qualitative survey approaches in a num-
ber of situations.
The choice of survey research design depends on two important factors—the tool used in sur-
vey research and the amount of time needed to carry out the study. Depending on how survey
research is conducted, there are three main survey research designs.
DOI: 10.4324/9781003527183-8
82 Research Methodology and Quantitative Techniques
Online/Email: One of the most often used survey research techniques available today is online
survey research. Research based on Internet surveys is inexpensive, conducted quickly and
yields responses that are relatively reliable. Data gathering and compilation are quicker and
easier using Google Forms.
Phone: Data from a wider range of the target respondents can be gathered through computer-aided
telephonic interviews (CATI) or survey research performed over the phone. It’s likely that
phone surveys may cost more money to conduct and take more time than other methods.
However, the time and cost for transport of the researcher/representative for face-to-face
conduct of the survey is saved.
Face-to-face: When it is possible to meet with respondents one-on-one, researchers conduct
in-person, comprehensive interviews. This approach has the highest response rate, although
it can be expensive and time-consuming. Surveys on subjects where data privacy is impor-
tant, and which seek more accurate data are better suited for this method of data collection.
Based on the time taken, survey research can further be classified into two methods:
Additionally, survey research is divided into two categories based on the sampling techniques
used to choose study samples: non-probability sampling and probability sampling. Under prob-
ability sampling, each member of a population should be treated equally when it comes to being
included in the survey study sample. Using probability theory, the researcher selects the respond-
ents for this sample technique. Numerous probability research techniques exist, including strati-
fied random sampling, cluster sampling, systematic sampling and simple random sampling. In
non-probability sampling, the researcher selects the sample for data collection based on his or her
expertise and experience. Convenience sampling, snowball sampling, sequential sampling, judg-
mental sampling, quota sampling and other non-probability sampling methods are among them.
Design Survey Questions: Grammatically and logically sound survey questions can be created
using a standard questionnaire or by creating them during a brainstorming session followed
Survey, Interview and Online Research Designs 83
by post-validation. It is very important to comprehend the purpose of the survey as well as
the anticipated results. In many surveys, learning about respondents’ preferences among the
alternatives offered is more valuable than knowing the details of their open choices. In these
circumstances, a researcher may use multiple-choice or closed-ended questions; on the other
hand, open-ended questions may be included in the questionnaire if perception or insights on
certain topics are needed. The surveys ought to have a thoughtful mix of closed-ended and
open-ended questions. This can be made possible by the use of the Likert scale, Semantic
scale, Net promoter score question, etc. to avoid fence-sitting.
Fixing Target Population: The researcher will have to decide the target population, the list of
which must be available if a random sample representing the population is intended. A pilot
study is advisable if the right sequencing and way of presenting the questions are to be
streamlined. Thus, the result generated can be in accordance with the requirement and gen-
eralized for the entire population.
Conduct of Survey: Once the survey questionnaire is distributed to the selected respondents,
the researcher needs to wait patiently for the feedback. The nature of the target population
and the region in which the survey is to be conducted have to be kept in mind while sched-
uling the survey. Besides the distribution of questionnaires, surveys can also be conducted
using social media, emails or website surveys in order to maximize the response.
Survey Data Analysis: Real-time feedback analysis should be done to spot trends using
accepted statistical methods. Survey feedback analysis techniques such as conjoint analysis,
cross-tabulation, GAPs (difference between actual and expected), TURF (total unduplicated
reach and frequency to assess market potential) and many other techniques can be used to
identify and provide insight into respondents’ perceptions and behaviours. The findings may
be used by researchers to assess the level of agreement with a statement, the extent of sat-
isfaction with some services, to evolve remedial measures to raise employee and customer
satisfaction, among other things. Research results from economics and sociology are gener-
ally used for policy formulations.
A questionnaire needs to be validated to ensure its suitability for any specific sample survey.
1. Validity by experts: For this, the contents and sequences of items in the questionnaire will
have to be gone through by experts who are well-versed with the topic of research so that the
effectiveness and completeness of the questionnaire are ensured. Subsequently, the question-
naire is examined for errors/difficulties/confusion in getting true responses from respondents
to various questions.
2. Pilot-testing: It is done on a subset of the intended population. The size of the sample size
must increase with the increase in questions in the questionnaire. It will help to remove irrel-
evant/confusing questions from the questionnaire. Additionally, the correctness of answers
to negatively phrased questions in relation to positively phrased questions can be assessed
while entering the data collected under the pilot study. In some cases, reversal of scale values
for negatively phrased questions may be required. Necessary cleaning of data can be done.
3. Principal component analysis (PCA): It will help to identify underlying components. The
component or factor loading will tell as what factors are being measured through the questions.
Questions that measure the same thing should load onto the same factors. Factor loadings range
from −1 to +1. Factor values that are ± 0.60 or higher can be grouped. In some cases, a question
will not load onto any factors. In fact, one must determine what factors represent by looking
for common themes in the questions that load onto the same factors. If ‘k’ factor themes are
84 Research Methodology and Quantitative Techniques
identified, it means the survey is measuring at least ‘k’ things. Finally, questions loading onto
the same factors can be aggregated or combined and compared during the final analysis.
4. Internal consistency of questions loading: It checks the correlation between questions
loading onto the same factor. It is a measure of reliability as it checks whether the responses
are consistent. By including some reverse questions having opposite answers of all previous
questions, the consistency and vigilance of the respondent can be ascertained. Another stand-
ard test for internal consistency is Cronbach’s Alpha (CA). The value of CA ranges from zero
to one. The value of CA above 0.6 is acceptable and the value above 0.70 is quite alright. The
CA value can be improved by dropping one or two questions if the CA value is low.
5. Revision of questionnaire: The questionnaire will have to be revised based on information/
feedback from PCA and CA. Here also one can retain a question even when it does not ade-
quately load onto a factor, if it is important in the overall framework and it can be analyzed
separately.
The primary and most important benefit of employing surveys in research is that the researcher
may get data on the research hypotheses that have been established for the study. Depending on
the intended audience and purpose of the survey, the researcher may pose these questions in a
variety of ways. For a study to be appropriately constructed, planned and carried out to produce
certain results, the researcher must be clear about the goal of the investigation before beginning
to design it. A few things should be considered while planning a survey:
The four main basic scales for the measurement of variables, nominal, ordinal, interval and ratio
scales are also used in survey research. Depending upon the scales used, the nature and type of
data analysis in survey research can be planned by the researcher.
Survey, Interview and Online Research Designs 85
8.2.6 Survey Research Stages
Survey research design provides easy access to details at a limited cost. Researchers frequently
employ this technique to comprehend and evaluate the current state of affairs and anticipated
future demands in relation to a good or service. Information gathering via a carefully planned
survey inquiry may be far more efficient and fruitful. There are five stages of survey research
design:
The key to gathering the data that a researcher needs to make important decisions based on
study findings may lie in selecting the appropriate survey design. Selecting an appropriate topic,
right questions and suitable design are crucial. Every stage of the process, including develop-
ing survey questions, sending out questionnaires by email or links and evaluating the results, is
made simple and effective with the use of web-based software like Google Forms.
• Set the SMART goals: What is intended to achieve with the survey, how can it be measured
promptly and what are the tentative/expected results?
• Choose the right questions: Prepare the questionnaire by choosing those specific questions
relevant to the research.
• Begin the questionnaire with generalized questions: Preferably, start the questionnaire
with general information related to the respondent followed by basic questions on the topic
of study.
• Include topic-related questions: Choose the best, most relevant, limited number of ques-
tions (15–20 questions) covering multiple-choice, rating scale, open-ended, yes or no type
questions, etc.
• Pre-testing: Once the questionnaire is created for the survey, it’s time to test it for correc-
tions and changes to make it more user-friendly.
• Distribution of questionnaire to respondents: Once the survey questionnaire is ready, it is
time to distribute it to the right audience through the selected online/offline media for their
compliance.
• Collect and analyze responses: Collect the filled questionnaire and enter the data on Excel
or any specific worksheet sheet prepared for data entry, with all the necessary categories
mentioned.
• Prepare the report: Analyze the data and prepare the report in the report format.
8.3 Interview Methods
The important terms used in the interview method of research are:
The interview-based research can be carried out under different modes or designs. These include:
For a study where information can only be gathered by meeting and establishing a personal
connection with the target audience, a researcher must go for the interview method of data col-
lection. Using interviews, researchers may stimulate participants and get detailed input that they
need. In research, there are three basic kinds of interviews:
Structured Interviews:
Structured interviews are characterized by their incredibly rigid operations and limited scope
allowing participants to gather and evaluate data. For this reason, it is sometimes referred
to as a standardized interview and works better in quantitative research. The questions of
Survey, Interview and Online Research Designs 87
the interview are pre-planned based on the specific details that are needed. Survey research
also makes extensive use of structured interviews to ensure consistency across all interview
sessions. They can be either open-ended or closed-ended, depending on the kind of intended
audience. Open-ended questions may be used to learn more about the respondent’s perspec-
tive and insight, whereas closed-ended questions can be used to understand user preferences
from a list of possible answers.
Semi-Structured Interviews:
Semi-structured interviews preserve the fundamental interview framework while allowing
the researcher a great amount of flexibility in questioning the respondents. Researchers are
afforded a notable degree of flexibility, even in the case of guided conversations between
interviewees and researchers.
As long as the researcher keeps the format in mind and ensures that this kind of research inter-
view does not require numerous rounds, they are free to pursue any idea or creatively utilize
the whole interview. Gathering data for a research project always requires further question-
ing of respondents. When a researcher needs in-depth knowledge on a subject but lacks the
time to study, semi-structured interviews work best.
Unstructured Interviews:
Unstructured interviews, also known as open interviews, are generally defined as discussions
conducted with the intention of obtaining information for the research project. These inter-
views are more like a conversation around a central theme and, hence have the least number
of questions. Most researchers who use unstructured interviews hope to establish a rapport
with their subjects using this method, which increases the likelihood that they will be com-
pletely honest in their responses. The researchers can approach the participants in whatever
ethical way to obtain the needed information for their research topic because there are no
rules to follow. There are no rules for such interviews, therefore, the researcher must be care-
ful to ensure that the respondents stay focused on the primary goal of the study.
The participant’s interests and abilities should be the main considerations throughout the inter-
view. The researcher should make all efforts to adhere to the acceptable boundaries of the study,
and all discussions should be done inside them. The researcher’s expertise and experience
should align with the interview’s objectives. The dos and don’ts of unstructured interviews
should be understood by the researchers.
Depending on the requirement of the research study, any of the following three methods can be
used to conduct a research interview.
Choosing the type of interview best suited for data collection for research work depends on
the objective of the research.
(i) Online Focus Group: It is a subset of internet research methods that are employed in
consumer, political and business-to-business (B2B) research. The task of leading and
supervising the focus group falls to the moderator, who extends invitations to pre-selected
and qualified participants who fit a certain interest area to participate at a specified time.
Typically, participants get incentives for participating in the conversation.
(ii) Online Interview: While the needed standard practices, communication with respond-
ents and sample methods are different in this online research approach from in-person
interviews, they are still quite similar. A variety of computer-mediated communica-
tions (CMC), primarily SMS or email, are used to arrange online interviews. These
Survey, Interview and Online Research Designs 89
interviews are divided into synchronous and asynchronous approaches based on the
response time.
Asynchronous online interviews take place by email, and the replies are typically not
received in real-time, whereas synchronous online interviews are conducted through plat-
forms like online chat. Similar to in-person interviews, online interviews delve into the
opinions and thoughts of participants about certain subjects to gain an understanding of
their backgrounds, perspectives and dispositions.
(iii) Online Qualitative Research: There are other forms of online research, particularly
qualitative research, outside the popular online focus groups and online interviews. These
forms consist of communities, blogs and mobile diaries. These techniques are very practi-
cal for researchers to collect data for their research study and help save money and time.
Because respondents may be added through surveys, panels, or already-existing data-
bases, online qualitative research methods offer a higher degree of sophistication than any
other traditional approach.
(iv) Online Text Analysis: This analysis method is an expansion of text analysis and con-
sists of a compilation of different online research samples used to extract knowledge
from information that is accessible online. This method of doing research online allows
researchers to provide written, spoken or visual explanations of communication format
categories: Documents, sentences, paragraphs, quasi-sentences, web pages, etc. Although
it is most frequently employed for quantitative research, researchers also employ qualita-
tive approaches to improve text interpretation.
(v) Social Network Analysis: The growing popularity of social networking platforms has led
to the acceptance of social network analysis, a new online research method. Graph theory
may be used by a researcher to map and quantify flows and interactions between individu-
als, groups, organizations, URLs and computers through social network analysis.
• A survey is the combination of questions, processes and methodologies that analyze data
about participants.
• The survey research tool for data collection consists of methods like online/emails, phones,
face-to-face interaction. A survey can be longitudinal or cross-sectional.
• Steps in survey research methods are to decide on survey questions, finalize a target audi-
ence, conduct surveys via decided mediums and analyze survey results.
• In survey research nominal, ordinal, interval and ratio scale-based data can be collected.
• Privacy/secrecy of respondents and responses possible, applicable for both quantitative and
qualitative research.
• The stages of survey methods are: Fix aims and objectives, selecting a sample from a target
population, identifying the survey method, designing the questionnaire, conducting the sur-
vey, collecting data and analyzing responses.
Survey, Interview and Online Research Designs 91
• Research survey can be made successful by selecting smart goals, choosing the right ques-
tions, starting with few simple and general questions, making questions with varying options
to answer especially more questions with ‘yes or no option’, keeping the survey instruments
and electronic devices intact, timely distribution of questionnaire to respondents, collection
and analysis of data and finalizing the report.
• In interview research, the investigator will have to move directly or indirectly from one sub-
ject of study to the other to get the required data through direct or indirect contacts.
• Interview Schedule/Questionnaire: An interview schedule is a structure used to gather the
data needed for any research. While the respondents often fill out the questionnaire, the
investigators typically fill up a schedule by talking with respondents or by asking questions.
• Validity of the questionnaire can be made using (i) face validity for content and sequence by
experts, (ii) pilot study on a small group for assessing lack of clarity, if any, (iii) factor load-
ing through principal component analysis to make grouping of factors, (iv) reliability test
through Cronbach’s Alpha and (v) revision based on feedback from (i) to (iv).
• Interview-based research can be carried out under different modes or designs like the
questionnaire-based direct interview method, schedule-based indirect interview method,
internet-based Google Form method, telephonic interview method, etc.
• The types of interview research can broadly be grouped as structured interviews,
semi-structured interviews and unstructured interviews.
• Structured interviews are also known as standardized interviews and are more useful for
quantitative research. Questions in this interview are pre-decided according to the required
detail of information.
• While retaining the fundamental interview structure, semi-structured interviews provide the
researcher with a great deal of flexibility in questioning the participants.
• Open or unstructured interviews are conversations that are carried out with the intention of
gathering information for a research project.
• Research interviews can be conducted in three different ways: In-person, over the phone, or
email or a website.
• There are five online research methods- online focus group, online interview, online ques-
tionnaire, online text analysis and social network analysis.
Suggested Readings
Bloch, A., C. Phellas, and C. Seale, Structured methods: interviews, questionnaires and observation in
researching society and culture, 3rd ed., Sage Publications Ltd, London, 2011.
Fowler, F. J., Survey research methods, 4th ed., Sage Publications, 2009, https://doi.org/10.4135/
9781452230184
Glasow, P. A., Fundamentals of survey research methodology, MITRE, Washington, DC; C3 Center,
McLean, VA, 2005.
https://researchmethod.net/survey-research/
Jaber, F. G., and J. A. Holstein, Handbook of interview research, Sage Publications, 2001, https://doi.
org/10.4135/9781412973588
More, J. M., ‘Determining sample size’, Qualitative Health Research 10(1): 3–5, 2000.
Patton, M. Q., Qualitative evaluation and research methods, 2nd ed., Sage Publications, Newbury Park,
CA, 1990.
Sampson, P., ‘Qualitative research and motivation research’, in Consumer market research handbook, 3rd
ed., edited by R. M. Worcester and J. Downham (eds.), Elsevier, Amsterdam, 1986.
Walker, R. (ed.), Applied qualitative research, Gower Publishing Company Ltd., Hants, 1985.
Williams, D. G., and N. A. Johnson, Essentials in qualitative research: a notebook for the field, McMaster
University, Hamilton, 1996.
9 Qualitative Research Designs
9.1 Introduction
In qualitative research, non-numerical information or data, such as verbal or written materials,
are gathered and analyzed to examine people’s ideas, perceptions, experiences, insights and
other aspects of any given research topic. The primary method of collecting data for quali-
tative research is conversational or open-ended dialogue. This approach looks at ‘why’ and
‘what’ individuals think about the things they do. The fields of social science, business studies,
marketing research and other fields can benefit more from qualitative research. It focuses on
how people perceive, understand and behave in relation to one another. The intent behind the
creation of qualitative research methodologies is to throw light on how the concerned audience
behaves and perceives a certain issue. Qualitative research methods are also used by researchers
to capture in-depth and non-numerical information. Qualitative approaches yield more descrip-
tive results, and the data readily allows for the drawing of conclusions. Social and behavioural
sciences are the fields that use qualitative research methodologies. Because of the complexity
of today’s environment, it might be challenging to comprehend how others think and see situ-
ations. As online qualitative research methods are more detailed and expressive, they facilitate
understanding of such facts.
An important approach for doing qualitative research is conducting in-person interviews. The
interviews are taken one respondent at a time. This approach is solely verbal and allows for
opportunities to get detailed information from the respondent.
The advantage of this approach is that it offers an excellent opportunity to collect accurate
information on people’s beliefs and motives. An experienced researcher can get insightful data
by posing the appropriate questions to the respondents. If the researchers require further data,
they should offer follow-up questions that will facilitate the collection of additional information.
The interviews can be held in person or telephonically. However, in-person interviews are more
effective as the researcher gets to observe the body language of the respondent and can match
the responses.
DOI: 10.4324/9781003527183-9
Qualitative Research Designs 93
9.2.2 Focus Groups
Focus groups are a frequently used technique for collecting information from a group for quali-
tative research. A target group of six to ten participants often makes up a focus group. Finding
the answers to the question as ‘why’, ‘what’ and ‘how’ is the primary goal of the focus group.
The main benefit of focus groups is that the researchers do not always have to speak with the
participants face-to-face. Focus groups may now receive online surveys on a variety of devices,
and the replies can be collected for additional processing.
Focus group approach is more costly method among different online qualitative research
methodologies. They are generally used to clarify intricate procedures. Testing of new products
or new concepts and market research can gain benefit greatly from focus groups.
This approach uses pre-existing, credible documents and related information sources as the data
source. Fresh research can get advantage from this data. This is like visiting a library where one
can go through books and other publications to get required data that is probably needed in the
study.
9.3 Process of Observations
It is a research process that uses subjective methodology to gather systematic information
or data. The main purpose of this method is to compare disparities in quality. The five main
senses—sight, smell, touch, taste and hearing—as well as how they work are the subjects of
qualitative observation. This is about traits or explanations rather than measures or statistics.
Based on live experiences of participants for the event in which they are involved as described
by them, the researchers investigate a phenomenon or event. Discussion, interviews, observa-
tion, surveys, etc. can be used to collect information on phenomena. In business, research on the
selling method used by sales representatives can be assessed using this method.
The most thorough observational method for studying people in their natural settings is ethno-
graphic research. Here the researchers are required to adjust to the natural settings of the study
population, which may be any place from a large city to an organization or can be any isolated
place. Here, geographic restrictions may provide a problem for data collection. Understanding
94 Research Methodology and Quantitative Techniques
the cultures, difficulties, motives and environments that arise can be the goal of this study design.
Rather than depending just on conversations and interviews, the researchers actually visit the
natural environments. Due to its intensive observational methodology and data collection on
those grounds, this sort of study approach might span several days to many months. It is a dif-
ficult and time-consuming process that only depends on the researcher’s expertise to observe,
analyze and infer from the situation. It mostly relates to research for the collection and analysis
of data about cultural groups. It is used in studies on the life process of people. Researchers
become part of the people who are studied to know their culture by living with them. It is basi-
cally to understand people’s behaviour in peculiar circumstances.
Example: An ethnographic study on the features, critical attributes, processes and benefits of
SHGs of women living with liquor addicted husbands.
Researchers collect data to study social processes and social structures and develop theories
inductively. It is basically to know the reasons behind actions being taken by people and to
develop theoretical models based on existing data in existing modes of genetic, biological or
psychological sciences. In business, this method is used for customer satisfaction surveys as to
why consumers use the company’s product which in turn helps the company to maintain cus-
tomer satisfaction.
Past events are critically analyzed to understand the present and to anticipate future choices.
The experience of the past is used to devise newer methods for more effectiveness in the future.
In this method, the researchers share stories to understand how participants perceive and make
sense of their experiences. It takes subjects to a starting point and reviews situations as problems
occur during the course. The business can make use of the results to devise innovations that
appeal to the target markets.
• Qualitative research relates to the collection and analysis of non-numerical data like verbal
or documented information to study concepts, perceptions, experiences, etc. of people.
• Qualitative research focuses on obtaining data through open-ended or conversational
communication.
• The strength of qualitative research is its ability to provide complex textual descriptions of
how people experience a given research issue.
• Qualitative research methods include in-depth interviews, focus groups, ethnographic
research, content analysis, case study research, etc.
• The three most common qualitative methods are participant observation, in-depth interviews
and focus groups.
• Qualitative research aims to study social and cultural phenomena based on the live experi-
ence of participants and it is an inductive approach to developing new concepts.
• Qualitative research designs emerge during the course of studies and not in advance.
• In person interview is one of the most commonly used techniques in qualitative methods.
Only one respondent is interviewed at a time.
• For the focus group, around six to ten respondents from the target population are taken as
a focus group. The main aim of the focus group is to find suitable answers to the research
questions.
• Record-keeping method uses reliable existing documents as the data source.
• The five sensory organs of seeing, smelling, tasting, hearing and touching are used in qualita-
tive observation.
• Data collection methods in qualitative research include written expression by the partici-
pants, observation by the researcher and interactive interviews with the participants by the
researcher.
• Ethnographic research is the study of the life process of people. Researchers become part of
the people who are studied to know their culture by living with them.
96 Research Methodology and Quantitative Techniques
• Grounded theory is basically to know the reasons behind actions being taken by people and
to develop theoretical models based on existing data in existing modes of genetic, biological
or psychological sciences.
• In the historical model method, past events are critically analyzed to understand the present
and to anticipate future choices.
• Case study is a specific research methodology involving a close-up, in-depth and detailed
examination of a particular case or cases within the real world.
Suggested Readings
Burns, L. D., and S. J. Lennon, ‘Social perception: methods for measuring our perception of others’, Inter-
national Textile and Apparel Association Special Publication 5: 153–159, 1993.
Hill, M., ‘Research review: participatory research with children’, Child and Family Social Work 2:
171–183, 1997.
More, J. M., ‘Determining sample size’, Qualitative Health Research 10(1): 3–5, 2000.
Patton, M. Q., Qualitative evaluation and research methods, 2nd ed., Sage Publications, Newbury Park,
CA, 1990.
Rew, L., ‘A theory of taking care of oneself grounded in experience of homeless youth’, Nursing Research
52(4): 234–241, 2003.
Sampson, P., ‘Qualitative research and motivation research’, in Consumer market research handbook, 3rd
ed., edited by R. M. Worcester and J. Downham (eds.), Elsevier, Amsterdam, 1986.
Sharma, B. S., Research methods in social sciences, Sterling Publishers Private Ltd., New Delhi, 1983.
Walker, R. (ed.), Applied qualitative research, Gower Publishing Company Ltd., Hants, 1985.
Williams, D. G., and N. A. Johnson, Essentials in qualitative research: a notebook for the field, McMaster
University, Hamilton, 1996.
Yin, R. K., Case study research: design and methods, Sage Publications, Beverly Hills, CA, 1984.
10 Design of Sample Surveys
10.1 Sampling Technique
Sampling is a scientific statistical technique widely used when one has to make a decision about
a population, universe or aggregate based on the evidence given by a representative sample
drawn from that population. Sampling is the method of choosing a part of the population to
represent it. It has a wide range of applications in different research methods like experimen-
tal, semi-experimental, observational, epidemiological, survey, interview and others in various
branches of studies. The decision taken by a doctor on the nature and type of illness of a person
based on a test of a few drops of blood taken from the person concerned, or tasting a bit of
vegetable being cooked are all day-to-day examples of decisions about a large universe based
on a small sample taken from it. Most of the research based on cross-sectional studies makes
use of scientific sampling techniques to make decisions about the population/universe under
study based on evidence from the sample. Hence sample-based studies form a major approach
of research in economics, sociology, education, commerce, agriculture, epidemiology, clinical
and other health-related studies. The aim of all sample-based studies is to get measures of basic
statistics for the population parameters which are either unknown or need updating. Getting
the value of the population parameter is a tedious task due to its large size, widespread nature,
cost to collect information from every unit of the population and the time needed to collect and
process the data. In such cases, sample-based studies to get estimates of population parameters
is a feasible option as the conformity of the sample estimate for population parameters is pos-
sible using the theory of probability-based statistical inference. Sample-based survey research
is carried out in almost all disciplines, especially in social sciences, commerce, business, man-
agement, economics, etc. The use of an appropriate sampling plan, collection of quality data
from an appropriate sample size and use of the most appropriate quantitative techniques are the
prerequisites for applied quality research.
10.3 Advantages of Sampling
A randomly selected sample, which is a true representative of the population under study, makes
it possible to get information about large populations with less cost, less field time and more
accuracy. Sampling methods are used when it is not possible to study the whole population due
to cost, time or other factors. Sample-based studies have obvious operational convenience over
a study of the entire population. It is also possible to make probability-based inferences about
population parameters using sample statistics. In other words, generalization of results based on
sample studies is possible for randomly selected samples.
10.4 Methods of Sampling
Broadly speaking, the entire sampling method can be grouped as probability samples and
non-probability samples. There are different sampling methods under these broad groups which
are shown in Figure 10.1.
10.5 Probability Sampling
Probability sampling is also known as random sampling and is a sampling technique in which
a sample from a population is selected using the method based on probability theory and every
unit in the population has a known probability of getting selected in the sample. Probability
sampling makes it possible to estimate population parameters from sample data and confirm
their validity through testing of the hypothesis.
In SRS, every unit in the population has an equal/known probability of being included in the
sample. It is applicable when units are more or less homogenous in the population and when
Design of Sample Surveys 99
Sampling Methods
Probability/Random Non-Probability/
Sample Purposive Sample
1. Judgement Sample
1. Simple Random Sample
2. Convenience Sample
2. Stratified Sample
3. Snowball Sample
3. Systematic Sample
4. Quota Sample
4. Cluster Sample
5. Multi-Stage Sample
the size of the population is known and a list of units is available. When units with a higher size
have a larger probability of being included in the sample, it is called probability proportion to
size (PPS) sampling (with or without replacement).
Example: Selection of households from a tribal area for nutritional/family planning adoption/
immunization level of children, etc.
It is the most commonly used sampling plan when the population is heterogeneous and can
be grouped into strata/classes which are homogenous within the classes. Random samples are
drawn independently from each stratum. Let N1, N2, . . . Nk be the size of strata in the population
of size N, then a sample of size n1, n2, . . . nk drawn from each stratum so that the size of the sam-
ple is n. This sampling plan has the advantage of ensuring the representation of each group in the
sample and a more precise estimate of population parameters. It has the advantage of generating
strata-based and overall estimates of parameters under study. The allocation of sample size in
strata can be equal or proportional to the size of strata in the population.
Examples: Students according to classes, patients according to disease, households based on
social class, etc.
10.5.3 Systematic Sampling
It is used when some order exists in population units and the population is finite and known in
size (serially arranged medical records, patients visiting an OPD). In this plan the first unit is
selected at random and subsequent units are selected according to some pre-determined rule.
Let N = nk, (N is the size of the population and n is the size of the sample so that k = N/n),
then one unit is selected at random, say rth unit from first k units in the population. The units at
r, r + k, r + 2k will form the sample of size n.
100 Research Methodology and Quantitative Techniques
Example: Selection of sample patients in a hospital during a certain period when the total
number of patients to arrive (N) by the end of the study period is known. If n is the sample,
k = N/n is worked out and one number is randomly selected (r) which is known as a random
start. Therefore, rth patient arriving for registration will be the first sample unit and subsequent
units will be r + k, r + 2k, from the registration list.
10.5.4 Cluster Sampling
The cluster sampling plan is used when the population under study is widespread and hence the
cost of collecting data from selected sample units will be higher. It may be possible to group
the final units of study as clusters. In cluster sampling, clusters having a group of final sampling
units will be selected randomly and all units in the selected cluster will be included in the final
sample. The variability in population will be addressed by selecting a greater number of clus-
ters. The data from all units of selected clusters will be collected and processed.
Example: For a district-level study of a rural health problem based on households, the rev-
enue villages which are well-defined can be treated as a cluster. All the households of selected
villages will form the final sample. So, the researcher must randomly select the required number
of villages to meet the sample size.
10.5.5 Multi-Stage Sampling
To study a large area (a state or a country) is a tedious job to consolidate a list of all sampling
units, to select a random sample and in such cases, multi-stage sampling can be used where ulti-
mate sampling units are selected in stages. To arrive at the ultimate sampling units, the selection
process can be done in stages.
For example, for a household-level health survey in a state, the selection of the sample can
be made in stages of districts, then blocks, then villages and finally households as ultimate
sampling units.
All the sampling plans mentioned above fall under a random sample and have the scope to
generalize results after the test of significance.
10.6 Purposive Sampling
The non-random samples, or purposive samples, are generally used to generate quick estimates
of unknown parameters which are required in many exigencies, especially in agriculture, health
and medical sciences, natural calamities, etc. The application of probability theory is not pos-
sible in purposive sampling and hence generalization of results is not feasible. However, the pre-
liminary information and quick results of practical utility can be generated for many purposes.
10.6.1 Snowball Sampling
When a sampling frame is not readily available and as time passes, the size of the sample goes
on increasing like a rolling snowball in this type of sampling. It is usually used for studies on
populations like sex workers, HIV patients, drug addicts, etc. The available units make it pos-
sible to have more new units as the study progresses due to their association with persons of
similar type. This method is also called the chain-referral sampling method. This sampling tech-
nique can go on and on, just like a snowball, increasing in size (in this case the sample size) till
the researcher has enough data to analyze and draw conclusive results.
Design of Sample Surveys 101
10.6.2 Quota Sampling
The researchers select a sample according to certain traits or qualities. The population is ini-
tially segmented into mutually exclusive subgroups. Then based on judgment, units are selected
from different segments of a population when the sample frame is not readily available for the
population.
10.6.3 Judgement Sampling
The sample is selected based on the judgement of the investigator. In judgement sampling, the
sample units are chosen only on the basis of the researcher’s knowledge and judgment. It ena-
bles us to select cases that will answer the research question(s) and meet the objectives of the
study. Hence it is a purposive sampling method.
10.6.4 Convenience Sample
The sample is selected at the convenience of the investigator in terms of approach or contact.
The application of statistical inference is valid in the case of a random sample. In other
words, the generalization of results emerging out of the sample is more valid in the case of a
random sample.
10.7 Sample Size
Sample size means the number of samples selected/needed for a study. Sample size determina-
tion is the method of selecting the optimum number of sampling units to collect data/obser-
vations. An accurate sample size is important to make valid findings from the sample for its
generalization for the population under study. The larger the sample size, the more accurate the
findings from a study. Generally, if the sample size is 30 or more (n > 30), it is considered a large
sample. For large samples, the sampling distributions of statistics are normal (Z distribution). If
the sample size is less than 30, small sample techniques are used. The sample size determination
has a direct bearing on statistical inference-both estimation and testing of hypothesis.
(i) Which sampling technique is to be used for a particular nature and type of population?
(ii) What would be the appropriate sample size to estimate precisely the population parameters?
The answer to the first question can be obtained from the description of the situations explained
above. Depending upon the required precision of the estimate to be obtained, the size of a sam-
ple for estimating the mean value or a proportion/prevalence can be worked out. A sample larger
than what is scientifically needed is a simple waste of time and resources. Similarly, a sample
smaller than what is theoretically required will pose problems to the precision of the estimate.
Before collecting data, it is important to determine how many samples are needed to perform
a reliable analysis. Sample size determination is the statistical assessment of the number of
population units to be included in the sample for the study. The sample size must be adequate
to represent the population. The determined size should be optimum and must be obtained by
the scientific method.
102 Research Methodology and Quantitative Techniques
10.7.2 Factors Affecting the Size of the Sample
The sample size for a study depends on many factors. Some of these are:
• Nature of the Population: If the population is homogenous, fewer cases will be enough. If
the population is heterogeneous, a large size sample will be required.
• Availability of Resources: The resources available including time and money are to be con-
sidered before determining the size of the sample. Large samples can be taken if sufficient
time and money are available.
• Type of Sampling Method: If the sampling method is restricted to random sampling, a mod-
erate sample will be enough. In simple random sampling, more numbers may be selected to
ensure the representation of all. Cluster sampling demands more samples when compared to
stratified sampling.
• Degree of Accuracy Required: If a higher degree of accuracy is required, the size of the
sample should be large.
• Nature of Analysis: Sample size may be influenced by the statistical tools and tests a researcher
plans to use for the analysis. Complex multivariate statistical analysis needs larger samples.
• Factors Used for Sample Size Calculation: Sample size is influenced by the factors used
for sample size calculation such as size of population, margin of error, confidence level,
extent of variability, etc.
• Margin of Error: The margin of error is a statistic expressing the amount of random sampling
error in the results of a survey. The margin of error decreases with an increase in sample size.
• Confidence Level: It is the probability that a population parameter lies within a given margin
of error with respect to the sample estimate.
Optimum Sample Size: The determined size should be optimum and must be obtained by
scientific method. An optimum sample for a study may be defined as that sample which fulfils
the requirements of efficiency, representativeness, reliability and flexibility. The sample should
be small enough to avoid unnecessary expenses and large enough to reduce sampling errors.
The larger size can lead to ethical concerns, time consumption and financial wastage. Smaller
sample sizes may cause misrepresentation, inefficiency and insignificant results. An optimum
sample size is required to allow for appropriate analysis, to provide the desired level of accuracy
and to allow validity of significance test. The common factors considered for the sample size
calculations for different research designs are:
• The research hypothesis, null hypothesis (H0) and alternative hypothesis (H1) of the research
study.
• The two types of errors, Type I error and Type II error and their probabilities as (ἀ) and (β).
• The precision of the estimate as interval estimate = estimate ± reliability coefficient × SE.
• If study variable X follows N (μ, σ) and X is the mean of X, then the confidence interval for,
• μ= X ±Z σ/√n.
(1-ἀ/2)
• If study variable X follows N (μ, σ) and p is the estimate of population proportion P and
q = 1-p, then the confidence interval for P is given by = p ± Z(1-ἀ/2)√(pq/n).
• Arbitrary Approach: It is the rule of thumb method which specifies a fixed percentage of
the population as sample size. According to this approach, a sample may be at least 5% of the
population and 10% is the ideal.
Design of Sample Surveys 103
• Conventional Approach: According to this approach, the average sample size used in simi-
lar studies can be taken as sample size.
• Statistical Analysis Requirements Approach: Here, the sample size is determined based
on the proposed statistical analysis considerations. Sample size is determined by the require-
ments of the proposed statistical techniques for the analysis.
• Cost-Benefit Basis Method: In this approach, the sample size is determined based on the
availability of resources and the benefits expected. Generally, it is considered ideal for
non-probability sampling methods.
• Confidence Interval Approach: It applies the concept of variability, sampling distribution
and standard error of the estimate. Several formulas have been suggested by statisticians. The
elements generally needed to determine the sample size include the amount of variability in
the population, desired accuracy (acceptable sampling error), level of confidence or precision
level, etc.
Every researcher must identify the primary and secondary outcome variable of the study to state
the research hypothesis to assess the required sample size. The outcome variable can be continu-
ous or discrete depending upon it as measurements or counts. The researcher must identify a
few published research papers comparable to the research topic under study. The estimated pro-
portion/prevalence, mean and standard deviation of outcome variable are estimated as per the
published papers comparable to the topic of study. When the standard deviation of the parameter
under study is not available at least the possible range of data (maximum-minimum) as per sci-
entific norms may be used (since range » 6 SD).
Zα 2 σ2 Zα 2 σ2
or d 2 = or n =
n d2
Z a 2 pq
n=
d2
Where, Zα = standard normal value at α% level of significance (= 1.96 at a = .05 (5% level of
significance),
p = prevalence rate of diseases of study or proportion/probability in favour of binary outcome
variable (to be obtained from other similar comparable published studies or secondary data
sources),
q = 1 − p as p + q = 1; d = precision level of the estimated value of p (width of the confidence
interval for population value of P to be decided by the researcher (team).
The sample size for the RCT-based study is selected by assuming the level of significance
or probability of Type I error (a = 0.05 or 0.01); the probability of Type II error (β = 0.20 or
β = 0.10), power of test as 1-β; reasonably assessed minimum important difference (MID) of
outcome values that would be worth detecting by the researcher; population standard devia-
tion based on other studies of comparable type (σ), carefully defined outcome of interest for
the study, equal allocation of study participants (n) between treatment and control groups and
the statistical null hypothesis of equivalence of outcome variable for the treatment and control
groups.
(Zα /2 + Zβ )
2
× 2σ2
n=
(µ 2 − µ1 )
2
2 ( Zα / 2 + Zβ ) p m q m
2
n=
(p1 − p 2 )
2
p1 + p 2
pm = ; q1 = 1− p1 ; q 2 = 1− p 2 and q m = 1− p m
2
Design of Sample Surveys 105
10.7.3.4 Sample Size for Case-Control Studies Based on Qualitative Variable
(Cancer as Disease and Smoking as Risk Factor)
In case-control studies, the outcome of cases (group with disease) is compared with control
(comparable group with all aspects except disease) for assessing for the risk factor (smoking)
for having the disease (cancer). The sample size n is calculated as:
( )( )
r + 1 p q ( zβ + zα )
∗ ∗ 2
Sample Size n =
r ( p1 − p2 )
2
Where, r = ratio of size in control to size in cases (r =1, when size of case and control is same);
p1= proportion of exposure to risk factors in cases (from previous studies);
p2 = proportion of exposure to risk factors in control (from previous studies), normally p > p ,
1 2
p1 + p2
p* = average of proportions exposed in case group and control group, p = *
2
q∗ = 1− p∗ , Z b = SND variate at the power of the test (0.84 for 80% power and 1.28 for
90 % power) and Z a/2 = SND variate at the level of significance at α level (1.96 at 5% and 2.58
at 1% level).
Suppose that the researcher is interested in studying diabetes in adulthood as a case associated
with birth weight (quantitative variable) as a risk factor).
The birth weight is a continuous quantitative variable. The researcher may start with adults
with diabetes as a case and non-diabetic as a control. Both groups will be looked back for child-
hood birth weight.
2
2
σ zβ + z α
(r + 1)
2
Sample size n = ×
r d2
Where, r = ratio of size in control to size in cases (r =1, when size of case and control is
same),
σ = standard deviation of childhood birth weight (from previous studies),
d = mean difference in case and control of childhood birth weight (from previous studies),
Zβ = SND variate at the power of the test (1-β), that is, (0.84 for 80% power and 1.28 for
90 % power),
z a � = SND variate at the level of significance at ἀ level (1.96 at 5% and 2.58 at 1% level).
2
In cohort studies, healthy subjects with and without exposure to risk factors are observed. The
researcher will start with two groups, one exposed to risk factors and another not exposed to risk
factors. Both groups are followed up for having cases either in the past (retrospective cohort) or
in the future (prospective cohort). The sample size is calculated as:
106 Research Methodology and Quantitative Techniques
2
p0 (1− p0 )
zα 1 + 1 pq + zβ + p1 (1− p1 )
m m
n=
( p0 − p1 )
2
Z a = SND variate at the level of significance at ἀ level (1.96 at 5% and 2.58 at 1% level),
Z b = SND variate at the power of the test (1-β), that is, (0.84 for 80% power and 1.28 for
90% power),
p0 = probability of case/event in control,
p1 = probability of case/event in risk factor exposed group,
m = number of control subjects per unit of risk-exposed group.
p + mp0
p= 1
m +1
q = 1− p
The sample size for correlation studies to test population correlation coefficient ƿ.
Here null hypothesis H0: ƿ = 0 and alternative hypothesis H1: p ¹ 0.
The level of significance is ἀ; the power of the test is 1-β and for assumed sample correlation
coefficient ‘r’. The sample size is calculated as
zα + zβ 2
n = +3
c
1+ r
where c = 0.5× Ln
1− r
Zἀ = standard normal variate at ἀ level of significance,
Zβ = standard normal variate at 1-β level of power of the test.
Z2 × S N (1− S N )
1− a
2
Sample Size for Sensitivity n =
L × Prevalance
2
Z2
1− a
× SP (1− S P )
2
Sample Size for Specificity n =
L × (1− Prevalance)
2
The sequence of steps in the selection of a sample for any study is depicted in Figure 10.2.
• Sampling is a scientific statistical technique widely used when one has to make a decision
about a population or universe or aggregate based on the evidence given by a representative
sample of that population.
• Sample-based studies aim to get estimates of population parameters which are tested for
confirmation.
• Population parameter: The statistical constants like mean, SD, variance and other measures
calculated for a population are called population parameters.
• Sample statistic: The statistical constants like mean, SD, variance and other such measures
are calculated using a sample of observations selected from a population is called a sample
statistic.
• Sampling makes it possible to get information about large populations with less cost, less
field time and more accuracy.
• The sampling methods can be grouped as probability samples and non-probability samples.
108 Research Methodology and Quantitative Techniques
• In simple random sampling, every unit in the population has an equal/known probability
for being included in the sample, applicable when units are more or less homogenous in the
population and when the size of the population is known and a list of units is available.
• Stratified sampling is the most commonly used sampling plan when the population is het-
erogeneous and can be grouped into strata/classes which are homogenous within the classes
and random samples are drawn independently from each stratum.
• Systematic sampling is used when some order exists in population units and the popula-
tion is finite and known in size. The first unit is selected at random and subsequent units are
selected according to some pre-determined rule.
• Cluster sampling, clusters having a group of primary sampling units will be selected ran-
domly and all units in the selected cluster will be included in the final sample.
• Multistage sampling: When the area coverage of a study is relatively large, like country
or state, for studies at the household level, direct random selection of households is tedious
for many reasons. Hence the required sample households will be selected at stages like dis-
tricts, blocks, villages and then households from selected villages for which the list will be
available.
• For purposive sampling, random selection criteria are not followed.
• Snowball sampling: This purposive sampling method starts with readily available units
which makes it possible to have more new units as the study progresses due to their associa-
tion with persons of a similar type. This method is also called the chain-referral sampling
method.
• Quota sampling: The population is initially segmented into mutually exclusive subgroups,
then based on judgement units are selected from different segments.
• In judgement sampling, the sample units are chosen only based on the researcher’s knowl-
edge and judgment.
• Sample size: For sample-based studies of the population, the sample size must be optimum.
Based on the nature of the estimate as mean values or proportion or the study variable as dis-
crete or continuous, the optimum number of samples required must be calculated before the
start of the study. The procedure for the selection of samples for different situations is given
in Section 10.7.3.
Suggested Readings
Buderer, N. M., ‘Statistical methodology I: incorporating the prevalence of disease into the sample size
calculation for sensitivity and specificity’, Academic Emergency Medicine 3(9): 895–900, 1996.
Fox, D. R., ‘Computer selection of size-biased samples’, The American Statistician 43(3): 168–171, 1989.
Golmant, J., ‘Correction: computer selection of size-biased samples’, The American Statistician,
194–194, 1990.
Goodman, L. A., ‘Snowball sampling’, Annals of Mathematical Statistics 32: 148–170, 1961.
Heckathorn, D. D., ‘Respondent-driven sampling: a new approach to the study of hidden populations’,
Social Problems 44: 174–199, 1997.
Heckathorn, D. D., ‘Respondent-driven sampling II: deriving valid estimates from chain-referral samples
of hidden populations, Social Problems 49: 11–34, 2002.
Kadam, P., and S. Bhalerao, ‘Sample size calculation’, International Journal of Ayurveda Research 1(1):
55–57, 2010.
Krejcie, R. V., and D. W. Morgan, ‘Determining sample size for research activities’, Educational and Psy-
chological Measurements 30: 607–610, 1970.
Salganik, M. J., and D. D. Heckathorn, ‘Sampling and estimation in hidden populations using
respondent-driven sampling’, Sociological Methodology 34: 193–239, 2004.
Singh, D., P. Singh, and P. Kumar, Handbook on sampling methods, Indian Agricultural Statistics Research
Institute, New Delhi, 1968.
11 Scaling, Coding and Scoring
Techniques in Research
11.1 Scale
Scaling Techniques: It is the method of placing respondents in continuation of gradual change
in pre-assigned values, symbols or numbers based on the features of a particular object per the
defined rules.
All the scaling techniques are based on any or all four aspects i.e. description, order, distance
and origin. The research is highly dependent upon the scaling techniques, without which quan-
titative analysis is not possible in many situations.
A scale is a continuous measurement having a lowest and a highest point, and in between
many points. Scaling techniques in research help to quantify qualitative aspects to a quantifi-
able level. Aspects like attitude, attributes, behaviour, feelings, opinion, etc. can be measured
on such scales.
11.1.1 Reliability of a Scale
Reliability of a scale refers to consistency in measurement and can be assessed using any of the
following methods:
Re-Test Method: The same scale is applied on all units of a population twice to examine vari-
ation, if any, in scaling the objects.
Multiple Form Method: Two types of scales are initially applied to the same group and order
of objects in the two scales examined.
Split-Half Method: The scale is divided into two equal parts. Apply both scales to objects of a
group. Work out the correlation of scores. A high correlation of scores is treated as reliable.
11.1.2 Validity of Scale
Validity of a scale refers to the accuracy in measurement and can be assessed using any of the
following methods:
Logical Validity: The scale must conform to common sense and logic.
Known Group Application: The scale can be applied to known objects and the results compared.
Jury’s Opinion: If several jurists give the same scores on the same object, then the scale is
valid.
Independent Method: Use two independent criteria to measure the scale value of objects and
if results are similar then the scale is valid.
DOI: 10.4324/9781003527183-11
110 Research Methodology and Quantitative Techniques
11.1.3 Difficulties in Scaling
The difficulties in scaling include the non-quantifiable nature of qualitative aspects, conceptual
problems of qualitative aspects and inconsistent behaviour of the human mind. Additionally, the
scales are not universal.
11.1.4 Types of Scale
Table 11.1 will better clarify the characteristics of all four primary scaling techniques includ-
ing the scope for arithmetic operations and basic statistical calculations.
Scaling of objects can be used for a comparative study between two or more objects (products,
services, brands, events, etc.).
Scaling, Coding and Scoring Techniques in Research 111
Table 11.1 Characteristics, Scope for Arithmetic Operation and Basic Statistical Calculation of Primary
Scaling Techniques
Following are the two categories under which other scaling techniques are used based on
their comparability:
• Comparative Scales: For comparing two or more objects of a study, a comparative scale
can be used by the respondents. Following are the different types of comparative scaling
techniques:
(i) Paired Comparison: It is used to select any one out of the two objects/products by
the respondents. This technique is mainly used for product testing, to facilitate the
consumers with a comparative analysis of the two major products (say, P and Q) in
the market. To compare more than two objects say comparing P, Q and R, one can
first compare P with Q and then the superior one (i.e. one with a higher percentage)
with R.
(ii) Rank Order Comparison Scale: In rank order scaling the respondent needs to rank or
arrange the given set of objects according to his or her preference.
For example, a soap manufacturing company conducted a rank order scaling to find
out the ordering preferences of consumers. The respondents were asked to rank four
soap brands in the sequence of their choice as in Table 11.2.
The above scaling shows that soap ‘Y’ brand is the first and most preferred brand,
followed by soap ‘X’, then soap ‘Z’ and the least preferred one is soap ‘V’.
V 4
X 2
Y 1
Z 3
112 Research Methodology and Quantitative Techniques
(iii) Constant Sum Comparison Scales: Is a scaling technique where a continual sum of
units is allocated to the given features, attributes and importance of a particular product
or service by the respondents.
For example, three respondents belonging to three different social groups are asked
to allocate 100 points to the following attributes of a soap product ‘S’. Their response
is given in Table 11.3.
Table 11.3 Response of Three Respondents on Features of New Soap ‘S’ Using Constant Sum Scale
Appearance 22 16 18
Skin Protection 22 24 16
Smell 14 22 24
Packaging 18 16 20
Unit Price 24 22 22
Total 100 100 100
From the above constant sum scaling analysis, we can see that:
Respondent 1 considers product ‘S’ due to its competitive price as a major factor.
Respondent 2 preferred the product because it is skin protection.
Respondent 3 preferred the product because of its smell.
In such situations, constant sum scales can be used to identify the most preferred fea-
ture out of all given features of the object, product or services.
(iv) Q-Sort Scaling: Q-sort scaling is a technique for sorting into groups of uniform nature
or preference out of a given large number of objects. It emphasizes the ranking of the
given objects in descending order to form similar piles based on specific attributes. It
is a modified form of rank-order scaling.
For example, the marketing manager of a garment manufacturing company sorts
the most efficient marketing executives based on their past performance, sales revenue
generation, dedication and growth. The Q-sort scaling can be performed on a group of
executives, and the marketing head can create a few groups based on their efficiency.
• Non-Comparative Scales: A non-comparative scale is used to analyze the performance or
choice of an individual product or object on different parameters. Following are some of the
most common types:
(i) Continuous Rating Scales
It is a graphical rating scale where the respondents are free to place the object in a posi-
tion of their choice. It is done by selecting and marking a point along the vertical or
horizontal line which ranges between two extreme points.
For example, a mattress manufacturing company used a continuous rating scale to find out the level
of customer satisfaction of its new comfy bedding. The response can be taken in the following ways:
Rating Scale: It gives position to individuals. There are two types of rating scales. (i) Graphic
rating, (ii) itemized rating.
• Graphic Rating: Rater indicates his rating position by ticking on a graphic scale such as
excellent (1), very good (2), good (3), average (4), below average (5), poor (6), very poor (7).
• Itemized Rating: The rater selects one of the limited choices which are given in terms of
scale position on a 0–5 or 0–10 scale.
Scaling, Coding and Scoring Techniques in Research 113
(i) Itemized Rating Scale:
Itemized rating scale is a widely used technique under the non-comparative scales.
It emphasizes on choosing a particular category among the various given categories
by the respondents. Each class is briefly defined by the researchers to facilitate such
selection.
The three most commonly used itemized rating scales are as follows:
(ii) Summated Scale (Likert Scale):
The Likert scale was developed by Rensis Likert, a psychologist in 1932. A definite
number of favourable and unfavourable statements are used. The respondent has to
react to each of these statements and express the degree of agreement or disagreement.
Each of his/her responses is given a numerical score in the order of degree. The total
score for each respondent is worked out by adding scores of each statement. In the
Likert scale, the degree of agreement is given scores as in Table 11.4.
Strongly approve 05 or +2
Approve 04 or +1
Undecided 03 or 0
Disapprove 02 or −1
Strongly disapprove 01 or −2
The Likert scale makes it possible to transform ordered qualitative aspects into quantitative
mode so that quantitative techniques applicable to numerical data can be used to analyze data
for further inferences.
In the Likert scale, the researcher provides some statements and asks the respondents to mark
their level of agreement/disagreement or satisfaction/dissatisfaction over these statements by
selecting any one of the given options from the alternatives.
For example, a shoe manufacturing company adopted the Likert scale technique for its new
sports shoe range named Z sports shoes. The purpose is to know the agreement or disagreement
of the respondents.
For this, the researcher asked the respondents to tick any one option representing the most
suitable answer according to them, out of the following options:
The illustration in Table 11.5 will help the company to understand what the customers think
about its products. Also, whether there is any need for improvement or not.
(iii) Semantic Differential Scale:
It is a scale used for attitudinal assessment. Attitudes of people on objects, concepts,
ideas, etc. are assessed using this scale. It consists of bipolar adjective pairs of extreme
situations like beneficial-harmful, good-bad, wise-foolish, etc. Hence it is a type of
114 Research Methodology and Quantitative Techniques
Table 11.5 Use of Likert Scale to Get the Degree of Agreement for a New Sports Shoe Range
Feature SD D NAND A SA
Very lightweight
Durable
Cost-effective
Extremely comfortable
Look too trendy
Recommend it to others
differential scale used to derive the respondent’s attitude on certain objects, ideas, events,
etc. It is commonly used for purposes like customer satisfaction, employee satisfaction,
etc. Depending upon the nature of the commodity, idea, concept, object, etc., the com-
monly used bipolar adjectives are good-bad, hard-soft, fast-slow, hot-cold, difficult-easy,
happy-sad, etc. The semantic differential scale on attitude toward a new product by cus-
tomers can be assessed by selecting any logical value between k to -k for aspects like
taste (good to bad), colour (bright to dull), keeping quality (long to short), etc.
In a bipolar seven-point non-comparative rating scale the respondent marks any of
the seven points for each given attribute of the object as per personal choice, thus
depicting the respondent’s attitude or perception towards the object.
For example: A well-known brand for watches carried out semantic differential
scaling to understand the customer’s attitude towards its product. The customers are
required to tick the most appropriate option for each of the attributes related to the
watch. The representation of this technique is given in Table 11.6.
From +3 +2 +1 0 -1 -2 -3 To
From the above diagram, one can analyze customers’ preferences and non-preferences
for the given feature of the product. Where more ticks on negative points are observed,
the company can make modifications to the product to make it more appealing or posi-
tive for such features.
(iv) Visual Analogous Scale (VAS):
The VAS scale is used to rate attributes/feelings/sensations like pain, shivering, cold,
etc. on a 10-point line depending upon intensity. The line can be horizontal or vertical.
VAS is a 10 cm or 100 mm line with extreme points at both ends like no pain to severe
pain, worst quality to best quality, highly alert to extreme dullness and so on. The VAS
scales are used in medical sciences to assess the magnitude of pain while undergoing
treatment or intervention as part of research work.
(v) Stapel Scale:
A Stapel scale is an itemized rating scale that measures the response, perception or
attitude of respondents toward a particular object through a unipolar rating. The range
of a Stapel scale is between −5 to +5 eliminating zero, thus confining it to ten units.
Scaling, Coding and Scoring Techniques in Research 115
For example, A tour and travel company asked the respondent to rank their holiday pack-
age in terms of value for money and user-friendly interface as in Table 11.7.
+5 +5
+4 +4
+3 +3
+2 +2
+1 +1
0 0
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
If a greater number of respondents tick the positive values of the upper range for a particular
aspect, it shows higher customer satisfaction. On the other hand, if more respondents tick the
negative values radical changes are to be made for those aspects and low-value positive ticks
also, special attention is needed to improve the choices of the people/respondents.
There are a large number of scales and scores developed by medical scientists for specific pur-
poses, some of these are given in Table 11.8.
(Continued)
116 Research Methodology and Quantitative Techniques
Table 11.8 (Continued)
11.1.7 Errors in Scale
We know that the errors in sampling techniques include sampling and non-sampling sources.
The sampling errors can be minimized by increasing the sample size whereas non sampling
errors are to be minimized through training and supervision of data collection personnel and by
using error free equipment to record such data. The measurement errors fall under non sampling
errors. The errors in scale are a part of non-sampling error which includes:
11.2 Coding
Coding is a technique for scientific communication between the human mind and computer
system. It is transformation of human language to computer language. Coding is a process for
identification and classification by the computer. Quite often, the data collected for research
purposes include both numerical and non-numeric information like names, attributes, etc. The
nonnumeric data of such types will have to be coded for the purpose of computer manipulations.
Scaling, Coding and Scoring Techniques in Research 117
Coding can be numeric (01, 02 or alphanumeric A-01, A-02). In the International Classification
of Diseases (ICD) all diseases are coded with alphanumeric codes having four characters as per
WHO guidelines. In the Farm Analysis Package (FARMAP) of FAO, all agricultural commodi-
ties, inputs, output, etc. are numerically coded according to record type. For entering research
data in MS EXCEL, researchers can develop codes for descriptive data and information for
items like name of respondent, location, sex, etc. along with other qualitative information.
11.3 Scoring
These days scorecards are used for appointment, promotion, etc. of people based on academic
merit, experiences, etc. to make the process more objective-oriented and transparent. Scoring is
a method of generating numerical data as scores for research variables from the respondent’s
responses on multiple choice options, dichotomous or qualitative responses, etc. KAP (knowledge,
attitude and practice) study is a research method widely used in agricultural extension, community
health, education, etc. For each of the selected respondents of the study, scores are generated for
knowledge, attitude and practices using objective-type questions. Once the scores are generated
for knowledge, attitude and practice based on objective statements for each of the respondents,
then statistical analysis as applicable to qualitative data can be done. Sometimes the aggregate
of scale values of different aspects of an item is treated as a score. In KAP studies the scores for
knowledge, attitude and practice can be developed for studies like the KAP study on farmers for
organic farming, KAP study on family planning adoption by adults of reproductive age, KAP
study on prevention of COVID (mask, hand washing and social distancing) in rural areas, etc.
Global Assessment Functioning (GAF) Score: GAF scoring system is that mental profes-
sionals use to assess the effective functioning of the daily life by a person. It is also used to
measure the impact of psychiatric illness on the personal life, skills and abilities of a person.
The score ranges from zero to 100. It is based on the level of functioning in daily life and the
severity of mental illness of a person. Doctors determine GAF score based on conversion with
the patient, their family members or caretakers, review of medical records and on examining
their behavioral history including police/court records. Though the score is numerical, it is more
subjective. As the score declines, the severity of mental illness increases.
Suggested Readings
Bentler, P. M., and D. G. Weeks., ‘Restricted multidimensional scaling models’, Journal of Mathematical
Psychology 17: 138–151, 1978.
Blalock, H. M., Social statistics, McGraw-Hill, New York, 1972.
Commandeur, J. J. F., and W. J. Heiser, Mathematical derivations in the proximity scaling (PROXSCAL) of
symmetric data matrices, Department of Data Theory, University of Leiden, Leiden, 1993.
Green, P. E., and V. Rao, Applied multidimensional scaling, Dryden Press, Hinsdale, IL, 1972.
Jones, L. E., and F. W. Young, ‘Structure of a social environment: longitudinal individual differences scal-
ing of an intact group’, Journal of Personality and Social Psychology 24: 108–121, 1972.
Kristof, W., ‘Estimation of true score and error variance for tests under various equivalence assumptions’,
Psychometrika 34(4): 489–507, 1969.
Nishisato, S., Analysis of categorical data: dual scaling and its applications, University of Toronto Press,
Toronto, 1980.
Schiffman, S. S., M. L. Reynolds, and F. W. Young, Introduction to multidimensional scaling: theory,
methods and applications, Academic Press, New York, 1981.
Torgerson, W. S., Theory and methods of scaling, Wiley, New York, 1958.
12 Research Variables and Research Data
12.1 Research Variables
The research methods and designs in quantitative research aim to generate research data or evi-
dence for the research variables applicable to a specific topic of research. Hence the knowledge
of broad classes of research variables and research data is important, especially in the context of
the application of statistical software like SPSS, SAS, R-programming, etc.
The broad classes of research variables and their meanings are:
Independent variable: It is that category of variables representing a specific and specified item
of measurement in research which other variables in the study cannot alter, but it can change/
influence the values of many other variables. For example, age, height, weight, etc. in health
studies; rainfall, inputs, etc. in agricultural production studies. In regression analysis of type
Y = F(X), X is the independent variable.
Dependent variable: It depends on the values of other variables. Normally, independent vari-
ables can change the values of dependent variables. For example, customer satisfaction as a
dependent variable (Y) depends on customer services as an independent variable (X), or crop
production (Y) as a function of the independent variable fertilizer (X). In regression analysis,
Y = F(X), Y is the dependent variable.
Explanatory variable: Independent variables in regression analysis are also called explanatory
variables (X). Values of these variables can explain the variations in values of the depend-
ent variable (Y). In cause-and-effect studies, the changes in the values of these variables
(X) cause a corresponding change in the values of the dependent variable (Y). If Y = F (X1,
X2, . . . , Xk), variables X1, X2, . . . , Xk are explanatory variables.
Dummy/Proxy variable: In many studies, the explanatory variables can be qualitative vari-
ables like sex, religion, season, etc. which cannot be measured in numerical form, but can
influence the dependent variable of the study. A dummy variable is a constructed variable
to describe variation in the outcome/dependent variable due to such qualitative variables
of the study. Normally such variables are included as explanatory/independent variables in
regression analysis. When quantitative data like exact age is not available, age groups such
as young or old can be included as dummy explanatory variables.
Lagged variables: In regression analysis like acreage response functions of crops, consumption
function of families, etc. the current period values may depend on lagged (previous period)
values of the dependent variable itself as well as lagged and current values of other explana-
tory variables. For example, the consumption function may be of the form Ct = F (Ct−1, Ct−2,
Yt, Yt−1, X1t, X2t, . . .). Where Ct is the tth year consumption (dependent variable) and inde-
pendent/explanatory variables include one year lagged consumption (Ct−1), two year lagged
DOI: 10.4324/9781003527183-12
120 Research Methodology and Quantitative Techniques
consumption (Ct−2), current year income (Yt), one year lagged income (Yt−1), current year
family size (X1t), current year livestock population (X2t), etc.
Outcome variable: Most of the research, especially experimental research, is focused on an
outcome variable which depends on the values of independent variables/interventional vari-
ables. The academic performance/marks obtained by a student in an examination depend on
his/her attendance in class or daily hours of study at home/hostel. Here, marks obtained are
the outcome variable of a study on academic performance. Generally, the outcome variable
is the dependent variable of the study. In medical research, the onset time and duration of
sensor or motor block as a response to anaesthesia drugs are the outcome variables. Similarly,
the healing time in surgical methods or duration of discharge from hospital etc. are outcome
variables.
Endogenous variable: These are variables in a statistical model that are changed or determined
by their relationship with other variables; therefore endogenous variables are dependent
variables. In demand-supply studies, price is an endogenous variable as supply can cause
changes in prices.
Exogenous variable: These are variables not included in the system but influencing the out-
come variable from outside like environmental factors like atmospheric temperature, hail-
storm, wind, etc. in an input-output study of agricultural production.
Intervening variable: It is a mediator variable that links the dependent and independent vari-
ables in a study. For example, rainfall or sun intensity are intervening variables in a study on
fertilizer response to agricultural production.
Control variables: Control variables are those kept constant or fixed during the study. For
example, in the agricultural production study, the crop variety was kept the same for all the
fields as a control variable.
Moderating variables: A moderating variable can influence the relationship between the
dependent and independent variables through its presence. In a study on agricultural produc-
tion, the climatic factors work as a moderating variable.
Extraneous variables: Extraneous variables are those factors the researcher failed to consider
while planning an experiment. For example, the basic fertility level of the soil in a fertilizer
response study which the researcher has missed while planning the experiment is an extrane-
ous variable. It can influence the outcome variable and hence lead to erroneous conclusions.
Quantitative/Metric variables: Quantitative variables are those that can get values as numbers
of any type. For example, family size, family income, family expenditure, etc., in household
economy-related studies.
Discrete Quantitative variable: It includes those quantitative variables taking values as counts.
For example, family size, number of children, number of dependent family members, num-
ber of milk animals, etc. in rural household-based studies.
Continuous Quantitative variable: It includes those variables taking values as measure-
ment, height, weight, age, etc. For example, income and consumption expenditure, income
expenditure-based studies and blood pressure or sugar level in health studies.
Qualitative/Non-Metric/Categorical variable: It includes non-numerical values or data on a
nominal scale. For example, male or female; joint family or nucleolus family; rural or urban
family, etc. in household-based studies.
Binary/Dichotomous variable: It includes variables that can fall only in any one of the two
possible categories. For example, the sex of newborn child as male or female, exam result as
pass or fail, dietary type as vegetarian or non-vegetarian, place of stay as rural or urban, etc.
Nominal variable: A nominal variable is a type of variable that is used to name, label, or cat-
egorize an attribute as a part of a research study. It includes qualitative variables that can fall
Research Variables and Research Data 121
into categories without any order. For example, patients as male or female in health research,
literacy level of the person as literate or illiterate, etc.
Ordinal variable: An ordinal variable is a categorical variable with ordered values. It is a vari-
able that is in between categorical and quantitative variables. It includes variables that can
fall into three or more categories in some order. For example, level of satisfaction, level of
pain, etc. expressed in Likert and other scales.
Confounding variable: A confounding variable is an unmeasured third variable in a study
aimed at cause-and-effect relationships. It includes those variables that are not included in the
study but can disguise the effect of another variable. Such variables can distort the results by
making them biased. For example, in a study on the consumption of a commodity with the
income of the consumer, the climatic factors act as a confounding variable.
Composite variable: The composite variables are generally categorical in nature. When two or
more variables are combined to form a new variable, it becomes a composite variable. For
example, body mass index (BMI) is a composite variable of height and weight of a person as
BMI = weight in kg ÷ height in meters.
12.2 Research Data
Data is the quantitative or qualitative identity of various physical, chemical and biological enti-
ties which are either measured or observed. Scales of measurements provide nature and magni-
tude to data with respect to any study unit or respondent of the study. The four basic scales of
measurement are nominal scale, ordinal scale, interval scale and ratio scale. The nature, type
and size of data collected for any research largely depend upon the aims and objectives included
in the plan of the research study. The numerical data plays a crucial role in applied research.
Research data can be considered as the values of quantitative or qualitative variables included
in the study. Data without processing is called raw data and after processing it becomes useful
information. The following are broad categories of data:
Primary data: Data directly collected the first time from a source or unit of study are primary
data. Most of the research data generated through experimental, survey and observation-based
research are primary data. The applied research taken by research scholars is mostly primary
data-based.
Secondary data: The data collected for certain specified purposes or as mandatory require-
ments by official and non-official agencies and documented is called secondary data. Most of
the official data are available on a time series basis across units and are rich sources of data
for secondary data-based research to assess growth, trends, business cycles, etc.
Cross-Sectional data: Data collected for many subjects/units of study from a specified space
at one point in time is cross-sectional data. Most of the data generated using experimental,
survey and observation-based methods fall under cross-sectional data. A large number of
quantitative techniques are available for analysis of cross-sectional data which depends on
the aim and objectives of the study.
Categorical data: Qualitative data which cannot be measured numerically is called categorical data.
Such data are also called attributes. For example, race, sex, hair colour, educational level, etc.
Univariate data: Data/observation collected on a specific characteristic. It can be numeric or
categorical. For example, farmer-wise production of a crop.
Time Series data: The data related to a variable according to time periods is called time series
data. For example, the decadal population of India from 1951 to 2011 or the daily tempera-
ture of a patient.
122 Research Methodology and Quantitative Techniques
Spatial data: It includes geospatial data or geographic data. It is used in geographic information
systems (GIS).
Ordered Data: It includes -data in an ordered type like low, medium, high.
The broad categories of research data are quantitative data and qualitative data. The qualita-
tive data are non-metric, categorical or nominal in nature. The broad categories of quantitative
data are:
(i) Discrete data (countable) and continuous data (measurable).
(ii) Primary data (collected for the first time by the researcher) and secondary data (collected
by some agencies and used by the researcher).
(iii) Time series data (chronological data of a variable) and cross-sectional data (data for many
subjects at one point in time).
Most of the research data are either measured or observed or collected as responses from the
subject of study. The following methods of data collection for research have been discussed in
previous chapters:
(i) Quantitative Methods
• Experimental/interventional method (conduct experiments based on specific design befit-
ting the problem).
• Non-interventional observation method (data collected by the researcher or his/her rep-
resentatives from subjects of study like OPD and laboratory test data without having any
intervention).
• Interview schedule (to be filled by the researcher/investigator by interacting with the
respondent) or questionnaire (to be filled by the respondent).
• Online method of data collection (Google Form).
• Independent variables are specific and specified items of measurement in research which
other variables in the study cannot alter but can change/influence the values of other vari-
ables. Independent variables are also called explanatory variables.
• Dependent variable depends on values of other variables, that is, independent variables.
• Outcome variables are the focal variable of research which depends on the values of inde-
pendent variables of the study.
• Endogenous variables are variables causing changes by staying within the system.
• Exogenous variables are variables not included in the system but influencing the outcome
variable from outside.
• Control variables are those kept constant or fixed during the study.
• Extraneous variables are those factors that the researcher has failed to consider while plan-
ning an experiment.
• Quantitative/metric variables are those which can get values as numbers of any type.
• Discrete quantitative variables include those quantitative variables which take values as
counts.
• Continuous quantitative variables include those variables taking values as measurement.
• Qualitative/non-metric/categorical variables include non-numerical values or data on a nom-
inal scale.
• Nominal variables are used to name, label or categorize an attribute as a part of a research
study.
• An ordinal variable is a categorical variable with ordered values.
• A confounding variable is an unmeasured third variable in a study aimed at cause-and-effect
relationships.
• Data directly collected for the first time from a source are primary data.
• The data collected for certain specified purposes or as mandatory requirements by
official and non-official agencies and documented for use by researchers are called second-
ary data.
• Data collected for many subjects from a specified space at one point of time is cross-
sectional data.
• The broad categories of research data include discrete data (countable) and continuous data
(measurable); primary data (collected for the first time by the researcher) and secondary data
(collected by agencies and used by the researcher) and time series data (chronological data)
and cross-sectional data (data for many subjects at one point of time).
• The methods of quantitative research data collection include the experimental method,
observation method (in experiment-based research designs the required data is collected by
the researcher or his/her representatives), interview schedule (to be filled by the researcher/
investigator by interacting with respondent), questionnaire (to be filled by the respondent
against each question set) and online method of data collection.
• The methods of qualitative research data collection include one-to-one interviews, focus
groups, record keeping and process of observation.
• The data collection format for any research study must take into account the data requirement
to address the stated aims and objectives of the study. Besides, the researcher must have a
plan for data analysis while preparing the research synopsis or plan of research.
124 Research Methodology and Quantitative Techniques
Suggested Readings
Bartholomew, D., M. Knott, and I. Moustaki, Latent variable models and factor analysis: a unified ap-
proach, John Wiley & Sons, New York, 2011, https://doi.org/10.1002/9781119970583
Heiser, W. J., Unfolding analysis of proximity data, Department of Data Theory, University of Leiden,
Leiden, 1981.
Hoaglin, D. C., F. Mosteller, and J. W. Tukey, Exploring data tables, trends, and shapes, John Wiley &
Sons, New York, 1985.
Kass, G., ‘An exploratory technique for investigating large quantities of categorical data’, Applied Statis-
tics 29(2): 119–127, 1980.
Kinderman, A. J., and J. G. Ramage, ‘Computer generation of normal random variables (Correction: 85:
212)’, Journal of the American Statistical Association 71: 893–896, 1976.
13 Basic Statistical Methods for Research
13.1 Introduction
Applied statistics plays a big role at all stages of research and its role is all the more important
at the stage of data analysis. Applied research, in different areas, invariably uses basic statistical
methods to describe data related to the situation/problem under study or to relate data for differ-
ent variables of study. Some of these concepts are briefly discussed in the forthcoming sessions:
• Measures of central tendency (arithmetic mean, geometric mean, harmonic mean, median,
mode, quartiles, deciles, percentiles).
• Measures of dispersion (range, mean deviation, variance, standard deviation, interquartile
range, coefficient of variation (CV)).
• Skewness.
• Kurtosis.
• Probability and basic probability distributions.
These descriptive statistical measures along with frequency tables are widely used in descriptive
research methods to explain the research problem with facts and figures.
Median
• The median is the middle observation for a dataset sorted in ascending or descending order.
The median divides the data into a higher half and a lower half and hence is a positional
average.
DOI: 10.4324/9781003527183-13
126 Research Methodology and Quantitative Techniques
• To calculate median, arrange the data in ascending or descending order. Then, in the case of
odd number of data, the middle value is the median and for even number of data, average of
the two middle values is the median.
• When there are extreme values in the dataset, median is the best alternative measure of cen-
tral tendency.
• In frequency distributions, the class having a median value is called the median class.
Mode
• Mode is the most repeated value in a dataset.
• The probability density function will have maximum probability at mode.
• There can be more than one mode in a set of data.
• For a symmetric unimodel distribution like normal distribution, the mean, median and mode
will be the same.
• In a business dataset, mode is important for many decision-making purposes.
• In frequency distributions the class having the highest frequency is called the model class.
• Mode has a wide-range of practical applications in business, economics, education, health
sciences, etc.
• It is used to find averages of different rates of the same event at different times or places.
• It minimizes the effect of extreme values in a set of data while drawing the averages.
Quartiles
• Quartiles are also positional averages when data are arranged in ascending or descending
order. There are three quartiles—Q1, Q2 and Q3.
• Q1 includes the first 25%, Q2 includes the first 50% and Q3 covers the first 75% of the total
arranged data.
• Q2 and median is the same.
• Useful when merit score-based decisions to cover the first 25%, 50% or 75% are to be
considered.
Basic Statistical Methods for Research 127
Deciles
• Deciles are also positional averages when data are arranged in ascending or descending
order. There are nine deciles—D1, D2. . . , D9.
• D1 includes the first 10%, D2 includes the first 20% . . . and D9 includes the first 90% of data.
• D5, Q2 and median are the same.
• Useful when merit score-based decisions are to be taken covering the first 10%, 20%, etc.
Percentiles
• Percentiles are also positional averages when data are arranged in ascending or descending
order. There are 99 percentiles—P1, P2 . . ., P99.
• P1 includes the first 1%, P2 the first 2%, . . . , P99 the first 99% of data.
• Useful when merit score-based decision is to be taken covering the first 1%, 2%, etc.
It may be noted that P50, D5, Q2 and median is the same and can be seen in Figure 13.1.
Minimum Q1 Q2 Q3 Maximum
Minimum D1 D3 D5 D7 D9 Maximum
Minimum P5 P 50 P 99 Maximum
Figure 13.1 Depiction of Situation Where Median, Q2, D5 and P50 Coincide
13.3 Measures of Dispersion
Measures of dispersion are also known as measures of variability. Without the knowledge of
variability in the dataset and using the measure of central tendency alone the spread of the data-
set cannot be meaningfully determined. In fact, with the knowledge of the average along with
knowledge of standard deviation, one can assess the nature of the dataset in terms of the central
value and the spread of the data around the average. There are four basic measures of dispersion:
• Range
• Quartile Deviation
• Mean Deviation
• Standard Deviation and Variance
Range:
• Range is the difference between the highest value and lowest value in the dataset.
• It does not give the idea of the spread of data around a central value.
• Range of the dataset is used to get a proximate value of standard deviation where it appears
in the formula to calculate sample size etc., in normally distributed data, range » 6SD.
128 Research Methodology and Quantitative Techniques
Quartile Deviation
• It helps to assess the spread of data about a measure of its central tendency. It gives the idea
of a spread of 50% of data around the median.
• It is half of the difference between Q3 and Q1 or QD = (Q3 − Q1)/2.
• Q3−Q1 is called inter-quartile range.
Mean Deviation
• It is the mean of absolute deviations of entire data with respect to the arithmetic mean.
• (Mean of difference of each data from AM, ignoring the sign).
n
• It does not ignore extreme values which may have importance in some cases and hence more
useful in economics, commerce, business, etc.
Skewness
• It is the measure of the asymmetry of the probability distribution in relation to a symmetric
normal distribution. For symmetric distribution, the left half of the histogram or frequency
polygon will be a mirror image of its right half. For a skewed distribution the mean, median
and mode will not be equal or the same.
• While mode is at the peak of the distribution, for skewed distribution the mean and mode can
be on either side. For a positively skewed distribution, the mean and median will be on the
right side of the mode, and for a negatively skewed distribution, those are on the left side of
the mode.
Basic Statistical Methods for Research 129
• The skewness measures the departure from normality of the distribution of the dataset and is
n
∑ (Xi − x)
3
n
i=1
measured as skewness = 3
.
n 2 2
∑ i=1
( X i − x )
• The measurement of skewness can be positive, negative and zero. For positively skewed
distribution, the longer tail is on the right, and for negatively skewed distribution, the longer
tail is on the left side.
Kurtosis
• It measures the peakness or flatness of the distribution in relation to the bell-shaped normal
curve.
• Kurtosis is a measure of flatness or peakness of the distribution in relation to its overall
shape.
• The kurtosis of a standard normal distribution is three; it is called mesokurtic.
• Distribution with kurtosis, < 3 is said to be platykurtic and distribution with kurtosis >3 is
said to be leptokurtic. Distribution with more kurtosis means more data at tails compared to
normal distribution.
n
∑ (X − X )
4
i
i =1
• Kurtosis = 2
− 3.
n
∑ ( X − X )
2
i =1
i
• Arrange the data from the lowest to the highest on a horizontal axis.
• Draw a box above the horizontal axis in such a way that the left end of the box coincides with
Q1 (first quartile) and the right end with Q3 (third quartile).
• The box is divided into two parts by a vertical line marking Q2 (second quartile/median).
• Draw a horizontal line from the left end of the box to the point at the smallest value of the
dataset and another line from the right end of the box to the point at the largest value of the
dataset. These lines are called whisker lines. Figure 13.2 shows what the box and whisker
plot looks like.
Lowest Highest
Data Data
13.4 Concept of Probability
Probability is associated with events that are the outcome of random experiments. Random
experiment is an experiment that can be repeated under homogenous conditions, the results
of which are one of the various possible outcomes/combinations of outcomes, but not exactly
predictable. A random event is a set of favourable outcomes out of all possible outcomes. Every
event has a probability. Example: the outcome of tossing a coin, sex of a newborn child, out-
come of the throw of one dice/two dice, etc. There are three approaches to defining probability.
• Classical approach
• Empirical approach
• Axiomatic approach
13.4.1 Classical Probability
Let A random experiment have ‘N’ possible outcomes/cases out of which ‘m’ are favourable to
the event ‘A’. Then the probability of event A is denoted as P(A) and
13.4.2 Empirical Probability
Empirical probability is the limiting value of classical probability, as the number of trials
becomes large:
P (A ) ⇒ limit of m / N as N → infinity.
Example: If four out of ten tosses of a coin are heads, the probability of getting heads = 0.4.
If 51 out of 100 tosses of a coin are heads, the probability of getting heads = 0.51.
If 499 out of 1000 tosses of a coin are heads, the probability of getting heads = 0.499.
If 5001 out of 10000 tosses of a coin are heads, the probability of getting heads = 0.5001.
Here, as the number of tosses increases, the probability of getting heads approaches 0.5.
Basic Statistical Methods for Research 131
13.4.3 Axiomatic Probability
It was introduced by A.N. Kolmogorov (1934). The probability of an event A in a sample space
S of a random experiment satisfies the following axioms:
P(A) ³ 0
P(S) = 1
13.6 Binomial Distribution
Let a random experiment is repeatedly performed a fixed number of times, say n times or n
trials. The experiment has only two disjoint outcomes, say success and failure. All trials are
independent; the preceding or succeeding outcome of the trial has nothing to do with the current
outcome. Let the probability of success = p and that of failure is q =1 − p, then the probability
of x success in n trials is given by p( x ) =n C x p×q n-x where, x = 0, 1, 2, . . . , n. The mean of
binomial distribution = np and variance of binomial distribution = npq and variance < mean.
While dealing with proportions having two disjoint cases like success or failure, pass or fail,
male or female, etc. binomial distribution is the most appropriate one to calculate probabilities.
13.7 Poisson Distribution
It was derived in 1837 by Simeon D. Poisson. It is a limiting case of binomial distribution when
n is very large (n → ∞), p probability of success is very small ( p → 0) and np = m is a definite
number. Under these conditions, Binomial distribution tends to Poisson distribution. The prob-
ability function of Poisson distribution is given by:
e−m m x
P(X = x ) = x = 0,1, 2,… and x ! = x.( x −1) ⋅ ( x − 2) ⋅… : 2.1
x!
132 Research Methodology and Quantitative Techniques
Mean = variance = m for the Poisson distribution.
m
Here np = m, hence, p = and q = 1- m .
n n
Example: Number of printing mistakes in a book, number of twins delivered in a hospital, etc.
13.8 Normal Distribution
• It is a continuous probability distribution.
• Derived by Karl Friedrich Gauss in 1733. The pdf of ND is given by:
x −µ 2
−1/ 2
1 σ
Probability density function (pdf) p ( x ) = e −∞ ≤ x ≤ ∞ .
( )
2π .σ
• π = 22/7 = 3.14 e = 2.718.
• Parameters of a normal distribution are μ = mean and σ = SD.
• ND has a bell-shaped curve and probability range with respect to mean ± SD (Figure 13.3).
• Measure of central tendency is a central representative value for the whole set of data that is
either a calculated value or a positional average. Mean, median and mode are the most widely
used measures of central tendency.
Sum of all values ΣX i
• AM of x1, x2. . . , xn is denoted as x = = .
n n
134 Research Methodology and Quantitative Techniques
• The median is the middle observation for a dataset sorted in ascending or descending order.
The median divides the data into a higher half and a lower half and hence is a positional
average.
• Mode is the most repeated value in a set of data.
• GM of x1, x2. . . , xn is GM = n x1.x 2 ¼.x n .
• HM of n numbers is the reciprocal of the arithmetic mean of the reciprocals of all the numbers.
1
HM =
1 1
Σ
n x i
• Measures of dispersion are also known as measures of variability. It is the measure of the
spread of the data around an average.
• Range is the difference between the highest value and lowest value in the dataset.
• Quartile deviation is half of the difference between Q3 and Q1 or QD = (Q3 − Q1)/2.
• Mean deviation is the mean of absolute deviations of the entire data with respect to arithme-
tic mean.
• Standard deviation gives a measure of the average of the deviations of the entire data in rela-
tion to the arithmetic mean. A low SD indicates that the values are close to the mean and a
large SD indicates that the values are spread around the mean.
• SD of a population is denoted as σ and that of a sample as S or s.
∑ ( X − X)
2
i
• The variance is calculated as Variance = , then Variance = SD .
n
• Quartiles are also positional averages when data is arranged in ascending or descending
order. There are three quartiles—Q1, Q2 and Q3. Q1 includes the first 25%, Q2 includes the
first 50% and Q3 covers 75% of the arranged data. Q2 and median are the same.
• Deciles are also positional averages when data is arranged in ascending or descending order.
There are nine deciles—D1, D2. . . , D9. D1 includes the first 10%, D2 includes the first 20%, . . .
and D9 includes the first 90% data.
• Percentiles are positional averages when data is arranged in ascending or descending order.
There are 99 percentiles—P1, P2. . . , P99.
• P1 includes the first 1%, P2 the first 2%. . . , P99 the first 99% of data. P50, D5, Q2 and median
are the same.
• Skewness measures the departure from normality of the distribution of the dataset.
• Kurtosis is a measure of the shape of the tail of the distribution in relation to its overall shape.
• An outlier in a dataset are very large or very small values in a set of data which can be ascer-
tained using whisker plots based on first and third quartiles.
• Probability is associated with events which are the outcome of random experiments.
• A random event is a set of favourable outcomes out of all possible outcomes. Every event has
probability.
• When A random experiment has ‘N’ possible outcomes/cases out of which ‘m’ are
favourable to the event ‘A’. Then the probability of event A is denoted as P(A) and
Favorable No. of cases of A m
P (A) = = .
Total number of possible cases N
• Binomial distribution: If probability of success = p and that of failure is q = 1 − p, then
the probability of x success in n trials is given by p(x) = nCxpxqn-x where x = 0, 1, 2, . . . , n.
Basic Statistical Methods for Research 135
The mean of binomial distribution = np and variance of binomial distribution = npq and
variance < mean.
• Poisson distribution: The probability function of Poisson distribution is given by
e−m m x
P (X = x ) = ; x = 0,1, 2,…; mean = variance = m for the Poisson distribution.
x! 2
1 X−µ
−
1 2 σ
• The probability density function of normal distribution is given by f ( x ) = e .
2πσ
• The graph of the curve is bell-shaped; the curve is symmetrical at x = μ; for ND,
mean = median = mode = μ; area of the curve before mean = area of the curve after mean.
• For ND, coefficient of skewness (asymmetry) = 0; coefficient of kurtosis (flatness) = 0; theo-
retical range of X values is -∞ to +∞, practically range = 6σ, maximum probability is at X = μ
1
and is equal to P = .σ and ND is unimodel.
2π
• Pr(µ − σ ≤ X ≤ µ + σ ) = 0.6827; Pr(µ − 2σ ≤ X ≤ µ + 2σ) = 0.9544; Pr(µ −3σ ≤ X ≤ µ + 3σ) = 0.9973
σ ≤ X ≤ µ + 3σ) = 0.9973 .
X −µ
• Put Z = , then the distribution of Z is standard normal distribution (SND) with N(0,1).
σ
• Pr(−1 ≤ Z ≤ +1) = 0.6826 ; Pr(−2 ≤ Z ≤ +2) = 0.9544 and Pr(−3 ≤ Z ≤ +3) = 0.9973 .
• The standardized Z values are used to solve problems of normal distribution with given mean
and SD.
Suggested Readings
Agresti, A., Categorical data analysis, 2nd ed., John Wiley & Sons, New York, 2002.
Bourke, G. J., J. McGilvary, and L. E. Daly, Interpretation and uses of medical statistics, Blackwell Sci-
entific, London, 1985.
Daniel, W. W., Biostatistics: basic concepts and methodology for the health sciences, Wiley Publications,
John Wiley & Sons. Reprint by Wiley India (P) Ltd, New Delhi, 2010.
Das, R., and P. N. Das, Instant medical biostatistics, Ane Books, New Delhi, 2009.
Dunn, O. J., Basic statistics, Wiley, New York, 1984.
Fisher, R. A., Statistical methods for research workers, 14th ed., Hafner Publishing Company,
New York, 1973.
Gupta, S. C., Fundamentals of statistics, Himalaya Publishing House, Mumbai, 1981.
Gut, A., An intermediate course in probability, Springer-Verlag, New York, 1995.
Haberman, S., The analysis of frequency data, University of Chicago Press, Chicago, IL, 1974.
Hays, W. L., Statistics for the social sciences, 3rd ed., Holt, Rinehart, and Winston, New York, 1981.
Larsen, R. J., and M. L. Marx, An introduction to mathematical statistics and its applications, 2nd ed.,
Prentice-Hall, Englewood Cliffs, NJ, 1986.
Mahajan, B. K., and A. B. Khankal, Methods in biostatistics for medical students and research workers,
7th ed., Jaypee, New Delhi, 2010.
Muzumdar, R. D., A. P. Kulkarni, and J. P. Baride, Manual biostatistics, 1st ed., Jaypee Brothers, New
Delhi, 2003.
Rees, D. G., Essentials of statistics, Chapman & Hall, London, 1989.
Rice John, A., Mathematical statistics and data analysis, 2nd ed., Duxbury, Belmont, CA, 1995.
Sharma, A. K., The textbooks of elementary statistics, Discovery Publishing House, Delhi, 2005.
Snedecor, G. W., and W. G. Cochran, Statistical methods, 7th ed., Iowa University Press, Ames, IA, 1980.
Woolson, R. F., Statistical methods for the analysis of biomedical data, Wiley, New York, 1987.
14 Correlation and Regression Analysis
14.1 Introduction
When we consider two or more variables at a time for their association or relationship, we
use techniques like correlation and regression analysis. In many of the research methods and
designs, we collect data on the stated dependent/outcome variable (Y) as well as on a number
of independent/explanatory variables (Xs). Often researchers may be interested in examining
the relationship of Y with X and also the extent of association among X explanatory variables.
14.2 Correlation Analysis
The correlation is a measure of the strength of association as well as the direction of association
between variables. Numerically, the value of the linear correlation coefficient also known as
Karl Pearson correlation coefficient lies between −1 and +1. The value of correlation coefficient
−1 to zero implies a negative correlation or association and that between 0 to +1 implies a posi-
tive correlation. As the correlation is closer to one from either side indicates a strong association
and closer to zero from either direction means a weak association. The correlation between
dependent and independent variable and also between theoretically important independent vari-
ables can be meaningful quantitative technique to confirm the emerging results of the research.
Correlation can be positive or negative as well as linear or non-linear. The magnitude of linear
correlation coefficient and also the direction of correlation between variables can throw light on
the nature of association between variables.
In most of the study areas, there are many variables having an association of one with the
other. Correlation analysis of two or more variables helps to assess the strength as well as direc-
tion of association between the variables.
• Correlation can be linear or nonlinear (Y, Vs, X associated in linear or non-linear form).
• Correlation can be simple (ryx) or multiple (rY.X1X2 . . . Xk) or partial ry.X1/X2 . . . Xk).
Generally, the population correlation coefficient is denoted as ‘ƿ’ and the sample correlation
coefficient is denoted as ‘r’.
DOI: 10.4324/9781003527183-14
Correlation and Regression Analysis 137
(i) Graphical Method of Correlation analysis: Scattered diagram (Figure 14.1)
Positive Negative
Perfect
Y
Y
X X
High
Y
X X
Low
Y
X X
No correlation
Y
Cov.( x, y ) ∑ xi yi
r= =
SDx ×SDy ∑ xi2 ∑ yi2
Where SDx is the standard deviation of X and SDy is the standard deviation of Y
xi = Xi = X and yi = Yi − Y.
r = 1−
6 ∑d 2
(
n n 2 −1 )
Where, d = difference between ranks of attributes for each unit of study and n = number
of pairs of observations.
(iv) Rank Correlation (Kendall’s Tau Correlation Coefficient)
It is a nonparametric approach to measuring correlation coefficient using rank date. It var-
ies from zero to one. Zero means no relation and one means perfectly related. For exam-
ple, two teachers assigned ranks to ten students based on their knowledge. Each student
having two ranks assigned by each teacher.
Arrange the students in the ascending order of rank assigned by the teacher one.
Mention the rank given by the second teacher against the name of each student.
Calculate the difference between the ranks for each student.
Count zero-difference students (A) and non-zero-difference students (B).
Kendall’s Tau r = (A − B) / (A + B).
The data must be in an equal number of pairs in order to calculate ‘r’.
14.2.2 Properties of ‘r’
14.2.3 Partial Correlation
The partial correlation coefficient is the correlation between two variables keeping the effect of
other variables constant. Let there be three variables X1, X2 and X3. There can be three partial
correlations:
Based on a simple correlation r12 , r13 and r23 the partial correlation coefficient can be calculated:
14.2.4 Multiple Correlation
The correlation of variable Y with joint effect of X1, X2, . . ., Xk is called multiple correlation
and is denoted as rY.X1X2 . . . Xk. It is the square root of the coefficient of determination in multiple
regression analysis (R2).
14.2.5 Non-linear Correlation
The correlation of Y with X can be non-linear also and then it is called curve linear correlation,
depending upon the nature of the curve formed by the plotted data on the X-Y plane.
140 Research Methodology and Quantitative Techniques
14.3 Regression Analysis
The statistical method of establishing functional relationships between variables is called
regression analysis.
Y = f ( x ) is a simple regression model with one independent variable X and a dependent vari-
able Y.
Y = a + bX is a simple linear regression where ‘a’ is the intercept of the line and ‘b’ is the slope
of the line; a and b are the two basic constants of the line. The corresponding regression
model is Y = a + bX + e where ‘e’ is the error term. The values of a and b are estimated
using a given set of data of X and Y using ordinary least squares (OLS) or other methods of
estimation. Various regression models are shown in Table 14.1.
Regression Model
Linear Y = a + bX
Exponential Y = ab x
Logarithmic Y = a + b log X
Quadratic Y = aX 2 + bX + c
The regression analysis means estimating functional relationships between the dependent
variable and independent variables. In regression analysis the relationship of type Y = f(X) is
established and used for forecasting and prediction purposes. The regression equation can be
linear or non-linear, simple or multiple type.
The statistical method of establishing functional relationships between variables is called
regression analysis.
Table 14.2 shows various examples of dependent and independent variables in simple and
multiple regressions. The first three are examples of simple regression and the last two are
examples of multiple regression.
Table 14.2 Examples of Dependent and Independent Variables in Simple and Multiple Regressions
When the value of Y increases or decreases at a constant rate with a change in the value of X, the
regression of Y on X will be linear. When the value of Y increases at a higher rate with a change
in the value of X, the regression of Y on X will be exponential. When the value of Y increases
at a lower rate with a change in the value of X, the regression of Y on X will be log-linear.
However, when the value of Y initially increases with the increase in the value of X, and after
reaching a maximum the value of Y starts declining with increasing the value of X, the regres-
sion of Y on X will be quadratic, with the coefficient of X2 negative. Similarly, when the value of
Y initially decreases with the increase in the value of X, and after reaching a minimum the value
of Y starts increasing with increasing the value of X, the regression of Y on X will be quadratic,
with the coefficient of X2 positive. These situations are shown in Figure 14.2.
Linear Exponential
Y=a+bX Y = a bX
Y
Y
X X
Y
Logarithmic function
Quadratic
Y = aX² + bX + c
When a is +ve When a is -ve
Y Y
X X
b=0
+a
+b +a -b
Y
Y
+a
x x x
b= Infinite
+b -a -b
Y
Y
-a a=0
x x x
Forms of Line
Simple y = f ( x ) y = a + bx linear.
Multiple y = f ( x1 , x 2 ,… x k ) y = a + b1x1 + b 2 x 2 +…+ b k x k linear.
Let Y = A + BX , be the simple linear regression existing in the population as a line representing
the scatter of (X, Y) points on the plain. In other words, corresponding to each point Xi on the
X-axis, there is a Yi, on the (XY) plane and also there is a point Yi � on the regression line. The
difference between Yi and Yi is called the error and is denoted as ‘ui’ for the population and ei for
the corresponding sample as shown in Figure 14.4
Correlation and Regression Analysis 143
Hence the actual relationship in population is Yi = A + BXi + u i and that in the sample
is yi = a + bx i + ei , where ‘a’ and ‘b’ are estimates of the population regression coefficients
A and B.
There are different methods for estimating the regression coefficients. The most widely used
method is the OLS method. Assuming that ei = Yi − Yi , the OLS method is where A and B are
estimated as ‘a’ and ‘b’ by minimizing Ʃ ei .
2
The OLS estimates of A and B have the desired properties of BLU (best, linear and unbiased)
and hence are preferred over other methods of estimation.
14.3.5 Violation of Assumptions
r(e ,e ) =
∑ t =1
et et −1
.
t t−1 n n
∑ t =1
et × ∑ t =1
et −1
Consequences of autocorrelation:
• Even when error terms are autocorrelated the estimated parameters of the model can be
unbiased.
• With autocorrelated error terms, the OLS estimate of the variance of the parameter is likely
to be more.
• The variance of the error term may be under-estimated.
• Prediction based on such estimates may be inefficient.
Consequence of multi-collinearity:
0
• When ( r( X i , X j ) = 1 then the parameters of the model become indeterminate ( ).
0
• On the contrary if r X , X = 0, then there is no need to perform multiple regression analysis.
( i j )
Detection:
Klein criteria: If r( X i , X j ) ≥ RY . X1 , X 2 ,…, X k then collinearity is harmful for the coefficient for Xi
2 2
and Xj.
The linear equation is of the form, y = a + bx , a sale is a function of advertisement cost. The
corresponding model is Y = a + bx + e , where e is the error term. The error is the difference
between observed value of Y and estimated value of Y denoted as Ŷ . Corresponding to each
value of Y there will be a Ŷ .
146 Research Methodology and Quantitative Techniques
That is, Yˆ = aˆ + bˆ Xi where â and b̂ are the estimated values of a and b respectively. We can
define ei as:
ei = Yi − Ŷi.
The values for ‘a’ and ‘b’ are estimated using the principle of ordinary least squares using the
above relationship.
aˆ
2
ei = Yi − Ŷi or ei = (Yi − Yi ) or ei 2 = [ Yi − (aˆ + bˆXi)]2 .
2
By applying mathematical conditions for minimum value of ei2, two equations in a and b are
derived and by solving these equations the values of a and b are worked out as:
b=
∑ xy where
∑x 2
a = Y −bX
x= X −X
y = Y −Y .
14.3.6.2 Validity of Model
R2 =
Explained variation in Y
=
∑ xy , which is the explanatory power of the model to
Totalvariationin Y ∑ y2
explain variation in Y due to that of X and 0 £ R 2 £ 1.
For Y = a + b1X1 + b 2 X 2 +…+ bk k X k the coefficients are to be estimated for forecasting Y for
given X values. Here it is assumed that:
Y = a + b1x1 + b 2 x 2 +…+ b k x k
—multiple linear equation.
Y = ax1b1x 2 b 2 … x bk
k —multiple non-linear equation.
Y = a + b1x1 + b 2 x 2 +…+ b k x k + ei
—multiple linear regression model.
Y = a1b1x 2 b 2 … x k bk ei —multiple non-linear regression model.
a, b1, b2, . . . bk are estimated by minimizing ∑ei2 w.r.t a, b1, b2, . . . bk.
Yi = a + b1X1i + b 2 X 2i + ei
ei = Yi − a − b1X1i = b 2 X 2i
b1 =
( x 22 x1 y − x1x 2 x 2 y)
x12 x 22 − ( x1x 2 )
2
b2 =
(x12 x 2 y − x1x 2 x1y)
x12 x 22 − ( x1x 2 )
2
a = Y − b1 X1 − b2 X 2
b1 ( x1 y ) + b2 ( x1 y )
R2 =
y2
Test of Significance
The estimated coefficients a, b1 and b2 are to be tested for their significance through a t-test.
aˆ
ta =
SE of aˆ
148 Research Methodology and Quantitative Techniques
b1
tb1 =
� b1
SE � of
b2
tb2 =
� b2
SE � of
14.3.8 ANOVA in Regression
ANOVA is applied in regression for testing the overall significance of regression (R2). It is also
useful to assess the improvement in fit due to additional explanatory variables.
Equality of regression coefficients from different samples can also assessed from ANOVA.
Format of ANOVA is given in Table 14.3.
Source of variation SS DF MS F
14.3.9 Logistic Regression
Regression analysis is a complex and powerful statistical technique for the prediction of a
dependent variable for any independent/explanatory variable based on an estimated relationship
using known values of dependent and independent/explanatory variables. In normal regression
analysis Y = F(X), the dependent variable is quantitative type (discrete or continuous), and the
independent variables can be discrete, continuous, ordinal or dichotomous/nominal. Regres-
sion analysis enables us to predict the dependent variable and to know the rate of change of the
dependent variable due to a unit change in the independent/explanatory variable or the elasticity
coefficient of Y with respect to X. When we have to use qualitative/categorical variable as an
independent variable in regression analysis, we use the technique of dummy variable (x = 0 or 1).
Stepwise regression is one of the effective ways to select the most important independent vari-
ables out of a set of available independent variables. The forward or backward selection of
independent variables is also possible in such situations.
Logistic regression is a special type of regression used when the dependent variable (Y) is
non-metric (binary/dichotomous) in nature and independent variables are discrete, dichotomous
Correlation and Regression Analysis 149
or continuous in nature. Here, Y = 1 or 0 for the regression Y = F(X) . The outcome variable
can be cured or not cured, smoker or non-smoker, male or female, etc. Logistic regression is a
technique in which the dependent/outcome variable is dichotomous.
Let us consider the simple linear regression model of the following type:
Y = a + bX + e , (1)
Where a is the intercept, b is the slope and e is the error term. The parameters ‘a’ and ‘b’ are
estimated using an ordinary least squares technique.
We know that at mean of X and Y, e = 0.
That is,
When Y is dichotomous, it may take values of one or zero and the mean of Y will lie between
zero and one.
p
Let p = Pr(Y = 1) or the probability of success, the ratio of, can take values between
0 and ¥ and 1- p
p
Ln can take values between −¥ and +¥.
1− p
p
Hence Ln = a + bX . (3)
1− p
Equation (3) above is called the logistic regression model and the transformation of p to
p
Ln is called logit transformation. Equation (3) can also be written as
1− p
exp (a + bX )
p= (4)
1 + exp (a + bX )
where exp is the inverse or antilog of the natural logarithm.
The logistic regression model is widely used in health science research. Epidemiologists may
use a logistic regression model to assess the probability or risk that a person will acquire a dis-
ease during some specified time interval during which he or she is exposed to risk factors related
to that disease.
The simplest case for the application of logistic regression is when both dependent and inde-
pendent variables are dichotomous. The values of the dependent/outcome variable (Y) indicate
whether or not the person acquired a disease. The values of the independent variable (X) indi-
cate the presence or absence of risk factors in the person. Hence both dependent and independ-
ent variables take the values of one or zero based on yes or no situations.
The data in such a situation can be presented in a 2 × 2 contingency table as given in Table 14.4
The analysis of these data can be done through odds ratio. According to the theory of prob-
ability, the odds for success is the ratio of the probability of success to the probability of failure.
The odds ratio is a measure of how much greater or less the odds are for subjects possessing the
150 Research Methodology and Quantitative Techniques
Table 14.4 2 × 2 Contingency Table
risk factor to experience a particular outcome. The value of the odds ratio reflects the number of
times the exposure to the risk factor can cause the case/disease as compared to those not exposed
to the risk factor.
The computer package performing logistic regression generally provides the following:
Here the odds ratio = exp(b) which gives the number of times the exposure to the risk factor can
cause the case or disease as compared to those not exposed to the risk factor.
Here the dependent variable is dichotomous and the independent variable is continuous. Assum-
ing that the logistic regression is worked out using computer software, the computer output will
be similar to that shown previously.
The estimated equation will be yi = Ln pˆ and p̂ is the predicted
= aˆ + bˆ ⋅ xi , where yi
1 − pˆ
probability of having the risk for the value of continuous variable xi.
exp (a + b1 X1 + b2 X 2 +…+ bk X k )
P= .
1 + exp (a + b1 X1 + b2 X 2 +…+ bk X k )
The previous cases 1, 2 and 3 were for dichotomous dependent or outcome variables. There may
be cases where the response/outcome variable may have a situation with multiple categories.
For example, we may have outcomes as positive, negative and undetermined. In cases like the
Correlation and Regression Analysis 151
study of body mass index (BMI) the outcome may be underweight, ideal weight, overweight
or obese. Such a situation is called an ordinal polytomous outcome or response. It is possible
to have computer software-based logistic regression analysis for such cases also, though it is
slightly complex.
• Correlation analysis of two or more variables is to assess the strength as well as direction of
association between the variables.
• Methods of studying correlation include the scatter diagram method, Karl Pearson coefficient
of correlation and rank correlation.
• Scatter diagram is the graphical method of correlation analysis.
Cov ( X , Y )
• Karl Pearson coefficient of correlation is calculated as r = .
SDX × SDY
• The data of X and Y must be in an equal number of pairs. The value of r lies between −1
and +1.
• Correlation coefficient between two variables can be +ve or −ve. If x and y are independent,
then r = 0. If X and Y are perfectly correlated, then r = 1. The correlation can be perfect, high
or low depending upon the value of r. The correlation can be positive or negative depending
upon the sign of r.
• The partial correlation coefficient is the correlation between two variables keeping the effect
of other variables constant.
• The statistical method of establishing functional relationships between variables is called
regression analysis. ⋅Y = f ( x ) is a simple regression model with one independent variable X
and a dependent variable Y. Multiple Regression model is of the type, Y = F (X1 , X 2 ,… X k )
with k dependent variables.
• Regression models can be linear or non-linear type, and the regression models also take
forms accordingly.
• Error term in the regression model stands for the difference between the observed value and
the corresponding estimated value on the regression line/curve ei = Yi − Yi .
• The error term is assumed to follow a normal distribution with ‘0’ mean and constant vari-
ance. The error term at different values of explanatory variables is assumed to be independent.
• The explanatory variables (X1, X2, . . ., Xk) are assumed not to have high correlation for OLS
application.
• Ordinary least squares (OLS) method is that method wherein the parameters of the regression
model are estimated by minimizing Ʃei2.
• The OLS estimates have the desired properties of BLU (best, linear and unbiased) and hence
preferred over other methods of estimation.
• The violation of assumptions about error term and explanatory variables in the OLS method
leads to consequences like heteroscedasticity (violation of constant variance), autocorrela-
tion (violation of independent error term) and multi-collinearity (violation of not having a
high linear correlation between explanatory variables) which affect the test of significance of
estimated parameters.
• The validity of the model depends on the coefficient of determination (R2) where
Explained variation in Y
R2 = , which is the power of the model to explain variation in Y due
Total variation in Y
to that of X and 0 ≤ R2 ≤ 1.
152 Research Methodology and Quantitative Techniques
• Test of Significance of the estimated coefficients of the regression model is done using a
t-test.
• ANOVA is applied in regression for testing (i) overall significance of regression (R2), (ii)
improvement in fit due to additional explanatory variable, (iii) equality of regression coef-
ficients from different samples.
• Logistic regression is a special type of regression used when the dependent variable (Y) is
non-metric (binary/dichotomous) in nature and independent variables are discrete, dichoto-
mous or continuous in nature.
Suggested Readings
Cameron, A. C., and P. K. Trivedi, Regression analysis of count data, Cambridge University Press, Cam-
bridge, 1998.
Daniel, W. W., Biostatistics: basic concepts and methodology for the health sciences, Wiley Publications,
John Wiley & Sons, Reprint by Wiley India (P) Ltd., New Delhi, 2010.
Dielman, T. E., Applied regression analysis for business and economics, 2nd ed., Duxbury, Belmont,
CA, 1996.
Dobson, A. J., An introduction to generalized linear models, 2nd ed., Chapman & Hall/CRC, Boca Raton,
FL, 2002.
Draper, N. R., and H. Smith, Applied regression analysis, 2nd ed., John Wiley & Sons, New York, 1981.
Hocking, R. R., Methods and applications of linear models: regression and the analysis of variance,
Wiley, New York, 1996.
Hosmer, D. W., and S. Lemeshow, Applied logistic regression, 2nd ed., John Wiley & Sons, New York, 2000.
Johnstone, I. M., and P. F. Velleman, ‘The resistant line and related regression methods’, Journal of the
American Statistical Association 80: 1041–1054, 1985.
Koutsoyiannis, A., Theory of econometrics, Macmillan, London, 1976.
Mendenhall, W., and T. Sincich, A second course in statistics: regression analysis, 5th ed., Prentice Hall,
Upper Saddle River, NJ, 1996.
Nagelkerke, N. J. D., ‘A note on the general definition of the coefficient of determination’, Biometrika
78(3): 691–692, 1991.
Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman, Applied linear regression models, 3rd ed.,
Irwin, Chicago, IL, 1996.
15 Statistical Inference—Parametric and
Non-Parametric Tests
15.1 Statistical Inference
The process of making a decision about population parameters based on evidence given by
a sample is called statistical inference. It has two parts. First, is the estimation of population
parameters based on sample data and second is the testing of the significance of sample esti-
mates for population parameters. The common assumptions for estimation and testing include
assumptions about normality of population distributions, equality of variances and independ-
ence of samples. There are two types of testing—parametric and non-parametric. The defini-
tions of important terms in parametric tests are:
Population: Aggregate of all units for a study having the focused features of the study is called
a population.
Sample: A representative part of the population, normally selected randomly from the population.
Parameter: The statistical constants of the population is called parameters. For exam-
ple, population mean (μ), population variance (σ2) or standard deviation (σ), population
correlation-coefficient (ƿ), etc.
Statistic: The statistical constants of the sample like sample mean ( X ), variance (S2 or s2)
and correlation coefficient (r) are called statistics. These are the estimates for population
parameters.
Test of Significance/Testing of Hypothesis: The inductive inference of deciding about popula-
tion parameters based on sample statistics involves an element of risk—the risk in making
wrong decisions. Minimization of such risk using the theory of probability is possible. Mak-
ing decisions about population parameters using sample statistics by minimizing the risk is
called the test of significance/testing of hypothesis. The theory of the test of significance was
initiated by J. Neyman and E.S. Pearson.
Research Hypothesis: A research hypothesis is a statement of expectation or prediction of out-
come or results through the conduct of research. It is in the form of a research question and
its expected results. Normally, the research hypothesis resembles the alternative statistical
hypothesis.
Statistical Hypothesis: A statistical hypothesis is a statement about population parameters
(may or may not be true) which is tested based on evidence from a sample using the theory
of probability. Normally it relates to the magnitude or relationship of population parameters.
Null Hypothesis (Ho): Null hypothesis is a hypothesis which is tested under the assumption that
it is true. It is generally the hypothesis of no difference. It is drawn based on a neutral or null
attitude by the researcher or decision maker.
Alternative Hypothesis (H1): It is that hypothesis which is accepted if the null hypothesis is
rejected.
DOI: 10.4324/9781003527183-15
154 Research Methodology and Quantitative Techniques
Simple Hypothesis: If the hypothesis completely specifies the population, it is simple.
H0 : µ = µ 0
Composite Hypothesis: If the hypothesis does not completely specify the population, then it is
a composite hypothesis.
H1 : µ <µ 0 ; µ > µ 0 ; µ ≠ µ 0
Types of Errors: When we make inferences about population parameters based on evidence
given by the sample, two types of errors are possible (Table 15.1).
First, the sample evidence may lead to rejection of a null hypothesis, when it is actually true. It
is called Type-I error.
Second, the sample evidence may lead to acceptance of a null hypothesis, when it is actually
false. It is called Type-II error.
Reject H0 Accept H0
Test Criteria: In a real-world situation, a Type-II error is more serious than a Type-I error. Both
the errors cannot be reduced simultaneously. The test criteria fix the size of Type-I error at a low
value and devise test criteria which minimize Type-II error.
Level of Significance: The size of Type-I error, that is, probability rejecting H0/H0 is true = α.
It is the probability of rejecting a true null hypothesis. It is generally kept at 5% (p = .05) or 1%
(p = .01).
In real-world situations, Type-II error is more serious than Type-I error. Both the errors cannot
be reduced simultaneously.
One-sided and Two-sided Tests: Depending upon the nature of the alternative hypothesis, a
test can be one-sided or two-sided. If the alternative hypothesis is ‘not equal to’, then the alter-
native hypothesis can fall on either side of the curve and if it is ‘greater than or less than’ it falls
either on the right side or the left side of the curve as shown (Figure 15.1 to Figure 15.3).
Figure 15.2 Acceptance and Rejection Region for One-Sided Left Test
156 Research Methodology and Quantitative Techniques
Figure 15.3 Acceptance and Rejection Region for One-Sided Right Test
15.2 Sampling Distribution
Let the size of the population be N and that of the sample be n. Then there can be NCn = k sam-
ples. Let t1, t2, . . . tk be the values of the sample statistic ‘t’. The set of values of ‘t’ is called the
sampling distribution of statistic ‘t’. Like any other variable ‘t ‘can also have mean and variance.
Mean ( t ) = E( t ) = t
Var( t ) = 1 / k ⋅ Σ( t − t ) 2
The standard deviation of the sampling distribution of a statistic is called standard error (S.E.)
(
S.E.( t ) = 1 / k Σ( t − t ) 2
) .
Precision: The precision of a sample estimate for a population parameter is the reciprocal of
the S.E. of the estimate.
1
Precision of estimate t = .
S .E.(t )
Let X1, X2, . . . Xn be a random sample of size n from a normal population N (μ, σ), then the
sample mean X � is taken as an estimate of population mean μ, then,
t − E (t )
According to the central limit theorem, Z = will follow N (0, 1).
S .E.(t )
Statistical Inference—Parametric and Non-Parametric Tests 157
15.2.3 Sampling Distribution of Proportion
Let A and B be two qualitative attributes and the entire population units (N) fall in either of these
two attributes. Let there be ‘a’ units in A and ‘b’ units in B. That is, a + b = N or b = N − a .
The proportion ‘Pa’ possessing attribute A = a / N .
N −a
The proportion Pb possessing attribute B = b / N = .
N
Pa + Pb = 1 or Pb = 1− P
15.4.1 Point Estimation
15.4.2 Interval Estimation
To overcome the error or likely deviation of point estimate from population parameter, an inter-
val is formed with point estimate at the centre and an interval with a lower limit and upper limit
on both sides based on the standard error and tabled value of corresponding test statistic (t or Z)
so that there is a fixed probability for the population parameter to lie within that limit. Let c1 and
c2 are the limits of the interval then, {c1<μ<c2} is the confidence interval for μ.
If ‘t’ is the statistic used to estimate μ, then (1 − α) per cent confidence interval for μ is given
by [t ± SE(t) zα].
p − E ( p)
Hence Z = will follow N(0,1) ,
S .E ( p )
Statistical Inference—Parametric and Non-Parametric Tests 159
p−P
that is, Z = will follow N(0,1) .
PQ / n
Hence 95% confidence interval for P = p ±1.96 pq n
X 1 and X 2 are the sample means and s1 and s2 are sample standard deviations based on two
independent samples from two normal populations with means μ1 and μ2 and standard devia-
tions σ1 and σ2.
s12 s22
95% confidence interval for µ1 − µ 2 = X1 − X 2 ± 1.96 +
n1 n2
s12 s22
99% confidence interval for µ1 − µ 2 = X1 − X 2 ± 2.58 +
n1 n2
Let p1 and p2 are sample proportions based on two independent samples from two populations
with proportions P1 and P2.
pq p q
Hence 95% confidence interval for P1 − P2 = p1 − p 2 ± 1.96 1 1 + 2 2
n1 n2
p1q1 p2 q2
And 99% confidence interval for P1 − P2 = p1 − p 2 ± 2.58 ± +
n1 n2
15.6 Parametric Tests
Parametric tests can be a large sample test (sample size n > 30) or a small sample test (n < 30).
The various parametric tests used for testing of hypothesis are:
(i) Standard normal distribution test (SND test)
(ii) Student’s t-test
(iii) Pearson’s Chi-square test for goodness-of-fit
(iv) F-test
160 Research Methodology and Quantitative Techniques
15.6.1 Standard Normal Distribution Test (SND/Z Test)
X −µ
• When X → follows N (µ, σ) , then Z = follows N (0, 1).
σ
• It is used for a large sample (n > 30).
• Used to test significance of single mean, single proportion, difference of two proportions,
difference of two means, difference of two standard deviations.
t − E (t )
• The test statistic Z = ~ N(0,1) , where ‘t’ is the sample estimate.
SE (t )
(i) Sample is drawn from a normally distributed population with known variance (σ2)
(ii) Sample is drawn from a normally distributed population with unknown variance
(ii) Sampling from normally distributed populations with unknown but equal variances
( X1 − X 2 )−(¼1 − ¼2 )
H 0 : µ1 = µ 2 ; H1 : µ1 ¹ µ 2 . Here test statistic is Z = .
s 2p s 2p
+
n1 n2
(iii) Sampling from normally distributed populations with unknown and unequal
variances
• If X1, X2, . . . . Xn is a random sample of size n which follows N (μ, σ), then the student’s t is
X −µ
defined as t = follows a student’s distribution with n − 1 degrees of freedom.
S/ n
1
Σ (Xi − X ) .
2
• Where S2 =
n −1
• It is used for small sample (n < 30), population from which sample drawn is normal, sample
observations are random, population variance is known or suitably estimated using sample
variance.
• Used to test significance of single mean, difference of two means, difference of two standard
deviations.
• Also used to test significance of sample correlation coefficient, estimated sample regression
coefficients.
• Paired t-test is used when before and after means of the same sample are to be tested for
equality.
15.6.4 F-Test (ANOVA)
• If X follows χ2 with n1 degrees of freedom and Y follows χ2 with n2 degrees of freedom, then
X / n1
F statistic is defined as F = . The G.W. Snedecor’s F follows with (n1 and n2) DF.
Y / n2
• Used to test equality of two population variances.
• Test significance of an observed sample multiple correlation.
• Test significance of observed sample correlation ratio.
• Test linearity of regression.
• Test equality of three or more population means.
The selection of statistical tests depends on many factors like the nature and type of parental dis-
tribution (as normal or non-normal), the sample as random or purposive, the type of parameters
being tested as mean values or proportions, the study hypothesis and other assumptions being
made about the population under study by the researcher. The available statistical tests broadly
classified as parametric and non-parametric tests. The summary of commonly used parametric
tests are given in Table 15.2.
162 Research Methodology and Quantitative Techniques
Table 15.2 Common Parametric Test Statistic for Different Situations/Purposes
The parametric tests make assumptions about the parameters of the population distribution(s)
from which the sample data are drawn. The non-parametric tests do not make any such assump-
tions about parental population. However, relatively non-parametric statistical tests are less
powerful compared to parametric tests. For example, a parametric correlation uses information
about the mean and deviation from the mean while a non-parametric correlation will use only
the ordinal position of pairs of scores. When tests are not concerned with population parameters
or when we do not have knowledge about sampled populations (distribution-free), then we
apply non-parametric tests. Non-parametric and distribution-free tests are the same. When the
data is in the form of ranks or frequencies then also non-parametric tests can be used.
164 Research Methodology and Quantitative Techniques
When to use non-parametric tests
• The inferences drawn from the parametric tests such as Z, t and F may be seriously affected
when the parent population’s distribution is not normal.
• The adverse effect could be more when sample size is small.
• Thus, whenever there is doubt about the distribution of the parent population, the
non-parametric tests should be used.
• In many situations, particularly in social and behavioral sciences, observations are difficult
or impossible to take on numerical scales and a suitable non-parametric test based on ordinal
values is an alternative under such situations.
Certain assumptions are made with most non-parametric statistical tests, but these are fewer
and weaker than those of parametric tests.
The summary of commonly used non-parametric tests according to the number of samples,
population parameter tested, null and alternative hypotheses are given in Table 15.3.
While applying non-parametric tests, it must be kept in mind that these tests are less powerful
compared to parametric tests. However, when either condition to apply parametric tests is not
fulfilled or the nature of data does not permit the application of parametric tests, then the option
left is non-parametric tests.
The situation-wise non-parametric tests based on a number of samples dealt with by the
researcher and the nature of the sample as dependent or independent are listed below.
• Relationship between variables: Chi-square test, Spearman rank correlation test, binomial
test, run test, one sample Kolmogrov-Smirnov test.
• Test for difference between groups
Table 15.3 Summary of Commonly Used Non-Parametric Tests
S. Name of Test No. of Population Parameter Tested/ Null and Test Criterion/Test Statistic
sample Situation/Assumptions Alternative
Hypotheses
1. Sign Test 01 Median of rank of population. H0: M = M0 Number of +ve and −ve signs for deviations of observed data
(one set of Used when the normality assumptions H1: M ≠ M0 from median = M0.
ordinal of t-test not fulfilled or the data is in M > M0 The test statistic is sufficiently small no. of observed +ve or −ve signs
data) rank form. M < M0 or both according to the nature of alternative hypothesis.
p-value calculated using binomial distribution for x number of −
ve or +ve signs in a sample of size n.
2. Sign Test 02 Median of rank of X and rank of Y. H0: Mx = My Every matched score for Y is subtracted from that of X and the
(Continued)
Table 15.3 (Continued)
Suggested Readings
Altman, D. G., Practical statistics for medical research, Chapman & Hall, London, 1992.
Brown, L. D., T. Cai, and A. DasGupta, ‘Interval estimation in exponential families’, Statistica Sinica 13:
19–49, 2001.
Brown, L. D., T. Cai, and A. DasGupta, ‘Confidence intervals for a binomial proportion and asymptotic
expansions’, The Annals of Statistics 30(4): 160–201, 2002.
Daniel, W. W., Biostatistics: basic concepts and methodology for the health sciences, Wiley Publications,
John Wiley & Sons, Reprint by Wiley India (P) Ltd., New Delhi, 2010.
Dunn, O. J., Basic statistics, Wiley, New York, 1984.
Gupta, S. C., Fundamentals of statistics, Himalaya Publishing House, Mumbai, 1981.
Lehmann, E. L., Nonparametrics: statistical methods based on ranks, McGraw-Hill, San Francisco,
CA, 1985.
Negi, K. S., Biostatistics, 2nd ed., AITBS, New Delhi, 2010.
Statistical Inference—Parametric and Non-Parametric Tests 169
Patel, J. K., and C. B. Read, Handbook of the normal distribution, Marcel Dekker, New York, 1982.
Rao, C. R., Linear statistical inference and its applications, 2nd ed., John Wiley & Sons, New York, 1973.
Siegel, S., and N. J. Castellan, Nonparametric statistics for the behavioral sciences, 2nd ed., McGraw-Hill,
New York, 1988.
Sprent, P., Applied nonparametric statistical methods, Chapman & Hall, Losu, 1989.
Zar, J. H., Bio-statistical analysis, 4th ed., Prentice-Hall, Upper Saddle River, NJ, 1999.
16 Multivariate Statistical Techniques
16.1 Introduction
In most of the univariate analysis, one variable which follows a normal population is gener-
ally assumed. When many variables or measurements in each unit of study are simultaneously
considered, it forms a case for multivariate analysis. In agriculture, a farming system study may
include variables related to farm family (family size, number of male members/female mem-
bers, children, etc.); farm animals (number of cows, buffaloes, goats, draught animals, etc.);
farm area (farm area, irrigated area, cropped area, etc.), farm household economy (income,
consumption expenditure, family savings, etc.) and so on. Each of these variables may fol-
low a normal distribution individually. In medical and health research, a set of observations
of a patient related to anthropometry (age, height, weight, etc.); haematology (blood pressure,
haemoglobin counts, platelet counts, etc.); other biochemical, pathological, microbiological,
etc. are considered for each patient as components of the random vector. Instead of a random
variable in univariate approach, a random vector consisting of a column matrix is the basis of
multivariate approach. Research in other areas like business, marketing, sociology, psychology,
economics and others where a set of observations are considered as a random vector on each
unit of study, it is possible to have multivariate analysis.
In multivariate analysis, a set of measurements on each unit of the study is taken and treated
as a vector. When similar observations are available for a given number of units of study and all
these observations are simultaneously considered for analysis, it forms the case of Multivariate
analysis. In the previously stated case of multivariate analysis, the given set of data of an agri-
cultural household or a patient is treated as a vector (a column matrix). If the vector has all the
characteristics of a random variable and it is called a random vector and denoted as [X] = [X1,
X2, X3, . . ., Xp]T, where T standard for transpose of the column matrix. Similar to normal distri-
bution, in univariate analysis in one variable case, there is multivariate normal distribution in
the multivariate case as:
−1
1
∑ 2 − ( X −µ) ∑−1( X −µ)
|
f (x) = p
e 2 .
(2π ) 2
Here, μ is the mean vector of individual variables X1, X2, . . ., Xp and Ʃ is the variance-covariance
matrix of the components of the vector [X]. Thus μ is a column matrix with mean values of the
individual components and Ʃ is a ‘p x p’ square matrix with diagonal elements as the variances
of X1, X2, . . ., Xp and non-diagonal elements are co-variances of Xi and Xj, I ≠ j.
DOI: 10.4324/9781003527183-16
Multivariate Statistical Techniques 171
If there are p components in the vector variable X, then X = [X1, X2, . . ., Xp]T is the mul-
tivariate random vector with population mean vector as μ = [μ1, μ2, . . ., μp]T and population
variance-covariance matrix Ʃ is:
σ11 σ12 σ1 p
Σ = ... ... ... .
σ σ p2 σ pp
p1
The corresponding sample vector for mean S11 S12 S1 p
T
X = X1 , X 2 ,… X p and sample variance-covariance matrix is S = ... ... ... .
S S pp
p1 S p 2
The variance-covariance matrix is a symmetric matrix meaning all Sij = Sji. In other words,
S12 = S21, S13 = S31. When households from rural and urban areas or patients with and without
disease are to be compared using multivariate analysis, two separate multivariate normal distri-
butions are assumed.
The multivariate analysis has a wide range of applications in research in areas like agricul-
ture, health and medical sciences, business, commerce, economics, social sciences and many
others. The major type of multivariate analysis includes the following:
The decision to accept or reject the null hypothesis is taken by comparing calculated F value
with tabled F value corresponding to given degrees of freedom (p, n1 + n 2 − P −1).
16.3 Discriminant Analysis
It is a multivariate technique used to classify units of a multivariate population in defined classes
based on available multivariate measurements on subjects fallen in these defined classes of the
said population. There should be at least two classes to classify the units of the population under
study as passing or failing an examination; curing or not curing from a disease after treatment;
profit earning or loss incurring in a business and so on. For simplicity let us assume a two-group
class situation for which the dataset is available separately. These two classes are assumed as
separate populations. The discriminant function and the rule for discrimination can be worked
out based on the mean vectors and variance-covariance matrices for the defined classes. Once
the function and the rule for classification are fixed, any new subject from the joint population
can be assessed in advance so as to fall into one of the classes prior to the actual realization of
the outcome after treatment and complex calculations.
Let us assume that:
(i) (X1) and (X2) are the random vectors representing the units from the two groups as two
separate multivariate normal distributions with p common components of measurements.
Group I as (X1): (x11, x12, . . . x1p)T with population mean (μI) = (μ11, μ12, . . ., μ1p)T
Group II as (X2): (x21, x22, . . ., x2p)T with population mean (μ2) = (μ 21, μ22, . . ., μ2p)T
(ii) Let the size of Group I with vector (X1) be n1 and for Group II with vector (X2) be n2.
(iii) The multivariate random vectors of the populations are assumed to have a common
variance-covariance matrix (σij) which is estimated as (Sij).
(iv) Compute the sample mean vectors of (X1) and (X2) as ( X1) and ( X 2) respectively.
(v) Compute the inverse of pooled variance-covariance (Sij) as (Sij).
(vi) Compute the difference of the two mean vectors as (D j ) = ( X1 − X 2 ).
(vii) The discriminant function is a linear function of the form L = l1x1 + l2 x 2 +… lp x p
where li = Σ jSij D j .
(viii) The grouping of the new units is made in Group I or Group II using the value of the func-
tion ‘L’ such that:
{X / L ³ C} means the unit will fall in Group I
{X / L < C} means the unit will fall in Group II
( T ij
) (
where C = In P1 / P2 − X1 S X1 − X 2 S X 2 / 2
T ij
)
P1 (= n1 / n ) and P2 (= n 2 / n ) are probabilities of units falling in Group I and Group II
respectively.
(ix) The statistical significance of discriminant function can be tested using F statistic as
n1 + n2 − p −1 n1.n2
F (p, n1 + n 2 − p −1) = × × M 2 where M p 2 = Σli Di
p (n1 + n2 )(n1 + n2 − 2) p
Multivariate Statistical Techniques 173
Li and Di are as per (vi) and (vii).
(x) The discriminant function can be validated by examining the errors in classification of
known cases of n1 and n2 respectively.
The principal component analysis can be a solution in such cases. Instead of trying individual
variables, a set of linear combination of ‘m’ independent variable is used to relate the rela-
tionship between dependent and independent variables. Such linear combinations are mutually
uncorrelated and will have maximum variance between them.
Let ‘A’ be the correlation matrix (symmetric) of the ‘m’ explanatory variables and I be an iden-
tity matrix of size maximum.
The solutions to the following ‘m’ equation A − bˆ ⋅ I = 0 are called the characteristic roots or
latent roots or eigenvalues where A is ‘m × m’ correlation matrix of explanatory variables and I is
an identity matrix of size ‘m × m’ and ‘b’ is a scalar matrix of size ‘m × m’. The solutions of ‘m’
equations will give ‘m’ different values of b̂. Let b̂1 , b̂2, . . ., b̂m be the characteristic roots arranged
in descending order of its size. Corresponding to each characteristic root (say b̂ j) a characteristic
vector or eigenvector (â j) of size m × 1 can be obtained by solving the following equation:
A − bˆ I ⋅ aˆ = [0].
j j
Where ‘A’ is m × m matrix as stated above, b̂ j is the scalar, I is an m × m identity matrix, â j is
m × 1 column vector of unknown values corresponding to b̂ j and 0 is m × 1 column matrix of
zero values.
Now, normalize â j, so that aˆ ¢j × aˆ j = 1.
After estimating â j , the principal component corresponding to jth characteristic root (b̂j) is
defined as:
The ratio of b̂j to Sb̂j is the proportion of total variation in all explanatory variables which is
accounted for by the jth principal component.
In practice, only a few principal components are finally selected according to their magnitude
which account for most of the variability in explanatory variables.
The correlation of each of these principal components with each of the individual explana-
tory variables are then calculated to find out the most closely associated explanatory variables
with each major principal component. These explanatory variables can be used in the regression
analysis in place of all available explanatory variables. Hence principal component analysis can
be used to select most contributing explanatory variables in regression analysis, especially when
sample size is small, and number of explanatory variables is large.
174 Research Methodology and Quantitative Techniques
For a regression model, Y = f (X1 , ⋅X 2 ,… X m ) when sample size ‘n’ is low compared to ‘m’
the number of independent/explanatory variable, that is, m < n or when there is high degree of
multi-collinearity among X1, X2, . . ., Xm, a linear combination of X1, X2, . . ., Xm is generated
which is called the principal component.
16.5 Canonical Correlation
It is a flexible multivariate technique which simultaneously correlates independent and depend-
ent variables. Independent variables are metric such as sales, measurements, etc. It can also
utilize nonmetric categorical variables, there are only very few restrictions in this method.
In multiple linear regression of type Z = f ( x1 , x 2 , x 3 ,…, x p ), a linear combination of the
vector of explanatory variables (X) on a single dependent variable Y is established so that the
multiple correlation RY . X1 X 2¼ X p is maximum.
Canonical Correlation is the correlation between two vectors of variables Y = (Y1 , Y2 ,…, Yp )
T
We have to choose ‘a’ and ‘b’ in such a way that ƿ is maximum and then ƿ is called the first
canonical correlation between Y and X. Here Z1 and Z2 are called first set of canonical variables
associated with Y and X.
The canonical correlation and variables can be obtained by solving the constrained maximi-
zation problem:
a ′Y. b ′X
Maximize subject to a ′Σ11a = b ′Σ22 b = I p×p .
(a ′Σ11 a )(b ′Σ22b)
It can be shown that there are p canonical correlations and p canonical variables corresponding
−lΣ11 Σ12
to each root of the equation: = 0.
Σ21 lΣ22
Here ʎ is called Lagrange multiplier.
Generally, out of these p roots, the highest one is chosen.
For estimation of canonical correlation or canonical variables, one has to estimate the
variance-covariance matrix or the correlation matrix of the variables.
16.7 Factor Analysis
Factor Analysis is a multivariate technique to reduce the ‘p’ dimensions of measurements of
each object into a smaller number of factors using the original variance-covariance matrix.
¢
If X is a ‘p’ dimensional multivariate random variable, that is, ⋅X = X1 , X 2 ,..., X p with E
(X) = μ and variance-covariance matrix of X = Σ.
A linear structure for each of the random variable Xi is assumed:
Here, Y1, Y2, . . ., Ym are hypothetical unobservable random variables which are common in
linear structure for each Xi.
Zi is a hypothetical unobservable random variable specific to each Xi.
aij (i = 1, 2, . . ., p and j = 1,2, . . ., m) are the coefficients of random variable Yi.
All random variables Y1, Y2, . . ., Ym appear in the linear model for each Xi and Yj is the jth
common factor while:
‘m’ is called the complexity of factor as small values of m implies less complexity;
Z1, Z2, . . ., Zp are called specific or unique factors;
aij are called factor loading for Xi on common factor Yi.
Factor analysis is a technique in which ‘k’ unobservable factors F1, F2, . . ., Fk are generated out
of ‘p’ number of observable variables (Y1, Y2, . . ., Yp) of interest in a study.
As a simple case, let there be three variables Y1, Y2 and Y3 as marks obtained in three subjects
(science, mathematics and English). Data of five students whose marks obtained in science,
mathematics and English out of 10 (maximum marks) are available in Table 16.1.
1 3 6 5
2 7 3 3
3 10 9 8
4 3 9 7
5 10 6 5
176 Research Methodology and Quantitative Techniques
These marks are functions of two factors F1 (mathematical ability) and F2 (verbal ability),
respectively.
Here each Y variable can be assumed as linearly related to these two factors:
As the relationships are not exact the errors e1, e2 and e3 are included in the above relationships.
The parameters Bij appearing in the relationship are called factor loadings of the variables
of study.
Based on the assumption that the error term ei s are independent of one another and the unob-
servable factors Fj s are independent of each other and of the error term and using the variance
covariance matrix of given data the factor loadings can be worked out.
It is used to reduce the large number of variables to a small set of factors when large
number of variables is present in the study. It is an independent technique with no dependent
variable. Independent variables are normal and continuous. Generally, three to five vari-
ables are loaded into a factor. Sample size should be greater than 50, with a minimum of five
observations per variable. High multi-collinearity between variables as revealed from the
correlation matrix of variables is assumed. Predictability of every variable by all other vari-
ables is assessed using Kaiser’s measure of statistical adequacy (MSA); MSA > 0.8 is good
and MSA < 0.5 is poor.
There are two methods of factor analysis: (i) Common factor analysis (CFA) and (ii) prin-
cipal component analysis (PCA). CMA extracts factors based on variances shared by factors
and is used to look for latent factors. PCA extracts factors based on total variance of the fac-
tors to find out a few numbers of variables that explain the maximum variance. The first fac-
tor extracted explains maximum variance and factors are extracted as long as eigenvalues are
greater than 1.0. Factor loadings are the correlation between the factor and the variables. An
orthogonal rotation assumes no correlation between factors and a factor loading greater than 0.4
is required to attribute a specific variable to a factor.
16.8 Cluster Analysis
Cluster Analysis is a technique by which ‘n’ objects under study are grouped into homogenous
clusters based on similarities of ‘p’ measurable characteristics on each subject. Hence clustering
is the grouping of objects according to homogeneity of multivariate characteristics. Clustering is
empirically done based on similarity. There are a large number of methods for clustering and the
objective of all methods is to form clusters in such a way that objects within a cluster are more
or less homogenous and objects between clusters are heterogeneous.
The clustering problem as a multivariate statistical technique can be better stated as the
grouping of ‘n’ objects (1, 2, . . ., n) into ‘k’ clusters (1, 2, . . ., k) which are homogenous
when based on ‘p’ measurements (X1, X2, . . ., Xp) of each object. Given objects and meas-
urements similarity is assessed based on a distance measure between each pair of objects,
say, d(x,y) as the distance between X and Y. For the smaller value of d(x,y), the higher is the
similarity.
It is used to reduce a large dataset to a meaningful subgroup of objects based on the similar-
ity of objects across a set of specified characteristics. A major problem arises from outliers to
Multivariate Statistical Techniques 177
too many irrelevant variables. The sample should be representative of the population, and it is
desirable to have uncorrelated factors.
There are three clustering methods: (i) Hierarchal with tree-like process, (ii) non-hierarchal
with priori specification of a number of clusters and (iii) combination of (i) and (ii).
Four rules to develop clusters—clusters should be different, reachable, measurable and prof-
itable. (Cluster analysis is mostly done for market segmentation.)
The conceptual part of some of the major multivariate techniques is introduced here. The
techniques of multivariate analysis are matrix-based and computer packages are available in
SPSS and other software packages. Researchers aiming for system analysis based on multiple
observations on each of the study units can apply to the multivariate techniques.
• Multivariate statistical techniques are applied when multiple observations on each sample
study unit are available in the form of a random vector, which is a column matrix of the
observations of the study unit, that is, [X] = [X1, X2, . . ., Xp] T.
• Similar to a univariable normal distribution, there is a multivariate normal distribution with a
mean vector (of size p × 1) as well as variance covariance matrix (of size p × p) for popula-
tion as well as for corresponding sample. The sample mean vector and variance-covariance
matrix of a random sample are considered unbiased estimates of population mean vector and
variance-covariance matrix respectively.
• Like t-test for equality of two population mean in a univariable case, the equality of mean
vectors of two multivariate normal populations can be tested using Hotelling’s T2, assuming
a common variance-covariance matrix for the two multivariate populations.
• Discriminant analysis is a multivariate technique used to classify new units of a population
in defined classes based on available multivariate measurements on subjects fallen in these
defined classes of the said population. There should be at least two classes to classify the
units of the population under study. The discriminant function and the rule for discrimina-
tion can be worked out based on the mean vectors and variance-covariance matrices for the
defined classes.
• Principal component analysis is a linear combination of independent variables used to relate
the relationship between dependent and independent variables. Such linear combinations
are mutually uncorrelated and will have maximum variance between them. It is used when
there is a high linear correlation among explanatory or independent variables, or the number
of observations (sample size) may not be large as compared to a number of explanatory
variables.
• Canonical Correlation is the correlation between two vectors of variables Y = (Y1, Y2, . . ., Yp)T
and X = (X1, X2, . . ., Xp)T. Multivariate analysis of variance (MANOVA) is used in experi-
mental design to assess the relationship between many categorical independent variables and
two or more metric dependent variables. It examines the dependence relationship between a
set of dependent variables across a set of groups.
• Factor analysis is a multivariate technique to reduce the ‘p’ dimensions of measurements of
each object into a smaller number of factors using the original variance-covariance matrix.
• Cluster analysis is a technique by which ‘n’ objects under study are grouped into homog-
enous clusters based on similarities of ‘p’ measurable characteristics on each subject.
Hence clustering is grouping of objects according to homogeneity of multivariate
characteristics.
178 Research Methodology and Quantitative Techniques
Suggested Readings
Anderberg, M. R., Cluster analysis for applications, Academic Press, New York, 1973.
Anderson, T. W., Introduction to multivariate statistical analysis, John Wiley & Sons, New York, 1958.
Bock, R. D., Multivariate statistical methods in behavioral research, McGraw-Hill, New York, 1975.
Dempster, A. P., Elements of continuous multivariate analysis, Addison-Wesley Pub. Comp., Reading,
MA, 1969.
Dillon, W. R., and M. Goldstein, Multivariate analysis: methods and applications, John Wiley & Sons,
New York, 1984.
Green, P. E., Analyzing multivariate data, The Dryden Press, Hinsdale, IL, 1978.
Johnson, R., and D. W. Wichern, Applied multivariate statistical analysis, Prentice-Hall, Englewood
Cliffs, NJ, 1982.
Karson, M. J., Multivariate statistical methods—An introduction, The IOWA State University Press, IA,
Ames, Iowa, 1982.
Mardia, K. V., J. T. Kent, and J. M. Bibby, Multivariate Analysis, Academic Press, New York, 1979.
Morrison, D. F., Multivariate statistical methods, McGraw-Hill, New York, 1976.
Seber, G. A. F., Multivariate observations, John Wiley & Sons, New York, 1984.
17 Some Other Quantitative Techniques
in Research
17.1 Kappa Statistics
It is a measure of inter-observer/inter-method agreement on the interpretation of situations like
consistency in two or more doctors’ diagnoses of medical test results or inter-judgement consist-
ency in participants’ performance or similar such events. The inferences based on the findings
of physical examination, radiographic interpretations or other lab tests by physicians/medical
observers have some degree of subjective interpretation. Similar variation exists in judgements
of two or more experts for cultural events, etc. Statistical tests are available to measure the extent
of agreement between two or more such observers as sometimes the agreement or disagreement
based on test results is simply by chance. The Kappa statistic or Kappa coefficient is a commonly
used statistical measure for assessing the same. The Kappa coefficient lies between −1 and +1.
The value +1 for the Kappa statistic means perfect agreement among observers, the Kappa sta-
tistic zero means the agreement is attributable to simply chance and the Kappa statistic −1 means
complete disagreement among observers. The agreement between observers or inter-observer
agreement measures the precision. Kappa statistics measure the precision and is a measure of the
magnitude of agreement between observers. It also is a measure of reliability of the test.
The opinion of two international tourists on their choice for revisit to 100 cities visited in
India during their first visit is summarized in Table 17.1.
Here, (a) and (d) represent the number of cities the two tourists completely agree for choice
of revisit while (b) and (c) indicate the number of cities the two tourists disagree or have dif-
ferent opinion. If there were no disagreement, then (b) and (c) would have been zero, (then the
observed agreement p0 = 1) and if there were no agreement, then (b) and (c) would be zero and
(then the observed agreement p0 = 0).
Table 17.1 Frequency of Usefulness of the Lectures Assessed by the Two Residents
Revisit No revisit
DOI: 10.4324/9781003527183-17
180 Research Methodology and Quantitative Techniques
Table 17.2 Interpretation of Kappa Values
17.2 Composite Index
The statistical procedure for estimation of composite index was developed by Prem Narain et al.
to classify districts based on social development indicators which are summarized:
Let [Xij] denote the data matrix representing the development-related indicators of states;
i = 1, 2, . . ., n states and j = 1, 2, . . ., k indicators.
As [Xij] denotes different indicators in different units of measurement these are not additive as
such to get the required composite index. Hence the [Xij] are transformed to [Zij]:
Z
(Xij − X j )
ij Sj
Where,
X J = mean of jth indicator
S j = standard deviation of jth indicators across states
Z = the matrix of standardized indicators.
ij
From [Zij], identify the best value of each indicator. In the case of positive (pushing factors)
indicators, the best value can be the maximum state value and in the case of negative (pull-
ing factors) indicators, it can be the minimum state value, depending upon the direction of the
impact of the indicator on the level of development. Let Z0j denote the best value (maximum
for positive indicator and minimum for negative) of jth indicator. In order to get the pattern of
development, calculate first Pij
Where,
Pij = ( Zij − Z0 j ) .
2
Some Other Quantitative Techniques in Research 181
The pattern of development Ci is given:
k Pij 1/2
Ci = ∑
j =1 CV j
Where,
CVj is the coefficient of variation of jth indicator in matrix Xij.
Composite index Di is given,
D i = Ci / C
Where,
C = C + 3SD
n
Mean of C = C =
∑ i =1
Ci
i
n
n
∑ (C − C )
2
i =1 i
Standard deviation of Ci = SD =
n
Smaller value of Di will indicate a high level of development and a higher value of Di will indi-
cate a low level of development as deviation of the standardized indicator from ideal standard-
ized state value (maximum/minimum) are taken to calculate the composite indices.
Y = a + bt − linear ,
Y = ab t − exponential
Y = a + bt + ct 2
Where, Y and t are as above; a, b and c are coefficients of the quadratic equation. Depending
upon the sign of c is +ve or −ve, the trend curve can be U-shaped or inverted U-shaped.
182 Research Methodology and Quantitative Techniques
17.4 Production Function
Production function in Economics is a physical relationship between output and input of any
production system. In agriculture, the agricultural output of any crop (physical production of a
production unit) is a function of many inputs (seed, fertilizer, labour, irrigation water, PP chemi-
cal, etc.) used in the production process. If Q stands for output and X1, X2, . . ., Xk stand for
inputs, then the generally used linear and Cobb-Douglas production functions are of the form
Q = a + b1X1 + b 2 X 2 +…+ b k X k is the linear production function and b1, b2 . . . bk are the
marginal physical product of X1, X2, . . ., Xk with respect to output Q.
Q = ax1b1X 2 b 2 … X k bk is the Cobb-Douglas production function and b1, b2, . . ., bk are the elas-
ticity of inputs X1, X2, . . ., Xk with respect to output Q.
Marginal Physical Product dQ / dX
Using the concept of output elasticity of input = = , the
Average Physical Product Q/X
elasticity can be worked out from the estimated linear function and marginal products can be
worked out from the Cobb-Douglas production function
For most of the data especially economic data will have a positive trend for pushing factors and
a negative trend for pulling factors. In other words, positive variables may have an increased
trend and negative variables may have a negative trend. Hence the consistency in de-trended
time series data can be assessed after eliminating the trend effect from original data. Using the
estimated trend equation, we can calculate the trend values for each of the given data in the time
series. If we assume an additive model, we have to subtract trend values from the original data
and if assume a multiplicative model we have to divide the original data with trend values cor-
responding to each data. Let Y* denote the de-trended values of original data series Y, then we
can calculate the CV of de-trended values as:
( ) ×100.
SD Y∗
( )
CV Y∗ =
AM (Y∗ )
17.6 Linear Programming
Linear programming is a technique for getting optimum values-maximum or minimum-subject
to certain constraints or conditions which limits the normal process to find such values. The term
linear implies that the relationship involved in the problem must be linear (first-degree form
of variables). The decision process for maximization of profit, minimization of cost, optimum
allocation of resources and many other similar issues can find application of linear programming
when such decisions are to be taken subject linear conditions. The basic assumptions of LP are:
• A goal for optimization as an objective function which can be expressed in a linear form of
variables involved.
• Presence of structural conditions or constraints in the activities which again can be expressed
in linear form of variables involved.
• In economic problems the prices are kept constant.
• Divisibility of activity levels in fractional or integer terms.
• Non-negativity of constraint variables.
Example: Assume that a production unit producing three products (X1, X2 and X3). Let x1, x2
and x3 be the number of units that are to be produced having prices p1, p2 and p3. Suppose there
is a capacity constraint (maximum possible at a time) for inputs in four uniform stages of the
production process of each of the products. The problem can be written as in Table 17.3.
Y = a + bx + cx 2 .
dy d2 y
To find maximum of this production function, = 0 and 2 is −ve.
dx dx
b
Hence b + 2cx = 0 or x = − will give that level of input which maximizes production.
2c
17.8 Decomposition Analysis
When we have data on two points of time for a time series variable like production say Pn and P0.
We may be interested to decompose the production change over time to area effect and yield effect.
We know that production (P) = Area (A) × Yield (Y) .
Pn = A n ⋅ Yn and P0 = A 0 ⋅ Y0
Q
β=
p1x1 + p 2 x 2 + ... + p k x k
Using the above formula, the TFP for two alternative crops or for the same crop at two points of
time can be compared. The crop with higher TFP will be more rewarding.
17.11 Diversification Index
Those who are engaged in agricultural research may come across problems of comparing agri-
cultural diversification. The Simpson’s diversity index (SDI) is a measure of diversity which
takes into account the number of species present as well as the relative abundance of each
species. As species richness and evenness increase the SDI will increase. The formula is set in
such a way that the value of SDI will lie between zero and one. SDI = 0 means no diversity and
SDI = 1 means very high diversity. SDI is calculated as:
Σn i (n i −1)
SDI = 1−
N( N −1)
where n = total number of organisms of a particular species and N is the total number of
organisms of all species.
As an example, if one is planning to calculate SDI for a region where different crop species
of different crop commodities are grown as shown in Table 17.4.
Table 17.4 Example for Working Out Simpson’s Diversity Index (SDI)
Cereals 6 3 30 6
Pulses 4 1 12 0
Oil crop 3 8 6 56
Spices 4 2 12 2
Vegetables 8 3 56 6
Fruits 2 0 2 0
Fodder 7 1 42 0
N = 34 N = 18 160 70
Table 17.5 Research Methods, Research Designs and Some Basic Statistical/Analytical Techniques at a
Glance Some Important points
Suggested Readings
Berkman, H. W., and C. Gilson, Consumer behavior: concepts and strategies, 3rd ed., Kent, Boston,
MA, 1986.
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel, Time series analysis: forecasting and control, 3rd ed.,
Prentice Hall, Englewood Cliffs, NJ, 1994.
Brockwell, P. J., and R. A. Davis, Time series: theory and methods, 2nd ed., Springer-Verlag, 1991.
Cryer, J. D., Time series analysis, Duxbury Press, Boston, MA, 1986.
Fuller, W. A., Introduction to statistical time series, John Wiley & Sons, New York, 1976.
Gottman, J. M., Time-series analysis: a comprehensive introduction for social scientists, Cambridge Uni-
versity Press, Cambridge, 1981.
Harvey, A. C., The econometric analysis of time series, Philip Allan, Oxford, 1981.
Harvey, A. C., Forecasting, structural time series models and the Kalman filter, Cambridge University
Press, Cambridge, 1989.
Koutsoyiannis, A., Theory of econometrics, Macmillan, London, 1976.
Kraemer, H. C., ‘Kappa coefficient’, in Encyclopedia of statistical sciences, edited by S. Kotz and N. L.
Johnson (eds.), John Wiley & Sons, New York, 1982.
Loudon, D. L., and A. J. Della-Bitta, Consumer behavior concepts and application, 4th ed., McGraw-Hill,
New York, 1993.
Pena, D., G. C. Tiao, and R. S. Tsay (eds.), A course in time series analysis, John Wiley & Sons,
New York, 2001.
Solomon, M. R., Consumer behavior: buying, having and being, 2nd ed., Allyn & Bacon, Boston, MA,
1994.
18 Computer Application in Research
18.1 Introduction
Computer application has become inevitable in various activities of research. Starting from
research topic identification to research publication, ICT application plays a crucial role and
eases the research activities at various stages as shown in Figure 18.1.
Topic Identification: The website visits of related research institutes and internet searches for
important research topics in the area of interest of the researcher will help the researcher to
set the research platform for his/her proposed research work. After having a panel of possible
research topics, the researcher can shortlist topics using the FINER (feasibility, interest, nov-
elty, ethics and relevance) criteria. After discussions with theresearch guide/research board
of the researcher, he/she can reach a decision on the specific topic of study.
DOI: 10.4324/9781003527183-18
190 Research Methodology and Quantitative Techniques
Review of Literature: The ICT application makes it possible for the researcher to have a
wide range of online reviews of literature of research journals, digitally published books
and other materials. In fact, most of the researchers initiate review of literature even before
topic finalization as it helps them to identify novel research topics. A comprehensive litera-
ture review will help the researcher right from topic identification to report finalization and
publication of results. A systematic review will always enhance the quality and relevance
of research.
Research Methodology: While reviewing past research work related to a specific area/topic the
researcher develops insights for the methodology of his/her identified topic of study. With
required modifications/improvements the research methodology for the identified topic can
be streamlined by the researcher.
Data Collection and Compilation: In most research studies the data collection is done manu-
ally or with the help of specially designed machines depending upon the research methods
as experimental, survey-based, observational or other methods including epidemiologi-
cal studies. Remarkably, in clinical research, the data of ICU patients are collected from
computer-based equipment/devices. The Google forms are being used to collect opinion/
survey-based data from educated and computer-literate respondents. Online surveys are
gaining dominance these days in market research and other areas.
Data Analysis: Google Form-based collected information can be easily transferred to Excel
sheets. The manually collected data can also be entered on Excel sheets either for prelimi-
nary analysis using available options in Excel or for software-based analysis using SPSS/
SAS/R-programme, etc. as most of these programmes allow import/export options from
Excel worksheets for further data analysis.
Reporting and Publications: MS Word has a wide range of options to prepare reports using
text matter, graphs, tables, charts, diagrams and other editing including plagiarism-checking
options of the manuscript. As there is a wide range of online journals, publication has become
faster and easier with ICT applications.
• The worksheet settings can be made by the researcher once the synopsis is ready with the
data collection format.
• The data entry can be simultaneously done with data collection and the problem of data
inconsistencies and missing data can be minimized.
• The manual validation of data in terms of permissible data range (outlier problems), missing
data, etc. can be done during data entry and by visual screening of data.
• Basic statistical data analysis can be done using inbuilt options of Excel under < Formulas>
< More Functions> <Statistical> or by putting = in a cell and selection of appropriate function.
• It has facility to set and edit formulas based on cell identity and get results instantly for the
dataset or any part of the marked dataset.
The statistical functions may vary with the version of MS Office but some of the inbuilt
options for getting statistical results from the dataset of Excel are given in Table 18.1 and 18.2.
Table 18.1 Some Functions Available in Excel
Is Function Conditional Mathematical Find and Lookup Reference Date and Time Misc. Rank Logical
Search
1 ISBLANK AVERAGEIF COUNT FIND MATCH ADDRESS DATE AREAS RANK AND
2 ISERR AVERAGEIFS COUNTA SEARCH LOOKUP CHOOSE DATEVALUE CHAR RANK.AVG OR
3 ISERROR SUMIF COUNTBLANK SUBSTITUTE HLOOKUP INDEX TIME CODE RANK.EQ XOR
4 ISEVEN SUMIFS AVERAGE REPLACE VLOOKUP INDIRECT TIMEVALUE CLEAN NOT
5 ISODD COUNTIF AVERAGEA OFFSET NOW TRIM
6 ISFORMULA COUNTIFS MEDIAN TODAY LEN
7 ISLOGICAL IF MOD YEAR COLUMN
8 ISNA IFERROR SUM MONTH ROW
9 ISNUMBER IFNA SUBTOTAL DAY EXACT
10 ISREF SUMSQ HOUR FORMULATEXT
11 ISTEXT SUMPRODUCT MINUTE LEFT
12 ISNONTEXT SQRT SECOND RIGHT
13 POWER WEEKDAY MID
One can get any sort of help from exceldemy.com or search on a search engine like google.
com. Help is also available in Excel itself. After typing ‘=’ and some of the alphabets of the
required function, a list of related functions appears and one can select the desired function
for use.
One can easily look for help with Excel on any of the search engines.
Availability of functions may vary with the version of the MS Office. Using a function in
Excel is very easy. You may select a function either by typing ‘=’ and some alphabet related to
Computer Application in Research 193
the function, a list of functions will appear, select a function (Figure 18.2) or click on ‘fx’, a box
will popup, select the type of function (Figure 18.3) either by typing the needed function in the
shaded area of Figure 18.4 or selecting a category of function then choosing one from the list of
functions as can be seen in Figure 18.4.
Enter the required information in the brackets () as desired by the function in the order, for
example, for calculating average, size of the array, is required =AVERAGE(A1:A10). It will return
the average of A column from rows one to ten. You may enter a block of any number of columns
and rows =AVERAGE(A1:D10) or different blocks =AVERAGE(A1:B10,E5:G15,K7:N21).
In this way, we can perform most of the calculations by selecting the appropriate function in
the appropriate cell and value. For the same size of data same function can be copied and the
function can be repeated. When we are having a number of data sets of different sizes calcula-
tion becomes cumbersome. In this situation, it is better to use other programmes like SPSS, R,
SAS, etc.
Inbuilt options are available in Excel for doing basic statistical analysis. However, for
advanced statistical analysis, one has to use appropriate computing software with care and cau-
tion as ‘garbage in is garbage out’ for any partial knowledge application of advanced data analy-
sis software. A large number of software are available for statistical analysis. Most of them are
customized for specific fields of specialization. Some of the generalized popular programmes
are SPSS, R-programme and SAS. The SPSS is most popular because of its user-friendly nature.
When we click on any menu or ribbon, a list of functions is available in each menu
(Figure 18.6). Details of each menu are:
File: All file-related functions are available in this menu such as open, save, save as etc.
Edit: To edit any file, all functions like cut, copy, paste, etc. are available in this menu.
View: What is to be viewed while using the data file is to be selected from this menu.
Data: All data-related functions are available in this menu.
Transform: The functions related to generate new column or replace the column with the help
of existing data of column(s) are available in this menu.
Analyze: This is the most important menu as all functions related to statistical analysis of data
are available in this menu.
Graphs: All options to develop different types of graphs with the help of existing data functions
are available in this menu.
Utilities: Other related functions are available in this menu.
Add-Ons: Applications, services, statistical guides, etc. are available in this menu.
Window: The files opening and minimize or maximize functions are available in this menu.
Help: All sorts of help are available in this menu.
Figure 18.6 Different Options Under Each Menu on the Menu Bar of SPSS
196 Research Methodology and Quantitative Techniques
To use this programme we have to follow these steps:
Preparation of data file: We may create the data file in SPSS itself or may import the file prepared
in any other programme by selecting the format and then opening file in the file menu. One may
have data for different variables recorded on a number of samples or plots. Each variable occupies
one column. Values recorded on the sample are placed in rows in respective cells of that variable.
Number of file management tools are available in the software like splitting the data, merging the
data, calculating new variables with the help of existing variables, changing the nature of data
(nominal, ordinal and scale). The data file has two views: (a) Data view and (b) variable view.
Data view: In this, all data of different variables are visible. We can edit the data in this view
(Figure 18.7).
Variable view: In this view name and nature of data can be changed. One can create a new vari-
able along with its properties (Figure 18.8).
Data Analysis: After preparation of the data file, data can be analyzed using the analyze menu.
Some popular calculations can be performed as:
Summary of Variables: To calculate the summary of variables like sum, mean, number, mini-
mum, maximum, standard deviation, skewness, kurtosis and variance of variables. Select
<Analyze>, <Reports> and <Report summaries in row> a box will pop up (Figure 18.9).
Select the desired dependent variable(s) and transfer to <data column variable> box. On
clicking <Summary> another box will pop up (Figure 18.10). Tick on the desired calcu-
lated values required for the variable like <Mean>, <Minimum>, <Maximum>, etc. Click on
<Continue>. If summaries are required for different groups, select the grouping independent
variable and transfer in <Break Column Variable> box else, the summary will be calculated
for the entire column of variable. Click on <OK>. You will get the result. The same can also
be obtained by <Analyze>, <Reports>, <Case Summaries> functions where the rest of the
procedure is the same except instead of ticking a type of summary required click on <Statis-
tics> select desired parameter and transfer in <Cell Statistics> box (Figure 18.11), click on
<Continue> this will disappear, click on <OK>. Resultant statistics will be in the output file.
Computer Application in Research 197
Compare means: Means can be compared by one sample t-test, independent sample t-test, paired
sample t-test or one-way ANOVA depending on the situation. Click on <Analyze>, <Compare
Means>, select desired statistics like <One-Sample t-test> a box will pop up, select the test vari-
able like plant height and transfer in <Test Variable(s)> box, put test value if available else it will
compare with zero, click on <OK>, results will be there in the output file (Figure 18.13). Simi-
larly, other statistics can also be obtained by selecting the appropriate function. Up to one-way
ANOVA (simple CRD) is possible in <Compare Means> menu for other complicated mean
comparisons <General Linear Model> or <Generalized Linear Models> menu may be used.
Correlation analysis: To calculate the correlation between a set of characters click on <Ana-
lyze>, <Correlate> and <Bivariate> a box will popup, select the set of characters and transfer
in the variable box, select the type of correlation to be applied like Pearson/Kendall’s Tau-b/
Spearman then select the test of significance (one-tailed or two-tailed) then select <Flag sig-
nificant correlations> then click <OK> (Figure 18.14), results will be in output file.
200 Research Methodology and Quantitative Techniques
18.4 R Programme
R is an interactive programming language has most of the statistical and graphical functions.
It has more flexibility of data input and output. Output format can be changed as per require-
ment but is for advanced users having the required knowledge of programme writing. It is a
command-based programme. Screen shot of R is given in Figure 18.15.
18.5 SAS Programme
It is a very powerful programme having a number of modules for researchers of different disci-
plines. It is a menu as well as a command-based programme. It has an active output file whose
results can be reused by the programme to calculate new values. But, to use it, one needs very
high skill due to its use by advanced users only. Though, it has functions for most calculations,
it is not very popular among common researchers.
202 Research Methodology and Quantitative Techniques
18.6 JAMOVI Programme
It is open-source software freely available on www.jamovi.org website under the license
AGPL3. It is available for all operating systems like Windows, macOS, Linux and Chromebook.
It can be installed on operating systems from Windows Vista (64-bit) onward. It is menu-based
software similar to SPSS. Different programmers developing the functions and uploading on the
site are available for all under JAMOVI library. R programmes can also be run in JAMOVI. All
help, functions, codes, etc. are available on this website under the resource menu. Details help
on exploration, t-tests, ANOVA, regression, frequencies and factors are available. Programmes
available under each heading are:
• Exploration: It is provided under the heading Descriptive where information about data like
count, mean, minimum, maximum, etc. are available.
• t-Test: In this one sample t-test, two independent sample t-tests and paired sample t-tests are
available.
• ANOVA: In this one-way ANOVA, ANOVA, repeated measures ANOVA, ANCOVA, MAN-
COVA, one-way ANOVA (non-parametric) and repeated measures ANOVA (non-parametric)
are available.
• Regression: Correlation matrix, partial correlation, linear regression, binomial logistic
regression, multinomial logistic regression and ordinal logistic regression are available.
• Frequencies: Proportion test (two outcomes), proportion test (N outcome), contingency table,
paired samples contingency table and log-linear regression are available.
• Factor: Reliability analysis, principal component analysis, exploratory factor analysis and
confirmatory factor analysis.
• Starting from topic identification for research to publication, ICT application eases the
research activities at various stages.
• MS Excel is a multi-purpose user-friendly programme for researchers. Every unit of data
occupies a cell having its identity specified by the row and column number.
• Basic statistical data analysis can be done using inbuilt options of Excel under <Formu-
las> < More Functions> <Statistical> or by putting = in a cell and selecting the appropriate
function.
• SPSS is the most popular software among researchers in various fields which provides solu-
tions for most of the problems. To facilitate the researchers, to date, 28 versions have been
released by the company. It has four types of files: (1) Data file, (2) syntax file, (3) output file
and (4) script file. Different functions are grouped and listed in the menu bar ribbon like File,
Edit, View, Data, Transform, Analysis, Graphs, Utilities, Add-ons, Windows and Help.
• R is an interactive programming language having most of the statistical and graphical func-
tions. It has more flexibility of data input and output.
• SAS is a very powerful programme having a number of modules for researchers of different
disciplines. It is a menu-based as well as a command-based programme.
• JAMOVI programme is an open-source software freely available on www.jamovi.org web-
site under the license AGPL3. It is available for all operating systems like Windows, macOS,
Linux and Chromebook.
• The application of AI in research can have its role right from identification of most relevant
topic to publication of research results in journals of high repute.
Suggested Readings
Burns, P. R., ‘Multiple comparison methods in MANOVA’, in Proceedings of the 7th SPSS users and co-
ordinators conference, Northwestern University, Evanston, IL, 1984.
Dixon, W. J., BMD biomedical computer programs, University of California Press, Los Angeles, CA, 1973.
Horton, N. J., and S. R. Lipsitz, ‘Review of software to fit generalized estimating equation regression
models’, The American Statistician 53: 160–169, 1999.
Knuth, D. E., The art of computer programming, vol. 3: sorting and searching, Addison-Wesley, Reading,
MA, 1973.
Lee, E., and M. Desu, ‘A computer program for comparing k samples with right censored data’, Computer
Programs in Biomedicine 2: 315–321, 1972.
Watts, D. L., ‘Correction: computer selection of size-biased samples’, The American Statistician 45(2):
172–172, 1991.
19 Writing Research Synopsis,
Thesis and Papers
19.1 Introduction
Documentation is an integral part of research. Right from preparation of the synopsis/research
protocol to periodical reports ending with the final report/thesis and finally publication of
research papers requires the skill for scientific documentation. Precision and clarity are two fac-
tors which enhance the utility of scientific writing. Research-related writings require a scientific
mode of thinking and application of scientific methods. Hence critical and creative mind is a
necessary requirement for any researcher to make his scientific writing more cohesive, precise
and concise.
DOI: 10.4324/9781003527183-19
Writing Research Synopsis, Thesis and Papers 205
19.2.1 Title/Topic of Research
The title of the research project should be brief but self-explanatory and depict the focal theme
addressed for research in ten to 12 keywords. It can also be in two parts. The first part consists of
keywords depicting the focal theme of the study and the second part a few qualifying words to
specify the special features of the study, for example, ‘Impact of Organic Agriculture on Income
and Employment in Rajasthan—A Rural Perspective’.
The aims and objectives must flow from the justification, background and scope of the study
explained previously.
• Aim is a concrete and definite statement highlighting the expected outcome of research on
the topic and the objectives are sub-activities having specific relevance and outcome in sup-
port of the overall aim.
• While the aim is normally a single statement revealing the ultimate goal of the study, there
can be three to four specific objectives each drawn in a sequential manner having a definite
outcome in support of aim of the study.
• Objectives are normally statement starting with infinitive form of verbs such as ‘to study’,
‘to assess’, ‘to examine’, ‘to explore’, ‘to relate’, ‘to correlate’, ‘to work out’, ‘to find out’,
‘to test’, ‘to search’, ‘to confirm’ and so on.
206 Research Methodology and Quantitative Techniques
• The aim is written as a concise statement depicting the ultimate goal/outcome of the study.
The objectives are to complement results/sub-outcomes in support of the aim or goal of
the study.
19.2.4 Review of Literature
• It is a very important part of a research synopsis which familiarizes the researcher and the
reviewers to the problem under study.
• It describes the work done by others at international, national and regional levels on the topic
or on similar topics during the recent period.
• It helps the researcher to understand the status of work carried out on related topics and to
identify the research gap to set his/her platform for the proposed research work.
• It also helps to know about difficulties faced by other researchers and the corrective steps
taken or modifications made by them. The researcher can anticipate similar or additional
problems during the study and the review of literature will help him/her in rectifying those
problems in advance.
• The review of literature in a synopsis needs to be latest and relevant to the topic. Literature
can be reviewed by using various scientific information-gathering methods. These are jour-
nals (national or international); publications and websites of research organizations; books;
computer-assisted searches; and personal communications with other researchers. The Inter-
net provides a vast opportunity for information gathering.
• Each review begins with the author’s name (reference year) and should reflect the aim of the
study, location of study, broad research method and design, major outcome, result and conclusion.
• (This section is normally written in past tense. Normally, five to ten most relevant and recent
reviews can serve the purpose in a synopsis.)
19.2.5 Research Methodology
• In a synopsis, the research methodology forms the core of the research project. The method-
ology should cover aspects like the study method and design, study settings, target popula-
tion and sampled population, sampling plans, sample size, inclusion and exclusion criterion,
variables under study, data collection, data analysis techniques to be used, ethical clearance,
etc. which the researcher intends to apply during the conduct of research.
• It gives a clear picture of the proposed research during the execution part including a tenta-
tive data analysis plan, statistical tools to be used, etc.
• (This Section is generally written in future tense.)
19.2.6 References
• All references quoted in the introduction, review of literature and anywhere else in the syn-
opsis should be listed here.
• There are various styles for writing references, like Vancouver style, Harvard style, Oxford
style and many others. Only one method as prescribed by the degree awarding university or
research institute will have to be followed by the researcher consistently.
• (Note: A synopsis must also contain the information like Name of the researcher and des-
ignation, name and designation of the guide, name and designation of head of department/
Writing Research Synopsis, Thesis and Papers 207
institution, name of the institution, signatures of all officials with seal. Generally, every insti-
tution has its own format for this purpose.)
Research: It is a systematic and scientific probe for NEW information, knowledge, concepts,
theories, techniques, technologies, inventions, processes, products or solutions to a problem.
The academic research has four stages: Planning (including synopsis preparation), execution
(data collection), data analysis (subject matter and statistical), report (thesis) writing and
publications.
Selection of Research Topic: It is vital for the timely and successful completion of research
work. The researcher must select a topic of interest to him/her as well as to the professional
society. Emphasis may be given to location-specific and problem-oriented topics. The fol-
lowing steps are suggested:
• Efforts need to be made to select a topic of interest to the researcher which is
problem-oriented and location-specific to the extent possible.
• Read and thoroughly understand various concepts and researchable issues of the selected
subject matter area from standard textbooks and research journals related to the subject.
• Shortlist a few tentative topics and eliminate those topics recently taken up in the depart-
ment to avoid duplication of research.
• Based on the FINER approach check the feasibility in terms of availability of cases, time,
resources, expertise; interest of the researcher; novelty of the topic; ethical issues and its
gravity; and relevance of the research topic in the contemporary context, select a topic for
research.
• Discuss the topic with the research supervisor, faculty members and peers in the depart-
ment to get clearance at the departmental level.
• Once the topic of research is cleared at the departmental level, start an extensive review of
research work done on similar topics at national and international levels from the latest to
backwards by searching in different journals of repute and by visiting websites of different
research institutes.
• Select about seven to eight latest published papers relevant to the identified topics and
read it thoroughly to understand the complexities of the topic, methodologies used, type
of analysis done, outcome of the research, present stage of research on the topic, research
gaps and emerging issues so as to proceed further with the synopsis writing including
modifications of the topic, if required.
Synopsis Preparation: The research synopsis is also called research proposal, research proto-
col, research outline, etc. A well-prepared synopsis is the road map of research and makes the
task of the researcher easy at various stages of research. A good synopsis will give maximum
information in minimum words. A well-conceived synopsis will go a long way in perceiving
the problem by the researcher and convincing the reviewer about the ability of the researcher
to conduct the project. In the cases of need for financial assistance, a well-documented syn-
opsis will help to consider the request favourably. Thus, research workers should make all
efforts to prepare a well-structured synopsis. A well-thought and well-written synopsis is
the crux of a scholarly research project which makes the task easy and systematic for the
researcher. Some tips for writing the research synopsis are summarized in Table 19.1
208 Research Methodology and Quantitative Techniques
Table 19.1 Tips for Research Topic Selection and Synopsis Writing
(Continued)
Writing Research Synopsis, Thesis and Papers 209
Table 19.1 (Continued)
7. Data Collection Format/ • A data collection format keeping in view the data needed to cover
Performa the aim and objectives of the study and to write the results and its
and Master Sheet discussion with facts and figures in the final thesis.
• No unwanted data should be collected, and no data required to
cover the objectives should be left out.
• A computerized master sheet for data entry may be developed
before data collection starts so that the researcher can enter the
data on Excel sheet when collected.
• A one-to-one correspondence may be ensured between data
format and master sheet.
• Introduction
• Aim and Objectives
• Review of Literature
• Materials and Methods/Methodology
• Results
• Discussion
• Summary and Conclusion
• Bibliography
The moderate size of a research thesis is a welcome fact. However, there is no strict norm on
the size of a thesis as it can vary according to the nature of the problem and also the scope of
research including research methods and design. Normally, one-third of the size of a thesis may
go for the introduction, review of literature and methodology chapters. The results and discus-
sion may also cover the remaining size almost equally. Wherever the abstract of the thesis is
mandatory it may be a one or two-page summary covering importance, objectives, methodol-
ogy, results and conclusion of the study.
19.3.1 Introduction
It is the enlarged version of the same section given in the synopsis with more facts and figures
on the gravity of the selected topic/problem in global, national and regional perspectives.
It may include efforts made elsewhere to tackle the problem and the gaps identified from
the already existing research. The topic of current research may be thoroughly justified as
an attempt to bridge the knowledge/information/process/gap. The aim and objectives should
have a perfect match with the justification and importance of the topic selected for the study.
The research hypothesis focusing on the major outcome of the research may also form a part
of the chapter on introduction. Normally past and present tenses are used in the write-up of
this section.
210 Research Methodology and Quantitative Techniques
19.3.2 Review of Literature
A classified review of literature based on the major objectives with appropriate sectional head-
ings is preferred. The reviews can be in support of or in contrast to the findings of the study.
A synthesized presentation of a review of the literature with a sectional summary at the end of
each section will add to the value of this chapter. Normally, a review starts with the author(s)
name, year of study brackets, theme of study, location and methodology used in brief, salient
findings/outcome of study, etc. Generally, the write-up of this chapter is in past tense.
19.3.4 Results
It is better to present the results of the study in separate sections in accordance with the objec-
tives of the research or research questions. The scientific way of presenting results using
well-structured tables, diagrams and graphs along with textual explanations is required. The
section-wise highlights of results at the end of each section may be meaningful. This chapter is
generally written in past tense.
19.3.5 Discussion
It is better to start with research questions as per the objectives and describe as how the results
enable to answer the specific and general questions posed in the study. The inter-objective anal-
ysis along with evidence in support or against the outcome of the study may be explained based
on studies conducted by other researchers in similar studies by citing proper references. This
chapter is also written in past tense.
Summary is recapitulation of the study results in a concise form. Conclusion is the affirmation
of the aim of the study with implication in precise form with recommendation emerging from
the study for future purposes. This chapter is generally written in past tense except the recom-
mendations which are generally in future tense.
It is better to spell out the limitations observed/experienced or assumptions made before and
during the course of the study as a caution.
Writing Research Synopsis, Thesis and Papers 211
19.3.8 Bibliography
A comprehensive list of materials referred to since the formation of the synopsis, discussing the
results, to drawing of final conclusion of the thesis may be presented at the end of the thesis in
accordance with the selected way of writing the bibliography.
The research thesis is to be prepared based on the plan of work as per approved synopsis only.
The writing of thesis gives a practical training to the research scholar to write technical reports
and papers which is very important for any professional research degree. Some tips for the
preparation of the research thesis are given in Table 19.2.
1. Topic • Short and self-explanatory, depicting the specific goal and aim of
research topic as per approved synopsis.
2. Introduction • Definition and background of the problem, origin, spatial
and temporal gravity/pattern, current relevance, emerging
researchable issues, already researched issues and outcomes
by others, research gaps, what is intended to be covered by the
researcher, research hypothesis, etc.
• This section should justify the topic selected for research.
• References, if any, taken from books and other published
documents must be cited then and there in brackets and be given
under bibliography/references.
• Past, present and future tenses can be used in the write-up as the
situation warrants.
• It may be extended generally to ten to 12 pages.
3, Aim, Objectives and • Must flow from the problem/justification explained earlier.
Research Hypothesis • Aim is a single statement highlighting the expected outcome/
research hypothesis (normally in present tense).
• Objectives reflect strategies for results and evidence to support
the main outcome of the research.
• Objectives are presented mostly in the infinitive verb form like to
examine, to explore, to correlate, to study, etc.
• Research hypothesis as a statement depicting the expected
conclusion in view of the aim of the study.
• It may cover one to two pages.
4. Methodology • It includes items like study population, type of research design
used, sample size and its calculation formula, inclusion and
exclusion criterion, data collection procedure as per subject
matter requirement and approved synopsis and a flow chart
showing the steps followed in collecting the data/information
from respondent is quite welcome.
• The scaling and scoring technique, if any used, for quantifying
qualitative data may be explained.
• Plan of data analysis like estimation of mean values or proportion,
standard deviation, confidence interval, coefficient of variation
(CV), correlation coefficient, test of significance like Z-test, t-test,
ANOVA, chi-square test, etc. with formula used may be explained.
• To be written in past tense only. It may span six to eight pages.
(Continued)
212 Research Methodology and Quantitative Techniques
Table 19.2 (Continued)
19.4.1 Introduction
The section on introduction or importance of the research paper must give a clear definition of
the theme of the study and also the issues addressed in the paper. It can include a brief descrip-
tion of global/national/regional significance of the theme of the study, studies conducted else-
where in a nutshell and the need for the study in its present form. The objectives of the study
must emerge from this section and the research hypothesis covered in the paper may also be
indicated here.
The materials and methods must spell out the research method and design applied for the study
and also the way it was performed. The study as quantitative or qualitative, basic or applied,
descriptive or analytical, prospective or retrospective may be mentioned. If it is a sample-based
study, the actual population sampled, the sample size, method of selecting the sample, the
method of data collection used along with the quantitative techniques used for data analysis
must be spelt out. If it is an interventional study, the description/profile of the experimental
group and control group may be given. The inclusion and exclusion criteria followed for the
final selection of the sample must be spelt out.
19.4.3 Results
The results are the outcome of research in quantitative or qualitative form supported with tables,
diagrams, graphs, charts, etc.
19.4.4 Discussion
The discussion part of a research paper must be more critical indicating what is special in the
outcome of the study as compared to other studies by giving references then and there. The
theoretical and practical importance of the findings is to be highlighted.
19.4.5 Conclusion
The conclusion is the most significant outcome of the study which can be used for policy inter-
ventions or for other follow-up actions.
19.4.6 References
The task of publication of research begins during or after a research work/project is completed.
Generally, a research project is teamwork by a team of researchers from the same discipline or
by a multi-disciplinary team. The idea of a research paper can come up from any one member or
from a sub-group of the research team. In any case, it is always advantageous to follow a series
of steps in writing research papers for early publication.
• Identify the Author(s): If the research work is done by any individual, he/she can be the
sole author of the research paper. In case any additional work is required to be done with
the help of someone else for the proposed paper such persons can also be co-author(s). But,
if the research paper is based on a research project carried out by a team of persons, then
the author can co-opt other members to form a team of co-authors based on their expected
contribution in finalizing the research paper.
• Planning: Advance planning for the research paper will enable the authors to include any
additional experiment/trial required for the paper to form part of the main project. It will save
time and extra work needed for the paper.
• Title and Abstract: Fixing a tentative title depicting the proposed content and coverage
of the research paper along with a tentative abstract will help to proceed with clarity and
time limit.
• Type of Publication: Any peer-reviewed journal can have different categories of papers such
as: (i) Full-length research articles on IMARD format—introduction, methods, results and
discussion, (ii) short communications or (iii) rapid communication. The length/size of each
of these categories will be as per norms of the respective journals.
• Selection of Journal: The selection of Journal for the publication of a research paper is nor-
mally done by adjudging (i) the focus area/subject matter of the journal (basic, theoretical,
applied, clinical, etc.), (ii) language (English, local language, Hindi, others), (iii) indexing
status of the journal (Scopus, Pub Med, Medline, Google Scholar, Index Copernicus, Open-
Gate, etc.), (iv) Availability of the journal (online/hard copy form, etc.), (v) reputation of
the journal (based on professional standing of the editorial board, impact factor, acceptance
rate, etc.), appearance of a published article (format as font size, font type, etc.), charges for
publishing a paper, time taken to publish an accepted paper, quality of figures, etc.
• Familiarize the Requirements and Norms of Selected Journal: The authors must care-
fully read and understand the instructions to the paper writers of the journal related to con-
tent, format and other guidelines. It will help to minimize the post submission corrections
and changes as well as early publication of the paper.
• Prepare the Outline of the Paper: The outline as the roadmap may be prepared before one
starts writing the paper and the tasks to complete the paper may be enlisted.
• Task Distribution Among Authors: The specific task to be carried out for completion of the
paper by the members of the team may be finalized in the beginning.
• Collection of the Materials: All related matters for preparation of sections like introduction,
methods, results, discussion, references, etc. may be collected and stored in different folders.
• Prepare the Tables, Figures and Legends: Prepare all tables, figures, etc. before you start
writing the paper.
• Prepare the First Draft: As per the task agreed upon by the authors, each author will carry
out the assigned task and the first draft of the paper may be prepared. There can be a mixture
of styles as well as gaps in the first draft written by different authors which needs unification/
correction as per the requirements of the selected journal.
Writing Research Synopsis, Thesis and Papers 215
• Revise the Manuscript: This can be done by any one author who is well-versed with the
content and context of the paper. The gaps will have to be filled and unification as per require-
ments of the journal may be completed. Wherever restructuring is needed to have a logical
flow and order, the same must be done. The required corrections for grammar and spelling
will have to be made. Finally, format the document to make the manuscript attractive and
easy to read.
• Check the References: The citation made in the text will have to be thoroughly checked
with references for correctness.
• Finalize the Paper Title and Abstract: Based on the final version of the paper the title and
abstract can be finalized.
Timely submission of papers in the required format as per guidelines of the journal is very
important for early publication. For referred journals, in case the comments of the referee
seek revision of the paper, it has to be done within the time limit for resubmission. All com-
ments of the referee will have to be attended meticulously before resubmission within the
time limit.
Indexing of the journal is made as a means for quality assurance of journals. It is the process of
listing journals by discipline, type of publication, etc. It is also known as bibliographic indexes
or citation indexes. It facilitates easy online searches of articles and acts as a means for informa-
tion retrieval for libraries. The inclusion of a journal in an indexed group of journals involves
rigorous scrutiny and assessment to ensure scholarly publishing standards based on factors like
the scope of the journal, its registration under International Standard Serial Number (ISSN),
regularity in publishing, competency of the editorial board, peer reviewing of articles before
publication, copyright licenses, etc.
Suggested Readings
Bero, L. A., R. Grilli, J. M. Grimshaw, E. Harvey, A. D. Oxman, and M. A. Thomson, ‘Closing the gap
between research and practice: an overview of systematic reviews of interventions to promote the im-
plementation of research findings’, British Medical Journal 317: 465–468, 1998.
Evans, D., and A. Pearson, ‘Systematic reviews: gatekeepers of nursing knowledge’, Journal of Clinical
Nursing 10(5): 593–599, 2001.
Mauch, J. E., and J. W. Birch, Guide to the successful thesis and dissertation: a handbook for students and
faculty, 5th ed., Marcel Dekker, New York, 2003.
20 Work Cited, References and
Bibliography
20.1 Introduction
When a thesis is written for the award of a research degree by an institute or a university or one
writes a research paper for publication in a standard research journal or a book for publication,
the author has to refer to many authentic and published information sources. Usually, such origi-
nal sources of quotation or paraphrasing are depicted then and there in the text, and it is known
as the in-text citation. It is necessary to give a list of such sources in a scientific manner at the
end of the thesis or the research paper or a book which enables the readers of the document to
go through the referred original sources and also helps the reviewers of the document to verify
any points of clarification while assessing the acceptance of the document for its publication
or acceptance of thesis for award of degree. The presentation of such sources at the end of the
documents is termed as a ‘list of works cited’ or ‘list of references’ or bibliography. It is a man-
datory requirement to make a document acceptable for academic publication. These terms are
sometimes used as synonymous to each other despite some definite differences among them.
In-Text Citation: As far as in-text citation is concerned the major styles include numerical
numbers as superscript, numerical numbers in parentheses, author name and year in paren-
theses, footnote citation at the bottom of the page, author and page number and so on.
List of Work Cited: While writing or composing a research thesis or a research paper the
authors have to refer to a number of already published sources by others to substantiate their
statements or results either by quoting or by paraphrasing the same. A list of all such sources
each starting with the author’s name and arranged alphabetically or chronologically is called
a list of works cited. Normally a double space is left between each work cited. A hanging
indent is used for the first line of each source implying that the first line begins at the left
margin of the page and subsequent lines follow uniform spacing for all sources.
List of References: A reference list is similar to a list of works cited meaning that only cited
sources in the main document are listed at the end of the document. Standard format for the
reference list is followed by different publishing groups. Hence different styles of preparing
reference lists are available.
Bibliography: It is a list of all the sources of information or published works consulted by the
authors in preparing the document (a thesis, a book or a research paper) irrespective of cita-
tion in the text of the document. In general, the elements of a bibliography include the author’s
name, title of the article, title of the publication, date of publication, the place and name of the
publisher, the volume number, issue number and relevant page numbers referred. Different
publishers have their own suggested style of writing the bibliography. The comparison of the
reference list/list of work cited, and bibliography and features are given in Table 20.1.
DOI: 10.4324/9781003527183-20
218 Research Methodology and Quantitative Techniques
Table 20.1 Comparison of Features of Reference List and Bibliography
• Authors’/editors’ name.
• Title of article/book/publication.
• Name of journal/source in which published/publisher.
• Year of publication.
• Volume and issue number.
• Page numbers covered in the journal.
Work Cited, References and Bibliography 219
The sequence, pattern and format are different for various styles of presenting references. Some
online journals provide a digital object identifier (DOI) also for each article which is a string
of numbers, letters and symbols to uniquely identify an article or document along with a per-
manent web address (URL). It will help the readers to locate the original document from the
reference/citation.
Here it may be important to compare the features in a reference as compared to a bibliography.
• References are the sources that the paper writer has cited in the paper, but the bibliogra-
phy includes the sources that the researcher has used irrespective of their citation in the
document.
• The references and bibliography appear at the end of the documents.
• There are various styles for the text citation of a document.
• A bibliography generally covers information like the author’s name, title of the work, name
and location of the institutional publisher, date/year of publication and the page numbers of
the article in the journal to help readers to track the original material.
• A reference consists of the previous information for the sources cited within the text of
the paper.
• The references are arranged chronologically or alphabetically based on the name of the
first author.
• The order of giving the information and the punctuation marks followed in different styles
are important for both reference and bibliography.
• The citation methods can be different for different journals as per the standard style of cita-
tion and writing references and bibliography.
The information for each reference or bibliography may include author(s), title of article/book,
name of Journal/publisher of book, year of publication of book or journal, volume and issue number
of the journal, page numbers covered for the referred article, etc. For edited books the chapter title
and author(s) of the chapter may also be included. For online journals, the digital object identifier
(DOI) for each article which is a string of numbers, letters and symbols to uniquely identify an arti-
cle or document along with a permanent web address (URL) may also be given. For each style there
220 Research Methodology and Quantitative Techniques
is a unique way of presenting the citation and also in the sequence of giving required information
under references/bibliography, the capitalization of words in the title, name of journal and book font
size, font type, line spacing, etc. Some of the main styles and their features for the in-text citation,
reference list/bibliography are given next. For each article/document, any one of these styles is to
be followed and it is advisable for the author(s) to get familiarized with required style for different
sources of citation/reference by going through the online guides available on the internet.
20.5.1 APA Style
It is the style followed by the American Psychological Association. It is used for social sciences
and behavioural sciences publications. APA guides are available for the citation method and
writing references for the users.
APA style follows author/date method of citation where author’s last name (generally surname)
followed by the year of publication is inserted at the appropriate place in the text of the paper.
The references given at the end of the paper must include all the information needed to locate
each source.
The reference list at the end of the paper includes only those references cited in the text of the
paper arranged in alphabetical order by the surname of the first author. APA references include
information about the author, publication date, title and source. The sequence of information
with respect to each reference, the spacing, the punctuation marks, capitalization underlining,
etc. is to be followed as per guidelines. The reference materials may include authored and edited
books, journal papers and articles, newspaper articles, magazines, microforms, audiovisual
media, electronic media, internet sources, etc. The format for writing reference as per APA style:
• Authors’ name: Authors’ full name beginning with last name followed by first name and
middle name with space in between names.
• Publication date: (year, month, date (2023, April 30)).
• Article title: The first word of the title and proper names appearing in titles only are
capitalized.
• Periodic title: The title of the periodical is given in italics with all major words in capital
letters.
• Volume/issue: When the volume number alone (without issue number) is given, it is in
italic form and when the issue number is available, it is given in parentheses after the
volume number, both in normal form.
• Page range: The first and last pages of the article in the referred periodical (source) are
given as the page range without ‘p’ or ‘pp’.
• DOI: It is the unique digital object identifier for online journal articles, if available.
The APA guidelines available online may be referred for writing citations and making refer-
ence lists based on different reference sources like books, periodicals, journal papers and arti-
cles, magazine articles, newspaper articles, electronic sources, etc.
Work Cited, References and Bibliography 221
20.5.2 AMA Style
The AMA style for citation and referencing was devised by the American Medical Association
(AMA). This method is widely used in the field of medical sciences. It follows the reference
style starting with the author’s last name and initials, source title, information about the pub-
lisher followed by publication date. No indent is required if any list of references extends to the
second and subsequent lines. There are specific guidelines for different types of sources.
20.5.2.1 Citation Method
The in-text citation under AMA style is made by using superscript numerals where the number
is the serial number in the list of references given at the end of the document. The reference
number should be given after the fact or quotation, or idea cited in the text.
Example: Significant differences in the level of immunization of infants according to their
gender were reported by Varghese et al.5—when the article on gender effect on immunization
by Varghese and others stands cited as the 5th one in the order of sequence of citation in the text
made by someone on immunization of children. When there are multiple consecutive sources
(say references two to four) to be cited at one place it may be cited as2–4 at the appropriate place
in the document. When the references of a cited statement in the document are not consecutive
it may be mentioned with serial numbers of reference sources separated with commas like4,6,8.
20.5.2.2 Reference List
A numbered reference list is given at the end of the document. The reference list must include all
the cited sources in the sequential order of citation in the text meaning the list starts with the first
citation in the text. The purpose of references here is to acknowledge the author who has done the
original work and to facilitate the readers with more information on related work. In papers hav-
ing multiple authors, all the authors are responsible for having the combined reference list as per
stipulated norms. Only primary sources and references fully read by the authors are to be included.
Each reference starts with the author’s surname followed by initials without periods. When there
are six or less number of authors, all names are to be mentioned. If there are more than six authors,
the first three are individually mentioned followed by ‘et al.’ The title of the article is to be retained
as appears in the original title (spelling, abbreviation, capitalization of words, numbers, etc.). The
name of the journal as per the National Library of Medicine (NLM) abbreviation may be used for
different journals. The sequence for each reference is author(s), title of paper, journal name, year,
volume (issue) and page numbers. The reference list may include journal articles including online
(with URL, with DOI and published ahead of print), book chapters (print and online), books (online
and print), websites, monographs, government/organization reports (print and online), patents,
materials accepted for publication and conference presentations (print and online). All the refer-
ences are numbered numerically in the order in which they appear in the citation of the document.
For more details about the use of capitalization, punctuation marks, font type, etc. the AMA
Manual may be referred.
20.5.3 MLA Style
This method of citation and referencing was developed by the Modern Language Association. It gives
details for text citation in the body of the paper and preparation of a list of ‘Works Cited’ at the end
of the document. This method of citation and referencing is widely used in the arts and humanities.
222 Research Methodology and Quantitative Techniques
20.5.3.1 Citation Method
The citation in the text is made by giving the surname of the first author followed by the page
number in parenthesis without any punctuation between the two.
The reference list is given under the heading ‘Works Cited’. It is a list of all sources from where
the ideas and information have been taken to prepare the manuscript. The list is prepared in alpha-
betical order by the surname of authors in sequence. The material in each reference will include
the author (surname), title of source (paper/article), title of container (name of journal/newspaper,
other contributors (editor/director), version (edition), number (volume/issue), publisher (agency
responsible for publication), publication date, location (location or page number). The work cited
may include books, chapters in book, journal articles, films, DVDs, video recordings, websites,
artworks, illustrations, etc. For more details, the MLA Quick Guide may be referred.
20.5.4 IEEE Style
IEEE style is followed mainly in Electronics, Electrical Engineering and Computer Sciences
disciplines. There are modified versions of IEEE such as IEEE-Pervasive Comp, IEEE Micro
and IEEE ACM Trans Network.
20.5.4.1 Citation Method
Citation is made when writing academic documents to acknowledge any sources used by the
writer. IEEE citation style is numeric with numbers in parenthesis as per references numbered
at the end of the document. The citation number is given in square brackets at the appropriate
place in the text immediately after the cited matter with a space but without any punctuation.
Once a source is cited with a number the same number is to be used at subsequent places where
the same source is used.
20.5.4.2 Reference List
The references are given in numeric order at the end of the text in the same order those are
cited in the text as per IEEE formatting guidelines. As per these guidelines, the list should
be formatted by aligning references left, using single space for each entry and double space
between entries. The reference number is given in square brackets at the left margin. The refer-
ence materials may include books, book chapters, electronic books, journal articles, E-journal
articles, conference papers, reports, patents, standards, thesis/dissertations, datasheets, online
documents, websites, etc. The IEEE guidelines for each of these may be followed while com-
posing the references. The sequence of the IEEE reference list is: Author initials, last name, the
title of the source, place/city, publisher, publication date and DOI.
20.5.5 Harvard Style
It is one of the oldest referencing systems developed by Edward Laurens Mark in 1881 who
was a zoologist at Harvard University. The guidelines for the Harvard style of referencing are
available for both in-text citations and full references at the end of the document. It is a widely
accepted style of referencing used across many subjects.
Work Cited, References and Bibliography 223
20.5.5.1 Citation Method
The in-text citation in this method follows the author-date system which includes the author’s
name and publication date of sources referred by the researcher. The in-text citations with only
one author include author(s)/editor(s) name, year of publication and page number(s). When
there are two or three authors the surnames of all authors with year of publication and page
number(s) are to be given under the in-text citation. However, when there are four or more
authors, the first author’s surname followed by ‘et al.’ may be used.
20.5.5.2 Reference List
The reference list covers all sources referred to while preparing the document showing informa-
tion like author, date of publication, title, etc. on separate page(s) at the end. It must be organ-
ized alphabetically by author or source title when there is no specific author. When multiple
works of the same author are cited, then the list is ordered by dates. Each reference must be
double-spaced with a blank space between the lines. All in-text citations must be given under
reference which may include books, edited books, chapters in the edited book, CD-ROMS,
e-mails, interviews, journal articles, newspaper articles, videos, DVDs, Worldwide webs, etc.
The online guide for the reference list and in-text citing as per Harvard style may be used for
each of these sources.
20.5.6 Vancouver Style
It was developed in Vancouver (Canada) in 1978 by the editors of medical journals. The Interna-
tional Committee of Medical Journal Editors (ICMJE) recommendations include guidelines for
the preparation and formatting of the manuscript for submission to biomedical journals for pub-
lication. These guidelines are followed by thousands of biomedical journals across the world.
In-text citation is made using numerical numbers in the parenthesis which are placed after the
relevant part of the sentence. The same reference number is used for repeated citation of the
reference. Based on the sequences of citation, the references are numbered serially. Place same
reference number in parentheses throughout the text, tables and legends of a document.
20.5.6.2 References List
The list of reference is prepared and given at the end of the document based on the sequence of
citation numbers in the text matter. Some of the important points related to the reference list as
per Vancouver style are as follows:
• Each reference comes only once even when the same is cited a number of times in the text.
• Spacing within reference is single and between references is double.
• For research paper in journals, the sequence of writing the reference is name of author(s),
title of paper, name of journal (the name of journal may be given in standard abbreviated
form of the name of journal) followed by a period and space, year of publication with a semi-
colon, volume and issue number (in parentheses) followed by colon and page range of paper
with period (like if page number is from 241 to 248 it is written as 241–8).
224 Research Methodology and Quantitative Techniques
• For books, the sequence of writing the reference is name of author(s), title of book, edition,
place of publication, name of publisher and year of publication.
• In the case of edited books, editor’s name may be given in place of author’s name followed
by word editor(s).
• For edited book with chapters written by different authors, the sequence of writing the
reference is name of author(s) of referred chapter, chapter title followed by ‘In:’, editor(s)
name, comma, space, word editor(s), period, book title followed by period, edition num-
ber followed by period, place of publication with colon, publisher, if copyright, put ‘c’
and year of publication without space else year of publication, period and page range of
the referred chapter.
• Pattern of writing name of author(s): Last name of each author appears first, followed by
space then initials without space and period, put comma and space between the names up to
six authors and a period at the end of last author. If number of authors exceeds six, ‘et al.’ is
used after the sixth author. In case of edited books, name of editors followed comma, space
and word editor(s) in the sequence.
• Patter of writing title: Capitalize only the first letter of the first word of the title and remain-
ing part of the title is written in lower case except proper name. If the book is a revised edi-
tion, it is shown after the title as ‘kth ed’.
• Patter of writing publication information about book: After the title of book (edition if
applicable), place a period and space and city name of publication with colon. Give the name
of publisher followed by semicolon. Give year of publication followed by a period. If publi-
cation has date of copyright, give the year of copyright preceded by letter ‘c’.
Following are the examples for writing references for authored and edited books and journals
which are commonly used by the researchers:
Book:
Daniel WM. Biostatistics: basic concepts and methodology for the health sciences 9th ed. New
Delhi: John Wiley & Sons, Inc; c2010.
Edited book:
Upadhyay R, Solanki D. Women empowerment in Thar desert region. In: Varghese Nisha, Burark
SS, Varghese KA, editors. Natural resource management in Thar desert region of Rajasthan.
Switzerland: Springer Nature; c2023. p 235–47.
Journal:
Srivastava AK, Sud UC, Chandra H. Small area estimation-an application to national sample
survey data. J. Ind. Soc. Agril. Statist. 2007; 61(2): 249–54.
The reference material may also include electronic journal article, electronic books, dic-
tionary, online references, Wiki entireties, newspaper articles (paper and electronics), website
pages, streaming videos, electronic images, poster presentation in conferences, etc. For writing
references related to any of the aforementioned sources, it is suggested to refer the online guides
available for writing references according to Vancouver style.
20.5.7 Bluebook Style
The Bluebook style is used in the American legal profession. This style is followed mostly in
legal writings in the United States by legal professionals. As far back as 1925 it was only an
eight-page booklet for Harvard Law and now it is a three-volume manual. It is designed in such
a way that the reader can easily find the cited sources.
Work Cited, References and Bibliography 225
20.5.7.1 Citation Method
The footnote citation of sources at the bottom of pages may include author(s), title, pub-
lisher, year, page/section number.
20.5.7.2 Reference List
The Bluebook reference list is known as the Bluebook bibliography which is an alphabetically
prepared list of all cited sources with the author’s name or in its absence by the initial word
of the title in a legal document. Every entry in the list may cover the author(s) as appeared in
original source, title in sentence case formatting with initial word and proper nouns capitalized,
publication information like publisher, place of publication, date of publishing, volume, issue
number, page number, etc.
20.5.8 MHRA Style
It was developed by the Modern Humanities Research Association for use in the arts and humanities.
20.5.8.1 Citation Method
According to this style, the sources are cited in foot notes of corresponding pages of the text and
marked by superscript numbers in the text.
20.5.8.2 Reference List
The reference list at the end of the text includes all sources alphabetically in order by authors’
last name.
20.5.9 OSCOLA Style
The Oxford Standard Citation of Legal Authorities (OSCOLA) is a reference and citation style
developed at Oxford University.
20.5.9.1 Citation Method
OSCOLA follows a footnote citation system. At the end of a sentence seeking a citation a
numeric number is given as superscript after punctuation.
20.5.9.2 Reference List
The complete references corresponding to all superscript numeric citations appearing on a page
are given at the bottom of those pages as a footnote.
20.5.9.3 Bibliography
The bibliography is given at the end of the document for all sources cited and referred to on dif-
ferent pages. This list is prepared by type of sources and alphabetically based on the first name
of the author under each type.
226 Research Methodology and Quantitative Techniques
20.5.10 Chicago Style
20.5.10.1 Citation Method
Chicago style has an author-date way of citation in the text. Author(s) surname and year of
publication are given in brackets at appropriate places in the text irrespective of the source of
referencing. Another way of citation is ‘notes and bibliography’ and in this style, a superscript
number is given at the appropriate place in the text corresponding to the serial number in the
bibliography. This style is practiced in literature, history, arts, etc. whereas author-date style is
followed in sciences and social sciences. All the in-text citations must be included in the refer-
ence list. For in-text citations, a uniform pattern is followed irrespective of sources of citation as
books, journal articles, internet documents, etc.
20.5.10.2 Reference List
The full details of the sources of in-text citations required to search those are given at the end
of the document as a reference list. The reference list is prepared alphabetically by surname of
the first author.
20.5.11.1 Citation Method
The ACS style of in-text citation follows three methods—by superscript numbers, by numbers
in parentheses and by author name and year of publication.
20.5.11.2 Reference List
The reference list under ACS style is given at the end of the document according to the order
those appear in the in-text citation. If a reference is cited more than once in the text, the same
number is used throughout the text.
• The way of showing borrowed or perceived facts and ideas from other publications while pre-
paring the text of a scientific document then and there by author(s) is called in-text citation.
• List of works cited or a reference list is the list of all publications and sources that the
author(s) used to prepare the document and cited somewhere in the text of the document.
• Bibliography: It is the list of all the sources of information or published works consulted by
the authors for preparing the document (a thesis or a research paper) irrespective of citation
in the text of the document.
• There are various standard styles for different disciplines/subjects to make in-text citations
and to make lists of references or bibliographies to documents like published research papers
in journals, thesis and other such documents. These include styles like American Psychologi-
cal Association (APA), American Medical Association (AMA), Modern Language Associa-
tion (MLA), Institute for Electrical and Electronics Engineers (IEEE), Harvard, Vancouver,
Work Cited, References and Bibliography 227
Bluebook, Modern Humanities Research Association (MHRA), Oxford Standard for Cita-
tion of Legal Authorities (OSCOLA), Chicago style, American Chemical Society (ACS)
style, etc.
Suggested Readings
Alves dos Santos, E., S. Peroni, and M. L. Mucheroni, ‘An analysis of citing and referencing habits across
all scholarly disciplines: approaches and trends in bibliographic references and citing practices’, Journal
of Documentation 67(6), https://libguides.reading.ac.uk
Pears, R., and G. Shields, Cite them right: the essential referencing guide, 12th ed., Palgrave Macmil-
lan, 2022.
University of Reading Lib Guides, ‘Different styles and systems of referencing—citing references’, https://
libguides.reading.ac.uk
Williams, R. B., ‘Citation systems in the biosciences: a history, classification and descriptive terminology’,
Journal of Documentation 67(6): 995–1014, 2011.
21 Ethics in Research and Publications
21.1 Introduction
Ethics implies a set of rules applicable to self and others for formation of a better society. Ethics
relates to the value system of our society that covers the consensual agreement on what is right and
what is wrong. It is much more than what is legislatively defined as legal and illegal. The ethical prin-
ciples are important to maintain harmony within and between all sections of society. The scientific
community will also have to address and resolve ethical problems in their efforts to make a better
society. They, as a group, use members of other social groups to arrive at new knowledge and infor-
mation, especially in activities like research in different areas. In order to avoid any confrontation
between these groups, adherence to ethical norms is vital. An indifferent attitude to these problems
may result in crossing ethical barriers, during the phases of implementation of such activities.
Ethical codes are pre-conceived ideas for the behaviour of people as individuals and as a
society while doing activities for the benefit of the society. They are moral statements that can
be applied to particular situations to help us make decisions and guide our behaviors. Those are
linked to cultural values at a defined time in our history and are subject to change as attitudes
and values change. What is well thought out to be insensitive today can be normative, just a
half-century ago or after. In doing research there may be a conflict between the speedy conduct
of a study by the researcher and the trouble of doing what is deferential to humans or even ani-
mals. Research ethics include guidelines for the conduct and dissemination of research results.
When research is based on human subjects, relevance of ethics is more vital. The ethical issues
in research could be perceived as:
DOI: 10.4324/9781003527183-21
Ethics in Research and Publications 229
principles of conduct in research may include moral regulations related to privacy of data/infor-
mation, safety of the participants, anonymity of participants, confidentiality of data, avoidance
of bias at all stages, respect for respondents, maximization of benefits to participants etc. in addi-
tion to informed pre-consent of participants for voluntary participation and freedom to withdraw
from the proposed research at any stage without any ill will from the researchers. Besides, the
data must be interpreted honestly without distortion by the researcher. The participants must
have a due share of ownership and any benefits accruing from the research.
Researchers are focused on knowledge expansion and on the methodology of their projects
including personnel and equipment, statistical analysis, selection of subjects, research proto-
cols, sample size and many other technical aspects. At the same time, they must try as much
as possible to respect the research environment, which requires attention not only to physical
resources including funds, but also to animals or dignity of human subjects used as subjects
of study.
Ethical considerations in research may help to decide whether a particular research is to be
done, and if it is so, how it should be pursued. Thus, it is vital to be reflective, transparent, sin-
cere and adhere to ethical guidelines in regard to research subjects.
Research involves the systematic process of collecting and analyzing data to increase our
understanding of the phenomenon under study. It is the duty of the researcher to contribute to
the understanding of the phenomenon and to communicate that understanding to the benefit of
the society.
Ethics refers to both morals and beliefs, ‘beliefs about what is right and what is wrong’. Ethi-
cal issues in research can be raised at all phases of research including the problem definition,
stating research objectives/hypotheses, literature review, choice of research design, question-
naire design, data collection procedures, data editing and cleaning, choice of statistical methods,
data analysis, conclusions and recommendations, and even referencing. Some cases of unethical
research are often associated with particular research methods, such as disguised observation
and deception in experiments. Hence ethics is relevant at every stage of the research.
Ethics is now an essential part of any research project and is just another stage of research
seeking adherence to ethical norms. Nowadays, doing ethical research is essential to produc-
ing relevant and successful findings. As such, researchers’ ethical conduct is currently being
scrutinized like never before. In today’s society, any concerns regarding ethical practices will
negatively influence attitudes about science, and the abuses committed by a few are often the
ones that receive widespread publicity. Clearly, researchers have liabilities to their line of work,
patrons and respondents and are obliged to have high ethical standards to make certain that both
the purpose and the information are not brought into ill repute.
As a branch of philosophy, ethics deals with the dynamic of decision-making concerning
what is right and what is wrong. Research ethics includes requirements on daily work, the
protection of the dignity of subjects and information in the research that is being made known.
A researcher can be a research scholar, psychologist, educationist, economist, sociologist,
medical doctor or anthropologist. His primary responsibility is to help protect participants and
the aim should be clear. Wherever the informed consent ought to be obtained, protection of
participants from any sort of harm must be ensured and privacy of information should be main-
tained. Some of these concepts are discussed.
• Don’t tamper with the study’s natural environment. More importantly, participant and
non-participant observations are essential to qualitative research and are widely employed in
the domains of sociology, anthropology and education. But each raises different moral ques-
tions about deceit, consent and privacy.
Ethics in Research and Publications 231
• Privacy of information supplied by the study participants in group discussion and other quali-
tative methods is a major problem. The participants in research based on qualitative methods
may face problems like criticism, ill-will, counter arguments, etc. from family members and
society as well.
The quantitative and qualitative approaches differ from one another; neither is better than the
other; rather, both have acknowledged advantages and disadvantages and are best applied in tan-
dem. Identifying and trying to make sense of the conflict that exists between researchers about
quantitative and qualitative research might help to generate new and distinct lines of inquiry.
The ethical issues in research consist of two major aspects. Firstly, ethical issues related to the
conduct of research and secondly, ethical issues pertaining to publications.
• Protection of dignity, integrity, life, health and right to self-determination of each person/
animal.
• Privacy and confidentiality of all personal information.
• Biomedical researchers must consider the regulatory, legal and ethical norms and standards
for research based on human beings and animals.
• Biomedical researchers must ensure minimum harm to the environment.
• Research on human beings and animals must be carried out with full adherence to ethical and
scientific norms.
• A physician combining medical research with medical care of any person must ensure that
the participation of the patient would in no way adversely affect his/her health.
• Research at the cost of risks and burdens of human subjects must be avoided.
• Risks and burdens of human subjects as a result of his/her participation in research must be
continuously monitored by the researcher.
The Declaration of Helsinki, which includes ethical principles for research involving human
beings and on identifiable human material and data, was developed by the World Medical
Association (WMA) as a statement of ethical standards for medical research involving human
beings. The declaration is primarily targeted to physicians, in line with the WMA’s mandate.
The WMA calls on other parties engaged in medical research involving humans to follow these
guidelines. The International Code of Medical Ethics states that ‘A physician shall act in the
patient’s best interest when providing medical care’, and the WMA’s Declaration of Geneva
obligates doctors by stating that ‘The health of my patient will be my first consideration’. The
responsibility of the doctor is to advance and protect the rights, wellbeing and health of patients,
including those involved in medical research.
232 Research Methodology and Quantitative Techniques
Medical researchers have an obligation to safeguard patients’ lives, health, integrity, dignity
and right to self-determination. Even when they have granted agreement, the onus of protecting
study participants must always be on the doctor or other healthcare providers, not the subjects
themselves. When doing research with human subjects, doctors must consider both applicable
national and international standards and guidelines. No ethical, legal, or regulatory obligation,
whether national or international, shall lessen or eliminate the ethical considerations stated in
the WMA Declaration. Medical research should be carried out in a way that limits potential
environmental damage. Only those with the necessary ethical standards, scientific knowledge,
training and credentials should undertake medical research on human subjects. Research involv-
ing both healthy and sick participants must be carried out under the guidance of licensed medi-
cal experts. When a study has the potential to be preventive, diagnostic or therapeutic, doctors
who integrate medical research with patient care should only consent to have their patients par-
ticipate in it if they have good reason to believe that their participation won’t have a detrimental
effect on their health. It is essential to provide fair compensation and treatment to participants
who get harmed as a result of their participation in the study.
In medical practice and research, risks are associated with most procedures. In medical research,
human subjects are only employed when the advantages of the study justify the risks and difficul-
ties they face. A thorough assessment of the known hazards to the persons and groups participat-
ing in the study, as well as the known benefits to them and other individuals or groups who may
be impacted by the condition under examination, must precede any medical research involving
human subjects. It is important to reduce the risks involved. The risks need to be routinely observed,
assessed and documented by the researcher. Doctors are not allowed to take part in a study that uses
humans as subjects unless they are sure of handling and managing the associated risks. Physicians
must decide whether to continue, amend, or abruptly halt the study when it is determined that the
hazards outweigh the possible benefits or when there is undisputed evidence of definitive outcomes.
21.5.3 Vulnerable Groups
Some people and groups are more susceptible than others in getting harmed or injured dur-
ing an intervention. Such vulnerable groups and people should be given special consideration
for protection. Research on vulnerable groups can be justified only when it is not relevant for
non-vulnerable groups and is in response to the health needs or goals of the vulnerable group is
it justified. This group should also stand to gain from any knowledge, procedures, or treatments
that come out of the research.
Medical research involving human beings as subjects must adhere to commonly recognized
scientific standards and should be supported by competent laboratory and, wherever neces-
sary, animal testing, as well as a complete understanding of the scientific literature and other
pertinent sources of information. Respect must be shown for the wellbeing of animals used in
the research. A research protocol outlining and justifying the investigation’s design and conduct
is required for every study involving human beings.
• An explanation of the ethical issues raised and an indication of how the declaration’s tenets
have been taken into account should be included in the protocol.
Ethics in Research and Publications 233
• The protocol should also include information on funding, sponsors, institutional ties, any
conflicts of interest, subject incentives and procedures for treating and/or compensating
study participants who sustain injuries as a result of taking part in the research. The protocol
for clinical trials must also provide suitable preparations for post-trial provisions.
Prior to beginning of any study in medical sciences, the research protocol needs to be submitted
to the appropriate research ethics committee for assessment, approval, comments and guid-
ance. This committee needs to function transparently and be independent of the sponsor, the
researcher and any other unethical pressures. Along with any relevant international norms and
standards, the rules and regulations of the country/countries in which the study will be done
must also be considered.
The committee must have the authority to keep an eye on ongoing research projects. The
committee must receive monitoring data from the researcher, including details regarding any
significant adverse events. The committee must review and approve any changes made to the
protocol before they may be implemented. Following the completion of the study, the investiga-
tor is required to present a final report to the committee that includes an overview of the results
and recommendations.
21.5.7 Informed Consent
In medical research, it is crucial that participants can give informed consent to participate volun-
tarily. No one who can give informed consent may be involved in a study unless they willingly
agree, yet it is also appropriate to consult with relatives or local authorities.
All participants who are able to give informed consent to participate in medical research
involving human subjects must be fully informed about the goals of the study, the methods, the
sources of funding, any potential conflicts of interest, the researcher’s institutional affiliations,
the expected benefits and risks, any potential discomfort, the post-study provisions and any
other relevant study details. The prospective participant must be informed of their freedom to
refuse to take part in the study or to withdraw consent at any time without having to face any
consequences. Consideration should be given to the ways in which the information is delivered,
as well as the particular information requirements of each possible subject.
Once it is ascertained that the subject has understood the information conveyed, the physi-
cian or any other trained person must then obtain the informed consent of the subject, preferably
in writing. The non-written consent needs to be officially recorded and witnessed if it cannot
be communicated in writing. Every participant in medical research should have the choice to
learn about the overall goal and findings of the investigation. A physician must use extra caution
when obtaining informed permission from a potential subject for research study participation if
the individual is in a dependent relationship with the physician or may consent under coercion.
In these circumstances, a suitably qualified person who is totally unrelated to this relationship
must request informed consent.
Research involving participants incapable of providing informed permission due to medical
or mental conditions (e.g., unconscious patients) may only be conducted if the research group
234 Research Methodology and Quantitative Techniques
includes members who are required to have the relevant physical or mental condition. In these
situations, the doctor has to ask the legally appointed person for informed consent.
The doctor must thoroughly explain to the patient any aspects of their care that are related
to the research. A patient’s refusal to participate in research or to opt out should never have a
detrimental effect on the patient-physician relationship.
Before gathering, keeping, or exploiting identifiable human material or data for medical
research, doctors must have informed consent. Research using information or materials from
biobanks or other similar repositories is included in this. Obtaining consent for this sort of study
may not be possible or viable in some extremely rare cases. Under these conditions, research
can only be carried out with the approval and assessment of a research ethics committee.
Regarding the publication and distribution of research findings, there are ethical responsibilities
on publishers, editors, sponsors, writers and researchers alike. Researchers are responsible for
the accuracy and completeness of their findings and have an obligation to make the results of
their work with human subjects available to the public. Everyone involved should follow rec-
ognized standards for moral reporting. Both good and inconclusive results need to be disclosed
or otherwise made available to the public. Publication requirements include disclosing funding
sources, institutional affiliations and conflicts of interest.
21.8 Publication Ethics
The success of academic publishing largely hinges on trust. Readers trust the peer-review
process; authors trust editors to choose qualified peer reviewers; and editors trust peer
reviewers to offer unbiased assessments. Strong intellectual, commercial and occasionally
political interests are present in the academic publishing environment, and these interests
may conflict or compete with one another. A sustainable and effective publishing system will
be fostered by wise decisions and robust editorial processes intended to manage conflicting
interests. Academic societies, journal editors, authors, research funders, readers and publish-
ers will all gain from this system. Ethical publication practices do not emerge by accident;
they must be actively pushed in order to gain traction. The following are the fundamentals
of publication ethics:
Ethics in Research and Publications 235
21.8.1 Transparency
Funding sources for studies and publications must always be made available. This should be
made clear in editors’ editing policies. All publications that authors prepare for publication
should routinely include information regarding research funding. When one is able, a clinical
trial registration number ought to be mentioned.
21.8.2 Authorship Acknowledgement
According to The International Committee of Medical Journal Editors (ICMJE) authorship cri-
teria, authorship credit should be based on:
When writing a research paper, authors should indicate if they had full access to the study data
used to support the publication. Not all contributors can be considered writers, but they should
still be listed in the acknowledgement and their specific roles explained. When gathering author-
ship data for research papers, the choice of authorship ought to be made at the outset of the
investigation. Editors are not responsible for deciding on authorship. Editors ought to require
clear and comprehensive credits for all authors who have contributed to a work.
Editors ought to implement suitable mechanisms for notifying contributors about authorship
standards (if applicable) and/or acquiring precise data regarding individual contributions. Edi-
tors should request that authors include with their initial submission package a statement attest-
ing to the following: that all persons listed as authors have fulfilled the necessary requirements
for authorship, that no one who is eligible for authorship has been left off the list, that contribu-
tors and funding sources have been duly acknowledged, and that contributors and authors have
approved the acknowledgement of their contribution.
In cases where many authors report on behalf of a larger group of investigators, the International
Committee of Medical Journal Editors suggest that ‘the group should identify the individuals
who accept direct responsibility for the manuscript when the work has been conducted by a
large, multi-center group’. These people ought to fulfil all of the requirements for authorship
listed above. When submitting a work with many authors, the lead author must specify the pre-
ferred citation style and explicitly include each individual author along with the group name.
The members of the wider authorship group should be listed as an appendix to the acknowledge-
ment of the individual writers who take direct responsibility for the manuscript.
Journals ought to request that authors certify that the institutional review board or research Eth-
ics committee in question has authorized the work they are submitting. Manuscripts involving
human subjects must state that the experiments were conducted with the participant’s knowl-
edge and proper informed permission, if any.
236 Research Methodology and Quantitative Techniques
In cases where there is uncertainty about whether the proper procedures have been followed,
editors should retain the authority to reject publications. When a study is submitted from a
nation without an institutional review board, ethics committee, or other comparable review and
approval process, editors should rely on their own judgement when determining whether or
not to publish the work. If it is decided to publish a manuscript in these circumstances, a brief
explanation should be provided.
The accuracy of the data is a prerequisite for the validity of the research. Falsification and fabrica-
tion are significant problems in scientific ethics because they cast doubt on the accuracy of data.
The act of removing or changing research tools, information, or procedures so that the findings
are no longer accurately represented in the research record is known as falsification. The act of
fabricating data or findings and then documenting, reporting, or recording them in the scientific
record is known as fabrication. These two acts are considered as the most serious transgressions
as they cast doubt on the integrity of every single component of scientific research.
21.9.2 Plagiarism
The word plagiarism originates from the Latin word plagium which means kidnapping a man.
Literally speaking, it is stealing or presenting something that was written by someone else as
your own.
Direct plagiarism (plagiarism of the text); mosaic plagiarism (taking concepts and view-
points from the original source and using a few exact words or phrases without giving credit to
the author); and self-plagiarism (reusing one’s own work without citations) are three main cat-
egories of plagiarism.
Researchers need to use their knowledge and discernment in order to interpret the published
data judiciously, integrate prior knowledge into new papers and distinguish between novel con-
cepts and results from previously published papers. Authors must adhere to the moral, legal
and ethical standards that the scientific community accepts. They must quote taken phrases and
ideas from published or unpublished works, correctly citing pertinent publications. To put it
simply, a passage taken verbatim from another author’s work must be put in quotation marks or
inverted commas.
21.9.3 Multiple Publication
Republication of prior published work is generally not allowed however, reprinting of trans-
lated content from an original publication in a foreign language may be accepted by journals,
meaning that they will not be considered ‘redundant’. Such journals need to ensure that prior
Ethics in Research and Publications 237
permissions are taken and should clearly mention the source of the original publication and it
should be made clear to the readers that the material is a translated reprint. It is more difficult to
justify republishing in the original language with the intention of reaching a different audience
when the original publication is electronic and thus easily accessible; however, editors should
take the same actions as for translation if they believe this is appropriate.
The Committee on Publication Ethics (COPE) is a body. Since its founding in 1997 by a group of
UK medical journal editors, COPE has grown to include nearly 7000 members from all academic
disciplines worldwide. Academic journal editors and anybody else with an interest in publica-
tion ethics can join. Elsevier, Wiley-Blackwell, Springer, Taylor & Francis, Palgrave Macmillan
and Wolters Kluwer are just a few of the prominent publishers who have enrolled their journals
as COPE members. Editors and publishers can get guidance from COPE on any topic related to
publishing ethics, including how to deal with instances of research and publication misconduct.
Another organization that represents the interests of more than fifty universities and organiza-
tions devoted to scientific research is the UK Research Integrity Office. It was founded in 2006
with the following objectives: To provide confidential, unbiased and knowledgeable advice and
guidance about the conduct of academic, scientific and medical research; to share best practices
for addressing misconduct, poor practice and unethical behaviour; and to promote good govern-
ance, management and conduct of research in these areas.
Medical professionals are accountable for protecting the lives and wellbeing of research
subjects when conducting scientific research.
One must keep in mind that producing a scientific work necessitates originality as well as
openness, honesty, trust and adherence to the ethical guidelines for preparing scientific papers.
If the researcher believes that the research could be harmful to the subject if it is continued, then
the researcher or the investigating team should stop. When it comes to human study, the interests
of society and science should never take precedence over the welfare of the subject. In order
to avoid misconducts in research, training researchers in research ethics, what defines research
misconduct and the gravity of its consequences is crucial.
• Research ethics includes guidelines for the conduct and dissemination of research results.
• It includes protection of life, health, dignity, integrity and the right to self-determination of
each person/animal.
• It also includes privacy and confidentiality of all personal information of research subjects.
• Biomedical researchers must consider the ethical, legal and regulatory norms and standards
for research based on human beings and animals.
• Biomedical researchers must ensure minimum harm to environment.
• Research on human beings and animals must be carried out with full adherence to ethical and
scientific norms.
• A physician combining medical research with medical care of any person must ensure that
the participation of the patient in no way adversely affects his/her health.
• Research at the cost of risks and burdens of human subjects must be avoided.
• Researchers must constantly assess the risks and costs that human subjects face as a result of
their involvement in the study.
238 Research Methodology and Quantitative Techniques
• Before the study commences, the research protocol needs to be submitted to the relevant
research ethics committee for review, approval and comments.
• Research subjects’ privacy and the security of their personal data must be protected at all
costs.
• There are ethical duties on publishers, editors, sponsors, writers, and researchers when it
comes to publishing and sharing research findings.
• The misconduct in scientific and publishing includes fabricating data and findings; falsifying
or altering the results; and plagiarizing, including self-plagiarism, fragmented, repeated and
double publishing (duplicate publication), etc.
Suggested Readings
Council for International Organizations of Medical Sciences, International ethical guidelines for biomedi-
cal research involving human subjects, Geneva, 1993.
Indian Council of Medical Research, Ethical guidelines for biomedical research on human subjects, New
Delhi, 2006.
Kher, S., ‘Informed consent process: protecting subjects rights’, in Basic principles of clinical research
and methodology, 1st ed., edited by S. K. Gupta (ed.), Jaypee Brothers, New Delhi, pp. 93–105, 2007.
Melo-Martin, I., and A. Ho, ‘Beyond informed consent: the therapeutic misconception and trust’, Journal
of Medical Ethics 34: 202–205, 2008.
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, The
Belmont Report, US Government Printing Office, Washington DC, 1979.
Saini, R., and S. Singh, Clinical trials in India: need for bioethics, The Tribune, August 9, 2010.
22 Intellectual Property Rights and
Research
22.1 Introduction
Intellectual property is a special type of property which is the intangible creation of human intel-
lect. Quite often, research outcome either as a new knowledge or as a solution to a problem leads
to creation of intellectual property.
The Liberalization, Privatization and Globalization (LPG) regime emerged in the early nine-
ties after the World Trade Organization (WTO) pushed the importance of intellectual property
across the countries of the world.
The research in all branches of learning leads to creation of a number of intellectual proper-
ties as continued research paves the way to many technologies, inventions, products, process,
etc. Following are the characteristics of intellectual property:
Intellectual property rights protect one’s work from being used unfairly by others. All kinds of
works including creative works, artistic works like images, symbols, etc., literary works like an
article or a book and new inventions like a new drug are protected by intellectual property rights
which ensure that the benefits accruing from one’s work are protected.
Mission oriented and continued research often paves way for IPR and patenting. Hence
a researcher must have basic knowledge of the nature and type of intellectual property and
method of patenting.
Copyright: It gives authors of books and articles the ability to protect their works from
misuse. Databases, reference works, software for computers, books, technical drawings,
figures and images and other items are covered under copyright protection. By giving
DOI: 10.4324/9781003527183-22
240 Research Methodology and Quantitative Techniques
copyright protection to one’s work, he/she can prevent unauthorized use of the work by
other people. Anyone using must seek the required formal permission before using any
copyrighted content in their articles or publications. One can use paraphrasing, summariz-
ing, or quoting to ensure that one gives due attribution anytime he/she refers to a book
chapter or research paper in order to avoid plagiarism. Remember that plagiarism is a
serious crime. The original source must be cited in such published work. A research thesis
having original research work of special nature has scope to be published as a book with
copy rights.
Industrial Property Rights: The rights related to industrial property which include invention,
patents, industrial designs, trademarks, geographical indications, layout designs/topogra-
phies integrated circuits, trade secrets, protection of new plant varieties and so on are covered
under industrial property rights.
Invention: The invention is a successful technical solution to a technical problem. An inven-
tion can be granted a patent if it is new, non-obvious and capable of industrial/commercial
applications.
Patents: An innovation, such as a product or a technique that gives a novel approach to a prob-
lem or a new technical solution, is given an exclusive right known as a patent. It prevents
others to manufacture the patented products without the permission of the original inventor.
The period of patents is normally for a span of 20 Years. Continued research paves way for
new inventions or process which has scope for commercial applications.
Industrial Design: What distinguishes and enhances a product is its industrial design. These
could be 2-D (lines or patterns) or 3-D (the shape or surface of an object) features. One exam-
ple of industrial design is the shape of your favourite perfume bottle.
Trademark: A trademark is a distinctive symbol used to distinguish a good or service. It could
be a single word or a combination of words and numbers. Symbols, 3-D signs, and even
drawings can have trademarks. For instance, the trademark Google is well-known. Depend-
ing on the amount of protection necessary, the trademark application may be submitted at
either the national or regional level.
Geographical Indication: A geographical indication identifies a product as being from a cer-
tain area and attributing its quality or reputation to that area. A watch labeled as ‘Swiss made’
implies that the technical development, assembly, and final inspections, have taken place in
Switzerland.
Layout Designs/Topographies of Integrated Circuits: These are three dimensional arrange-
ments of elements forming an integrated circuit intended for manufacturing. These arrange-
ments and ordering of elements follow from electronic function that the integrated circuit is
to perform.
Trade Secrets: These are a type of intellectual property which includes formula, practice, pro-
cess, designs, instruments, patterns, or compilations of information having economic value.
Protection of New Plant Varieties: The protection of plant varieties and Farmers Rights Act in
India allows for the creation of an efficient framework for the protection of plant varieties, as
well as the freedom for farmers and plant breeders to promote the breeding and cultivation
of new plant varieties.
Novelty: A patent can only be issued for an invention if it has never been previously described
or revealed anywhere in the public domain. Disclosures about invention made during talks
with third parties and on personal websites, scholarly articles, abstracts, and presentations at
scientific conferences can be detrimental. Hence the researcher must choose between patent-
ing or publications as both are not possible for any specific invention. In other words, secrecy
of the invention is vital for patenting.
Inventive step: Even while an innovation may be new, it may not be thought to have any inven-
tive steps or procedures. An invention should be such that a person working in the same
field should not come up with the same invention using common knowledge or by employ-
ing information specifically related to that field. It is obvious that this requirement cannot
be evaluated as simply as the novelty criteria. In reality, it is a matter of trying to develop
arguments that will persuade the patent examiner (who ultimately decides whether a patent
is issued or not) of the innovative step. A patent attorney helps in determining if an invention
involves an original thought or inventive step.
Industrial applicability: Any invention with practical industrial uses is eligible for patenting.
Most of the patent applications pertain to industrial applications.
In addition to the previously specified conditions for patentability, the disclosure of the invention
must also meet a number of other formal requirements. The demand for support for the inven-
tion is one of these formal conditions. The disclosure must provide sufficient details about the
invention so that a person knowledgeable in the relevant field can duplicate it.
22.4 Patentable Inventions
Patentable inventions do not necessarily need to be sophisticated or even intelligent. Through-
out their careers, many scientists produce several inventions. To allow the invention to catch the
attention of people, researchers must have a particular level of awareness. In general, an inno-
vation can be patented if it is new, not obvious, and has an industrial use. A substance, mixture,
formulation, tool, or procedure can all be considered as inventions.
242 Research Methodology and Quantitative Techniques
22.5 Grant of Patent
The granting of patent to an inventor by the country-level authorities is done after the strict
scrutiny of applications by them. In country like India, objections can be raised by other people
during pre-grant and post-grant periods. The patents are applicable within the boundaries of the
country granting patent to an invention.
The international organizations dealing with intellectual property include World Intellectual
Property Organization and World Trade Organization. The patent office in India is headquar-
tered at Kolkata with branch offices at Delhi, Mumbai and Chennai.
22.11 Inventorship
The rights of the applicant and inventor are not usually known to scientists. It is crucial to
stress that the notion of an ‘inventor’ may differ from that of a principal investigator. In pat-
ent law, a person or group of people who conceptually contributed to the claims detailed in
the invention disclosure are considered inventors. Most of the time, one cannot claim to be
an innovator just by facilitating the reduction of the invention to practice. For example, a
department head who has provided the financing, lab infrastructure, and technicians as well
as job and salary to the possible inventor but not contributed intellectually does not qualify
for claiming stake in inventorship. This is different from being a co-author of a scientific
publication. It is crucial for both political and legal reasons that (possible) inventors thor-
oughly document all IP-related actions to disprove erroneous claims made by individuals
who are not entitled.
However, in most countries, the research institute that employs the inventor is the applicant
of patent application by institute job regulations and therefore retains ownership of the patent.
22.11.1 Patent Filing
The amount needed for filing and maintaining patents is typically constrained, particularly
within universities. The decision to patent a particular invention is therefore not made lightly
and typically entails evaluation of the patentability criteria by a trained patent attorney as well
244 Research Methodology and Quantitative Techniques
as commercial evaluation of the business case. Some questions that need to be raised before
applying for patent include:
Once it is decided to file a patent application, the next step would be to decide on the
time to file the application. The likelihood that someone else will come up with the identical
innovation and file a patent earlier is reduced by filing as soon as possible. The first person
to file a patent on an invention is granted the patent, even if someone else may have come
up with the idea first. This is because the patent system is built on the first to file concept.
It is better to file for a patent before submitting a journal article or presenting the invention
in a conference.
There are a few things that need to be considered before deciding on the time to file a patent
application. Firstly, once a patent application has been filed, deadlines that depend on the fil-
ing date specially the patent’s expiration date need to be fixed. Most development expenditures
are incurred at the early stages of a patent lifetime, but majority of the earnings are typically
received towards the end. This is a strong justification for submitting the patent as late as pos-
sible. Secondly, in case of biomedical patent applications, the scope is limited by the quantity
of technical evidence provided. For instance, when a compound is claimed for use in treatment
of a new disease, there may be similar compounds which have the same effect. However, in the
absence of supporting data, it might be more challenging to acquire a broad protection for an
entire class of compounds. Additionally, medical use claim requires in vivo data which demon-
strates a compound’s effectiveness in a treatment. Thus, the first filing date can be postponed to
allow for the collection of such data, giving the institute a better chance that its investment in
the patent would result in a profit.
Usually, the patent attorney is in charge of writing the patent application. The writing of the
patent application should ideally begin with the formulation of the invention’s claims, typically
based on the manuscript on the invention which has been prepared to be submitted to a scientific
journal. Before filing the patent application, the patent attorney will invite the inventor(s) to
review the application a couple of times to respond to queries and offer further input. The entire
process can be completed in a few weeks.
The patent applicants have what is called the ‘priority right’ to file within one year a subsequent
patent application for the same invention. The first filing date is used to assess novelty and inno-
vative step. This right is generally used to file an international patent. An international patent
is filed under the Patent Cooperation Treaty (PCT) which is an agreement between more than
160 nations to recognize the priority of each other’s first patent application. By filing a PCT
application, one can postpone filing separate applications in individual countries for 30 months
from the first filing. Due to high cost of PCT application, those applications which do not have
sufficient data after the first year are discontinued. Once the application enters PCT phase, addi-
tional data providing further support for claims can be added.
Intellectual Property Rights and Research 245
22.11.3 National Phase
The applicant must decide which nations/regions the application should be granted in 30 or
31 months (depending on the territory) following the first filing of the patent application or 18
months following the subsequent PCT file. The goal is to get the application approved primarily
in those jurisdictions where there is thought to be a sizable market for the invention’s product(s)
or where the main competitors are situated. This may be highly expensive, depending on the
number of jurisdictions elected. The expenses include filing fees, patent attorney fees (a separate
intermediate agency is necessary for each international jurisdiction), and translation fees, which
are applicable to applications in, for example, Chinese and Japanese.
An examiner from the national patent office further examines the patent application in each
jurisdiction after it enters the national phase. It is extremely uncommon for a patent to be issued
without any input from a regional examiner. Examiners frequently disagree with the breadth of
the claims, which forces the applicant to either develop counter arguments or narrow down the
scope of the claims. These exchanges frequently take the form of written correspondence, or
‘office actions’. The issuing of the patent is the intended result of these administrative operations
or office actions. However, it is possible that all of the claims would be rejected, which would
prevent the patent from being granted. Another possibility is that the claims that are ultimately
approved are useless from a commercial standpoint, in which case the applicant would have to
decide to withdraw the application. A rival or possible patent infringer may also oppose the pat-
ent in an effort to have it cancelled, for instance by claiming that the invention should not have
been awarded due to a lack of novelty or inventiveness.
The role of the researchers during this phase is limited to assisting the patent attorney in the
defence of the claims by responding to any arguments the opponents may have.
• The intellectual property types include copyrights and rights related to industrial property
which includes patents, industrial designs, trademarks, geographical indications, layout
designs/topographies integrated circuits, trade secrets, protection of new plant varieties and
so on.
• The research in all branches of learning leads to creation of a number of intellectual properties.
• The intellectual property is creation of human mind (intellect) and intangible.
• IPR is given by statutes, attended with limitations and exceptions, time-bound and territo-
rial and the right is ensured through patenting. A patent is an exclusive right granted for a
new invention. An invention can be a substance, a composition a formulation, a device or a
method.
• The invention is a successful technical solution to a technical problem. An invention can be
granted a patent if it is new (not known to public prior to claim by inventor), non-obvious
(invention would not be obvious to a person with ordinary skill of the art) and capable of
246 Research Methodology and Quantitative Techniques
industrial/commercial application (invention can be made or used in any useful, practical
activity).
• Patents are granted by national patent offices after publication and substantial examination of
the applications. In India provisions exist for pre-grant and post-grant opposition by others.
• Licence is a permission granted by an IP owner to another person to use the IP on agreed
terms and conditions, while he continues to retain ownership of the IP. Licensing creates an
income source for the owner.
• The prime reason that should motivate a researcher to be engaged in the development of IPR
is that it is very rewarding to see a technology, originating from a researcher is developed into
a final product that ultimately benefits the people.
• A significant bottleneck in patenting is that academic researchers often do not have the
awareness, business mindset, or in-depth knowledge of IP-related issues to efficiently pro-
ceed with patenting their invention in addition to publishing their data. The problem here is
that a public disclosure of research findings may destroy the patentability of any invention
arising from data contained in the publication.
• ‘Intellectual property rights’ is a term used to describe a variety of legal rights for different
types of creations of the mind. IP rights provide owners the right to exclude others from com-
mercializing their creations.
• The patent system was put in place to balance the interests of the inventor and society at
large, while the inventor is granted 20-year exclusive use of the invention, the underlying
information is disclosed to the public.
Suggested Readings
Bindal, S., Intellectual property law—an introduction, 2nd ed., EBC Webstore, 2023, https://ebcweb-
store.com
Chauhan, A., and K. Singh, ‘Intellectual property rights in the digital age: a scopus-based review of
research literature’, Journal of Emerging Technologies and Innovative Research (JETIR) 10(7): 2023.
‘Intellectual property rights: an overview and implications’, www.ncbi.nim.nih.gov>pmc
‘Intellectual property rights: what researchers need to know’, https://www.enago.com.academy
Md. Mahfooez Nomani, Z., and F. Rahman, Intellectual Property Rights (IPRs) and economic develop-
ment, New Century Publications, New Delhi, 2018.
Pharmaceutical Research and Manufacturers of America (PhRMA), ‘Drug discovery and development:
understanding the R & D process’, 2007, www.phrma.org/sites/default/files/pdf/rd_brochure_022307.pdf
Rattan, J., Intellectual property rights, Volume 1, Bharat Publishers, New Delhi, 2024.
Index
Note: Page numbers in italics indicate a figure and page numbers in bold indicate a table on the
corresponding page.