0% found this document useful (0 votes)

179 views126 pages

Associate Analytics M1 Faculty Guide

Uploaded by

bejjanki.kumaraswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views126 pages

Associate Analytics M1 Faculty Guide

Uploaded by

bejjanki.kumaraswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

ASSOCIATE ANALYTICS
FACILITATORS GUIDE
MODULE 1

This Facilitators Guidebook for the

Associate Analytics program contains
detailed facilitation guidelines as well
as the exhaustive course material for
the Associate Analytics program.
Page 1 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Page 2 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Facilitator’s Guide

Associate - Analytics
Powered by:

Page 3 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

NASSCOM
4E-Vandana Building (4th Floor)
11, Tolstoy Marg, Connaught Place
New Delhi 110 001, India
T 91 11 4151 9230; F 91 11 4151 9240
E [email protected]
W www.nasscom.in

Published by

Building Domain | Enhancing Careers

T: 91 70365 88888
E [email protected]
W www.mindmapconsulting.com

Disclaimer

The information contained herein has been obtained from sources reliable to
NASSCOM. NASSCOM disclaims all warranties as to the accuracy, completeness or
adequacy of such information. NASSCOM shall have no liability for errors, omissions,
or inadequacies, in the information contained herein, or for interpretations thereof.
Every effort has been made to trace the owners of the copyright material included in
the book. The publishers would be grateful for any omissions brought to their notice
for acknowledgements in future editions of the book.

No entity in NASSCOM shall be responsible for any loss whatsoever, sustained by any
person who relies on this material. The material in this publication is copyrighted. No
parts of this report can be reproduced either on paper or electronic media, unless
authorized by NASSCOM.

Page 4 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Foreword

Page 5 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Acknowledgements

Page 6 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Prologue

Page 7 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Table of Contents – Module 1

Introduction to QP Associate Analytics

Introduction to Associate Analytics 8

Career growth in Analytics 11
Qualification pack - Q/2101 Associate Analytics 12
Overall Associate Analytics Content Structure 20
Glossary of terms 22

CORE CONTENT

UNIT 1.1 Introduction to R and R Programming 27

UNIT 1.2 Manage your work to meet requirements 40
UNIT 2.1 Summarizing Data and Revisiting Probability 60
UNIT 2.2 Work effectively with Colleagues 73
UNIT 3.0 SQL using R 99
UNIT 4.0 Correlation and Regression Analysis 106
UNIT 5.0 Understanding the Verticals and Requirements Gathering 119

Page 8 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Introduction
Qualifications Pack-Associate –Associate Analytics SSC/Q2101

SECTOR: IT-ITeS

SUB-SECTOR: Business Process Management

OCCUPATION: Analytics

REFERENCE ID: SSC/Q2101

ALIGNED TO NCO CODE: TBD

Brief Job Description: Individuals at this job are responsible for building analytical packages
using Databases, Excel or other Business Intelligence (BI) tools

Personal Attributes: This job requires the individual to follow detailed instructions and
procedures with an eye for detail. The individual should be analytical and result oriented and
should demonstrate logical thinking.

Eligibility: Bachelor's Degree in Statistics/ Science/Technology, Master's Degree in

Science/Technology/Statistics

Work Experience:0-1 years of work experience/internship in analytics roles

Roles of Associate Analytics

(--------)

Page 9 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Analytics is a key occupation in the structure of the ITS Sub-Sector

Page 10 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Analytics excellent Vertical and Horizontal movements in their

tracks

Page 11 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Movement to Other Occupations, Sub-sectors and Industries:

Given the dynamic range of services that the BPM sub-sector is increasingly offering to its
clients in the industry, there are a variety of roles that employees are performing across the
entire spectrum of offerings. As such they become a valuable asset not only to the BPM sub-
sector, but also to all the client industries they are associated with.

Page 12 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

OVERALL QUALIFICATION PACK DETAILS

Qualifications Pack Code SSC/Q2101

Associate - Analytics
Job Details

Job Role This job role is applicable in both national and international
scenarios
Credits(NVEQF/NVQF/NSQF) Version number 0.1
Sector IT-ITeS Drafted on 30/04/13
Business Process 30/04/13
Sub-sector Last reviewed on
Management
30/06/14
Occupation Analytics Next review date

KA1. Job Role KA2. Associate - Analytics

(Business Analytics Associate/ Analyst)

KA4. Responsible for building analytical packages using

KA3. Role Description
Databases, Excel or other Business Intelligence (BI) tools
KA5. NVEQF/NVQF level KA10. 7
KA6.

KA7. Minimum Educational KA11. Bachelor's Degree in Statistics/ Science/Technology or any

Qualifications other course
KA8. KA12. Master's Degree in Science/Technology/Statistics or any
KA9. Maximum Educational other course
Qualifications
KA15. Courses in SPSS, SAS, STATA and/or Spreadsheets
KA13. Training KA16. RDBMS concepts, PL\SQL, OCA certification
KA14. (Suggested but not mandatory) KA17. Financial and accounting terminologies in respective
language & various accounting standards and GAAPs
KA18. Experience
KA20. 0-1 years of work experience/internship in analytics roles
KA19.
KA22. Compulsory:
1. SSC/ N 0703 (Create documents for knowledge sharing)
2. SSC/ N 2101 (Carry out rule-based statistical analysis)
3. SSC/ N 9001 (Manage your work to meet requirements)
4. SSC/ N 9002 (Work effectively with colleagues)
KA21. Applicable National 5. SSC/ N 9003 (Maintain a healthy, safe and secure working
Occupational Standards (NOS) environment)
6. SSC/ N 9004 (Provide data/information in standard formats)
7. SSC/ N 9005 (Develop your knowledge, skills and competence)
KA23.
KA24. Optional:
KA25. Not Applicable

KA26. Performance Criteria KA27. As described in the relevant OS units

Page 13 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N0703 - Create Documents for knowledge

sharing

Session Overview

In the Associate Analytics “Working with Documents”, the participant will learn about the most
prominently used documentation techniques in corporate organizations. The Documentation
types covered would include case studies, best practices, project artifacts, reports, minutes,
policies, procedures, work instructions etc.
This session is NOT intended to cover technical documents or documents to support the
deployment and use of products/applications, which are dealt with in different standards.

Session Goal

Participants should be able to have a good hands on understanding of MS Word and MS Visio,
where there will be required to draft various documents/reports. The goal of the session is for the
participant to be aware of the various documentation techniques which are used prominently in
organizations.

Session Objectives
Upon completion of both parts of this course, the participants will be able to:

PC1. establish with appropriate people the purpose, scope, formats and target audience for
the documents
PC2. access existing documents, language standards, templates and documentation tools from
your organization’s knowledge base
PC3. liaise with appropriate people to obtain and verify the information required for the
documents
PC4. confirm the content and structure of the documents with appropriate people
PC5. create documents using standard templates and agreed language standards
PC6. review documents with appropriate people and incorporate their inputs
PC7. submit documents for approval by appropriate people
PC8. publish documents in agreed formats
PC9. update your organization’s knowledge base with the documents
PC10. comply with your organization’s policies, procedures and guidelines when creating
documents for knowledge sharing

Note: The material for this NOS has been covered in the Associate Analytics Module 3 Book
(book 3) in Unit 5

Page 14 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N 2101 – Carry out rule-based statistical analysis

Session Overview

In the Associate Analytics Carry out rule based statistical analysis, the participants will go
through Business Analytics using R tool. The participants will also learn Applied Statistical
concepts like Descriptive Statistics and find their usage along with R. Furthermore, they willalso
have an overview of Big Data tools and their basic functioning.
Then they will learn about Machine Learning algorithm and their use in Data Mining and
Predictive Analytics. Finally the participants will learn about Data Visualization and gather
knowledge on Graphical representation of Data as well as results and reports.

Session Goal

The primary goal of the session is for the participants to learn the R tools and its various
functions and features. Then also learn about Big Data tools and Big Data Analytics. Students
will also learn about basic applied statistical concepts.

Session Objectives
To be competent, participants must be able to:

PC1. establish clearly the objectives and scope of the analysis

PC2. obtain guidance from appropriate people to identify suitable data sources to agree the
methodological approach
PC3. obtain and structure data using standard templates and tools
PC4. validate data accurately and identify anomalies
PC5. obtain guidance from appropriate people on how to handle anomalies in data
PC6. carry out rule-based analysis of the data in line with the analysis plan
PC7. validate the results of your analysis according to statistical guidelines
PC8. review the results of your analysis with appropriate people
PC9. undertake modifications to your analysis based on inputs from appropriate people
PC10. draw justifiable inferences from your analysis
PC11. present the results and inferences from your analysis using standard templates and tools
PC12. comply with your organization’s policies, procedures and guidelines when carrying out
rule-based quantitative analysis

Note: The material for this NOS has been covered in all the three Modules of Associate
Analytics
Page 15 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N 9001: Manage Your Work to Meet

Requirement
Session Overview
The Associate Analytics Manage your work to meet requirement module is designed to help
participants understand the importance of time in a professional environment and how to manage
multiple time bound requirements. It emphasizes on how time management is critical to work
management and completing requirements/deliverables.
Participants learn how to manage work and how to ensure deliverables are completed in
stipulated time in an organization by following tested principles to prevent/handle slippages
on timelines. The module also emphasizes the need to respect time for self as well as
colleagues.
Time management cannot override the qualitative aspect of the deliverable.

Session Goal
The primary goal of the session is for the participants to learn and manage time to be able to
complete their work as required. The requirements of a work unit may be further classified
into; activities, deliverable, quantity, standards and timelines. The session makes participants
to be aware of defining requirements of every work unit and then ensuring delivery.
Additionally, this session discusses practical application of planning and execution of work
plans to enable the participants to effectively deal with the failure points, minimize the
impact, if any. Equally critical is the escalation plan and root cause analysis of exceptions.
Successful candidates will be able to understand the inter-relationship of time, effort,
impact and cost.

Session Objectives
Upon completion of both parts of this course, the participants will be able to:
PC1. Establish and agree your work requirements with appropriate people
PC2. Keep your immediate work area clean and tidy
PC3. Utilize your time effectively
PC4. Use resources correctly and efficiently
PC5. Treat confidential information correctly
PC6. Work in line with your organization’s policies and procedures
PC7. Work within the limits of your job role
PC8. Obtain guidance from appropriate people, where necessary
PC9. Ensure your work meets the agreed requirements

Note: The material for this NOS has been covered in Unit 1 of Module 1. Much of the material
herein is going to be self-study for the participants

Page 16 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N 9002: Work Effectively With Colleagues

Session Overview
The Associate AnalyticsWork Effectively with Colleagues module is designed to help
participants understand the importance of teamwork in a professional environment. It
emphasizes on how relationship management is critical to work management. It also focuses
on the importance of personal grooming.
Participants learn how to manage cross functional relationships and how to nurture a good
working environment. The module also stresses on the need to respect colleagues.

Session Goal
The primary goal of the session is for the participants to understand the importance of
professional relationships with colleagues.Additionally, this session discusses importance of
personal grooming.
Successful candidates will be able to understand the inter-relationship of professionalism and
team-work.

Session Objectives
Upon completion of both parts of this course, the participants will be able to:
PC1. Communicate with colleagues clearly, concisely and accurately.
PC2. Work with colleagues to integrate your work effectively with theirs.
PC3. Pass on essential information to colleagues in line with organizational requirements.
PC4. Work in ways that show respect for colleagues.
PC5. Carry out commitments you have made to colleagues.
PC6. Let colleagues know in good time if you cannot carry out your commitments, explaining
the reasons.
PC7. Identify any problems you have working with colleagues and take the initiative to solve
these problems.
PC8. Follow the organization’s policies and procedures for working with colleagues.

Note: The material for this NOS has been covered in Unit 2 of Module 1. Much of the material
herein is going to be self-study for the participants

Page 17 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N 9003: Maintain a Healthy, Safe and Secure

working Environment
Session Overview
The Associate Analytics Health, Safety and Security module is designed to help participants
understand the importance of following safety rules and regulations at workplace.
Participants learn how to work safely in an organization by following guidelines to
prevent/handle any accidents or emergencies. The module also emphasizes the need of
security and the entities that can pose a threat to it.

Session Goal
The primary goal of the session is for the participants to be aware about the various hazards
that they may come across at workplace and what are the defined health, safety and security
measures that should be followed at the time of occurrence of such unpredictable events.
Additionally, this session discusses practical application of the health and safety procedures to
enable the participants to effectively deal with the hazardous events to minimize the impact, if
any.

Session Objectives
Upon completion of both parts of this course, the participants will be able to:
PC1. Comply with your organization’s current health, safety and security policies and procedures
PC2. Report any identified breaches in health, safety, and security policies and procedures to the
designated person
PC3. Identify and correct any hazards that you can deal with safely, competently and within the
limits of your authority
PC4. Report any hazards that you are not competent to deal with to the relevant person in line
with organizational procedures and warn other people who may be affected
PC5. Follow your organization’s emergency procedures promptly, calmly, and efficiently
PC6. Identify and recommend opportunities for improving health, safety, and security to the
designated person
PC7. Complete any health and safety records legibly and accurately

Note: The material for this NOS has been covered in Unit 2 of Module 1. Much of the material
herein is going to be self-study for the participants

Page 18 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N 9004: Provide data/information in standard

formats
Session Overview
The Associate AnalyticsProvide data/information in standard formats module is designed to
help participants understand the standard operating procedures in organizations pertaining to
reporting data in a logical sequence and arriving at conclusive decisions models after analysis
of data. This module is aimed at developing the sense of understanding in an individual when
the individual works with data, of how to take the data and present it as relevant information
in standardized formats.
Participants learn how to share information with other people inside or outside a specified
work group and also how to arrive at decisions regarding certain problem types.

Session Goal
The primary goal of the session is for the participants to analyze data and present it in a
suitable format, as is suitable for the given process or organization.
Successful candidates will be able to understand the process of standardized reporting and the
nuances of a publishing a report with a specified end objective in mind.

Session Objectives
Upon completion of both parts of this course, the participants will be able to:
PC1. establish and agree with appropriate people the data/information you need to provide,
the formats in which you need to provide it, and when you need to provide it
PC2. obtain the data/information from reliable sources
PC3. check that the data/information is accurate, complete and up-to-date
PC4. obtain advice or guidance from appropriate people where there are problems with the
data/information
PC5. carry out rule-based analysis of the data/information, if required
PC6. insert the data/information into the agreed formats
PC7. check the accuracy of your work, involving colleagues where required
PC8. report any unresolved anomalies in the data/information to appropriate people
PC9. provide complete, accurate and up-to-date data/information to the appropriate people in
the required formats on time

Page 19 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

SSC/ N 9005: Develop your knowledge, skills and

competence
Session Overview
The Associate Analytics develop your knowledge, skills and competence module is designed to
help participants understand the importance of skill development in a professional environment
and how to enhance skills in order to excel. It emphasizes on how enhance skills and knowledge
in a diversified professional environment.

Session Goal
The pimary goal of the session is to give a overview on how skills and competency can be
enhanced in a professional environment. It gives knowledge on organizational context, technical
knowledge, core skills/geneic skills, professional skills and technical skills.The session makes
participants to understand the need of skills improvement for personal and organizational
growth.
Successful candidates will be able ro understand the relationship between skill enhancement and
growth.

Session Objectives
Upon completion of both parts of this course, the participants will be able to:
PC1. obtain advice and guidance from appropriate people to develop their knowledge, skills
and competence
PC2. identify accurately the knowledge and skills you need for their job role
PC3. identify accurately their current level of knowledge, skills and competence and any
learning and development needs
PC4. agree with appropriate people a plan of learning and development activities to address
their learning needs
PC5. undertake learning and development activities in line with their plan
PC6. apply their new knowledge and skills in the workplace, under supervision
PC7. obtain feedback from appropriate people on their knowledge and skills and how
effectively they apply them
PC8. review their knowledge, skills and competence regularly and take appropriate action

Page 20 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Overall Associate Analytics Content Structure

Module 1 – Book 1
Subject I / SSC NASSCOM - NOS- 2101, 9001, 9002 NOS Hours Minutes
NOS
Unit – 1 2101/9001
Introduction to Analytics & R programming 6 360
Manage your work to meet requirements 4 240
NOS
Unit – 2 2101/9002
Summarizing Data & Revisiting Probability 6 360
Work effectively with Colleagues 4 240
Unit – 3 NOS 2101
SQL using R 9 510
Unit – 4 NOS 2101
Correlation and Regression Analysis 9 510
Unit – 5 NOS 2101
Understanding Verticals - Engg, Financial, others 6 390
Requirements Gathering 6 390
Total Hrs/Minutes 50 3000

Module 2 – Book 2
Subject II / SSC NASSCOM - NOS- 2010, 9003, 9004 NOS Hours Minutes
NOS
Unit – 1 2101/9003
Data Management 7 420
Maintain Healthy, Safe & Secure Working environment 4 240
NOS
Unit – 2 2101/9004
Big Data Tools 7 420
Provide Data/Information in Standard formats 4 240
Unit – 3 NOS 2101
Big Data Analytics 8 480
Unit – 4 NOS 2101
Machine Learning Algorithms 8 480
Unit – 5 NOS 2101
Data Visualization 6 360
Product Implementation 6 360
Total Hrs/Minutes 50 3000

Page 21 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 3 – Book 3
Subject III / SSC NASSCOM - NOS - 0703, 2101, 9005 NOS Hours Minutes
Unit – 1 NOS 2101
Introduction to Predictive Analytics 6 360
Linear Regression 6 360
Unit – 2 NOS 2101
Logistics Regression 9 540
NOS
Unit – 3 2101/9005
Objective Segmentation 6 360
Develop Knowledge Skill and competences 3 180
Unit – 4 NOS 2101
Time Series Methods/Forecasting, Feature Extraction 5 300
Project 5 300
Unit – 5 NOS 0703
Working with documents 10 600
Total Hrs/Minutes 50 3000

Page 22 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Glossary of Terms

Keywords /Terms Description

Sector Sector is a conglomeration of different business operations having similar

Definitions

businesses and interests. It may also be defined asa distinct subset of the
economy whose components share similar characteristics and interests.
Sub-sector Sub-sector is derived from a further breakdown based on the characteristics and
interests of its components.
Vertical Vertical may exist within a sub-sector representing different domain areas or the
client industries served by the industry.
Occupation Occupation is a set of job roles, which perform similar/related set of functions in
an industry.
Function Function is an activity necessary for achieving the key purpose of the sector,
occupation, or area of work, which can be carried out by a person or a group of
persons. Functions are identified through functional analysis and form the basis
of OS.
Sub-functions Sub-functions are sub-activities essential to fulfill the achieving the objectives of
the function.
Job role Job role defines a unique set of functions that together form a unique
employment opportunity in an organisation.
Occupational OS specify the standards of performance an individual must achieve when
Standards (OS) carrying out a function in the workplace, together with the knowledge and
understanding they need to meet that standard consistently. Occupational
Standards are applicable both in the Indian and global contexts.
Performance
Performance Criteria are statements that together specify the standard of
Criteria
performance required when carrying out a task.

National
Occupational NOS are Occupational Standards which apply uniquely in the Indian context.
Standards (NOS)
Qualifications Pack Qualifications Pack Code is a unique reference code that identifies a qualifications
Code pack.
Qualifications Qualifications Pack comprises the set of OS, together with the educational,
Pack(QP) training and other criteria required to perform a job role. A Qualifications Pack is
assigned a unique qualification pack code.
Unit Code Unit Code is a unique identifier for an OS unit, which can be denoted with either
an ‘O’ or an ‘N’.
Unit Title Unit Title gives a clear overall statement about what the incumbent should be
able to do.

Page 23 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Description Description gives a short summary of the unit content. This would be helpful to
anyone searching on a database to verify that this is the appropriate OS they are
looking for.
Scope Scope is the set of statements specifying the range of variables that an individual
may have to deal with in carrying out the function which have a critical impact on
the quality of performance required.
Knowledge and Knowledge and Understanding are statements which together specify the
Understanding technical, generic, professional and organisational specific knowledge that an
individual needs in order to perform to the required standard.
Organisational Organisational Context includes the way the organisation is structured and how it
Context operates, including the extent of operative knowledge managers have of their
relevant areas of responsibility.
Technical
Technical Knowledge is thespecificknowledgeneededto accomplish specific
Knowledge
designated responsibilities.

Core Skills/Generic Core Skills or Generic Skills are a group of skills that are key to learning and
Skills working in today's world. These skills are typically needed in any work
environment. In the context of the OS, these include communication related skills
that are applicable to most job roles.
Helpdesk Helpdesk is an entity to which the customers will report their IT problems. IT
Service Helpdesk Attendant is responsible for managing the helpdesk.
Keywords /Terms Description
IT-ITeS Information Technology - Information Technology enabled Services
BPM Business Process Management
BPO Business Process Outsourcing
KPO Knowledge Process Outsourcing
LPO Legal Process Outsourcing
IPO Information Process Outsourcing
BCA Bachelor of Computer Applications
B.Sc. Bachelor of Science
OS Occupational Standard(s)
NOS National Occupational Standard(s)
QP Qualifications Pack
UGC University Grants Commission
MHRD Ministry of Human Resource Development
MoLE Ministry of Labour and Employment
NVEQF National Vocational Education Qualifications Framework
NVQF National Vocational Qualifications Framework
NSQF National Skill Qualification Framework

Page 24 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Nomenclature for QP & NOS

UNITS________________________________________________________________________
_____
Qualifications Pack
9 characters
SSC/Q0101

SSC denoting Software &Services

Companies (IT-ITeSindustry)
QP number (2 numbers)

Q denoting Qualifications Pack

National Occupational Standard
Occupation (2 numbers)
9 characters
SSC/N0101

SSC denoting Software &Services

Companies (IT-ITeSindustry)
NOS number (2 numbers)

N denoting National Occupational

Standard Occupation (2 numbers)
Occupational Standard
9 characters
SSC/N0101

SSC denoting Software &Services

Companies (IT-ITeSindustry)
OS number (2 numbers)
O denoting Occupational Standard

Occupation (2 numbers)
It is important to note that an OS unit can be denoted with either an ‘O’ or an ‘N’.
 If an OS unit denotes ‘O’, it is an OS unit that is an international standard. An example of OS unit
denoting ‘O’ is SSC/O0101.
 If an OS unit denotes ‘N’, it is an OS unit that is a national standard and is applicable only for the
Indian IT-ITeS industry. An example of OS unit denoting ‘N’ is SSC/N0101

Page 25 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

The following acronyms/codes have been used in the nomenclature above:

Sub-Sector Range of Occupation numbers

IT Service (ITS) 01-20
Business Process Management (BPM) 21-40
Engg. and R&D (ERD) 41-60
Software Products (SPD) 61-80

Sequence Description Example

Three letters Industry name SSC
(Software & Service Companies )
Slash / /
Next letter Whether QP or NOS N
Next two numbers Occupation Code 01
Next two numbers OS number 01

Page 26 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1: Unit– 1.1

Introduction to Analytics or R programming
Topic Session Goals

Introduction to Analytics or R By the end of this session, you will be able to:
programming 1. Understand R
2. Use functions of R

Material and Handouts

Facilitator Material Participant Material and Handouts

Facilitator Guide, Handouts  Participants’ Guide

Session Plan:
Activity Location
Knowing language R Classroom

Using R as calculator Classroom

Understanding components of R Classroom

Reading database using R Classroom

Importing & Exporting CSV Classroom

Working on Variables Classroom

Outliers and Missing Data treatment Classroom

Combining Data sets in R Classroom

Discuss Function and Loops Classroom

Check your understanding Classroom

Summary Classroom

Page 27 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Unit 1.1 : Step-by-Step

Key Points
Knowing the language “R”

R can be defined as below:-

•Programming language for graphics and statistical computations
•Available freely under the GNU public license
•Used in data mining and statistical analysis
•Included time series analysis, linear and nonlinear modeling among others
•Very active community and package contributions
•Very little programming language knowledge necessary
•Can be downloaded from http://www.r-project.org/

Did you know?

 R Studio is an IDE for R with advanced and more user-friendly GUI.

 R is the substrate on which we can mount various features using PACKAGES like
RCMDR- R Commander or R-Studio.
 R was started by Bell Laboratories as “S” for Fortran Library.

Look at R!

R-Commander Interface R-Studio Interface

Using R as calculator

R can be used as a calculator. For example, if we have to know what is 2+2 then-
> 2+2
Press enter and we get the answer as
[1] 4
R-Studio Interface
Similarly we can calculate anything as if done on calculator.

Page 28 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Calculate the following using R:

1. Log of 2
2. 23 X 32
3. e3

Understanding components of R
1. Data Type:

There are two types of data classified on very broad level. They are Numeric and Character data.
 Numeric Data: - It includes 0~9, “.” and “- ve” sign.
 Character Data: - Everything except Numeric data type is Character. For Example, Names, Gender
etc.

Data is also classified as Quantitative and Qualitative.

For Example, “1,2,3…” are Quantitative Data while “Good”, “Bad” etc. are Quantitative Data.

Although we can convert Qualitative Data into Quantitative Data using Ordinal Values.
For Example, “Good” can be rated as 9 while “Average” can be rated as 5 and “Bad” can be rated as 0.

2. Data Frame:
A data frame is used for storing data tables. It is a list of vectors of equal length.

For example, here is a built-in data frame in R, called mtcars.

The top line of the table, called the header, contains the column names. Each horizontal line afterward
denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data
member of a row is called a cell. To retrieve data in a cell, we would enter its row and column coordinates in
the single square bracket "[]" operator. The two coordinates are separated by a comma. In other words, the
coordinates begins with row position, then followed by a comma, and ends with the column position. The
order is important.
For Example,
Here is the cell value from the first row, second column of mtcars.

Page 29 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

3. Array and Matrices:

We have two different options for constructing matrices or arrays. Either we use the creator
functions matrix () and
Array (), or you simply change the dimensions using the dim () function.
For example, you make an array with four columns, three rows, and two “tables” like this:

> my.array < - array(1:24, dim=c(3,4,2))

In the above example, “my.array” is the name of the array we have given. And “←” is the assignment
operator.
There are 24 units in this array mentioned as “1:24” and are divided in three dimensions “(3, 4, 2)”.
Note: - Although the rows are given as the first dimension, the tables are filled column-wise. So, for arrays,
R fills the columns, then the rows, and then the rest.
Alternatively, you could just add the dimensions using the dim ( ) function. This is a little hack that goes a
bit faster than using the array ( ) function; it’s especially useful if you have your data already in a vector.
(This little trick also works for creating matrices, by the way, because a matrix is nothing more than an array
with only two dimensions.)
Say you already have a vector with the numbers 1 through 24, like this:

> my.vector <- 1:24

You can easily convert that vector to an array exactly like my.array simply by assigning the dimensions, like
this:

> dim(my.vector) <- c(3,4,2)

Create an Array with name “MySales” with 30 observations using following

methods:

1. Defining the dimensions of the array as 3, 5 and 2.

2. By using Vector method.

You can check whether two objects are identical by using theidentical () function.

Page 30 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Reading Database using R

We can import Datasets from various sources having various files types for example,
 .csv format
 Big data tool – Impala
 CSV File

The sample data can also be in comma separated values (CSV) format. Each cell inside such data file is
separated by a special character, which usually is a comma, although other characters can be used as well.
The first row of the data file should contain the column names instead of the actual data. Here is a sample of
the expected format.

Col1,Col2,Col3
100,a1,b1
200,a2,b2
300,a3,b3

After we copy and paste the data above in a file named "mydata.csv" with a text editor, we can read the data
with the function read.csv.

> mydata = read.csv("mydata.csv") # read csv file

> mydata

Col1 Col2 Col3

1 100 a1 b1
2 200 a2 b2
3 300 a3 b3

In various European locales, as the comma character serves as the decimal point, the
functionread.csv2 should be used instead. For further detail of the read.csv and read.csv2 functions, please
consult the R documentation.

> help(read.csv)
 Big data tool – Impala

Cloudera 'Impala', which is a massively parallel processing (MPP) SQL query engine runs natively in
Apache Hadoop.

R package, RImpala, connects Impala to R.

Page 31 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

RImpala enables querying the data residing in HDFS and Apache HBase from R, which can be further
processed as an R object using R functions. RImpala is now available for download from the Comprehensive
R Archive Network (CRAN) under GNU General Public License (GPL3).

This package is developed and maintained by MuSigma.

To install RImpala :
We use following code to install RImpala package.

>install. packages("RImpala")

How RImpala works

Page 32 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Importing and Exporting CSV

•Loading data – data(dataset_name)

•read and write functions
•getwd() and setwd(dir)
•read and write functions use full path name
Example:
read.csv (“C:/Rtutorials/Sampledata.csv”). Similarly for writing dataset we use the write () function.

“Getwd” means get the working directory (wd) and “setwd” is used to set the working directory.

•Help - ?function_name -> It is used to get help on any function in R.

Import a CSV file in R and check the output.

Name, Age, Sex

Shaan, 21, M

Ritu, 24, F

Raj, 31, M

Working on Variables

Before learning about creating and modifying variables in R we will know the various operators in R.

There are 2 types of operators –Arithmetic and Logical.

Operator Description Operator Description
+ Addition < less than
<= less than or equal to
- Subtraction > greater than
* Multiplication >= greater than or equal to
== exactly equal to
/ Division
!= not equal to
^ or ** Exponentiation !x Not x
x|y x OR y
x %% y modulus (x mod y) 5%%2 is 1
x&y x AND y
x %/% y integer division 5%/%2 is 2 isTRUE(x) test if X is TRUE

Arithmetic Operators Logical Operators

Page 33 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

 Creating New variables:-

Use the assignment operator “<-” to create new variables.

For example,
mydata$sum <- mydata$x1 + mydata$x2
New variable is created using two already available variables.

 Modifying existing variable:-

We can rename the existing variable by rename() function.

For examples,
mydata<- rename(mydata, c(oldname="newname"))

We can also recode variables in R.

For example,
If we want to rename variable based on some criteria like below
mydata$agecat<- ifelse(mydata$age> 70, c("older"), c("younger"))

 Create a new variable Total_Sales using variables Sales_1 and Sales_2.

 Then, modify the above new variable name from Total_Sales to Sales_Total.

Outliers and Missing Data treatment

Inputting missing data using standard methods and algorithmic approaches (mice package R):
 In R, missing values are represented by the symbol NA (not available).
 Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number).
 Unlike SAS, R uses the same symbol for character and numeric data.

To test if there is any missing in the dataset we use is.na () function.

For Example,
We have defined “y” and then checked if there is any missing value. T or True means that there is a missing
value.
y <- c(1,2,3,NA)
is.na(y)
# returns a vector (F FF T)

Page 34 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Arithmetic functions on missing values yield missing values.

For Example,
x <- c(1,2,NA,3)
mean(x)
# returns NA

To remove missing values from our dataset we use na.omit() function.

For Example,
We can create new dataset without missing data as below: -

newdata<- na.omit(mydata)

Or, we can also use “na.rm=TRUE” in argument of the operator. From above example we use na.rm and get
desired result.
x <- c(1,2,NA,3)
mean(x, na.rm=TRUE)

# returns 2

MICE Package -> Multiple Imputation by Chained Equations

MICE uses PMM to impute missing values in a dataset.

PMM-> Predictive Mean Matching (PMM) is a semi-parametric imputation approach. It is similar to the
regression method except that for each missing value, it fills in a value randomly from among the a observed
donor values from an observation whose regression-predicted values are closest to the regression-predicted
value for the missing value from the simulated regression model.

Outliers:
Outlier is a point or an observation that deviates significantly from the other observations.
●Due to experimental errors or “special circumstances”
●Outlier detection tests to check for outliers
●Outlier treatment –
 Retention
 Exclusion
 Other treatment methods

We have “OUTLIER” package in R to detect and treat outliers in Data.
Normally we use BOX Plot and Scatter plot to find outliers from graphical representation.

Page 35 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Scatter Plot

Box Plot

Combining Data sets in R

 To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join
two data frames by one or more common key variables (i.e., an inner join).
For example,
To merge two data frames by ID:
total <- merge(data frameA,data frameB,by="ID")
 To merge on more than one criteria we pass the argument as follows
To merge two data frames by ID and Country:
total <- merge(data frameA,data frameB,by=c("ID","Country"))
 To join two data frames (datasets) vertically, use the rbind function. The two data frames must have
the same variables, but they do not have to be in the same order.

Page 36 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

For example,
total <- rbind(data frameA, data frameB)

Note:-
If data frameA has variables that data frameB does not, then either:
1. Delete the extra variables in data frameA or
2. Create the additional variables in data frameB and set them to NA (missing)
before joining them with rbind( ).

We use cbind() function to combine data by column the syntax is same as rbind().

Plyr package: Tools for Splitting, Applying and Combining Data.

We use rbind.fill() in plyr package in R. It binds or combines a list of data frames filling missing columns
with NA.
For example,
rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])

In this all the missing value will be filled with NA.

Discuss Function and Loops

Using for and ifelse in R :

“FOR” Loop:-
To repeat an action for every value in a vector by using a “for” loop.
We construct a “for” loop in R as follows:
for(i in values){
... do something ...
}

This for loop consists of the following parts:

 The keyword for, followed by parentheses.
 An identifier between the parentheses. In this example, we use i, but that can be any object name you
like.
 The keyword in, which follows the identifier.
 A vector with values to loop over. In this example code, we use the object values, but that again can be
any vector you have available.
 A code block between braces that has to be carried out for every value in the object values.
In the code block, you can use the identifier. Each time R loops through the code, R assigns the next value in
the vector with values to the identifier.

Page 37 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

“IFELSE” Function:-
When using R, sometimes we need our function to do something if a condition is true and something else if
it is not.
You could do this with two if statements, but there’s an easier way in R: an if...else statement.
An if…else statement contains the same elements as an if statement), and then some extra:
 The keyword else, placed after the first code block
 A second block of code, contained within braces, that has to be carried out if and only if the result of the
condition in the if() statement is FALSE

For example,
if(hours > 100) net.price<- net.price * 0.9
if(public) {tot.price<- net.price * 1.06 } else
{tot.price<- net.price * 1.12} round(tot.price)}

Or it can be written as also,

if(public) tot.price<- net.price * 1.06 else
tot.price<- net.price * 1.12
Check Your Understanding

1. Which of the following is not a functionality of R?

a. Linear modeling
b. Nonlinear modeling
c. Developing webapplications
2. Which of the following is the function to display the first 5 rows of data?

3. What is the difference between rbind and cbind?

4. What are 2 ways of looping in R?

Page 38 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Case Study:
Segregate Sepal length based on the “Sepal Length” and “Sepal Width”. Use the dataset named “IRIS”.

Data Set

Summary

 R is managed and maintained by CRAN.

 It is a freeware as well as open source software.
 We use “>” to start ant new code in R.
 We use “<-” as an assignment operator.
 A data frame is used for storing data tables.
 Rbind and Cbind are used to join 2 or more datasets.
 Outliers are extraordinary situations.
 IFELSE is used to 2 conditions simultaneously.

1. Divide the class into groups of 4-5 participants

2. Give the Dataset to the participants.
3. Give 10 minutes to the class for each group to discuss the steps that they
would follow for Missing data treatment and Outlier Treatment process
along with a discussion on the methods type which they would like to use
4. Each group presents their steps (5 min each)

Page 39 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1: Unit– 1.2

Manage your work to meet requirements
Topic Topic Description Material Required

Time Management  Discuss the significance of time  Facilitator Guide

management  Student Handbook
 Create awareness on basic time
management techniques
 Summarize the appropriate
discussion points from the breakout
sessions

Work  Discuss importance prioritization  Facilitator Guide

Management and and planning.  Student Handbook
Prioritization  How to operationalize the plan.
 Create awareness on how to monitor
performance

Quality and  Discuss importance of expectation  Facilitator Guide

Standards setting  Student Handbook
Adherence  Develop understanding on defining
activities to be performed,
deliverables and yardsticks of
measuring output.
 Create awareness on the common
Service Level Agreements

Course  Validate learning objectives have n/a

Conclusion been met
 Make final summary remarks
 Conduct final Q&A
 Conclude the course

Page 40 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Facilitator Preparation
Responsibilities

 Review examples provided: reflect on your own experiences and determine when to share
them.

 Review all material – Facilitator Guide, Presentation, Guides and Handouts (if any)

 Make sure you havecopies of all the handouts.

 Make sure the learning resources are loaded on your computer.

 Conduct a run through of the content. Conduct a dress rehearsal of the session as you move
through the content. Make sure you are comfortable with the tools and interactions
recommended in the facilitator guide.

 Note that all examples are in italics to emphasize key learning points; however, you may use
your own professional experience to enhance the learning.

 Make sure you create folders for all breakout activities.

Page 41 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Principles of Facilitating
Personal Experiences
As a facilitator, you lead participants through prepared scenarios and discussions. During
this process, relate your own professional experience to add realism. Often, personal
experiences on how you helped a colleague through the career ownership process and guided
them to achieving work satisfaction are more memorable than step-by-step instructions on
following the career ownership process. Sharing experiences helps participants understand
how professionals work and think, and gives them the opportunity to apply those lessons to
their own work processes. Also, participants are more likely to remember answers if they
have to think and explore on their own. Your goal is to foster independent thinking and
action rather than having participants depend on your experience.

Experiential Learning
This workshop includes exercises designed to help participants discover the principles of
guiding the participants through the career ownership process and career satisfaction.
Encourage a free-wheeling discussion and call out important trends and insights. Make
liberal use of the whiteboard to capture and display critical participant insights.

Socratic Questions
Your goal throughout the session is to guide participants towards thinking through the
scenarios and discussion questions independently, rather than providing answer. For
example:

Rather than saying… Ask…

The Reality Check worksheet provides valuable What information can you gather from
information about how time is currently spent and the Reality Check worksheet and how
what it would look like in the best case scenario. can the information be used to move
towards career satisfaction?

Page 42 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Topic: Welcome and Introduction

Time Management
Welcome the participants to the course and move to the introductions.

Introductions
I am <Facilitator’s Name> and I am your facilitator today.”
Briefly review the roles of the Lead Facilitator and Support Facilitator, if any.
Give a brief of your own experience and background.

Why are you here today? [Course Objectives]

“Why are you here today?”
After reviewing and arranging responses, summarize the responses and map the responses to
the suggested course benefits below.

“Regardless of why you’re here today, we’re all going to walk away with some key benefits – let’s discuss
those briefly.”
Suggested Responses/Benefits to Debrief:
The benefits of this course include:
 Efficient and Effective time management
 Efficient – Meeting timelines
 Effective – Meeting requirement for desired output
 Awareness of the SSC environment and time zone understanding
 Awareness of the SSC environment and importance of meeting timelines to handoffs

Review the course objectives listed above.

“To fulfill these objectives today, we’ll be conducting a number of hands-on activities. Hopefully
we can open up some good conversations and some of you can share your experiences so that we
can make this session as interactive as possible. Your participation will be crucial to your
learning experience and that of your peers here in the session today.”

Ice Breaker – Open Discussion of 2 points

“Please share your thoughts on following.”

Page 43 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

After participants give their views, debrief and bring to consensus

Question: Please share your thoughts on following?

A. Time is perishable – Cannot be created or recovered
B. Managing is only option – Prioritize

Page 44 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Importance of Time Management

Provide a brief overview of the session. Discuss the importance of better utilization of time as the
only tool to prevention of slippages on timelines.
Open up the discussion for the session and ask participants to share their thoughts on
“time management”?

The first part of this session discusses the following:

 “Plan better avoid wastage”
 Understanding the timelines of the deliverables. Receiving the hand off from upstream teams
at right time is critical to start self contribution and ensure passing the deliverables to
downstream team.
 It is important to value others’ time as well to ensure overall organizational timelines are met
 Share the perspective of how important is time specifically in a global time zone mapping
scenario

Page 45 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Why Time Management?

Ask the question to the partipants and gather responses.

Discuss the responses with the group to understand the significance of time management.

Share the SSC model and how working along several time zones is important for the Shared Services
Center.

Activity Description:

1. Refer to the Aspects of Time Management table in the Student

Workbook and identify the rules that employees/workers must follow.
2. Refer to the Vocabulary Words table if you do not understand the meaning
of a word/term.

Suggested Responses:
 Time management has to be looked at an organizational level and not just individual level
 These Aspects teach us how to build the blocks of time management.

Time Management Aspects

Prompt participants to come up with some aspects and relate them back to here.

 Planning and goal setting

 Managing yourself
 Dealing with other people
 Your time
 Getting results

The first 4 Interconnect and Interact to give the 5th one – Results

Differentiate between Urgent and Important task

Urgent task

Page 46 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

 Assume importance as they demand immediate attention

Important Task
 May become urgent if left undone
 Usually have a long term effect
To judge importance vs. urgency, gauge tasks in terms of
 Impact of doing them
 Effect of not doing them

Main aim of prioritization is to avoid a crisis

We must

Schedule our Priorities

as opposed to
Prioritizing our Schedule

Time Management quadrants

1. Urgent and Important – Do Now
2. Not Urgent and Important – Schedule on your calendar
3. Urgent and Not Important – Delegate, Automate or Decline
4. Not Urgent Not Important – Delegate, Automate or Decline

Page 47 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Check Your Understanding

1. True or False? Time can be stored.

a. True
b. False

Suggested Responses:
False – Time once lost cannot be gotten back – hence important to plan time utilization properly

2. True or False? Time is perishable

a. True
b. False

Suggested Responses:
True – Time lost is lost for every – lost moments cannot be gotten back

3. True or False? Time management is required both at individual level and

organizational level.
a. True
b. False

Suggested Responses:
True – plan for activities organizational level and also at individual level

4. True or False? Activities should be judged basis Urgency and Importance

c. True
d. False

Suggested Responses:
True – prioritization should be based on 2x2 matrix of urgency and importance

Page 48 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Team Exercise

List the items and ask participants to classify them as per the quadrant.
Ask the participants to pick up the items listed below and place them in the
Urgent/Important quadrant. Discuss the rationale of their thoughts and categorization.

Activity Description:
1. Refer to the Time Management Quadrant on the display / in the Student
Workbook and categorize the below items.
2. Refer to the Vocabulary Words table if you do not understand the
meaning of a word/term.

Create teams of 2 participants and share the below list

Categorize the below items in the Time Management Quadrant

1. Wildly important goal

2. Last minute assignments from boss
3. Busy work
4. Personal health
5. Pressing problems
6. Crises
7. Planning
8. Time wasters
9. Professional development
10. Win-win performance agreement
11. Too many objectives
12. Vital customer call
13. Major Deadlines
14. Unimportant pre scheduled meetings
15. Meaningless management reports
16. Coaching and mentoring team
17. Low priority email
18. Other people’s minor issues

Page 49 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

19. Workplace gossip

20. Exercise
21. Needless interruptions
22. Defining contribution
23. Aimless Internet surfing
24. Irrelevant phone calls

Suggested Answers:
Depends on rationale shared

1. Wildly important goal – Q1

2. Last minute assignments from boss – Q1
3. Busy work – Q4 – Consumes time however not pressing
4. Personal health – Q4 – requires planning and care not pressing
5. Pressing problems – Q1 – has to be solved immediately
6. Crises – Q1 – have to tended to immediately
7. Planning – Q2 – Important but not urgent; should be done before crisis
8. Time wasters – Q4
9. Professional development – Q2
10. Win-win performance agreement – Q2 – Expectation setting part of planning
11. Too many objectives – Q3 – Prioritize further to establish which are important and
pressing
12. Vital customer call – Q1 – Customer centricity
13. Major Deadlines – Q1
14. Unimportant pre scheduled meetings – Q3
15. Meaningless management reports – Q3 – Prioritize further to establish which are
important and pressing
16. Coaching and mentoring team – Q2
17. Low priority email – Q3 – Prioritize further to establish which are important and
pressing
18. Other people’s minor issues – Q3 – May not be urgent but important for team building
19. Workplace gossip – Q4 – Non value add; occasionally creates negativity
20. Exercise – Q4 – Important for health and personal well being. To be done in spare and
leisure time. Cannot be ignored.
21. Needless interruptions – Q3
22. Defining contribution – Q2
23. Aimless Internet surfing – Q4
24. Irrelevant phone calls – Q4 – Reserve and avoid

Page 50 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Summary

 It is important to manage time.

 To manage time one must:
 Prioritize
 Define Urgency
 Define Importance

Page 51 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Work Management and Prioritization

Ask participants to define a job and split it into activities.

Preparing morning tea is a good example. Define time, no of family members, preparation
required at night and then in the morning. Perfect execution to ensure good morning tea
!!! with family.
Gather responses.
Start the session by connecting the course content to the candidate responses.

Work Management
Six steps for expectation setting with the stakeholders
1. Describe the jobs in terms of major outcomes and link to the organization’s need
The first step in expectation setting is to describe the job to the employees. Employees need to
feel there is a greater value to what they do. We need to feel out individual performance has an
impact on the organization’s mission.
Answer this question: My work is the key to ensuring the organization’s success because…
While completing the answer link it to
- Job Description
- Team and Organization’s need
- Performance Criteria

2. Share expectations in terms of work style

While setting expectation, it’s not only important to talk about the “what we do” but also on
“how we expect to do it”. What are the ground rules for communication at the organization?
Sample ground rules
- Always let your tam know where are the problems. Even if you have a solution, no one
likes surprises.
- Share concerns openly and look for solutions
- If you see your colleagues doing something well, tell them. If you see them doing
something poorly, tell them.
Sample work style questions
- Do you like to think about issues by discussing them in a meeting or having quite time
alone?
- How do you prefer to plan your day?

Page 52 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

3. Maximize Performance - Identify what is required to complete the work: Supervisor needs /
Employee needs. Set input as well as output expectations
In order to ensure employees are performing at their best, the supervisor needs to provide not
only the resource (time, infrastructure, desk, recognition etc.) but also the right levels of
direction (telling how to do the task) and support (engaging with employees about the task).

4. Establish priorities. Establish thresh holds and crisis plan

Use the time quadrant to establish priorities. Refer to earlier session.

5. Revalidate understanding. Create documentation and communication plan to establish all

discussion
When you are having a conversation about expectations with stakeholders, you’re covering lot
of details so you’ll need to review to make sure you both have a common understanding of the
commitments you have made.

6. Establish progress check

No matter how careful you have been in setting expectations, you’ll want to follow up since
there will be questions as work progresses.
Schedule an early progress check to get things started the right way, and agreed on
scheduled/unscheduled further checks. Acknowledge good performance and point your ways to
improve

Page 53 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Check Your Understanding

1. True or False? Setting expectations is best done after the employee

has worked for 6 months.
a. True
b. False

Suggested Responses:
False, work expectations have to be set from day 1 – so that roles & responsibilities are clear

2. True or False? Do not provide too many details when setting expectations.
a. True
b. False

Suggested Responses:
False, as much details with examples can help clarify all the expectations

3. True or False? Always check to make sure there is a common understanding of

expectations.
a. True
b. False

Suggested Responses:
True, asking the person to rearticulate the understanding of expectations is best way to ensure there is clear
understanding on both sides

4. True or False? Try not to ask too many questions while setting expectations.
a. True
b. False

Suggested Responses:
False, questions always to be encouraged to ensure any clarifications are responded

Page 54 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

5. True or False? Employees need to know what tasks to do and how to communicate,
appreciating work styles.
a. True
b. False

Suggested Responses:
True, provides clarity and enables response based on work styles

6. True or False? Employees do not need to know how their work contributes to
organizational results.
a. True
b. False

Suggested Responses:
False, linking efforts with common goals is very motivating and develops team effort

7. True or False? Employees need to know what their team members’ performance
problems are.
a. True
b. False

Suggested Responses:
True, knowing common problems bring teams focused towards solutions.

8. True or False? Employees how have work style different from the Boss/Peers need
to change.
a. True
b. False

Suggested Responses:
False, they need to adapt and respond based on the partners work style – understanding the work styles is very critical to
enhance team operating performance.

Page 55 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Summary

 Define work and activities:

 What

 How

 Define Stakeholders and participants:

 Whom to serve

 Who all are serving

 Plan, Execute and Monitor

Page 56 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Quality Standards Adherence

Let’s Get Started
Provide a brief overview of the session. Discuss key points on importance of defining quality
output. Re Iterate importance of Efficiency and Effectiveness
o Efficiency – Performing activities well
o Effectiveness – Performing activities right

Goals and Objectives compliant to SMART

S
Specific M
Work activities
should be
specific. Why Measurable A
R
and How to be The output
defined metrics and
yardsticks should Achievable

T
be defined. Should be
assigned to
those Realistic
responsible for Should be
achieving it. challenging yet
attainable. Have Time bound
a motivational Time period for
effect achievement is
clearly stated

Page 57 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Efficiency vs Effectiveness

In efficient Efficient
Goals / Doing the Right

Pursuing right goals but in efficient Pursuing right goals and efficient
Pursuit of Appropriate

Effective
Things

Pursuing wrong goals and inefficient. Pursuing Wrong goals but is Efficient
Effective
In

In efficient Efficient

Use of Resources / Doing Things Right

Service Level Agreements

Service Level Agreement (SLA) is a contract between a service provider and its internal or external
customers that documents what services the provider will furnish

SLA measures the service provider’s performance and quality in a number of ways.

Some sample metrics SLAs may specify or include

 Availability and uptime – the percentage of the time services will be available
 The no of users being served, the bandwidth or volume being addressed or the quantum f work
being performed in work units
 Specify performance benchmarks to which actual performance will be periodically compared
 Turnaround time

In addition to establishing performance metrics, an SLA may include a plan for addressing downtime and
documentation for how the service provider will compensate customers in the event of a contract breach.
SLAs, once established, should be periodically reviewed and updated to reflect changes in technology and
the impact of any new regulatory directives

Page 58 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Summary

 Every activity must have defined goals and objectives. These goals and objectives
should be SMART complaint (Specific, Measurable, Achievable, Realistic and
Time-bound).
 One must balance the efficiency and effectiveness while performing the tasks to
achieve the desired objectives.
 The Service Level Agreements should be clearly laid out to measure the quality
and performance.

Course Conclusion
Course Conclusion
“We’ve almost reached the end of the course! Before we wrap up, let’s review what
we’ve learned today”
Ask the participants to recall key learning points from the session and map these
learning points to the course objectives.

Thank You Note

Page 59 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1 - Unit – 2.1

Summarizing Data and Revisiting Probability
Topic Session Goals

Summarizing Data, and By the end of this session, you will be able
to:
Revisiting Probability
1. Summarize Data
2. Work on Probability.

Material and Handouts

Facilitator Material Participant Material and Handouts

Facilitator Guide, Handouts  Participants’ Guide

Session Plan:
Activity Location
Summary statistics- summarizing data with R Classroom

Probability Classroom

Expected value Classroom

Random & Bivariate Random Variables Classroom

Probability Distribution Classroom

Normal Distribution Classroom

Central Limit Theorem Classroom

Random walk Classroom

Check your understanding Classroom

Summary Classroom

Page 60 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Summary statistics- summarizing data with R

•summary(data_frame)
•summary(iris)
•Output : Mean, Median, Minimum, Maximum, 1st and 3rd quartile

>summary(dataset)
For example ,

Probability

Probability is the chance of occurrence anything.

P(A)= S/P
Where S is sample size or no of positive outcomes and P is the population size or total no of outcomes.

A probability distribution describes how the values of a random variable are distributed.

For example, the collection of all possible outcomes of a sequence of coin tossing is known to follow
the binomial distribution. Whereas the means of sufficiently large samples of a data population are known
to resemble the normal distribution. Since the characteristics of these theoretical distributions are well
understood, they can be used to make

Page 61 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Statistical inferences on the entire data population as a whole.

For example,
Probability of ace of Diamond in a pack of 52 cards when 1 card is pulled out at random.

Now, “At Random” means that there is no biased treatment with any card and the result will be totally at
random.
So, No. of Ace of Diamond in a pack = S = 1
Total no of possible outcomes = Total no. of cards in pack = 52
Probability of positive outcome = S/P = 1/52
That is we have 1.92% chance that we will get positive outcome.
Expected value

The expected value of a random variable is intuitively the long-run average value of repetitions of the
experiment it represents.

For example, the expected value of a dice roll is 3.5 because, roughly speaking, the average of an
extremely large number of dice rolls is practically always nearly equal to 3.5.
Less roughly, the law of large numbers guarantees that the arithmetic mean of the values almost
surely converges to the expected value as the number of repetitions goes to infinity.

The expected value is also known as the expectation, mathematical expectation, EV, mean, or first
moment.

More practically, the expected value of a discrete random variable is the probability-weighted average of all
possible values. In other words, each possible value the random variable can assume is multiplied by its
probability of occurring, and the resulting products are summed to produce the expected value. The same
works for continuous random variables, except the sum is replaced by an integral and the probabilities
by probability densities. The formal definition subsumes both of these and also works for distributions
which are neither discrete nor continuous: the expected value of a random variable is the integral of the
random variable with respect to its probability measure.The expected value is a key aspect of how one
characterizes a probability distribution; it is a location parameter.

Page 62 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Random & Bivariate Random Variables

 Random Variable:

 A random variable, aleatory variable or stochastic variable is a variable whose value is subject to
variations due to chance (i.e. randomness, in a mathematical sense). A random variable can take on
a set of possible different values (similarly to other mathematical variables), each with an
associated probability, in contrast to other mathematical variables.

 A random variable is a real-valued function defined on the points of a sample space.

Random variables can be discrete, that is, taking any of a specified finite or countable list of values,
endowed with a probability mass function, characteristic of a probability distribution; or continuous, taking
any numerical value in an interval or collection of intervals, via a probability density function that is
characteristic of a probability distribution; or a mixture of both types. The realizations of a random variable,
that is, the results of randomly choosing values according to the variable's probability distribution function,
are called random variates.

For Example,
If we toss a coin for 10 times and we get heads 8 times then we cannot say that the 11th time if coin is
tossed then we get a head or a tail. But we are sure that we will either get a head or a tail.

 Bivariate Random Variable:

Bivariate Random Variables are those variables having only 2 possible outcomes. For example flip of coin.

Probability Distribution

There are 2 types of Distribution Functions:-

1. Discrete
2. Continuous

Page 63 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Probability Distribution Function or PDF is the function that defines probability of outcomes based on
certain conditions.
Based on Conditions, there are majorly 5 types PDFs.

Types of Probability Distribution:

 Binomial Distribution
 Poisson Distribution
 Continuous Uniform Distribution
 Exponential Distribution
 Normal Distribution
 Chi-squared Distribution
 Student t Distribution
 F Distribution

Page 64 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Normal Distribution

We come now to the most important continuous probability density function and perhaps the most
important probability distribution of any sort, the normal distribution.
On several occasions, we have observed its occurrence in graphs from, apparently, widely differing sources:
the sums when three or more dice are thrown; the binomial distribution for large values of n; and in the
hyper geometric distribution.

There are many other examples as well and several reasons, which will appear here, to call this distribution
“normal.”
If

We say that X has a normal probability distribution. A graph of a normal distribution, where we have
chosen a = 0 and b = 1, appears in figure below:

Normal Curve or Bell Curve

The shape of a normal curve is highly dependent on the standarddeviation.

Importance of Normal Distribution:

Normal distribution is a continuous distribution that is “bell-shaped”. Data are often assumed to be normal.
Normal distributions can estimate probabilities over a continuous interval of data values.

Page 65 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Normal Distribution curves with different conditions

Also we can say A standard normal distribution is a normal distribution with :

 Mean=0 and
 Standard deviation = 1.

Standard Normal Distribution

The normal distribution f(x), with any mean μ and any positive deviation σ, has the following
properties:
 It is symmetric around the point x = μ, which is at the same time the mode, the median and the mean
of the distribution.
 It is unimodal: its first derivative is positive for x < μ, negative for x > μ, and zero only at x = μ.
 Its density has two inflection points (where the second derivative of is zero and changes sign),
located one standard deviation away from the mean, namely at x = μ − σ and x = μ + σ.
 Its density is log-concave.
 Its density is infinitely differentiable, indeed super smooth of order 2.
 Its second derivative f′′(x) is equal to its derivative with respect to its variance σ2.

Test of Normal Distribution:

Normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute
how likely it is for a random variable underlying the data set to be normally distributed.
• More precisely, the tests are a form of model selection, and can be interpreted several ways,
depending on one's interpretations of probability:
• In descriptive statistics terms, one measures a goodness of fit of a normal model to the data – if the
fit is poor then the data are not well modeled in that respect by a normal distribution, without making a
judgment on any underlying variable.
• In frequentist statistics statistical hypothesis testing, data are tested against the null hypothesis that it

Page 66 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

is normally distributed.
• In Bayesian statistics, one does not "test normality" per se, but rather computes the likelihood that
the data come from a normal distribution with given parameters μ,σ (for all μ,σ), and compares that with the
likelihood that the data come from other distributions under consideration, most simply using a Bayes factor
(giving the relative likelihood of seeing the data given different models), or more finely taking a prior
distribution on possible models and parameters and computing a posterior distribution given the computed
likelihoods.

1. Graphical methods:
An informal approach to testing normality is to compare a histogram of the sample data to a normal
probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble
the normal distribution. This might be difficult to see if the sample is small. In this case one might proceed
by regressing the data against the quartiles of a normal distribution with the same mean and variance as the
sample. Lack of fit to the regression line suggests a departure from normality.(see Anderson Darling
coefficient and Minitab)
A graphical tool for assessing normality is the normal probability plot, a quantile-quantile plot (QQ plot) of
the standardized data against the standard normal distribution. Here the correlation between the sample data
and normal quartiles (a measure of the goodness of fit) measures how well the data are modeled by a
normal distribution. For normal data the points plotted in the QQ plot should fall approximately on a
straight line, indicating high positive correlation. These plots are easy to interpret and also have the benefit
that outliers are easily identified.
Back-of-the-envelope test
Simple back-of-the-envelope test takes the sample maximum and minimum and computes their z-score, or
more properly t-statistic (number of sample standard deviations that a sample is above or below the sample
mean), and compares it to the 68–95–99.7 rule: if one has a 3σ event (properly, a 3s event) and substantially
fewer than 300 samples, or a 4s event and substantially fewer than 15,000 samples, then a normal
distribution will understate the maximum magnitude of deviations in the sample data.
This test is useful in cases where one faces kurtosis risk – where large deviations matter – and has the
benefits that it is very easy to compute and to communicate: non-statisticians can easily grasp that "6σ
events are very rare in normal distributions".

2. Frequentist tests:
Tests of univariate normality include D'Agostino's K-squared test, the Jarque–Bera test, the Anderson–
Darling test, the Cramér–von Mises criterion, the Lilliefors test for normality (itself an adaptation of the
Kolmogorov–Smirnov test), the Shapiro–Wilk test, the Pearson's chi-squared test, and the Shapiro–Francia
test. A 2011 paper from The Journal of Statistical Modeling and Analytics concludes that Shapiro-Wilk has
the best power for a given significance, followed closely by Anderson-Darling when comparing the
Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors, and Anderson-Darling tests.
Some published works recommend the Jarque–Bera test. But it is not without weakness. It has low power
for distributions with short tails, especially for bimodal distributions. Other authors have declined to include

Page 67 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

its data in their studies because of its poor overall performance.

Historically, the third and fourth standardized moments (skewness and kurtosis) were some of the earliest
tests for normality. The Jarque–Bera test is itself derived fromskewness and kurtosis estimates. Mardia's
multivariate skewness and kurtosis tests generalize the moment tests to the multivariate case. Other early
test statistics include the ratio of the mean absolute deviation to the standard deviation and of the range to
the standard deviation.

More recent tests of normality include the energy test (Székely and Rizzo) and the tests based on the
empirical characteristic function (ecf) (e.g. Epps and Pulley, Henze–Zirkler, BHEP test). The energy and
the ecf tests are powerful tests that apply for testing univariate or multivariate normality and are statistically
consistent against general alternatives.

The normal distribution has the highest entropy of any distribution for a given standard deviation. There are
a number of normality tests based on this property, the first attributable to Vasicek.

3. Bayesian tests:

Kullback–Leibler divergences between the whole posterior distributions of the slope and variance do not
indicate non-normality. However, the ratio of expectations of these posteriors and the expectation of the
ratios give similar results to the Shapiro–Wilk statistic except for very small samples, when non-
informative priors are used.

Spiegelhalter suggests using a Bayes factor to compare normality with a different class of distributional
alternatives. This approach has been extended by Farrell and Rogers-Stewart.

Page 68 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Central Limit Theorem

The central limit theorem states that under certain (fairly common) conditions, the sum of many random
variables will have an approximately normal distribution.
More specifically, where X1, …, Xn are independent and identically distributed random variables with the
same arbitrary distribution, zero mean, and variance σ2; and Z is their mean scaled by

Then, as n increases, the probability distribution of Z will tend to the normal distribution with zero mean
and variance (σ2).

The central limit theorem also implies that certain distributions can be approximated by the normal
distribution, for example:

• The binomial distribution B(n, p) is approximately normal with mean np and variance np(1−p) for
large n and for p not too close to zero or one.
• The Poisson distribution with parameter λ is approximately normal with mean λ and variance λ, for
large values of λ.
• The chi-squared distribution χ2(k) is approximately normal with mean k and variance 2k, for large
k.
• The Student's t-distribution t(ν) is approximately normal with mean 0 and variance 1 when ν is
large.

Random walk

A random walk is a mathematical formalization of a path that consists of a succession of random steps.

For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging
animal, the price of a fluctuating stock and the financial status of a gambler can all be modeled as random
walks, although they may not be truly random in reality. The term random walk was first introduced by
Karl Pearson in 1905.

Random walks have been used in many fields: ecology, economics, psychology, computer science, physics,
chemistry, and biology.

Page 69 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Random Walk Graphical Representation

Application of Random Walk:

Applying the random walk theory to finance and stocks suggests that stock prices change randomly, making
it impossible to predict stock prices. The random walk theory corresponds to the belief that markets
are efficient, and that it is not possible to beat or predict the market because stock prices reflect all available
information and the occurrence of new information is seemingly random as well

Check your understanding

1. What is central Limit Theorem?

2. What is Normal Distribution Curve and Why it is called as
Bell Curve?
3. How to find Summary statistics in R?
4. What are the various types of Probability Distribution curves?
5. Why is Normal curve so widely used?

Page 70 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Summary

 •summary(data_frame)
•summary(iris)
•Output: Mean, Median, Minimum, Maximum, 1st and 3 rd
 Normality tests are used to determine if a data set is well-modeled by a normal distribution and to
compute how likely it is for a random variable underlying the data set to be normally distributed.
 A random walk is a mathematical formalization of a path that consists of a succession of random steps.
 Probability Distribution Function or PDF is the function that defines probability of outcomes based on
certain conditions.
 A random variable, aleatory variable or stochastic variable is a variable whose value is subject to
variations due to chance

1. Divide the class into groups of 4-5 participants

2. Give the Dataset to the participants.
3. Give 10 minutes to the class for each group to discuss the various
random variable examples process along with a discussion on the
methods type which they would like to use
4. Each group presents their examples with justification. (5 min
each)

CASE STUDY – Binomial Dist.

Sachin buys a chocolate bar every day during a promotion that says one out of six chocolate bars
has a gift coupon within. Answer the following questions:

•What is the distribution of the number of chocolates with gift coupons in seven days?
•What is the probability that Amir gets no chocolates with gift coupons in seven days?
•Amir gets no gift coupons for the first six days of the week. What is the chance that he will get a
one on the seventh day?

Page 71 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

•Amir buys a bar every day for six weeks. What is the probability that he gets at least three gift
coupons?
•How many days of purchase are required so that Amir’s chance of getting at least one gift
coupon is 0.95 or greater?

Solution:
Steps:

Formula = nCrpr q n-r

Where
 n is the no. of trials
 r is the number of successful outcomes
 p is the probability of success, and
 q is the probability of failure.

Other important formulae include

p+q=1
Hence, q = 1 – p
Thus, p = 1/6
q = 5/6
1. Distribution of number of chocolates with gift coupons in 7 days: 7C r (1/6)r (5/6)7-r
2. Probability of failing 7 days: P(X=0) =(5/6)7
3. Probability of winning a coupon on the 7th day: 1/6
4. The number of winning at least 3 wrappers in six weeks:

P(X ≥3)=1 – P(X≤2)

=1 – (P(X=0) + P(X=1) + P(X=2)
=1 – (0.0005+0.0040+0.0163)
= 0.979

5. Number of purchase days required so that probability of success is greater than 0.95:

P(X ≥1) ≥ 0.95 = 1 – P(X ≤0) ≥ 0.95

= 1 – P(X=0) ≤ 0.05
= n ≥ 16.43 (applying log function)
= 17days.

Page 72 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1 - Unit – 2.2

Work Effectively with Colleagues
Topic Topic description Material Required

Welcome and  Welcome participants to the n/a

Introduction course
 Introduce facilitators
 Recap of core skills through
questions and Polling Questions
 Review learning objectives

Team Work  Importance of team above  Facilitator Guide

individual  Student Handbook
 How to unlock group potential
 Compete as a team not within the
team

Professionalism  Being a Professional  Facilitator Guide

 Importance of Grooming  Student Handbook

Effective  What is effective communication  Facilitator Guide

Communication  Verbal and Written  Student Handbook
Communication
 Common Communication Barriers

Course  Validate learning objectives have n/a

Conclusion been met
 Make final summary remarks
 Conduct final Q&A
 Conclude the course

Page 73 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Facilitator Preparation
Responsibilities

 Review examples provided: reflect on your own experiences and determine when to share
them.

 Review all material – Facilitator Guide, Presentation, Guides and Handouts (if any)

 Make sure you havecopies of all the handouts.

 Make sure the learning resources are loaded on your computer.

 Conduct a run through of the content. Conduct a dress rehearsal of the session as you
move through the content. Make sure you are comfortable with the tools and interactions
recommended in the facilitator guide.

 Note that all examples are in italics to emphasize key learning points; however, you may
use your own professional experience to enhance the learning.

 Make sure you create folders for all breakout activities.

Page 74 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Principles of Facilitating
Personal Experiences
As a facilitator, you lead participants through prepared scenarios and discussions. During this
process, relate your own professional experience to add realism. Often, personal experienceson
how you helped a colleague through the career ownership process and guided them to achieving
work satisfaction are more memorable than step-by-step instructions on following the career
ownership process. Sharing experiences helps participants understand how professionals work and
think, and gives them the opportunity to apply those lessons to their own work processes. Also,
participants are more likely to remember answers if they have to think and explore on their own.
Your goal is to foster independent thinking and action rather than having participants depend on
your experience.

Experiential Learning
This workshop includes exercises designed to help participants discover the principles of guiding the
participants through the career ownership process and career satisfaction. Encourage a free-
wheeling discussion and call out important trends and insights. Make liberal use of the whiteboard
to capture and display critical participant insights.

Socratic Questions
Your goal throughout the session is to guide participants towards thinking through the scenarios
and discussion questions independently, rather than providing answer. For example:

Rather than saying… Ask…

Page 75 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Topic: Team Work

Working Effectively
Welcome the participants to the course and move to the introductions.

Why are you here today? [Course Objectives]

“Why are you here today?”
After reviewing and arranging responses, summarize the responses and map the responses to the suggested
course benefits below.

“Regardless of why you’re here today, we’re all going to walk away with some key benefits – let’s discuss those briefly.”

Debrief the following:

Why are teams more popular??
 Teams outperform individuals
 Teams use employee talent better
 Teams are more flexible and responsive to environmental changes in the organization.
 Teams facilitate employee involvement
 Teams are an excellent way to democratize an organization and increase motivation.

Page 76 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Topic: Team Work

Team Work

Ask participants to share their thoughts on:

 What is team work?

 How is it more advantageous?

What is a Team?
A team comprises a group of people linked in a common purpose.
Teams are especially appropriate for conducting tasks that are high in complexity and have many
interdependent subtasks.

Team Work
Coming together is a beginning, keeping together is progress and working together is success. A team is a number of people
associated together in work or activity. In a good team members create an environment that allows everyone to go beyond
their limitation.
Why do we need teamwork – The overriding need of all people working together for the same organization is to make the
organization profitable.

Team Building

Page 77 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Trust

Communic
Planning
ation

Decision Problem
Making Solving

Team work vs. Individual work

• Team Work • Individual work

o Agree on goals/ milestones o Work on tasks
o Establish tasks to be completed o Work on new / revised tasks
o Communicate / monitor progress
o Solve Problem
o Interpret Results
o Agree completion of projects

Team Development

Page 78 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Team building is any activity that builds and strengthens the team as a team. The teams that are
integrated in Spirit, Enthusiasm, Cohesiveness and Camaraderie are vitally important.
Team building fundamentals
 Clear Expectations – Vision/Mission
 Context – Background – Why participation in Teams?
 Commitment – dedication – Service as valuable to Organization & Own
 Competence – Capability – Knowledge
 Charter – agreement – Assigned area of responsibility
 Control – Freedom & Limitations
 Collaboration – Team work
 Communication
 Creative Innovation
 Consequences – Accountable for rewards
 Coordination
 Cultural Change
Roles of team member
 Communicate
 Don't Blame Others
 Support Group Member's Ideas
 No Bragging – No Full of yourself
 Listen Actively
 Get Involved
 Coach, Don't Demonstrate
 Provide Constructive Criticism
 Try To Be Positive
 Value Your Group's Ideas

Ice Breaker – Open Discussion of below points

“Please share your thoughts on following.”
After participants give their views, debrief and bring to consensus
Team Work: Pros and Cons

Page 79 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Check Your Understanding

1. True or False. The organizations that display a higher level fo team work are
generally more successful.

2. Which one of the following is NOT a key attribute of Team Work?

a. Commitment
b. Communication
c. Vague expectations
d. Transparency
Suggested Answer:
c. Vague expectations

Summary

 A team comprises a group of people linked in a common purpose.

 Team work is essential to the success of every organization. In a good
team, members create an environment that allows everyone to go
beyond their limitation.
 Some of the fundamentals on which a team is built are: Collaboration,
Clear Expectations and Commitment.

Page 80 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Importance of Professionalism
Provide a brief overview of the session. Discuss the importance of professional behavior in the
organization.

Activity Description:
Ask the candidates “What does professionalism mean to you?”

Take a few minutes and write down your thoughts… as a definition

or description.
Summarize
Who is professional?
A person who has achieved an acclaimed level of proficiency in any trade and whose competencies
can be measured against fixed set of standards or guidelines. The following are the key
characteristics in a person that make him stand out as a professional.
 Positively proactive. Professionals demonstrate behaviours that are positive, proactive instead
of negative, and reactive.
 Respect. Through this ethic and value of respect, professionals are known and trusted within
and without their respective organizations.
 Opportunities to help others. Those who avow before understand they have a responsibility to
help others whether it is to grow self-leadership skills or provide some expert advice.
 Follow-up. No one likes to wait for un-returned phone calls or emails. Professionals make it a
habit to follow-up on everything and accept responsibility when they fail to engage in that
behavior.
 Empathy. Professionals know how to be empathetic. This characteristic is a one of the signs of
high emotional intelligence and a predictor for leadership success.
 Self-confident. When individuals are self confident, they do not have to put others down at
their own expense. These individuals have a high sense of balanced self-esteem and role
awareness.
 Sustainable. Professionals are truly sustainable in that they can continue forward when times
become difficult. Their ethics and beliefs keep them focused.
 Integrity. Integrity is putting your values into action; doing the right thing when no one else is
looking without personal gain or benefit; and accepting a potential personal cost.
 Optimize all interactions. This is critical because professionals do not negate the value of
people. They look to see how one interaction can benefit someone else even before himself or
herself.

Page 81 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

 Nimble. Being flexible and open to change allows these individuals to be quick on their feet
and nimble to the opportunities that they encounter on a daily basis.
 Awareness. Having a high level of awareness of themselves, the marketplace, the community
and even the world helps these individuals continually stay on top of things.
 Leadership. Last, but not least, professionals demonstrate exceptional leadership skills and
even more importantly self-leadership skill. For if you cannot lead yourself, you cannot lead
others.
What is professionalism?
 Professionalism is the competence or set of skills that are expected from a professional.
 Professionalism determines how a person is perceived by his employer, co-workers, and
casual contacts.

How long does it take for someone to form an opinion about you?
 Studies have proved that it just takes six seconds for a person to form an opinion about
another person.

How does someone form an opinion about you?

Eye Contact – Maintaining eye contact with a person or the audience says that you are confident. It says that you are
someone who can be trusted and hence can maintain contact with you.
Handshake – Grasp the other person’s hand firmly and shake it a few times. This shows that you are enthusiastic.
Posture – Stand straight but not rigid, this will showcase that you are receptive and not very rigid in your thoughts.
Clothing – Appropriate clothing says that you are a leader with a winning potential.
How to exhibit professionalism?

 Empathy
 Positive Attitude
 Teamwork
 Professional Language
 Knowledge
 Punctual
 Confident
 Emotionally stable

Grooming
What are the colours that one can opt for work wear?
A good rule of thumb is to have your pants, skirts and blazers in neutral colours. Neutrals are not only restricted to grey
brown and off white - you can also take advantage of the beautiful navies, forest greens, burgundies, tans and caramel

Page 82 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

tones around. Pair these neutrals with blouses, scarves or other accessories in accent colours - ruby red, purple, teal blue,
soft metallic and pinks are some examples.
Things to remember

 Wear neat clothes at work which are well ironed and do not stink.
 Ensure that the shoes are polished and the socks are clean
 Cut your nails on a regular basis and ensure that your hair is in place.
 Women should avoid wearing revealing clothes at work.
 Remember that the way one presents oneself plays a major role in the professional world.

Check Your Understanding

1. True or False? Polo T-Shirt is professional dress.

a. True
b. False

Suggested Responses:
False

2. True or False? I can wear sandals to office

a. True
b. False

Suggested Responses:
False

3. True or False? Well tailored Salwar Suit is not professional.

a. True
b. False

Suggested Responses:
False

Page 83 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Activity Description:
Ask the participants to pick up the items listed below and place them in
the Acceptable / Unacceptable category. Discuss the rationale of their
thoughts and categorization.

Categorize the below items in the Acceptable / Unacceptable

1. Polo T Shirt –
2. Golf Shoes –
3. Collared Shirt -
4. Suede Shoes –
5. Leather laced Shoes –
6. Matching Socks –
7. Backpacks –
8. Lynards in Pockets –
9. Jeans on weekdays –
10. Rolled Up Sleeves –
11. Matching Belt and Shoes –
12. Pressed Suit –
13. Knee Length Skirt –
14. Short Skirts –
15. Obvious Tatoos –

1. Polo T Shirt – Unacceptable

2. Golf Shoes – Unacceptable
3. Collared Shirt - Acceptable
4. Suede Shoes – Unacceptable
5. Leather laced Shoes – Acceptable
6. Matching Socks – Acceptable
7. Backpacks – Unacceptable
8. Lynards in Pockets – Unacceptable
9. Jeans on weekdays – Unacceptable
10. Rolled Up Sleeves – Unacceptable
11. Matching Belt and Shoes – Acceptable

Page 84 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

12. Pressed Suit – Acceptable

13. Knee Length Skirt – Acceptable
14. Short Skirts – Unacceptable
15. Obvious Tatoos – Unacceptable

Summary

 Professionalism determines how a person is perceived by his employer

andco-workers.
 Empathy, Positive Attitude, Teamwork, Professional Language,
Knowledge, Punctuality, Confidence are some of the key characteristics
that determine the professionalism of a person.
 The type of clothes you wear and grooming also plays an important role
in forming an impression of the person in his/her work environment.

Page 85 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Effective Communication
Provide a brief overview of the session.
Prompt candidates to discuss the consequences of ineffective or unclear
communication.

We would probably all agree that effective communication is essential to workplace effectiveness.
And yet, we probably don’t spend much time thinking about how we communicate, and how we
might improve our communication skills. The purpose of building communication skills is to
achieve greater understanding and meaning between people and to build a climate of trust,
openness, and support. To a large degree, getting our work done involves working with other
people. And a big part of working well with other people is communicating effectively. Sometimes
we just don’t realize how critical effective communication is to getting the job done. So, let’s have
an experience that reminds us of the importance of effective communication. Actually, this
experience is a challenge to achieve a group result without any communication at all! Let’s give it a
shot.

Activity Description:
Ask the participants to share an experience that reminds them of the significance of
effective communication OR consequences of ineffective communication.

Page 86 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

What is Effective Communication?

We cannot not communicate.
The question is: Are we communicating what we intend to communicate?
Does the message we send match the message the other person receives?
Impression = Expression

Real communication or understanding happens only when the receiver’s impressionmatches what
the sender intended through his or her expression.So the goal of effective communication is a
mutual understanding of the message.

In simple terms, effective communication means this . . .

You say it.
I get it.

But how do we know if the other person “gets” our message? We don’t know until wecomplete the
communication.Until a message is complete, the best we can say about its meaning is this:

The meaning of a message is not what is intended by the sender, but what is
understood by the receiver.

So what does it take to complete communication? It takes completing the loop. It takesadding one
more step. It takes feedback.
In simple terms, complete or effective communication means . . .
You say it.
I get it.
You get that I got it.

So far, then, we’ve defined effective communication and what makes it complete. Let’snow
explore the process, or circle, of communication to see how, where, and why itbreaks down.

Forms of Communication:

Page 87 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

The most common way in which we communicate is by talking to the other person. What are the other possible ways in
which you communicate?

There are three main forms of Communication:

1. Verbal communication
2. Non verbal communication
3. Written communication

Verbal Communication
Verbal communication refers to the use of sounds and language to relay a message. It serves as a vehicle for expressing
desires, ideas and concepts and is vital to the processes of learning and teaching. In combination with nonverbal forms
of communication, verbal communication acts as the primary tool for expression between two or more people.

Types of verbal communication:

Interpersonal communication and public speaking are the two basic types of verbal
communication. Whereas public speaking involves one or more people delivering a
message to a group, interpersonal communication generally refers to a two-way
exchange that involves both talking and listening.
Verbal communication has many purposes, but its main function is relaying a
message to one or more recipients. It encompasses everything from simple one-
syllable sounds to complex discussions and relies on both language and emotion to
produce the desired effect. Verbal communication can be used to inform, inquire,
argue and discuss topics of all kinds.

Non Verbal Communication

How do we communicate without words???

 We communicate a lot to each other outside what we say.

 We create confusion when our verbal and nonverbal messages don’t match & When verbal and
nonverbal messages don’t match, we tend to “listen” to the nonverbal one.
(Intuitively, we generally view others’ “body language” as a more reliableindicator of their
attitudes and feelings than their words.)
 We can learn to read the meanings of nonverbal behaviors.
- The key is discovering an individual’s behavior patterns—there is predictability to their
meaning.
- However, be careful—people can mask their feelings.

Page 88 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

- Also, trying to read something into every movement others make can get in the way of
effective interactions.

Forms of non verbal communication

1. Ambulation is the way one walks. Whether the person switches, stomps, or swaggers can
indicate how that person experiences the environment.
2. Touching is possibly the most powerful nonverbal communication form. People
communicate trust, compassion, tenderness, warmth, and other feelings through touch.
Also, people differ in their willingness to touch and to be touched. Some people are
“touchers” and others emit signals not to touch them.
3. Eye contact is used to size up the trustworthiness of another. Counselors use this
communication method as a very powerful way to gain understanding and acceptance.
Speakers use eye contact to keep the audience interested.
4. Posturing can constitute a set of potential signals that communicate how a person is
experiencing the environment. It is often said that a person who sits with his/her arms
folded and legs crossed is defensive or resistant. On the other hand, the person may just be
cold.
5. Tics are involuntary nervous spasms that can be a key to indicate one is being threatened.
For example, some people stammer or jerk when they are threatened. But these mannerisms
can easily be misinterpreted.
6. Sub-vocals are the non-words one says, such as “ugh” or “um.” They are used when one is
trying to find the right word. People use a lot of non-words trying to convey a message to
another person. Another example is the use of "you know." It is used in place of the "ugh"
and other grunts and groans commonly used.
7. Distancing is a person’s psychological space. If this space is invaded, one can become
somewhat tense, alert, or “jammed up.” People may try to move back to reestablish their
personal space. The kind of relationship and the motives toward one another determines this
personal space.
8. Gesturing carries a great deal of meaning between people, but different gestures can mean
different things to the sender and the receiver. This is especially true between cultures. Still,

Page 89 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

gestures are used to emphasize our words and to attempt to clarify our meaning.
9. Vocalism is the way a message is packaged and determines the signal that is given to
another person. For example, the message, “I trust you,” can have many meanings. “I trust
you” could imply that someone else does not. “I trust you” could imply strong sincerity. “I
trust you” could imply that the sender does not trust others.

Written Communication
Written communication involves any type of message that makes use of
the written word. Written communication is the most important and the
most effective of any mode of business communication.

Examples of written communications generally used with clients or other

businesses include email, Internet websites, letters, proposals, telegrams,
faxes, postcards, contracts, advertisements, brochures, and news releases.

Advantages and disadvantages of written communication:

Advantages
- Creates permanent record
- Allows to store information for future reference
- Easily distributed
- All recipients receive the same information
- Written communication helps in laying down apparent principles, policies and rules for
running on an organization.
- It is a permanent means of communication. Thus, it is useful where record maintenance is
required.
- Written communication is more precise and explicit
- Effective written communication develops and enhances organization’s image
- It provides ready records and references
- Written communication is more precise and explicit.
- Effective written communication develops and enhances an organization’s image
- Necessary for legal and binding documents

Disadvantages of Written Communication

- Written communication does not save upon the costs. It costs huge in terms of stationery
and the manpower employed in writing/typing and delivering letters.

Page 90 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

- Also, if the receivers of the written message are separated by distance and if they need to
clear their doubts, the response is not spontaneous.
- Written communication is time-consuming as the feedback is not immediate. The encoding
and sending of message takes time.
- Effective written communication requires great skills and competencies in language and
vocabulary use. Poor writing skills and quality have a negative impact on organization’s
reputation.
- Too much paper work and e-mails burden is involved

Common Etiquettes In Written Communication

Continuing with the series of etiquettes in communication, language expert Preeti Shirodkar tells us about what we
need to keep in mind while communicating in writing.
While written communication affords greater flexibility, since it can be edited and both composed and read at leisure or
at one's pace, a great deal of care needs to be taken, in order to ensure its effectiveness; as it can serve as a point of
reference, which one can turn to time and again, thus creating a more lasting impact.

1 – Structuring of the Content

Introduction, Body and Conclusion: While writing one should ensure that the content is well organized, with the
overview/basic details comprising the introduction; all major points with their explanation and exemplification
constituting the body (preferably divided into a separate paragraph each for every new point, with titles and subtitles, if
necessary).

2 – Ensuring Connectivity
The content that comprises a piece of writing should reflect fluency and should be connected through a logical flow of
thought, in order to prevent misinterpretation and catch the attention of the reader.
Moreover, care should be taken to ensure that the flow is not brought about through a forced/deliberate use of
connectives, as this make the piece extremely uninteresting and artificial.

3 – Steering Clear of Short Form

People may not be aware of the meaning of various short forms and may thus find it difficult to interpret them.
Moreover, short forms can at time be culture specific or even organization specific and may thus unnecessarily
complicate the communication.

4 – Importance of Grammar, Spelling and Punctuation

Improper grammar can at worst cause miscommunication and at least results in unwanted humour and should be thus
avoided. So too, spellings can create the same effect or can even reflect a careless attitude on part of the sender.
Finally, effective use of punctuations facilitates reading and interpretation and can in rare cases even prevent a
completely different meaning, which can result in miscommunication.

Page 91 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

5 – Sensitivity to the Audience

One needs to be aware of and sensitive to the emotions, need and nature of the audience in choosing the vocabulary,
content, illustrations, formats and medium of communication, as a discomfort in the audience would hamper rather than
facilitate communication.

6 – Importance of Creativity
In order to hold the readers' attention one needs to be creative to break the tedium of writing and prevent monotony
from creeping in.
This is especially true in the case of all detailed writing that seeks to hold the readers' attention.

7 – Avoidance Excessive use of Jargons

Excessive use of jargon can put off a reader, who may not read further, as, unlike a captive audience, the choice of
whether to participate in the communication rests considerably with the reader.

Some Do’s and Don’ts of Writing

 Be Specific: Just like a reporter, communicate the “who, what, where, why, when and how”
of what needs to done. Stay objective and specific.
 Avoid the Passive Voice: Instead of writing “The program was planned by Dane,” write,
“Dane planned the program.”
 Be Concise: There’s no need to be long-winded. Get to the point. You’ll lose readers if you
spout off too long!
 Get Things Right: Take great care when spelling people’s names,, and other specifics. And
also make sure that you do a careful proof of your work.
 Know When Formal Language is Required: If you’re writing an informal note to group
members, it’s fine to use contractions (“don’t” instead of “do not”).However, if you’re
writing for a formal audience, like a proposal to the board of directors, be more formal with
your language.
 Read It Out Loud: One very effective way to self-proof your work is to read it out loud. This
will help you determine if you’ve used incorrect words, if your sentences run on too long, if
your tenses don’t match, and more.

Communication Barriers

Ask to the candidates “What Communication Barrier means to you?”

Take a few minutes and share your thoughts/ examples.

Common barriers to effective Communication:

Page 92 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

1. The use of jargons. Over Over-complicated, unfamiliar and/or technical terms.

2. Emotional barriers and taboos. Some people may find it difficult to express their emotions
and some topics may be completely 'off-limits' or
taboo.
3. Lack of attention, interest, distractions, or
irrelevance to the receiver.
4. Differences in perception and viewpoint.
5. Physical disabilities such as hearing problems or
speech difficulties.
6. Physical barriers to non verbal communication.
Not being able to see the non-verbal cues, gestures,
posture and general body language can make communication less effective. Accents.
7. Language differences and the difficulty in understanding unfamiliar accents.
8. Expectations and prejudices which may lead to false assumptions or
stereotyping. People often hear what they expect to hear rather than what is actually said and
jump to incorrect conclusions.
9. Culturaldifferences. The norms of social interaction vary greatly in different cultures, as do
the way in which emotions are expressed. For example, the concept of personal space varies
between cultures and between different social settings.

Check Your Understanding

1. True or False? A good definition of communication is the sending of

information from
one person to another.
a. True
b. False

Suggested Responses:
False

2. True or False? Good working relationships between people form an important

foundation for effective communication
a. True
b. False

Page 93 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Suggested Responses:
True

3. True or False? Empathy is one of the most important concepts in

communication.
a. True
b. False

Suggested Responses:
True

4. True or False? The best way to get feedback is to ask, “Do you have any
questions?”
a. True
b. False

Suggested Responses:
False

5. True or False? A person’s attitude toward the value of communication is

more important than the skills or methods used to communicate.
a. True
b. False

Suggested Responses:
True

6. True or False? Everyone should be responsible for effective upward,

downward, and
horizontal communication.
a. True
b. False

Page 94 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Suggested Responses:
True

7. True or False? A sender has failed to communicate unless the receiver

understands the
message the way the sender intended it.
a. True
b. False

Suggested Responses:
True

8. True or False? The grapevine is usually an accurate source of information,

and should
be used intentionally to communicate.
a. True
b. False
Suggested Responses:
False

9. True or False? If people don’t understand, they will usually indicate so by

asking
questions or by saying they don’t understand.
a. True
b. False

Suggested Responses:
False

Page 95 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

10. True or False? In order to have an effective communication program, top

management
must take an active part
a. True
b. False
Suggested Responses:
True

11. True or False? Where persuasion is needed, oral communication is better

than written
Communication.
a. True
b. False

Suggested Responses:
True

12. True or False? In keeping others informed, it is better to under-communicate

than over communicate
a. True
b. False

Suggested Responses:
False

13. True or False? The best way to be sure we understand a communication is to

repeat it
back to the communicator.
a. True
b. False

Suggested Responses:
True

Page 96 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

14. True or False? The use of effective visual aids by a speaker usually provides
a
significant increase in the audience’s understanding of the message
a. True
b. False

Suggested Responses:
True

15. True or False? The use of a large vocabulary helps greatly in a person’s
communication effectiveness.
a. True
b. False

Suggested Responses:
True

16. True or False? Most people can listen approximately four times faster than
they speak.
a. True
b. False

Suggested Responses:
True

17. True or False? Information is usually distorted when it is orally

communicated through more than two people.
a. True
b. False

Suggested Responses:
True

Page 97 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

18. True or False? In getting people to listen, subject content is more important
than the
manner in which the subject is communicated.
a. True
b. False

Suggested Responses:
True

19. True or False? People will accept a logical explanation even if it ignores
their personal
Feelings.
a. True
b. False

Suggested Responses:
True

Summary

 The purpose of effective communication skills is to achieve greater

understanding between people that builds a climate of trust, openness,
and support.
 There are three commonly used forms of communication: Verbal, Non
verbal and Written.
 Lack of attention and interest, use of jargons and language differences
are some of the common communication barriers. One must watch out
for these in order to effectively communicate.

Page 98 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1: Unit – 3
SQL using R
Session Time: 180 minutes
Topic Activities

SQL using R By the end of this session, you will be able

to:
1. Understand NOSQL
2. Work on Excel and R integration.

Material and Handouts

Facilitator Material Participant Material and Handouts

Facilitator Guide, Handouts  Participants’ Guide

Session Plan:
Material Time
Activity Location
Needed (Minutes)
NO SQL Classroom Student Workbook
Student Workbook
Excel and R integration with R connector Classroom
Student Workbook
Check your understanding Classroom
Student Workbook
Summary Classroom

Page 99 of 126
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Step-by-Step
NO SQL

Before we understand about NO SQL we will see how SQL is used in R.

SQL using R:

It is sqldf, an R package for running SQL statements on data frames.

To load the “SQLDF” package we use below step

Library (sqldf)

# Use the titanic data set

data(titanic3, package=”PASWR”)
colnames(titanic3)
head(titanic3)

NO SQL:
A NoSQL (originally referring to "non SQL" or "non-relational") database provides a
mechanism for storage and retrieval of data that is modeled in means other than the tabular
relations used in relational databases.
NoSQL databases are increasingly used in big data and real-time web applications. NoSQL
systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like
query languages.
There have been various approaches to classify NoSQL databases, each with different categories
and subcategories, some of which overlap.

The Benefits of NoSQL

When compared to relational databases, NoSQL databases are more scalable and provide
superior performance, and their data model addresses several issues that the relational model is
not designed to address:

 Large volumes of structured, semi-structured, and unstructured data

 Agile sprints, quick iteration, and frequent code pushes
 Object-oriented programming that is easy to use and flexible
 Efficient, scale-out architecture instead of expensive, monolithic architecture

Page 100 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

A basic classification based on data model, with examples:

 Column: Accumulo, Cassandra, Druid, HBase, Vertica
 Document: Clusterpoint, Apache CouchDB, Couchbase, DocumentDB, HyperDex,
Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx
 Key-value: CouchDB, Oracle NoSQL Database, Dynamo, FoundationDB, HyperDex,
MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB, MUMPS
 Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
 Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB

NoSQL vs. SQL Summary

SQL Databases NOSQL Databases

One type (SQL database) with minor variations Many different types including key-value
Types stores, document databases, wide-column stores,
and graph databases

Developed in 1970s to deal with first wave of Developed in 2000s to deal with limitations of
Development
data storage applications SQL databases, particularly concerning scale,
History
replication and unstructured data storage

Examples MySQL, Postgres, Oracle Database MongoDB, Cassandra, HBase, Neo4j

Individual records (e.g., "employees") are Varies based on database type. For example, key-
stored as rows in tables, with each column value stores function similarly to SQL databases,
storing a specific piece of data about that record but have only two columns ("key" and "value"),
(e.g., "manager," "date hired," etc.), much like a with more complex information sometimes stored
Data Storage spreadsheet. Separate data types are stored in within the "value" columns. Document databases
Model separate tables, and then joined together when do away with the table-and-row model altogether,
more complex queries are executed. For storing all relevant data together in single
example, "offices" might be stored in one table, "document" in JSON, XML, or another format,
and "employees" in another. When a user wants which can nest values hierarchically.
to find the work address of an employee, the
database engine joins the "employee" and

Page 101 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

"office" tables together to get all the

information necessary.

Structure and data types are fixed in advance. Typically dynamic. Records can add new
To store information about a new data item, the information on the fly, and unlike SQL table rows,
entire database must be altered, during which dissimilar data can be stored together as necessary.
Schemas
time the database must be taken offline. For some databases (e.g., wide-column stores), it is
somewhat more challenging to add new fields
dynamically.

Vertically, meaning a single server must be Horizontally, meaning that to add capacity, a
made increasingly powerful in order to deal database administrator can simply add more
with increased demand. It is possible to spread commodity servers or cloud instances. The
Scaling
SQL databases over many servers, but database automatically spreads data across servers
significant additional engineering is generally as necessary.
required.

Development Mix of open-source (e.g., Postgres, MySQL) Open-source

Model and closed source (e.g., Oracle Database)

Supports Yes, updates can be configured to complete In certain circumstances and at certain levels (e.g.,
Transactions entirely or not at all document level vs. database level)

Specific language using Select, Insert, and Through object-oriented APIs

Data
Update statements, e.g. SELECT fields FROM
Manipulation
table WHERE…

Can be configured for strong consistency Depends on product. Some provide strong

Consistency consistency (e.g., MongoDB) whereas others offer

eventual consistency (e.g., Cassandra)

Page 102 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Excel and R integration with R connector

1 – Read Excel spreadsheet in R

gdata: it requires you to install additional Perl libraries on Windows platforms but it’s very powerful.
require(gdata)
myDf<- read.xls ("myfile.xlsx"), sheet = 1, header = TRUE)
RODBC: This is reported for completeness only. It’s rather dated; there are better ways to interact with
Excel nowadays.
XLConnect: It might be slow for large dataset but very powerful otherwise.
require (XLConnect)
wb<- loadWorkbook("myfile.xlsx")
myDf<- readWorksheet(wb, sheet = "Sheet1", header = TRUE)
xlsx: Prefer the read.xlsx2() over read.xlsx(), it’s significantly faster for large dataset.
require(xlsx)
read.xlsx2("myfile.xlsx", sheetName = "Sheet1")
xlsReadWrite: Available for Windows only. It’s rather fast but doesn’t support .xlsx files which is a serious
drawback. It has been removed from CRAN lately.
read.table(“clipboard”): It allows to copy data from Excel and read it directly in R. This is the quick and
dirty R/Excel interaction but it’s very useful in some cases.
myDf<- read.table("clipboard")
2 – Read R output in Excel
First create a csv output from an R data.frame then read this file in Excel. There is one function that you
need to know it’swrite.table. You might also want to consider: write.csv which uses “.” for the decimal point
and a comma for the separator and write.csv2 which uses a comma for the decimal point and a semicolon for
the separator.

x <- cbind(rnorm(20),runif(20))
colnames(x) <- c("A","B")
write.table(x,"your_path",sep=",",row.names=FALSE)
3 – Execute R code in VBA
RExcel is from my perspective the best suited tool but there is at least one alternative. You can run a batch
file within the VBA code. If R.exe is in your PATH, the general syntax for the batch file (.bat) is:

Page 103 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

R CMD BATCH [options] myRScript.R

Here’s an example of how to integrate the batch file above within your VBA code.

4 – Execute R code from an Excel spreadsheet

Rexcel is the only tool I know for the task. Generally speaking once you installed RExcel you insert the
excel code within a cell and execute from RExcel spreadsheet menu. See the RExcel references below for an
example.

5 – Execute VBA code in R

This is something I came across but I never tested it myself. This is a two steps process. First write a
VBscript wrapper that calls the VBA code. Second run the VBscript in R with the system or shell functions.
The method is described in full details here.

6 – Fully integrate R and Excel

RExcel is a project developped by Thomas Baier and Erich Neuwirth, “making R accessible from Excel and
allowing to use Excel as a frontend to R”. It allows communication in both directions: Excel to R and R to
Excel and covers most of what is described above and more. I’m not going to put any example of RExcel use
here as the topic is largely covered elsewhere but I will show you where to find the relevant information.
There is a wiki for installing RExcel and an excellent tutorial available here. I also recommend the
following two documents: RExcel – Using R from within Excel and High-Level Interface Between R and
Excel. They both give an in-depth view of RExcel capabilities.

Page 104 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Check your understanding

1. How do you read excel dataset in R?
2. What are the types of No SQL tools based on Data Models?
3. Why do we use No SQL?
4. Is No SQL a query language like SQL?

Summary
 For integration of SQL and R we use SQLDF package.
 A NoSQL (originally referring to "non SQL" or "non-relational") database provides a
mechanism for storage and retrieval of data that is modeled in means other than the tabular
relations used in relational databases.
 NoSQL support SQL like query language.
 NoSQL is used primarily in compliment to Big Data tools.
 Excel can be integrated with R using R-Connector.
 How to execute VBA code in R tool?

1. Divide the class into groups of 4-5 participants

2. Give the Dataset to the participants.
3. Give 10 minutes to the class for each group to discuss the various change points
between SQL and NO SQL along with a discussion on the methods type which
they would like to use
4. Each group presents their examples with justification. (5 min each)

Page 105 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1 UNIT – 4
Correlation and Regression
Topic Session Goals

Correlation and Regression By the end of this session, you will be able
to:
1. Make Regression models
2. Find Correlation
3. Understand Multi Collinearity
4. Work on Multiple Regression
5. Work with Dummy variables

Material and Handouts

Facilitator Material Participant Material and Handouts

Facilitator Guide, Handouts  Participants’ Guide

Session Plan:
Activity Location
Basic Regression Analysis Classroom

OLS Regression Classroom

Regression Modeling Classroom

Regression residuals Classroom

Correlation Classroom

Heteroscedasticity Classroom

Autocorrelation& Multicollinearity Classroom

Introduction to Multiple Regression Classroom

Dummy Variables Classroom

Check your understanding Classroom

Page 106 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Basic Regression Analysis

Regression analysis is the statistical method you use when both the response variable and the
explanatory variable are continuous variables (i.e. real numbers with decimal places – things like
heights, weights, volumes, or temperatures).
In simple regression, we try to determine whether there is a relationship between two variables. It is
assumed that there is a high degree of correlation between the two variables chosen for use in
regression.

In R we use lm () function to do simple regression modeling.

For example,
> fit <- lm(data$petal_length ~ data$petal_width)
When we call “fit” as below
>fit

We get the intercept “C” and the slope “m” of the equation – Y=mX+C
The fit information displays four charts: Residuals vs. Fitted, Normal Q-Q, Scale-Location, and
Residuals vs. Leverage.
Below are the various graphs representing values of regression

Page 107 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

OLS Regression
OLS:- Ordinary least squares (OLS) or linear least squares is a method for estimating the unknown
parameters in a linear regression model, with the goal of minimizing the differences between the
observed responses in some arbitrary dataset and the responses predicted by the linear approximation of
the data.
This is applied in both simple linear and multiple regression where the common assumptions are
(1) The model is linear in the coefficients of the predictor with an additive random error term
(2) The random error terms are
 normally distributed with 0 mean and
 a variance that doesn't change as the values of the predictor covariates (i.e. IVs) change.

Regression Modeling

 Regression modeling or analysis is a statistical process for estimating the relationships among
variables. It includes many techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or more independent
variables (or 'predictors'). More specifically, regression analysis helps one understand how the
typical value of the dependent variable (or 'criterion variable') changes when any one of the
independent variables is varied, while the other independent variables are held fixed. Most
commonly, regression analysis estimates the conditional expectation of the dependent variable
given the independent variables – that is, the average value of the dependent variable when the
independent variables are fixed. Less commonly, the focus is on a quantile, or other location
parameter of the conditional distribution of the dependent variable given the independent
variables. In all cases, the estimation target is a function of the independent variables called the
regression function. In regression analysis, it is also of interest to characterize the variation of
the dependent variable around the regression function which can be described by a probability
distribution.

 Regression analysis is widely used for prediction and forecasting, where its use has substantial
overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms
of these relationships. In restricted circumstances, regression analysis can be used to infer
causal relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so caution is advisable; for example, correlation does not imply
causation.

 Many techniques for carrying out regression analysis have been developed. Familiar methods
such as linear regression and ordinary least squares regression are parametric, in that the

Page 108 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

regression function is defined in terms of a finite number of unknown parameters that are
estimated from the data. Nonparametric regression refers to techniques that allow the regression
function to lie in a specified set of functions, which may be infinite-dimensional.

 The performance of regression analysis methods in practice depends on the form of the data
generating process, and how it relates to the regression approach being used. Since the true
form of the data-generating process is generally not known, regression analysis often depends to
some extent on making assumptions about this process. These assumptions are sometimes
testable if a sufficient quantity of data is available. Regression models for prediction are often
useful even when the assumptions are moderately violated, although they may not perform
optimally. However, in many applications, especially with small effects or questions of
causality based on observational data, regression methods can give misleading results.

 In a narrower sense, regression may refer specifically to the estimation of continuous response
variables, as opposed to the discrete response variables used in classification. The case of a
continuous output variable may be more specifically referred to as metric regression to
distinguish it from related problems.
Regression residuals
The residual of an observed value is the difference between the observed value and the estimated value
of the quantity of interest.

Because a linear regression model is not always appropriate for the data, you should assess the
appropriateness of the model by defining residuals and examining residual plots.

 Residuals

The difference between the observed value of the dependent variable (y) and the predicted value (ŷ) is
called the residual (e). Each data point has one residual.

Residual = Observed value - Predicted value

e=y–ŷ

Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.

 Residual Plots

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on
the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a
linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

Below the table on the left shows inputs and outputs from a simple linear regression analysis, and the
chart on the right displays the residual (e) and independent variable (X) as a residual plot.

Page 109 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

x 60 70 80 85 95

y 70 65 70 95 85

ŷ 65.411 71.849 78.288 81.507 87.945

e 4.589 -6.849 -8.288 13.493 -2.945

The residual plot shows a fairly random pattern - the first residual is positive, the next two are negative,
the fourth ispositive, and the last residual is negative. This random pattern indicates that a linear model
provides a decent fit to the data.

Below, the residual plots show three typical patterns. The first plot shows a random pattern, indicating
a good fit for a linear model. The other plot patterns are non-random (U-shaped and inverted U),
suggesting a better fit for a non-linear model.

Random pattern Non-random: U-shaped Non-random: Inverted U

Correlation

•Measure of association between variables

•Positive and negative correlation, ranging between +1 and -1
•Positive correlation example:
•Earning and expenditure
•Negative correlation example
•Speed and time
•Parametric – normal distribution and homogenous variance
•Pearson correlation
•Non parametric – no assumptions, nominal variables
•Spearman correlation

Page 110 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Correlation Coefficients:-

•r : correlation coefficient
•+1 : Perfectly positive
•-1 : Perfectly negative
•0 – 0.2 : No or very weak association
•0.2 – 0.4 : Weak association
•0.4 – 0.6 : Moderate association
•0.6 – 0.8 : Strong association
•0.8 – 1 : Very strong to perfect association

Correlation and Covariance:

With two continuous variables, x and y, the question naturally arises as to whether their values are
correlated with each other (remembering, of course, that correlation does not imply causation).
Correlation is defined in terms of the variance of x, the variance of y, and the covariance of x and y (the
way the two vary together; the way they co-vary) on the assumption that both variables are normally
distributed. We have symbols already for the two variances, s2x and s2y.
We denote the covariance of x and y by cov(x, y), after which the correlation coefficient r is defined as

Page 111 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Heteroscedasticity:
A collection of random variables is heteroscedastic (or 'heteroskedastic' from Ancient Greek hetero
“different” and skedasis “dispersion”) if there are sub-populations that have different variabilities from
others. Here "variability" could be quantified by the variance or any other measure of statistical
dispersion. Thus heteroscedasticity is the absence of homoscedasticity.
The existence of heteroscedasticity is a major concern in the application of regression analysis,
including the analysis of variance, as it can invalidate statistical tests of significance that assume that
the modeling errors are uncorrelated and uniform—hence that their variances do not vary with the
effects being modeled. For instance, while the ordinary least squares estimator is stillunbiased in the
presence of heteroscedasticity, it is inefficient because the true variance and covariance are
underestimated. Similarly, in testing for differences between sub-populations using a location test,
some standard tests assume that variances within groups are equal.

Test of Heteroscedasticity:-

Tests in regression
• Levene's test
• Goldfeld–Quandt test
• Park test
• Glejser test
• Brown–Forsythe test
• Harrison–McCabe test
• Breusch–Pagan test
• White test
• Cook–Weisberg test

Tests for grouped data

• F-test of equality of variances

• Cochran's C test
• Hartley's test
These tests consist of a test statistic (a mathematical expression yielding a numerical value as a
function of the data), a hypothesis that is going to be tested (the null hypothesis), an alternative
hypothesis, and a statement about the distribution of statistic under the null hypothesis.

Page 112 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Fixes:-
There are four common corrections for heteroscedasticity. They are:
 View logarithmized data. Non-logarithmized series that are growing exponentially often appear
to have increasing variability as the series rises over time. The variability in percentage terms
may, however, be rather stable.

 Use a different specification for the model (different X variables, or perhaps non-linear
transformations of the X variables).

 Apply a weighted least squares estimation method, in which OLS is applied to transformed or
weighted values of X and Y. The weights vary over observations, usually depending on the
changing error variances. In one variation the weights are directly related to the magnitude of
the dependent variable, and this corresponds to least squares percentage regression.

 Heteroscedasticity-consistent standard errors (HCSE), while still biased, improve upon OLS
estimates. HCSE is a consistent estimator of standard errors in regression models with
heteroscedasticity. This method corrects for heteroscedasticity without altering the values of the
coefficients. This method may be superior to regular OLS because if heteroscedasticity is
present it corrects for it, however, if the data is homoscedastic, the standard errors are
equivalent to conventional standard errors estimated by OLS. Several modifications of the
White method of computing heteroscedasticity-consistent standard errors have been proposed as
corrections with superior finite sample properties.

Page 113 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Autocorrelation

Autocorrelation, also known as serial correlation or cross-autocorrelation, is the cross-correlation of a

signal with itself at different points in time (that is what the cross stands for). Informally, it is the
similarity between observations as a function of the time lag between them. It is a mathematical tool for
finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying
the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in
signal processing for analyzing functions or series of values, such as time domain signals.
In statistics, the autocorrelation of a random process describes the correlation between values of the
process at different times, as a function of the two times or of the time lag. Let X be some repeatable
process, and i be some point in time after the start of that process. (i may be an integer for a discrete-
time process or a real number for a continuous-time process.) Then Xi is the value (or realization)
produced by a given run of the process at time i. Suppose that the process is further known to have
defined values for mean μi and variance σi2 for all times i. Then the definition of the autocorrelation
between times s and t is

where "E" is the expected value operator.

Test: -
 The traditional test for the presence of first-order autocorrelation is the Durbin–Watson statistic
or, if the explanatory variables include a lagged dependent variable, Durbin's h statistic. The
Durbin-Watson can be linearly mapped however to the Pearson correlation between values and
their lags.

 A more flexible test, covering autocorrelation of higher orders and applicable whether or not the
regressors include lags of the dependent variable, is the Breusch–Godfrey test. This involves an
auxiliary regression, wherein the residuals obtained from estimating the model of interest are
regressed on (a) the original regressors and (b) k lags of the residuals, where k is the order of
the test. The simplest version of the test statistic from this auxiliary regression is TR2, where T
is the sample size and R2 is the coefficient of determination. Under the null hypothesis of no
autocorrelation, this statistic is asymptotically distributed as x2 with k degrees of freedom.

Page 114 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Multicollinearity
In statistics, multicollinearity (also collinearity) is a phenomenon in which two or more predictor
variables in a multiple regression model are highly correlated, meaning that one can be linearly
predicted from the others with a substantial degree of accuracy. In this situation the coefficient
estimates of the multiple regressions may change erratically in response to small changes in the model
or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a
whole, at least within the sample data set; it only affects calculations regarding individual predictors.
That is, a multiple regression model with correlated predictors can indicate how well the entire bundle
of predictors predicts the outcome variable, but it may not give valid results about any individual
predictor, or about which predictors are redundant with respect to others.
In case of perfect multicollinearity the predictor matrix is singular and therefore cannot be inverted.
Under these circumstances, for a general linear model y=Xβ+ε, the ordinary estimator

does not exist.

Test:-
Indicators that multicollinearity may be present in a model:
1) Large changes in the estimated regression coefficients when a predictor variable is added or
deleted
2) Insignificant regression coefficients for the affected variables in the multiple regression, but a
rejection of the joint hypothesis that those coefficients are all zero (using an F-test)
3) If a multivariable regression finds an insignificant coefficient of a particular explanator, yet a
simple linear regression of the explained variable on this explanatory variable shows its coefficient to
be significantly different from zero, this situation indicates multicollinearity in the multivariable
regression.
4) Some authors have suggested a formal detection-tolerance or the variance inflation factor (VIF)
for multicollinearity:

Where is the coefficient of determination of a regression of explanator j on all the other

explanators. A tolerance of less than 0.20 or 0.10 and/or a VIF of 5 or 10 and above indicates a
multicollinearity problem.

Page 115 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

5) Condition number test: The standard measure of ill-conditioning in a matrix is the condition
index. It will indicate that the inversion of the matrix is numerically unstable with finite-precision
numbers (standard computer floats and doubles). This indicates the potential sensitivity of the
computed inverse to small changes in the original matrix. The Condition Number is computed by
finding the square root of (the maximum eigenvalue divided by the minimum eigenvalue). If the
Condition Number is above 30, the regression may have significant multicollinearity; multicollinearity
exists if, in addition, two or more of the variables related to the high condition number have high
proportions of variance explained. One advantage of this method is that it also shows which variables
are causing the problem.
6) Farrar–Glauber test: If the variables are found to be orthogonal, there is no multicollinearity; if
the variables are not orthogonal, then multicollinearity is present. C. Robert Wichers has argued that
Farrar–Glauber partial correlation test is ineffective in that a given partial correlation may be
compatible with different multicollinearity patterns. The Farrar–Glauber test has also been criticized by
other researchers.
7) Perturbing the data. Multicollinearity can be detected by adding random noise to the data and
re-running the regression many times and seeing how much the coefficients change.
8) Construction of a correlation matrix among the explanatory variables will yield indications as to
the likelihood that any given couplet of right-hand-side variables is creating multicollinearity problems.
Correlation values (off-diagonal elements) of at least .4 are sometimes interpreted as indicating a
multicollinearity problem. This procedure is, however, highly problematic and cannot be
recommended. Intuitively, correlation describes a bivariate relationship, whereas collinearity is a
multivariate phenomenon.

Introduction to Multiple Regression

The general purpose of multiple regressions (the term was first used by Pearson, 1908) is to learn more
about the relationship between several independent or predictor variables and a dependent or criterion
variable.
For example,
A real estate agent might record for each listing the size of the house (in square feet), the number of
bedrooms, the average income in the respective neighborhood according to census data, and a
subjective rating of appeal of the house. Once this information has been compiled for various houses it
would be interesting to see whether and how these measures relate to the price for which a house is
sold. For example, you might learn that the number of bedrooms is a better predictor of the price for
which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating).

Page 116 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Dummy Variables

In regression analysis, a dummy variable (also known as an indicator variable, design variable, Boolean
indicator, categorical variable, binary variable, or qualitative variable) is one that takes the value 0 or 1
to indicate the absence or presence of some categorical effect that may be expected to shift the
outcome. Dummy variables are used as devices to sort data into mutually exclusive categories (such as
smoker/non-smoker, etc.).

In other words, Dummy variables are "proxy" variables or numeric stand-ins for qualitative facts in a
regression model. In regression analysis, the dependent variables may be influenced not only by
quantitative variables (income, output, prices, etc.), but also by qualitative variables (gender, religion,
geographic region, etc.). A dummy independent variable (also called a dummy explanatory variable)
which for some observation has a value of 0 will cause that variable's coefficient to have no role in
influencing the dependent variable, while when the dummy takes on a value 1 its coefficient acts to
alter the intercept.

For example,

Suppose Gender is one of the qualitative variables relevant to a regression. Then, female and male
would be the categories included under the Gender variable. If female is arbitrarily assigned the value
of 1, then male would get the value 0. Then the intercept (the value of the dependent variable if all
other explanatory variables hypothetically took on the value zero) would be the constant term for males
but would be the constant term plus the coefficient of the gender dummy in the case of females.

Page 117 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Check your understanding In the context of regression analysis, which of the following statements are
true?

I. When the sum of the residuals is greater than zero, the data set is nonlinear.
II. A random pattern of residuals supports a linear model.
III. A random pattern of residuals supports a non-linear model.

(A) I only
(B) II only
(C) III only
(D) I and II
(E) I and III

Summary Solution
 Regression is method of establishing relation between two or more variables.
 Correlation shows the extent of relation
The correct answer is (B). A random pattern of residuals supports a linear model; a
 Correlation coefficient lies between -1 and supports
non-random pattern 1. a non-linear model. The sum of the residuals is
 Ordinary least squares (OLS)
always or linear
zero, least the
whether squares is aismethod
data set linear orfor estimating the
nonlinear.
unknown parameters in a linear regression model.
 Multiple regressions find relationship between several independent or predictor variables
and a dependent or criterion variable.
 Multicollinearity (also collinearity) is a phenomenon in which two or more predictor
variables in a multiple regression model are highly correlated.
 A dummy variableis one that takes the value 0 or 1 to indicate the absence or presence of
some categorical effect.
 Autocorrelation, also known as serial correlation or cross-autocorrelation

Page 118 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Module 1 UNIT – 5
Understand the verticals and requirements gathering
Session Time: 180 minutes
Topic Activities

Understand the Verticals and Requirement Gathering By the end of this session, you will be
able to:
1. Solve Engg. & Manufac. Issues
2. Create Business Models

Material and Handouts

Facilitator Material Participant Material and Handouts

Facilitator Guide, Handouts  Participants’ Guide

Session Plan:
Activity Location
Understand systems viz. Engineering
Design, Manufacturing, smart utilities,
Classroom
production lines, Automotive industries,
Tech system

Understand the business problem Related to

engineering, Identify the critical issues. Set Classroom
business objectives.

Requirement gathering Classroom

Summary Classroom

Page 119 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Step-by-Step

Understand systems viz. Engineering Design, Manufacturing, smart utilities, production lines,
Automotive industries, Tech system
Engineering Design:
The engineering design process is a methodical series of steps that engineers use in creating
functional products and processes. The process is highly iterative - parts of the process often need
to be repeated many times before production phase can be entered - though the part(s) that get
iterated and the number of such cycles in any given project can be highly variable.
One framing of the engineering design process delineates the following stages: research,
conceptualization, feasibility assessment, establishing design requirements, preliminary
design, detailed design, production planning and tool design, and production.

Manufacturing:
Manufacturing is the production of merchandise for use or sale using labour and machines, tools,
chemical and biological processing, or formulation. The term may refer to a range of human
activity, from handicraft to high tech, but is most commonly applied to industrial production, in
which raw materials are transformed into finished goods on a large scale. Such finished goods may
be used for manufacturing other, more complex products, such as aircraft, household appliances or
automobiles, or sold to wholesalers, who in turn sell them to retailers, who then sell them to end
users – the "consumers".

Manufacturing takes turns under all types of economic systems. In a free market economy,
manufacturing is usually directed toward the mass production of products for sale to consumers at
a profit. In a collectivist economy, manufacturing is more frequently directed by the state to supply
a centrally planned economy. In mixed market economies, manufacturing occurs under some
degree of government regulation.

Modern manufacturing includes all intermediate processes required for the production and
integration of a product's components. Some industries, such as semiconductor and steel
manufacturers use the term fabrication instead.

The manufacturing sector is closely connected with engineering and industrial design. Examples
of major manufacturers in North America include General Motors Corporation, General Electric,
Procter & Gamble, General Dynamics, Boeing, Pfizer, and Precision Cast parts. Examples in
Europe include Volkswagen Group, Siemens, and Michelin. Examples in Asia include Sony,
Huawei, Lenovo, Toyota, Samsung, and Bridgestone.

SMART Utilities:

Page 120 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a

monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs) that
detects and reports on various indicators of drive reliability, with the intent of enabling the anticipation
of hardware failures.

When S.M.A.R.T. data indicates a possible imminent drive failure, software running on the host
system may notify the user so stored data can be copied to another storage device, preventing data
loss, and the failing drive can be replaced.

Understand the business problem related to engineering, Identify the critical issues. Set business
objectives.
The BA process can solve problems and identify opportunities to improve business performance. In
the process, organizations may also determine strategies to guide operations and help achieve
competitive advantages. Typically, solving problems and identifying strategic opportunities to follow
are organization decision-making tasks. The latter, identifying opportunities can be viewed as a
problem of strategy choice requiring a solution.

Page 121 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Comparison of business analytics and organization decision-making processes

Requirements gathering : Gather all the Data related to Business objective

There are many different approaches that can be used to gather information about a business. They
include the following:

 Review business plans, existing models and other documentation

 Interview subject area experts
 Conduct fact-finding meetings
 Analyze application systems, forms, artifacts, reports, etc.

The business analyst should use one-on-one interviews early in the business analysis project to gage
the strengths and weaknesses of potential project participants and to obtain basic information about the
business. Large meetings are not a good use of time for data gathering.

Facilitated work sessions are a good mechanism for validating and refining “draft”
requirements. They are also useful to prioritize final business requirements. Group dynamics can
often generate even better ideas.

Primary or local data is collected by the business owner and can be collected by survey, focus group
or observation. Third party static data is purchased in bulk without a specific intent in mind. While
easy to get (if you have the cash) this data is not specific to your business and can be tough to sort
through as you often get quite a bit more data than you need to meet your objective. Dynamic data is
collected through a third party process in near real-time from an event for a specific purpose (read into
that VERY expensive).

Three key questions you need to ask before making a decision about the best method for your firm.

 What is the timeline required to accomplish your business objective?

 What is your required return on investment?

 Is the data collection for a stand-alone event or for part of a broader data collection effort?

How to interpret Data to make it useful for Business:-

Page 122 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Business intelligence (BI) is the set of techniques and tools for the transformation of raw data into
meaningful and useful information for business analysis purposes. BI technologies are capable of
handling large amounts of unstructured data to help identify, develop and otherwise create new
strategic business opportunities. The goal of BI is to allow for the easy interpretation of these large
volumes of data. Identifying new opportunities and implementing an effective strategy based on
insights can provide businesses with a competitive market advantage and long-term stability.

BI technologies provide historical, current and predictive views of business operations. Common
functions of business intelligence technologies are reporting, online analytical processing, analytics,
data mining, process mining, complex event processing, business performance management,
benchmarking, text mining, predictive analytics and prescriptive analytics.

BI can be used to support a wide range of business decisions ranging from operational to strategic.
Basic operating decisions include product positioning or pricing. Strategic business decisions include
priorities, goals and directions at the broadest level. In all cases, BI is most effective when it combines
data derived from the market in which a company operates (external data) with data from company
sources internal to the business such as financial and operations data (internal data). When combined,
external and internal data can provide a more complete picture which, in effect, creates an
"intelligence" that cannot be derived by any singular set of data.

Business intelligence is made up of an increasing number of components including:

Page 123 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

 Multidimensional aggregation and allocation

 Denormalization, tagging and standardization
 Realtime reporting with analytical alert
 A method of interfacing with unstructured data sources
 Group consolidation, budgeting and rolling forecasts
 Statistical inference and probabilistic simulation
 Key performance indicators optimization
 Version control and process management
 Open item management

Business intelligence can be applied to the following business purposes, in order to drive business value.

 Measurement – program that creates a hierarchy of performance metrics (see also Metrics
Reference Model) and benchmarking that informs business leaders about progress towards
business goals (business process management).
 Analytics – program that builds quantitative processes for a business to arrive at optimal
decisions and to perform business knowledge discovery. Frequently involves: data mining,
process mining, statistical analysis, predictive analytics, predictive modeling, business process
modeling, data lineage, complex event processing and prescriptive analytics.
 Reporting/enterprise reporting – program that builds infrastructure for strategic reporting to
serve the strategic management of a business, not operational reporting. Frequently involves
data visualization, executive information system and OLAP.
 Collaboration/collaboration platform – program that gets different areas (both inside and
outside the business) to work together through data sharing and electronic data interchange.
 Knowledge management – program to make the company data-driven through strategies and
practices to identify, create, represent, distribute, and enable adoption of insights and
experiences that are true business knowledge. Knowledge management leads to learning
management and regulatory compliance.

In addition to the above, business intelligence can provide a pro-active approach, such as alert
functionality that immediately notifies the end-user if certain conditions are met. For example, if some
business metric exceeds a pre-defined threshold, the metric will be highlighted in standard reports, and
the business analyst may be alerted via e-mail or another monitoring service. This end-to-end process
requires data governance, which should be handled by the expert.

Data can be always gathered using surveys.

Your surveys should follow a few basic but important rules:

1. Keep it VERY simple. I recommend one page with 3-4 questions maximum. Customers are

Page 124 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

visiting to purchase or to have an experience, not to fill out surveys.

2. Choose only one objective for the survey. Don’t try to answer too many questions, ultimately
you won’t get much useful data that way because your customer will get confused and frustrated.
3. Don’t give the respondent any wiggle room. Open ended questions are tough to
manage. Specific choices that are broad enough to capture real responses gives you data that is
much easier to use.
4. Always gather demographics. Why not? But rather than name and e-mail (leading to concerns
with confidentiality and often less than truthful answers) gather gender, age and income; you
might be surprised at who is actually buying what.

Page 125 of 126

Facilitator Guide – SSC/ Q2101 – Associate Analytics

Check your understanding

1. What are various steps involved Organization Decision

making?
2. What are examples of SMART utilities?
3. What do you understand by Production Lines?
4. What are various components of Descriptive Statistics?

Summary
 Engineering design process is a methodical series of steps that engineers use in creating
functional products and processes.
 Manufacturing is the production of merchandise for use or sale using labor and machines,
tools, chemical and biological processing, or formulation.
 Assembly line or Production line concept was first used by Henry Ford in automobile
industry. It reduces production time drastically.
 Most of the critical Business problems are solved with help of Data Analytics.

1. Divide the class into groups of 4-5 participants

2. Give the Dataset to the participants.
3. Give 10 minutes to the class for each group to discuss on an issue of an
automobile giant X where the quality of Bikes is not as per standard. So solve this
issue by taking some Business Decisions. Along with a discussion on the methods
type which they would like to use
4. Each group presents their examples with justification. (5 min each)

Page 126 of 126

Fileviewpro 2020 Crack
0% (1)
Fileviewpro 2020 Crack
2 pages
Associate Analytics m3 SH Final
No ratings yet
Associate Analytics m3 SH Final
144 pages
SAT 250 SCALA - General Basics
No ratings yet
SAT 250 SCALA - General Basics
9 pages
Liferay Authorization Role Management
100% (1)
Liferay Authorization Role Management
21 pages
CAP Studyguide PDF
100% (3)
CAP Studyguide PDF
140 pages
Zabbix Performance Tuning - Sample Chapter
No ratings yet
Zabbix Performance Tuning - Sample Chapter
17 pages
FG_SSC_Q2215_Associate_Data_Entry_Operator_English
No ratings yet
FG_SSC_Q2215_Associate_Data_Entry_Operator_English
120 pages
TR-SSC-Q2101-Associate Analytics 0 PDF
No ratings yet
TR-SSC-Q2101-Associate Analytics 0 PDF
10 pages
HEU27 Debug
No ratings yet
HEU27 Debug
10 pages
Business Analyst Job
No ratings yet
Business Analyst Job
9 pages
MC English Q8102 AI - Business Intelligence Analyst v3.0
No ratings yet
MC English Q8102 AI - Business Intelligence Analyst v3.0
27 pages
Analyst, SAC Planning Developer
No ratings yet
Analyst, SAC Planning Developer
2 pages
NOCA - Power BI Analyst - Job Spec
No ratings yet
NOCA - Power BI Analyst - Job Spec
3 pages
3
No ratings yet
3
2 pages
#7 GIS Database QA QC Tutorial Whitepaper
No ratings yet
#7 GIS Database QA QC Tutorial Whitepaper
10 pages
AMZN DANA 015+Program+Launch
No ratings yet
AMZN DANA 015+Program+Launch
77 pages
C09 - SBS Campus JD 1
No ratings yet
C09 - SBS Campus JD 1
1 page
Data Analytics Roadmap Tips
No ratings yet
Data Analytics Roadmap Tips
14 pages
Rbs Finals Jd
No ratings yet
Rbs Finals Jd
2 pages
6 Functional Data Roles With Courses To Get Started
No ratings yet
6 Functional Data Roles With Courses To Get Started
20 pages
Business Data Analyst Job Description (February 2025)
No ratings yet
Business Data Analyst Job Description (February 2025)
4 pages
Alt-Shift X ( (Special:Random/Namespace) ) : Mediawiki Namespace Keyboard Shortcut Namespace
No ratings yet
Alt-Shift X ( (Special:Random/Namespace) ) : Mediawiki Namespace Keyboard Shortcut Namespace
1 page
Estabilizacion de Bases Suelo Cemento
No ratings yet
Estabilizacion de Bases Suelo Cemento
2 pages
Mad - Imp Questions
No ratings yet
Mad - Imp Questions
3 pages
AA Learning Objectives
No ratings yet
AA Learning Objectives
3 pages
MobileEntryPoint Datasheet v10.2 PDF
No ratings yet
MobileEntryPoint Datasheet v10.2 PDF
4 pages
Complete Notes of BA
100% (1)
Complete Notes of BA
22 pages
JD&PS English
No ratings yet
JD&PS English
7 pages
Data & Analytics JD - 202
No ratings yet
Data & Analytics JD - 202
5 pages
Business Analyst Job
No ratings yet
Business Analyst Job
9 pages
Certified Analytics Professional: Examination Study Guide
No ratings yet
Certified Analytics Professional: Examination Study Guide
141 pages
Companies Worldwide Are Investing Heavily in Analytics. It'S The Simple Reason Why You Should Be Doing The Same. NIIT's
100% (1)
Companies Worldwide Are Investing Heavily in Analytics. It'S The Simple Reason Why You Should Be Doing The Same. NIIT's
8 pages
ERAU Datcom User Guide
No ratings yet
ERAU Datcom User Guide
2 pages
Business Analytics & Modelling Curriculum
No ratings yet
Business Analytics & Modelling Curriculum
2 pages
Geethanjali College of Engineering and Technology
No ratings yet
Geethanjali College of Engineering and Technology
87 pages
Add A Filter To A Dataset (Report Builder and SSRS)
No ratings yet
Add A Filter To A Dataset (Report Builder and SSRS)
2 pages
T Associate Database L2
No ratings yet
T Associate Database L2
3 pages
Data Analytics and Visualization for Commerce Curriculum NOS Template (1)
No ratings yet
Data Analytics and Visualization for Commerce Curriculum NOS Template (1)
21 pages
Ahmed Rebai R-Basics
No ratings yet
Ahmed Rebai R-Basics
33 pages
DATA ANALYST JOB DESCRIPTION
No ratings yet
DATA ANALYST JOB DESCRIPTION
3 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
9 Skills Every Business Analytics Professional Needs - Harvard Business Analytics Program
No ratings yet
9 Skills Every Business Analytics Professional Needs - Harvard Business Analytics Program
5 pages
Introduction To Spark
No ratings yet
Introduction To Spark
4 pages
Senior Insight Analyst JD - 2024
No ratings yet
Senior Insight Analyst JD - 2024
5 pages
TD Hive Guide V2.0
No ratings yet
TD Hive Guide V2.0
34 pages
Instructions
No ratings yet
Instructions
4 pages
FTK Imager User Guide
No ratings yet
FTK Imager User Guide
52 pages
JD PS-2
No ratings yet
JD PS-2
10 pages
JD - Product Pricing Analyst - 116212
No ratings yet
JD - Product Pricing Analyst - 116212
2 pages
Business Analytics
No ratings yet
Business Analytics
48 pages
Migrating Material Master Data Into Sap PDF
No ratings yet
Migrating Material Master Data Into Sap PDF
9 pages
Bat 401 Fba Reviewer
No ratings yet
Bat 401 Fba Reviewer
5 pages
Associate Data Scientist
No ratings yet
Associate Data Scientist
1 page
Advanced Data Analytics
No ratings yet
Advanced Data Analytics
114 pages
Submitted To Submitted by
No ratings yet
Submitted To Submitted by
44 pages
Associate Analytics Sample Paper
No ratings yet
Associate Analytics Sample Paper
9 pages
Advanced Pogramming (Java) Course: Topics
No ratings yet
Advanced Pogramming (Java) Course: Topics
48 pages
Job Profiles - Two Regional Positions
No ratings yet
Job Profiles - Two Regional Positions
6 pages
Basic Productivity Lesson Plan
No ratings yet
Basic Productivity Lesson Plan
1 page
Careers
No ratings yet
Careers
6 pages
SSC/ N 9001: Manage Your Work To Meet Requirement: Session Overview
No ratings yet
SSC/ N 9001: Manage Your Work To Meet Requirement: Session Overview
5 pages
Roles responsibilites and skillset
No ratings yet
Roles responsibilites and skillset
4 pages
Cloudera Overview PDF
No ratings yet
Cloudera Overview PDF
20 pages
Awrrpt 1 21724 21747
No ratings yet
Awrrpt 1 21724 21747
349 pages
SSCQ8102 AI-Business Intelligence Analyst V1
No ratings yet
SSCQ8102 AI-Business Intelligence Analyst V1
66 pages
Analytics-Career-Sheet
No ratings yet
Analytics-Career-Sheet
6 pages
Business Analyst Role Guideline
100% (1)
Business Analyst Role Guideline
4 pages
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Business Analytics
No ratings yet
Business Analytics
10 pages
Business Analytics COMPLETE
No ratings yet
Business Analytics COMPLETE
8 pages
Lecture 1 - BISM7233 - AS - 2023
No ratings yet
Lecture 1 - BISM7233 - AS - 2023
62 pages
Cdb101 Assignment
No ratings yet
Cdb101 Assignment
10 pages
Annual Administraive Report 2015-16 - 0
No ratings yet
Annual Administraive Report 2015-16 - 0
21 pages
Document Read
No ratings yet
Document Read
4 pages
GW90 Manual
No ratings yet
GW90 Manual
616 pages
Deep Sea Electronics: Complex Solutions Made Simple
No ratings yet
Deep Sea Electronics: Complex Solutions Made Simple
57 pages
Online Analytical Processing System Providing Spatial Information To The Data Warehouse by Using Geographical Cube Methodology
No ratings yet
Online Analytical Processing System Providing Spatial Information To The Data Warehouse by Using Geographical Cube Methodology
5 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
A Manual of Yacht and Boat Sailing PDF
No ratings yet
A Manual of Yacht and Boat Sailing PDF
805 pages
What Is Business Analytics
No ratings yet
What Is Business Analytics
10 pages
Associate Analytics m2 FG
No ratings yet
Associate Analytics m2 FG
140 pages
Associate Analytics M3 Final
0% (1)
Associate Analytics M3 Final
145 pages
TUTORIAL - AIS615.CHAPTER 1 To CHAPTER3
100% (1)
TUTORIAL - AIS615.CHAPTER 1 To CHAPTER3
3 pages
Associate Analytics Reference Notes PDF
No ratings yet
Associate Analytics Reference Notes PDF
112 pages
Analytics Techniques and Tool1
No ratings yet
Analytics Techniques and Tool1
6 pages
Associate Analytics M1 SH PDF
No ratings yet
Associate Analytics M1 SH PDF
112 pages
Java Lab
No ratings yet
Java Lab
51 pages
Career Plan Template Business Analyst
No ratings yet
Career Plan Template Business Analyst
4 pages
PROJECT REPORT Sample 6 Sem
No ratings yet
PROJECT REPORT Sample 6 Sem
70 pages
2018 Obe Customer Analytics Oac 106
No ratings yet
2018 Obe Customer Analytics Oac 106
7 pages
Virtual Hammond B3 Organ VST VST3 Audio Unit Plugins Plus EXS24 and KONTAKT Sample Libraries: Jazz, Blues, Prog Hard Rock, Soul, Gospel Organs
No ratings yet
Virtual Hammond B3 Organ VST VST3 Audio Unit Plugins Plus EXS24 and KONTAKT Sample Libraries: Jazz, Blues, Prog Hard Rock, Soul, Gospel Organs
20 pages
The Future Growth of A Career As A Business Analyst Its Role and Responsibilities
No ratings yet
The Future Growth of A Career As A Business Analyst Its Role and Responsibilities
23 pages
Careers - Data Analytics Training - Internshala Trainings PDF
No ratings yet
Careers - Data Analytics Training - Internshala Trainings PDF
2 pages
All-In-One D2K PDF
No ratings yet
All-In-One D2K PDF
123 pages
Business Analytics: A Comprehensive Guide
From Everand
Business Analytics: A Comprehensive Guide
Naila Hina
No ratings yet
Comprehensive Guide to BusinessObjects: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to BusinessObjects: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet