0% found this document useful (0 votes)
6 views

Dtsc Final

The document outlines the syllabus for Data Science courses for Classes XI and XII, detailing course objectives and outcomes aimed at imparting knowledge in computer fundamentals, programming, data analysis, and machine learning. It includes specific units covering topics such as Python programming, data visualization, database management, and business theory, along with practical components. The syllabus emphasizes developing skills in data management, statistical methods, and ethical considerations in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Dtsc Final

The document outlines the syllabus for Data Science courses for Classes XI and XII, detailing course objectives and outcomes aimed at imparting knowledge in computer fundamentals, programming, data analysis, and machine learning. It includes specific units covering topics such as Python programming, data visualization, database management, and business theory, along with practical components. The syllabus emphasizes developing skills in data management, statistical methods, and ethical considerations in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

WEST BENGAL COUNCIL OF HIGHER SECONDARY EDUCATION

SYLLABUS FOR CLASSES XI AND XII


SUBJECT : DATA SCIENCE ( DTSC )
Course Objectives

The objective of course is:


● To Impart knowledge about basic computer fundamentals and programming languages for data
science.
● To Impart knowledge about mathematical and statistical methods for data analysis.
● To Empower students with data visualization techniques and tools.
● To impart knowledge about the basics of data management and Business Theory.
● To impart knowledge about various machine learning techniques used for data analysis.
● To enable students to develop data-based machine learning models for solving real-word applications.
● To enable students to gain practical experience in programming languages and statistical and machine
learning tools for data sciences.

Course Outcomes

Upon completion of this course, the student should be able to:


● Explain the importance of and be able to formulate a data analysis problem
● Explain various data types, and data formats , and identify and appropriately acknowledge sources of
various types of data
● Gain an ability to apply mathematical and statistical methods in data science applications
● Apply basic data cleaning techniques to prepare data for analysis
● Demonstrate proficiency in using appropriate tools and technology to collect, process, transform,
summarize, and visualize data.
● Apply various machine learning algorithms in data-based decision-making applications , and draw
accurate and useful conclusions through data analysis
● Demonstrate some skills in data retrieval using Structured Query language (SQL)
● Explain the basics of Business Theory
● Demonstrate skill in basic exploratory data analysis using unsupervised learning
● Demonstrate proficiency in implementing supervised machine learning algorithms for predictive data
analysis using the latest programming languages and software tools.
● Differentiate between ethical and unethical uses of data science.
CLASS - XI
SEMESTER – I
SUBJECT: DATA SCIENCE ( DTSC )
FULL MARKS: 35 CONTACT HOURS: 60 Hours
COURSE CODE: THEORY

CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS
History of computer, Basic Computer hardware, input and output devices,
Basic computer architecture, input output devices, memory and CPU,
Unit -1 networking of machines (overview of LAN, MAN, WAN, Internet, Wifi etc),
Computer types of computer (workstation, desktop, Smartphone, embedded
1a. 8 5
Fundamentals system, etc.), Overview of Software (system software and application
(15) software with examples (mention names only)), Definition of Operating
System and functions (mention names of some popular operating systems
like Windows, Linux, Android, etc).

Bit, Byte and Word, Number System (Base, Binary, Decimal, Octal,
Hexadecimal), Conversion of number systems, Boolean logic (Boolean
1b. 6 5
Gates ), Boolean operators (OR, AND and NOT), ASCII code, Concept of
Algorithm and Flowchart.

Basics of Computer Programming (three levels: high level language,


assembly language, machine language, definition and block diagrams),
Overview of Compiler and Interpreter (definition and mention name of
major compiled (e.g., C, C++) and interpreted languages (e.g., Python),
1c. 10 5
Overview of procedural and object oriented programming (key features
and just the basic differences, mention names of some popular
procedural (e.g., BASIC, FORTRAN, C) and object oriented programming
languages (e.g., C++, Java, Python).

Unit -2 Basics of Python programming (with a simple 'hello world' program,


Introduction process of writing a program, running it, and print statement), Concept of
to Python class and object, Data-types (integer, float, string), notion of a variable,
Programming Operators (assignment, logical, arithmetic etc.), accepting input from
2a. 12 5
(15) console, conditional statements (If else and Nested If else ), Collections
(List, Tuple, Sets and Dictionary), Loops (For Loop, While Loop & Nested
Loops), iterator, string and fundamental string operations (compare,
concatenation, sub-string etc.), Function, recursion.
CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS

Overview of linear and nonlinear data structure (definition, schematic


view and difference), array (1D, 2D and its relation with matrix, basic
operations: access elements using index, insert, delete, search), stack
(concept of LIFO, basic operations: Push, Pop, peek, size), queue (concept
2b 12 6
of FIFO, basic operations: Enqueue, Dequeue, peek, size), use of List
methods in Python for basic operations on array, stack and queue,
overview of NumPy library and basic array operations (arrange(), shape(),
ndim(), dtype() etc.), binary tree (definition and schematic view only) .

Linear search and binary search algorithm, sorting algorithm ( bubble sort
2c 4 4
only)

Unit -3 History of AI: AlanTuring and cracking enigma, mark 1 machines, 1956-
History of AI the birth of the term AI, AI winter of 70’s, expert systems of 1980s,
and skipped journey of present day AI. Distinction between terms AI,
3a 2 2
Introduction Pattern recognition and Machine Learning.
to Linear (Note: it should be taught as a story more than flow of information
Algebra (5) World war 2, Enigma and Alan Turing, the birth of modern computers)

Basic matrix operations like matrix addition, subtraction, multiplication,


3b transpose of matrix, identity matrix. A brief introduction to vectors, unit 6 3
vector, normal vector, Euclidean space.

NB : Additional 10 hours for Remedial and/or Tutorial classes


CLASS - XI
SEMESTER – II

SUBJECT: DATA SCIENCE ( DTSC )


FULL MARKS: 35 CONTACT HOURS: 60 HOURS
COURSE CODE: THEORY
CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS
Unit -4 Brief history of data science, data science as
History of data conjunction of computer science statistics and
4a 6 3
science and domain knowledge. Definition of data science, data science life
statistics (15) cycle - capture, maintain, process, analyze, communicate

Probability distribution, frequency, mean, median and mode,


variance and standard deviation, Gaussian distribution, Random
sampling by uniform distribution and students-t distribution
4b. 10 5
hypothesis testing, Distance function, Euclidean norm, distance
between two points in 2D and 3D and extension of idea to n
dimensions

4c. Basic ideas of different Data Science Toolkit: Excel, Weka, R 12 7

Types of data: textual data (reviews, comments blogs), signal data


(time series, audio, sensor data)
visual data (image and video, remote sensing data, feeds etc.)
Introduction to data dimension and
modality, their representations in computer science. Data cleaning
 Representation of data in textual form, tokens, sentences,
word histograms, reading from web pages using crawlers
 Representation format of audio data, uncompressed wav
format and compressed mp3 format (just the description of
the pipeline, no maths)
Unit - 5 Data  Representation of visual data in RGB pixels, storing in raw
5 16 10
Visualization(10) format and compressed format (just the description of the
pipeline, no maths)
 Representation of other forms of data like time series values
from different sensors, remote sensing image data etc.
 Introduction to the concept of multimodality i.e. different
modes of data from the same information source (example
audio and video generated when filming)
 Data dimension (resolution for image, frequency bins and
sampling rate for audio, word histograms for text)
 Concept of data cleaning, removal of abnormal, incomplete
and corrupted or garbage data as a preprocessing stage.
CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS
Brief introduction to relational database, tables for keeping data,
brief introduction to SQL
Unit -6  Introduction to the concept of database
Database  Relational database, table, schema as columns and tuple as
6 8 5
Management rows
(5)  Some basic SQL statements such as CREATE, SELECT, INSERT,
UPDATE, DELETE
(Simple query examples)

Business theory basics: Different business models B2B, B2C.


Aggregator type business,
manufacturing type business, consultancy and turnkey service
based businesses, social media type
and general digital platform type business, content hosting
businesses.
Definitions of profit, loss, revenue, break-even, valuation etc.
Unit -7 [NO LAB COMPONENT]
Basics of  The basic business types, product based and service based
7 8 5
Business Theory  Business classification by clients, the B2B and B2C models
(5)  Types of business who use DS extensively: software product
and service, aggregator (cab, food delivery, groceries,
online market), manufacturing and banking
 Consultancy type business and service profiling
 Social media business and targeted advertising based
business model
 Basic business terminologies, refer
(https://getsling.com/blog/business-terms/)

NB : Additional 10 hours for Remedial and/or Tutorial classes


CLASS: XI
SUBJECT: DATA SCIENCE ( DTSC )
COURSE CODE: PRACTICAL
FULL MARKS: 30 CONTACT HOURS: 60 HOURS
Sub Topic
1. Computer ● Visit to Computer Lab and familiarization with computers and no marks
Fundamentals peripherals and different networking devices (e.g., modem, switch, ( 6 hours )
[ No marks ]
router).
● Opening of the CPU box/cabinet and identification of different parts
(e.g., Motherboard, CPU/Processor, RAM, Hard Disk, power supply).

2. Introduction
to Python
Programming
[ 10 Marks ]

2a. ● Introduction to installation and running of python codes with hello 3 Marks
world and simple accessing user inputs from console examples. (4 hours)
● Menu driven arithmetic calculator
● Simple logical and mathematical programs (e.g., printing patterns,
Conversion of binary to decimal and vice versa, computing GCD of two
numbers, Finding prime numbers, Generating Fibonacci sequence,
Computing factorial –iterative and recursive etc.)
● Finding max, min, avg, sum, length of a list
● Use of basic string methods like upper(), lower(), count(), find(), join(),
replace(), split() etc.

2b. ● Use of Python List methods for Stack and Queue implementation, 5 Marks
for examples, append() and pop() (4 hours)
● Use of NumPy array methods: arrange(), shape(), ndim(), size(),
add(), subtract(), multiply(), divide(), mat() etc.
● Use of NumPy matrix multiplication methods: dot(),
matmul(), multiply() etc.
● Linear search and binary search in an array
● Bubble sort in an array

2c. Creating data frame from .csv file , excel sheet , python dictionary, python 2 Marks
list, tuple operation on data frame. (4 hours)
3. Foundation ● Generation of random numbers in python following a certain 5 Marks
for AI and distribution and filling up random arrays (10 hours)
Data Science ● Introduction to matplotlib to plot arrays as histograms
[ 5 Marks ] ● Computation of mean, median and mode
● Computing CDF from PDF and plotting using matplotlib
● Plotting Gaussian distribution with a given mean and
standard deviation
● Plotting mixture of Gaussian distributions

4. Data Using Scipy, opencv and NLTK libraries run codes for the following 10 Marks
Visualization ● Visualization of audio data as spectrogram (12 Hours)
[ 10 marks ] ● Visualization of image data by zooming into pixels
● Visualization of word histograms

5. Database ● Use of MySQL database for Creating tables 5 Marks


Management ● Running retrieval, insertion, deletion and updation queries (8 hours)
[5 marks ]

NB : Additional 10 hours for Remedial and/or Tutorial classes


CLASS - XII
SEMESTER – III
SUBJECT: DATA SCIENCE ( DTSC )
FULL MARKS: 35 CONTACT HOURS: 60 Hours
COURSE CODE: THEORY

CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS
Distance between distributions - Euclidean norm, Pearson's
correlation coefficient, basic concepts of (not in detail) chi-square
distance, Bayes' theorem and Bayesian probability
● Real n-dimensional space (R^n) and vector algebra, dot
Unit 1: product of two vectors, vector projections.
Foundation ● Product moment correlation coefficient (Pearson's
of statistics 1. coefficient) its use in determining relation between two sets 10 5
for machine of data
learning(5) ● Chi-square and use in finding distance between two
distributions
● Conditional probability and Bayes' theorem , conditional
independence

• What is Machine Learning?


• Difference between traditional programming and machine
learning
• Relation of machine learning with AI
• Applications of machine learning.
• Why should machines have to learn? Why not design
machines to perform as desired in the first place?
Unit 2: ● Types of Machine Learning Supervised, Unsupervised,
Introduction Semi-supervised and Reinforcement learning),
to machine 2a. ● Concept of training, testing and validation, Concepts of 18 10
learning training examples, Linear Regression with one variable ,
(15) hypothesis representation, hypothesis space, Learning
Requires Bias, Concept of Loss function
● Training methods for linear regression model: Iterative
trial-and-error process that machine learning algorithms
may use to train a model, Disadvantages of iterative
training method, gradient descent algorithm.
● Effect of learning rate on reducing loss. Importance of
feature scaling(mini-max normalization)
CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS
• What is feature or attribute?
• Definition and meaning of feature in various kinds of data
(e.g., structured data, unstructured data( text data, image
data))
• Types of features(continuous, categorical)
2b. 10 5
• Representation of training examples with multiple features
• Linear regression with multiple attributes (multiple
features)
• Feature cross and polynomial regression

● Difference between regression and classification.


Examples of some real world classification problems
● Linear classification and threshold classifier, Concept of input
Unit 3: space and linear separator,
Supervised Drawback of threshold classifier, use of logistic function in
3a. 12 7
learning defining hypothesis function for logistic regression model.
(15) ● Probabilistic interpretation of output of the logistic
regression model, use of logistic regression model in binary
classification task. Multi-class classification using One vs all
strategy.
Probabilistic classifier:
 Bayesian Learning, conditional independence
3b. 4 3
 Naive-Bayes classifier

Measuring Classifier performance:


● Confusion matrix, true positive, true negative, false positive,
false negative, error, accuracy, precision, recall, F-measure,
3c. 6 5
sensitivity and specificity
● K-fold cross validation

NB : Additional 10 hours for Remedial and/or Tutorial classes


CLASS - XII
SEMESTER – IV
SUBJECT: DATA SCIENCE ( DTSC )
FULL MARKS: 35 CONTACT HOURS: 60 HOURS
COURSE CODE: THEORY
CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS
 Concept of entropy for measuring purity (impurity) of a
collection of training examples. and information gain as a
measure of the effectiveness of an attribute in classifying
Unit 4: the training data (just basics and equation) .
Decision tree  Inducing decision tree from the training data using ID3
learning and 4a. algorithm, an illustrative example showing how the ID3 12 5
Unsupervised algorithm works.
learning ( 10)  Concept of overfitting, reduced error pruning
 Discretizing continuous-valued attributes using
information gain-based method (binary split only)
 Differences between supervised and unsupervised learning
● What is unsupervised learning?
● Difference between supervised and unsupervised learning.
● What is clustering?
4b. ● Why is clustering an unsupervised learning technique? 10 5
● Some examples of real world application of clustering,
● Difference between clustering and classification
● K-means clustering algorithm. Simple use cases
 What is the need for data visualization?
 Visualization techniques: visualization of a small number of
attributes (Stem and leaf plots, 1D Histogram and 2D
Unit 5:
Histogram, Box Plots, Pie chart, Scatter Plots)
Data
 Visualizing Spatio -temporal Data (Contour plots, Surface
visualization 5. 12 10
plots)
technique
 Visualizing higher dimensional data (Plot of data matrix)
(10)
 Heatmap visualization
 Introduction to data visual platform- Tableau and Google
Chart
 Biological motivation for Artificial Neural Networks(ANN)
 A simple mathematical model of a neuron (McCulloch and
Unit 6: Pitts(1943))
Artificial  Concept of activation function: threshold function and
neural 6. Sigmoid function,
network  Perceptron as a linear classifier, perceptron training rule
(10)  Representations of AND and OR functions of two inputs
using threshold perceptron. Equation of a linear separator
in the input space, Representational power of perceptrons
CONTACT
UNIT NO. SUB TOPICS MARKS
HOURS

 Training unthresholded perceptron using Delta rule, Need


for hidden layers , XOR example,
 Why do we need non-linearity? Network structures: feed
forward networks and recurrent networks (basic concept
only)
 Training multiplayer feed-forward neural networks using
Backpropagation algorithm (Concepts only and no
derivation).
 Generalization, overfitting, and stopping criterion,
overcoming the overfitting problem using a set of
validation data
20 10
 An Illustrative example of an ANN architecture for
handwritten digit recognition (Only input representation,
output representation and a block diagram of the network)
 Need for automatic feature learning, difference between
the conventional feed-forward neural networks and CNN,
role of convolution layer in CNN, An example of 2D
convolution, function of pooling layer
 A block diagram illustrating CNN applied to handwritten
digit recognition task
Some case studies:
 Weather forecasting using some statistical and machine
learning tools (consider the ML algorithms covered in the
Unit 7:
theoretical subjects)
Case studies
7.  Sentiment Analysis using some machine learning tools 6 5
in data
(consider the ML algorithms covered in the theoretical
science (5)
subjects)
 A simple collaborative filtering-based recommendation
System

NB : Additional 10 hours for Remedial and/or Tutorial classes


CLASS: XII
SUBJECT: DATA SCIENCE ( DTSC )
COURSE CODE: PRACTICAL
FULL MARKS: 30 CONTACT HOURS: 60 HOURS

Sub Topic
CONTACT
SUB TOPICS MARKS
HOURS

1. Foundation of Consider a table of data about n persons with two attributes– 4 hrs 2
Statistics for age and income and find Pearson correlation coefficient using a
Machine python program. Do not use any built-in library function for
Learning directly calculating Pearson correlation coefficient.
[ 2 marks ]

2. Introduction to
Machine
Learning
[ 5 Marks ]

2a. ● Introduction to python libraries like scipy 4 hrs 2


● Revisit matrix operations using scipy (basic matrix
operations of addition, subtraction, multiplication,
transpose)

2b. • Generation of random (x, y) pairs where y = f(x) + d (d varies 6 hrs 3


from -r to +r , a random value ), f being a linear function
• Linear regression or line fitting of the data
• Optimizing the function using gradient descent

3. Supervised ● Loading csv file-based datasets using file-read operation in 10 hrs 7


Learning python
[ 7 Marks ] ● Introduction to pandas library and loading csv and json files
● Building Logistic regression model for binary classification of
Diabetes Data set downloadable from the UCI machine
learning repository
● Building a decision tree classifier and testing on the
Diabetes Data
● Introduction to the IRIS dataset, building a logistic
regression model for multi-class classification and testing
the model on the IRIS dataset downloadable from UCI
Machine Learning Repository
● Building K-nearest neighbor classifier and testing on the IRIS
dataset
(Use Scikit-learn open source data analysis library for
implementing the models)
CONTACT
SUB TOPICS MARKS
HOURS

4. Unsupervised Using Scikit-learn library to use k means algorithm for clustering 8 Hrs 3
Learning IRIS data and its visualization
[ 3 Marks ]

5. Data Introduction to plotly library in python and plotting different 12 Hrs 5


Visualization
types of plot using the library refer
techniques
[ 5 Marks ] this(https://plotly.com/python/plotly-express/)
• Stem and leaf plots
• 1D Histogram of four attributes of the IRIS dataset
• 2D Histogram( considering the IRIS dataset, plot 2D
histogram of petal length and width)
• Box Plots (Considering the IRIS dataset, show the Box plots
of attributes for IRIS attributes and species)
• Plot the Pie chart, showing the distribution of IRIS flowers
(use IRIS dataset)
• Scatter Plots for each pair of attributes of the IRIS dataset
• Heatmap

6. Artificial ● Using MLP from Scikit-learn library, develop a handwritten 10 Hrs 5


Neural
digit recognition model using MLP and MNIST dataset
Network
[ 5 Marks ] ● Using CNN from Keras library, develop a handwritten digit
recognition model using CNN and MNIST dataset

7. Case studies in Case Study: sentiment analysis of movie reviews. Use machine 6 Hrs 3
Data Science
learning tools from Scikit-learn library and the IMDB dataset
[3 Marks]

You might also like