0% found this document useful (0 votes)
11 views9 pages

Final Data Science Syllabus WBCHSE XIandXII

The document outlines the curriculum for Data Science classes XI and XII, detailing the theoretical and practical components of the syllabus. It covers topics such as computer fundamentals, Python programming, AI and data science foundations, data visualization, database management, and machine learning concepts. The assessment structure includes full marks distribution for theory and practical components across various subjects.

Uploaded by

Tanweer Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Final Data Science Syllabus WBCHSE XIandXII

The document outlines the curriculum for Data Science classes XI and XII, detailing the theoretical and practical components of the syllabus. It covers topics such as computer fundamentals, Python programming, AI and data science foundations, data visualization, database management, and machine learning concepts. The assessment structure includes full marks distribution for theory and practical components across various subjects.

Uploaded by

Tanweer Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Science

Class XI

Full Marks - 100


Theory - 70
Practical - 30

1. Computer Fundamentals [ 15 Marks ]

1a History of computer, Basic Computer hardware, input and output devices, Basic computer 5
architecture, input output devices, memory and CPU, networking of machines (overview of LAN,
MAN, WAN, Internet, Wifi etc), types of computer (workstation, desktop, Smartphone, embedded
system, etc.), Overview of Software (system software and application software with examples
(mention names only)), Definition of Operating System and functions (mention names of some
popular operating systems like Windows, Linux, Android, etc).

1b Bit, Byte and Word, Number System (Base, Binary, Decimal, Octal, Hexadecimal), Conversion of 5
number systems, Boolean logic(Boolean Gates ), Boolean operators (OR, AND, and NOT), ASCII
code, Concept of Algorithm and Flowchart

1c Basics of Computer Programming (three levels: high level language, assembly language, machine 5
language, definition and block diagrams), Overview of Compiler and Interpreter (definition and
mention name of major compiled (e.g., C, C++) and interpreted languages (e.g., Python)),
Overview of procedural and object oriented programming (key features and just the basic
differences, mention names of some popular procedural (e.g., BASIC, FORTRAN, C) and object
oriented programming languages (e.g., C++, Java, Python)).

2. Introduction to Python Programming [15 Marks ]

2a Basics of Python programming (with a simple 'hello world' program, process of writing a program, 5
running it, and print statement), Concept of class and object, Data-types (integer, float, string),
notion of a variable, Operators (assignment, logical, arithmetic etc.), accepting input from console,
conditional statements (If else and Nested If else), Collections (List, Tuple, Sets and Dictionary),
Loops (For Loop, While Loop & Nested Loops), iterator, String and fundamental string operations
(compare, concatenation, sub string etc.), Function, recursion.

2b Overview of linear and nonlinear data structure (definition, schematic view and difference), array 6
(1D, 2D and its relation with matrix, basic operations: access elements using index, insert, delete,
search), stack (concept of LIFO, basic operations: Push, Pop, peek, size), queue (concept of FIFO,
basic operations: Enqueue, Dequeue, peek, size), use of List methods in python for basic
operations on array, stack and queue, overview of NumPy library and basic array operations
(arrange(), shape(), ndim(), dtype() etc.), binary tree (definition and schematic view only) .

2c Linear search and binary search algorithm, sorting algorithm (bubble sort only) 4
3. Foundation for AI and Data Science [ 20 Marks ]

3a History of AI: Alan Turing and cracking enigma, mark 1 machines, 1956-the birth of the term AI, 2
AI winter of 70’s, expert systems of 1980s, skipped journey of present day AI. Distinction
between terms AI, Pattern recognition and Machine Learning
(Note: it should be taught as a story more than flow of information World war 2, Enigma
and Alan Turing, the birth of modern computers)

3b Brief history of data science, data science as a conjunction of computer science statistics and 3
domain knowledge. Definition of data science, data science life cycle - capture, maintain, process,
analyze, communicate

3c Introduction to linear algebra and statistics for DS: 8


• Basic matrix operations like matrix addition, subtraction, multiplication, transpose of
matrix, identity matrix
• A brief introduction to vectors, unit vector, normal vector, Euclidean space
● Probability distribution, frequency, mean, median and mode, variance and standard
deviation, Gaussian distribution
● Random sampling by uniform distribution and students-t distribution hypothesis testing
● Distance function, Euclidean norm, distance between two points in 2D and 3D and
extension of idea to n dimensions

3d. Basic ideas of different Data Science Toolkit: Excel, Weka , R 7

4. Data Visualization [ 10 Marks ]

4 Types of data: textual data (reviews, comments blogs), signal data (time series, audio, sensor data) 10
visual data (image and video, remote sensing data, feeds etc.) Introduction to data dimension and
modality, their representations in computer science. Data cleaning
● Representation of data in textual form, tokens, sentences, word histograms, reading from
web pages using crawlers
● Representation format of audio data, uncompressed wav format and compressed mp3
format (just the description of the pipeline, no maths)
● Representation of visual data in RGB pixels, storing in raw format and compressed
format (just the description of the pipeline, no maths)
● Representation of other forms of data like time series values from different sensors,
remote sensing image data etc.
● Introduction to the concept of multimodality i.e. different modes of data from the same
information source (example audio and video generated when filming)
● Data dimension (resolution for image, frequency bins and sampling rate for audio, word
histograms for text)
● Concept of data cleaning, removal of abnormal, incomplete and corrupted or garbage data
as a preprocessing stage.
5. Database Management [ 5 Marks ]

5 Brief introduction to relational database, tables for keeping data, brief introduction to SQL 5
● Introduction to the concept of database
● Relational database, table, schema as columns and tuple as rows
● Some basic SQL statements such as CREATE, SELECT, INSERT, UPDATE, DELETE
(Simple query examples)

6. Basics of Business Theory [ 5 Marks ]

6 Business theory basics: Different business models B2B, B2C. Aggregator type business, 5
manufacturing type business, consultancy and turnkey service based businesses, social media type
and general digital platform type business, content hosting businesses.
Definitions of profit, loss, revenue, break-even, valuation etc. [NO LAB COMPONENT
● The basic business types, product based and service based
● Business classification by clients, the B2B and B2C models
● Types of business who use DS extensively: software product and service, aggregator (cab,
food delivery, groceries, online market), manufacturing and banking
● Consultancy type business and service profiling
● Social media business and targeted advertisement based business model
● Basic business terminologies, refer (https://getsling.com/blog/business-terms/)

Practical

1. Computer Fundamentals [ No marks ]

1a ● Visit to Computer Lab and familiarization with computers and peripherals and different
networking devices (e.g., modem, switch, router).
● Opening of the CPU box/cabinet and identification of different parts (e.g., Motherboard,
CPU/Processor, RAM, Hard Disk, power supply).

2. Introduction to Python Programming [ 10 Marks ]

2a ● Introduction to installation and running of python codes with hello world and simple 3
accessing user inputs from console examples.
● Menu driven arithmetic calculator
● Simple logical and mathematical programs (e.g., printing patterns, Conversion of binary
to decimal and vice versa, computing GCD of two numbers, Finding prime numbers,
Generating Fibonacci sequence, Computing factorial –iterative and recursive etc.)
● Finding max, min, avg, sum, length of a list
● Use of basic string methods like upper(), lower(), count(), find(), join(), replace(), split()
etc.
2b ● Use of Python List methods for Stack and Queue implementation, for examples, 7
append() and pop()
● Use of NumPy array methods: arrange(), shape(), ndim(), size(), add(), subtract(),
multiply(), divide(), mat() etc.
● Use of NumPy matrix multiplication methods: dot(), matmul(), multiply() etc.
● Linear search and binary search in an array
● Bubble sort in an array

3. Foundation for AI and Data Science [ 5 Marks ]

3 ● Generation of random numbers in python following a certain distribution and filling up 5


random arrays
● Introduction to matplotlib to plot arrays as histograms
● Computation of mean, median and mode
● Computing CDF from PDF and plotting using matplotlib
● Plotting Gaussian distribution with a given mean and standard deviation
● Plotting mixture of Gaussian distributions

4. Data Visualization [ 10 marks ]

4 Using Scipy, opencv and NLTK libraries run codes for the following 10
● Visualization of audio data as spectrogram
● Visualization of image data by zooming into pixels
● Visualization of word histograms

5. Database Management [5 marks ]

5 ● Use of MySQL database for Creating tables 5


● Running retrieval, insertion, deletion and updation queries
Data Science
Class XII

Full Marks - 100


Theory - 70
Practical - 30

1. Foundation of Statistics for Machine Learning [ 5 Marks ]

1 Distance between distributions - Euclidean norm, Pearson correlation coefficient, basic concepts 5
of (not in details) chi square distance, Bayes theorem and Bayesian probability
● Real n dimensional space (R^n) and vector algebra, dot product of two vectors, vector
projections.
● Product moment correlation coefficient (Pearson's coefficient) its use in determining
relation between two sets of data
● Chi square and , use in finding distance between two distributions
● Conditional probability and Bayes theorem , conditional independence

2. Introduction to Machine Learning [ 15 Marks ]

2a • What is machine learning? 10


• Difference between traditional programming and machine learning
• Relation of machine learning with AI
• Applications of machine learning.
• Why should machines have to learn? Why not design machines to perform as desired in
the first place?
• Types of Machine Learning (Supervised, Unsupervised, Semi-supervised and
reinforcement learning),
● Concept of training, testing and validation, Concepts of training examples, Linear
Regression with one variable , hypothesis representation, hypothesis space, Learning
Requires Bias, Concept of Loss function
● Training methods for linear regression model: Iterative trial-and-error process that
machine learning algorithms may use to train a model, Disadvantages of iterative
training method, gradient descent algorithm.
● Effect of learning rate on reducing loss. Importance of feature scaling(min-max
normalization)

2b • What is feature or attribute? 5


• Definition and meaning of feature in various kinds of data (e.g., structured data,
unstructured data( text data, image data))
• Types of features(continuous, categorical)
• Representation of training examples with multiple features
• Linear regression with multiple attributes (multiple Features)
• Feature cross and polynomial regression
3. Supervised Learning [ 20 Marks ]

3a ● Difference between regression and classification. Examples of some real world 7


classification problems
● Linear classification and threshold classifier, Concept of input space and linear separator,
Drawback of threshold classifier, use of logistic function in defining hypothesis function
for logistic regression model.
● Probabilistic interpretation of output of the logistic regression model, use of logistic
regression model in binary classification task. Multi-class classification using One vs, all
strategy.

3b Decision tree Learning: 8


 Concept of entropy for measuring purity (impurity) of a collection of training examples.
and information gain as a measure of the effectiveness of an attribute in classifying the
training data (just basics and equation) .
 Inducing decision tree from the training data using ID3 algorithm, an illustrative example
showing how the ID3 algorithm works.
 Concept of overfitting, reduced error pruning
 Discretizing continuous-valued Attributes using information gain-based method
(binary split only)
Probabilistic classifier:
 Bayesian Learning, conditional independence
 Naive Bayes classifier

3c Measuring Classifier performance: 5


● Confusion matrix, true positive, true negative, false positive, false negative, error,
accuracy, precision, recall, F-measure, sensitivity and specificity
● K-fold cross validation

4. Unsupervised Learning [ 5 Marks ]

4 • What is unsupervised learning? 5


• Difference between supervised and unsupervised learning.
• What is clustering?
• Why is clustering an unsupervised learning technique?
• Some examples of real world application of clustering,
• Difference between clustering and classification
• K-means clustering algorithm. Simple use cases
5. Data visualization techniques [10 Marks ]

5 ● What is the need for data visualization? 10


● visualization techniques: visualization of a small number of attributes (Stem and leaf
plots, 1D Histogram and 2D Histogram, Box Plots, Pie chart, Scatter Plots)
● Visualizing Spatio -temporal Data (Contour plots, Surface plots)
● Visualizing higher dimensional data (Plot of data matrix)
● Heatmap visualization
● Introduction to data visual platform- Tableau and Google Chart

6. Artificial Neural Network [ 10 Marks ]

6 ● Biological motivation for Artificial Neural Networks(ANN) 10


● A simple mathematical model of a neuron (McCulloch and Pitts(1943))
● Concept of activation function: threshold function and Sigmoid function,
● Perceptron as a linear classifier, perceptron training rule
● Representations of AND and OR functions of two inputs using threshold perceptron.
Equation of a linear separator in the input space, Representational power of perceptrons
● Training unthresholded perceptron using Delta rule, Need for hidden layers , XOR
example,
● Why we need non-linearity? Network structures: feed forward networks and recurrent
networks (basic concept only)
● Training multiplayer feed-forward neural networks using Backpropagation algorithm
(Concepts only and no derivation).
● Generalization, overfitting, and stopping criterion, overcoming the overfitting problem
using a set of validation data
● An Illustrative example of an ANN architecture for hand written digit recognition (Only
input representation, output representation and a block diagram of the network)
● Need for automatic feature learning, difference between the conventional feed-forward
neural networks and CNN, role of convolution layer in CNN, An example of 2D
convolution, function of pooling layer
● A block diagram illustrating CNN applied to handwritten digit recognition task

7. Case studies in Data Science [ 5 Marks ]

7 Some case studies: 5


• Weather forecasting using some statistical and machine learning tools (consider the ML
algorithms covered in the theoretical subjects)
• Sentiment Analysis using some machine learning tools (consider the ML algorithms
covered in the theoretical subjects)
• A simple collaborative filtering-based recommendation System
Practical
1. Foundation of Statistics for Machine Learning [ 2 marks ]

1 Consider a table of data about n persons with two attributes-age and income and find Pearson 2
correlation coefficient using a python program. Do not use any built-in library function for
directly calculating Pearson correlation coefficient.

2. Introduction to Machine Learning [ 5 Marks ]

2a 2
● Introduction to python libraries like scipy
● Revisit matrix operations using scipy (basic matrix operations of addition, subtraction,
multiplication, transpose)

2b • Generation of random (x, y) pairs where y = f(x) + d (d varies from -r to 3


+r , a random value ), f being a linear function
• Linear regression or line fitting of the data
• Optimizing the function using gradient descent

3. Supervised Learning [ 7 Marks ]

3a ● Loading csv file-based datasets using file-read operation in python 7


● Introduction to pandas library and loading csv and json files
● Building Logistic regression model for binary classification of Diabetes
Data set downloadable from the UCI machine learning repository
● Building a decision tree classifier and testing on the Diabetes Data
● Introduction to the IRIS dataset, building a logistic regression model for multi-class
classification and testing the model on the IRIS dataset downloadable from UCI Machine
Learning Repository
● Building K-nearest neighbor classifier and testing on the IRIS dataset
(Use Scikit-learn open source data analysis library for implementing the models)

4. Unsupervised Learning [ 3 Marks ]

4 Using Scikit-learn library to use k-means algorithm for clustering IRIS data and its visualization 3
5. Data Visualization techniques [ 5 Marks ]

5 Introduction to plotly library in python and plotting different types of plot using the library refer 5
this(https://plotly.com/python/plotly-express/) ●
• Stem and leaf plots
• 1D Histogram of four attributes of the IRIS dataset
• 2D Histogram( considering the IRIS dataset, plot 2D histogram of petal length and
width)
• Box Plots (Considering the IRIS dataset, show the Box plots of attributes for IRIS
attributes and species)
• Plot the Pie chart, showing the distribution of IRIS flowers (use IRIS dataset)
• Scatter Plots for each pair of attributes of the IRIS dataset
• Heatmap

6. Artificial Neural Network [ 5 Marks ]

6 ● Using MLP from Scikit-learn library, develop a handwritten digit recognition model using 5
MLP and MNIST dataset
● Using CNN from Keras library, develop a handwritten digit recognition model using CNN
and MNIST dataset

7. Case studies in Data Science [3 Marks]

7 Case Study: sentiment analysis of movie reviews. Use machine learning tools from Scikit-learn 3
library and the IMDB dataset

Text Books:
1. Fundamentals of Computers,E Balagurusamy, McGraw Hill,2009
2. Artificial Intelligence: A Modern Approach, Stuart Russell, Peter Norvig, Pearson Education
3. Machine Learning by Peter Flach, Cambridge University Press
4. Machine Learning, Tom Mitchell, McGraw Hill, 1997
5. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures, Claus Wilke,
Publisher(s): O'Reilly Media, Inc., 2019.
6. Introduction to Machine Learning with Python, by Andreas C. Müller, Sarah Guido, Publisher(s): O'Reilly Media

You might also like