Class 22TT – Term I/2024-2025
Course: CS420 – Artificial Intelligence
Homework 04
Submission Notices:
Conduct your homework by filling answers into the placeholders given in this file (in Microsoft Word format).
Questions are shown in black color, instructions/hints are shown in italic and blue color, and your content
should use any color that is different from those.
After completing your homework, prepare the file for submission by exporting the Word file (filled with
answers) to a PDF file, whose filename follows the following format,
<StudentID-1>_<StudentID-2>_HW02.pdf (Student IDs are sorted in ascending order)
E.g., 2312001_2312002_HW04.pdf
and then submit the file to Moodle directly WITHOUT any kinds of compression (.zip, .rar, .tar, etc.).
Note that you will get zero credit for any careless mistake, including, but not limited to, the following things.
1. Wrong file/filename format, e.g., not a pdf file, use “-” instead of “_” for separators, etc.
2. Disorder format of problems and answers
3. Conducted not in English
4. Cheating, i.e., copy other students’ works or let the other student(s) copy your work.
Problem 1. (2.5pts) Answer each of the following question with a detailed explanation.
Please write your answer in the table.
Questions (0.5pt each) Answers
1. Name and briefly describe three advanced + Generative AI: Generative AI refers to AI
technologies in AI mentioned in the slides. models that can generate new, original content
using generative models.
+ Text generation: These models can generate
the relevant text based on the given prompt.
+ Text to image: Text-to-image AI can convert
the natural language descriptions into artistic
images.
2. What is the key difference between supervised Supervised learning model learns from labeled
learning and unsupervised learning? examples to map from input to output.
Unsupervised learning model learns from
unlabeled examples to describe patterns and
insights in the data.
3. What is the primary function of the perceptron The perceptron algorithm is mainly used in
algorithm in machine learning, and how does it binary classification. It divides the n-
update its weights during the training process? dimensional space into two decision regions by
a hyperplane defined by the linearly separate
function.
1
4. In the context of backpropagation, explain how The chain rule of calculus is essential for this
the chain rule of calculus is applied to compute the process, as it allows the gradients to be
gradients of the loss function with respect to each calculated layer by layer in a multi-layer
weight in a neural network. How does this process network.
help in updating the weights during the training of The chain rule helps in computing the gradient
a multi-layer neural network using gradient of the loss function with respect to each weight
descent? by breaking it smaller components.
+ Forward pass: The network computes the
output y by applying weights and activation
functions to the input data.
+ Backward pass: To update a weight w, we
will calculate the gradient of the loss with
respect to w.
Using the chain rule, we can break that
gradient into a product of partial derivatives:
∂L ∂ L ∂ z
= .
∂w ∂ z ∂ w
Where z is the weight input to next layer.
And after that, the gradients are propagated
backward layer by layer. The gradient at each
layer depends on the gradient of subsequent
layers.
5. Explain the differences between batch gradient The difference between these algorithms is
descent, stochastic gradient descent (SGD), and about the amount of data usage when updating
mini-batch gradient descent. the parameters.
+ Batch gradient descent: It computes the
gradient of entire dataset to perform a single
update in each iteration.
+ Stochastic gradient descent: It computes the
gradient using a single random sample from the
dataset to update.
+ Mini-batch gradient descent: It computes the
“k” samples from the dataset to update.
Problem 2. (2pts) For each of the statement below, choose either true or false, and then provide a
detailed explanation.
Please fill in your answers on the table below.
Statement (0.5pt each) True / False Explanation
1. In reinforcement learning, a reward is False The reward can depend on the state and
2
a number returned at a certain step of the action. The reward function determines
Markov Decision Process. The reward is the immediate feedback in that state s
not allowed to depend on state and after taking the action a and transitioning
action. to state s’
2. In reinforcement learning, a human False A human user is not typically needed to
user is typically needed to provide provide feedback during the learning
feedback to assess whether the predicted process. The algorithm learns by
value is correct or incorrect. This interacting with the environment,
feedback helps the algorithm learn and receiving the rewards as feedback.
refine its understanding of the problem
space.
3. Inverse reinforcement learning (which True Inverse reinforcement learning primarily
allows helicopters to fly autonomously) aims to learn the expert’s reward
focuses mainly on learning the expert’s function by interacting with the
reward function. environment based on the trajectories.
The core idea is that understands what
motivates the expert’s actions.
4. Reinforcement contingencies True The contingencies play a crucial role in
significantly influence which behaviors shaping the voluntary behavior because
individuals are likely to engage in individuals are more likely to engage in
voluntarily. actions that leads to positive outcomes or
rewards.
Problem 3. (2.5pts) The following table is a dataset used for training a decision tree. The labels of each
sample can be “yes” or “no” given three features X, Y, and Z. Your task is to build an ID3 decision tree
by splitting by information gain (draw the resulting tree), then answer the questions.
X Y Z Label
0 0 0 yes
0 0 1 yes
0 1 0 yes
0 1 1 no
1 0 0 no
1 0 1 no
1 1 0 yes
1 1 1 no
a. (1pt) Draw the decision tree below:
3
Your tree:
b. (1pt) Which features can be the root of the tree? Explain why.
Answer:
−4 4 4 4
H ( S )= log 2 − log 2 =1
8 8 8 8
4
AEx = ¿
8
I x =1−0.81127=0.18873
A Ey = (
4 −2
8 4
2 2
log 2 − log 2 +
4 4 4 8 4) (
2 4 −2 2 2 2
)
log 2 − log 2 =1
4 4 4
I y =1−1=0
4
4
AEz = ¿
8
I z =1−0.81127=0.18873
We can see that the information of “X” and “Z” are both 0.18873. Therefore, I choose “X” will be the
root of the decision tree.
c. (0.5pt) How many edges are there on the longest branch? Indicate the branch.
Answer:
There are 3 edges on the longest branch. That is the path from X go to Y through “0” and go to Z
through “1” and reach “No” through “1”.
Problem 4. (3pts) Answer the following question about Gradient Descent.
a. (1pt) Let ϕ (x ): R ↦ R d , w ∈ R d. Consider the following objective function (loss function).
Loss (x , y , w)=¿
where y ∈ R . Compute the gradient ∇ w Loss(x , y , w).
Answer:
Case 1: (w⋅ ϕ (x)) y ≤ 0
Loss( x , y , w )=1−2( w ⋅ϕ ( x )) y
∇ w Loss ( x , y , w )=−2 ϕ (x) y
Case 2: 0<(w ⋅ ϕ (x)) y ≤ 1
Loss( x , y , w )=¿
∇ w Loss ( x , y , w )=−2 ϕ (x) y (1− ( w ⋅ϕ ( x ) ) y)
Case 3: (w ⋅ ϕ (x)¿ y >1
Loss( x , y , w )=0
∇ w Loss ( x , y , w )=0
b. (1pt) Write out the Gradient Descent update rule for the function TrainingLoss (w):R d ↦ R .
Answer:
We need to find the gradient of the loss function TrainingLoss(w):
∂ TrainingLoss( w)
∇ w TrainingLoss(w) =
∂w
After that, we will update the weights based on the formula:
w ← w−η ∇ w TrainingLoss(w)
(1pt) Let d=2, and ϕ ( x )=[1 , x]. Consider the following loss function.
1
TrainingLoss (w)= (Loss (x1 , y 1 , w)+ Loss(x 2 , y 2 , w)).
2
5
Compute ∇ w Loss(w) for the following values of x 1 , y 1 , x 2 , y 2 , w .
1
[ ]
w= 0 , , x 1=−2 , y 1=1 , x 2=−1 , y 2=−1.
2
Answer:
We compute the ϕ ( x 1 ) and ϕ ( x 2 ):
ϕ ( x 1 )=[ 1 ,−2 ]
ϕ ( x 2 )=[1 ,−1]
Then we compute the w ϕ ( x 1 ) and w ϕ ( x 2 ):
[ ]
w ϕ ( x 1 )= 0 ,
1
2
1
∗[ 1 ,−2 ] =0∗1+ ∗(−2 )=−1
2
[ ][
w ϕ ( x 2 )= 0 ,
1
2
1
∗ 1 ,−1 ] =0∗1+ ∗(−1 ) =
2
−1
2
Then we compute the loss function
For x 1 , y 1 , w :
( w ϕ ( x1 ) ) y 1=(−1 )∗1=−1→case 1
Loss ( x1 , y 1 , w )=1−2 ( w ⋅ ϕ ( x 1 ) ) y 1
∇ w Loss ( x 1 , y 1 , w ) =−2 ϕ ( x 1 ) y 1=−2∗[ 1,−2 ]∗1=[−2 , 4]
For x 2 , y 2 , w :
( w ϕ ( x 2) ) y 2=( −12 )∗(−1)= 12 →case 2
Loss( x 2 , y 2 , w ) =¿
( 12 )=[1 ,−1]
∇ w Loss ( x 2 , y 2 , w ) =−2 ϕ ( x2 ) y 2 ( 1−( w ⋅ ϕ ( x 2 ) ) y 2 )=−2∗[ 1 ,−1 ]∗(−1 )∗ 1−
Finally, we will compute the gradient TrainingLoss(w):
1 1 −1 3
∇ w Traning Loss ( w )= ( ∇ w Loss ( x 1 , y 1 , w ) + ∇ w Loss ( x 2 , y 2 , w ) ) = ( [−2, 4 ] + [ 1 ,−1 ] )=[ , ]
2 2 2 2