0% found this document useful (0 votes)
34 views13 pages

Assignment Mid

The document covers key concepts and techniques in Generative AI, including definitions, machine learning methods, and applications in various fields like healthcare. It also discusses advanced retrieval techniques, vector storage, and embedding models, along with practical examples of linear regression in Python. Ethical concerns, model evaluation metrics, and the importance of data preprocessing are highlighted throughout the content.

Uploaded by

Daniyal Shahbaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Assignment Mid

The document covers key concepts and techniques in Generative AI, including definitions, machine learning methods, and applications in various fields like healthcare. It also discusses advanced retrieval techniques, vector storage, and embedding models, along with practical examples of linear regression in Python. Ethical concerns, model evaluation metrics, and the importance of data preprocessing are highlighted throughout the content.

Uploaded by

Daniyal Shahbaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Fill-in-the-Blank Questions: (from document : 0_Gen_AI_Adnan.

pdf )

Generative AI for
BSCS-AI

1. Generative AI refers to a subset of AI techniques that generate new data instances


resembling the ________.
2. AI's capabilities include data interpretation, pattern recognition, and ________ processes.
3. The journey of AI from Turing’s era to modern applications is marked by periods known
as ________.
4. Narrow AI is designed for ________ tasks, while General AI aims to replicate human-like
cognitive abilities.
5. AI’s role in healthcare includes improving diagnostic accuracy and personalized ________.
6. Machine Learning is a branch of AI where algorithms learn from ________ to make
predictions.
7. Supervised learning uses ________ datasets to train algorithms.
8. Unsupervised learning finds hidden patterns in ________ data.
9. Reinforcement learning involves taking actions and learning from the ________.
10. Classification algorithms are used to categorize data into predefined ________.
11. Regression algorithms predict ________ outcomes based on input variables.
12. Dimensionality reduction techniques, such as PCA, reduce the number of ________
variables.
13. Neural networks consist of layers of interconnected nodes called ________.
14. The ________ function decides whether a neuron should be activated in a neural network.
15. Backpropagation is the process of adjusting ________ in a neural network based on error
rates.
16. Deep Learning involves neural networks with multiple ________ layers.
17. CNNs are specialized for processing data with a grid-like topology, such as ________.
18. Pooling layers in CNNs reduce the ________ size of the representation.
19. RNNs are designed to recognize patterns in ________ data.
20. LSTMs address the challenge of ________ dependencies in traditional RNNs.
21. Deep Learning models require large ________ for training to improve accuracy.
22. Transformer models use ________ mechanisms to process words in relation to all other
words in a sentence.
23. LLMs are trained using techniques like Masked Language Modeling and ________
Prediction.
24. Generative AI can create novel outputs such as images, text, or ________.
25. Discriminative models focus on differentiating between different ________ or categories.
26. GANs consist of a generator and a ________ that evaluate outputs in an adversarial
process.
27. VAEs encode inputs into a compressed representation and then ________ the input.
28. The ________ space in VAEs captures the essence of the data for manipulation.
29. Probability theory is central to Generative AI for modeling ________ in data.
30. Bayesian methods update the model's beliefs based on ________ data.
31. Sampling methods like Monte Carlo are used to approximate complex ________.
32. Information theory concepts like ________ quantify the information captured by generative
models.
33. Optimization techniques like ________ are used to find the best parameters for generative
models.
34. Regularization techniques such as dropout prevent ________ in generative models.
35. The ________ Score is a metric used to evaluate the performance of generative models.
36. AI winters refer to periods of reduced ________ in AI research.
37. The ________ matrix is used to evaluate the performance of classification models.
38. Overfitting occurs when a model learns the ________ data too well.
39. The F1-Score balances ________ and recall in model evaluation.
40. Transfer Learning allows deep learning models to be pre-trained on one task and
finetuned for ________.
41. The ________ layer in a neural network processes the final output.
42. Sigmoid and ReLU are examples of ________ functions in neural networks.
43. The ________ process in deep learning involves adjusting weights to minimize errors.
44. LLMs like GPT-3 have hundreds of billions of ________ for nuanced language processing.
45. Generative AI can automate ________ processes, such as art and music creation.
46. The ethical concerns of Generative AI include the creation of ________.
47. The ________ distance is a metric used to assess the quality of generative models.
48. The ________ method combines multiple algorithms to improve performance.
49. The ________ function in VAEs helps reconstruct the input from its compressed
representation.
50. The ________ theory provides a framework for handling uncertainty in generative models.

Answers:

1. training data
2. decision-making
3. AI winters
4. specific
5. treatment plans
6. data
7. labeled
8. unlabeled
9. results
10. classes
11. continuous
12. input
13. neurons
14. activation
15. weights
16. hidden
17. images
18. spatial
19. sequential
20. long-term
21. datasets
22. self-attention
23. Next Sentence
24. sound
25. classes
26. discriminator
27. reconstruct 28. latent
29. uncertainty
30. observed
31. distributions
32. entropy
33. gradient descent
34. overfitting
35. Inception
36. funding
37. confusion
38. training
39. precision
40. another
41. output
42. activation
43. backpropagation
44. parameters
45. creative
46. deepfakes
47. Fréchet Inception
48. ensemble
49. decoder
50. Bayesian

Technical Short Questions and Answers:

1. Q: What is the purpose of Retrieval-Augmented Generation (RAG)?


A: To enable AI systems to retrieve contextual documents for answering questions.
2. Q: Which loader is used to transcribe YouTube audio into text?
A: OpenAIWhisperParser with GenericLoader.
3. Q: What does FileSystemBlobLoader do?
A: Fetches raw binary data (e.g., audio files) from a local directory.
4. Q: Why is chunk_overlap used in text splitting? [CLO2]
A: To preserve context across adjacent chunks by sharing overlapping text.
5. Q: What is the key difference between RecursiveCharacterTextSplitter and
CharacterTextSplitter? [CLO2] A: Recursive splits hierarchically (paragraphs →

words), while Character splits at fixed character counts.

6. Q: Which separator is prioritized first in RecursiveCharacterTextSplitter?


A: \n\n (paragraph breaks).
7. Q: How does MarkdownHeaderTextSplitter preserve document structure?
A: By retaining header metadata (e.g., {"Header 1": "Title"}) in chunks.
8. Q: What is a typical chunk_size value for production RAG pipelines? [CLO3] A:
500–1500 characters.

Vector Storage & Embeddings (Lab 3)

9. Q: What embedding model is used by default in OpenAIEmbeddings?


A: text-embedding-ada-002.
10. Q: How does ChromaDB handle duplicate documents?
A: It stores them but does not deduplicate automatically.
11. Q: What metric is used for semantic search in vector databases?
A: Cosine similarity between query and document vectors.
12. Q: Why might a query about "regression in Lecture 3" fail in basic similarity search?
A: Without metadata filtering, it retrieves irrelevant chunks from other lectures.
13. Q: What is the dimensionality of OpenAI’s text embeddings?
A: 1536 dimensions.
14. Q: How can you check the number of documents stored in ChromaDB?
A: vectordb_collection.count().
Advanced Retrieval Techniques (Lab 4)

15. Q: What does k=3 specify in similarity_search?


A: The top 3 most relevant results to return.
16. Q: How does MMR (max_marginal_relevance_search) improve retrieval?
A: Balances relevance and diversity by selecting non-redundant results.
17. Q: What is the purpose of fetch_k in MMR? [CLO2]
A: Defines the candidate pool size before diversifying results.
18. Q: How can you filter search results to only include "Lecture [Link]" content?
[CLO3]
A: Use filter={"source": "[Link]"}.
19. Q: What does SelfQueryRetriever automate? [CLO2]
A: Extracts metadata filters (e.g., lecture number) from natural language queries.
20. Q: How does ContextualCompressionRetriever reduce noise? [CLO3] A:
Extracts only relevant parts of retrieved documents using an LLM.
21. Q: Name two lightweight alternatives to vector databases for retrieval.
A: TF-IDFRetriever and SVMRetriever.
22. Q: Why is a 20% chunk_overlap recommended for production? [CLO3] A:
Ensures context continuity without excessive redundancy.

General Concepts

23. Q: What is the primary advantage of deep learning in generative AI?


A: Automatic feature extraction from raw data.
24. Q: What problem does "mode collapse" refer to in GANs? [CLO2] A: The
generator produces limited varieties of outputs.
25. Q: Which activation function is commonly used in CNNs?
A: ReLU (Rectified Linear Unit).
26. Q: What is the role of backpropagation in neural networks?
A: Adjusts weights based on prediction errors during training.
27. Q: How does a Transformer model process language differently from RNNs?
[CLO2]
A: Uses self-attention to weigh word relationships globally (not sequentially).
28. Q: What is the key ethical concern with generative AI?
A: Misuse for creating deepfakes or copyrighted content.
29. Q: What does "latent space" represent in VAEs? [CLO2]
A: A compressed representation of input data for generation/manipulation.
30. Q: Which metric evaluates generative model outputs quantitatively?
A: Fréchet Inception Distance (FID).

31. Q: Explain the graph:

Ans: Code is here:

#Step1. Remove blank row and columns


#Step2. File save as CSV
#Step3. Upload it on colab file upload
#Step4. Run the command
import os
print([Link]()) # Lists all files in current directory
import os
print([Link]()) # Prints current working directory
import pandas as pd
import numpy as np
import [Link] as plt
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error
import math
from [Link] import files
uploaded = [Link]() # Use the UI to select your file
dataset = pd.read_csv('Dataset_House_Price.csv')

#read .csv into a DataFrame


#dataset = pd.read_csv('C:\\Users\\User\\Desktop\\house_prices.csv')
dataset = pd.read_csv('Dataset_House_Price.csv')
size=dataset['sqft_living']
price=dataset['price']

#machine learing handle arrays not dataframes


x = [Link](size).reshape(-1,1)
y = [Link](price).reshape(-1,1)

#we use Linear Regression + fit() is the training


model = LinearRegression()
[Link](x, y)

#MSE and R value


regression_model_mse = mean_squared_error(x, y)
print("MSE: ", [Link](regression_model_mse))
print("R squared value:", [Link](x,y))

#we can get the b values after the model fit


#this is the b0
print(model.coef_[0])
#this is b1 in our model
print(model.intercept_[0])

#visualize the dataset with the fitted model


[Link](x, y, color= 'green')
[Link](x, [Link](x), color = 'black')
[Link] ("Linear Regression")
[Link]("Size")
[Link]("Price")
[Link]()

#Predicting the prices


print("Prediction by the model:" , [Link]([[2000]]))

Explanation:
Code Explanation & Graph Output Analysis
This Python script performs Linear Regression on a dataset of house prices based on square
footage (sqft_living). Below is a breakdown of each step, along with the expected graph
output.

Step-by-Step Explanation
1. Data Loading & Preprocessing
import pandas as pd
import numpy as np
import [Link] as plt
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error
import math
from [Link] import files
uploaded = [Link]() # Uploads the CSV file in Google Colab
dataset = pd.read_csv('Dataset_House_Price.csv') # Loads the dataset
• Purpose:
o Uploads a CSV file (Dataset_House_Price.csv) containing house data (size in
sqft & price).
o Uses pandas to read the CSV into a DataFrame.

2. Data Extraction & Reshaping


size = dataset['sqft_living'] # Independent variable (X)
price = dataset['price'] # Dependent variable (Y)
# Convert to NumPy arrays and reshape for sklearn
x = [Link](size).reshape(-1, 1) # Reshapes into a 2D array (required by sklearn)
y = [Link](price).reshape(-1, 1)
• Purpose:
o Extracts sqft_living (features) and price (target).
o Reshapes them into 2D arrays since scikit-learn requires this format.

3. Linear Regression Model Training


model = LinearRegression() # Initialize the model
[Link](x, y) # Train the model on (X, Y)
• Purpose:
o Fits a linear regression model to predict price based on sqft_living.
o The equation learned:

Price=b0+b1×SizePrice=b0+b1×Size
4. Model Evaluation (MSE & R²)
regression_model_mse = mean_squared_error(x, y)
print("MSE: ", [Link](regression_model_mse)) # RMSE
print("R squared value:", [Link](x,y)) # R² (0 to 1, higher is better)
# Coefficients (slope & intercept)
print("Slope (b1):", model.coef_[0]) # b1 (price per sqft)
print("Intercept (b0):", model.intercept_[0]) # Base price when size=0
• Output Interpretation:
o RMSE (Root Mean Squared Error): Measures prediction error (lower =
better).
o R² (R-squared): Explains how well the model fits the data (0-1, 1 = perfect fit).
o Slope (b1): Price increase per additional sqft.
o Intercept (b0): Estimated price when sqft_living = 0.

5. Visualization (Scatter Plot + Regression Line)


[Link](x, y, color='green') # Actual data points
[Link](x, [Link](x), color='black') # Predicted regression line
[Link]("Linear Regression: House Price vs. Size")
[Link]("Size (sqft)")
[Link]("Price ($)")
[Link]()
• Graph Output:
o Green dots: Actual house prices vs. size.
o Black line: Predicted trend from the model.
o Interpretation:
▪ If the line fits well, the model is accurate.
▪ Outliers (dots far from the line) indicate prediction errors.

6. Making a Prediction
print("Prediction for 2000 sqft:", [Link]([[2000]]))
• Output Example:
o If the model predicts $500,000 for a 2000 sqft house, the output will be:
Prediction for 2000 sqft: [500000.]

Expected Graph (Example)


• X-axis: House size (sqft).
• Y-axis: Price ($).
• Black line: Best-fit regression line.
• Green dots: Actual data points.

Key Takeaways
1. Model Accuracy:
o High R² (~0.6-0.9) means the model explains price well.
o High RMSE means predictions may be off by thousands of dollars.
2. Prediction:
o The model can estimate prices for new house sizes.
3. Improvements Possible?
o Adding more features (bedrooms, location) could improve accuracy (Multiple
Linear Regression).
1. [Link]
2. [Link]
3. [Link]
4. [Link]
5. [Link]
6. [Link]
7. [Link]

You might also like