0% found this document useful (0 votes)
29 views15 pages

FAM Prelims

Uploaded by

Aariz Fakih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

FAM Prelims

Uploaded by

Aariz Fakih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

FAM (ANSWERS)

PRELIMINARY EXAMS
1. Components of AI.
 Learning: AI learns from experience, employing trial-and-error
methods to improve performance, such as finding optimal moves in
games.
 Reasoning: AI employs logical thinking to draw conclusions or make
decisions based on provided information, relying on deductive or
inductive reasoning.
 Problem Solving: AI addresses diverse challenges, from strategic
decision-making in games to complex tasks like image recognition,
utilizing specialized or general methods.
 Perception: AI perceives its surroundings through sensors like
cameras, interpreting scenes and extracting object features and
relationships to interact with the environment effectively.

2. Types of agents in AI
 Simple Reflex Agent: Simple reflex agents are the most basic,
functioning best in clear environments. They make decisions
without considering the past, limiting their intelligence.
 Model-based Reflex Agent: These agents handle less clear
situations by using a "model" of how things work and tracking
what they've seen to make better decisions.

 Goal-based Agents: Goal-based agents have specific objectives


and choose actions to achieve them. They plan ahead to reach
their goals, making them proactive.
 Utility-based Agents: These agents not only have goals but also
look for the most efficient ways to achieve them. They use a
"utility" function to determine the best actions.

 Learning Agents: Learning agents improve over time by learning


from experiences. They consist of components for learning,
feedback, decision-making, and suggesting new actions for better
performance.
3. Explain heuristic search techniques in detail:
(1) Generate and test
(2) Hill climbing
(3) A*
(4) BFS

1. Generate and Test:


Generate and Test is a basic heuristic search technique where the system
generates possible solutions iteratively and tests each one to see if it
satisfies the problem's requirements.
Generation: The process begins by generating potential solutions based on
some predefined rules or heuristics.
Testing: Each generated solution is then tested to determine if it satisfies
the problem constraints or criteria. If a solution is found, the search process
stops, and the solution is returned. If not, the generation and testing process
continue until a satisfactory solution is found or a predefined search limit is
reached.
Advantages:
 Simple and easy to implement.
 Can handle a wide range of problems.
Disadvantages:
 Inefficient for complex problems as it does not use any information
from previous attempts to guide the search.
2. Hill Climbing:
Hill climbing is a local search algorithm that continuously moves in the
direction of increasing elevation (or value) to find the peak of the mountain,
which represents the optimal solution in the search space.
Evaluation: The algorithm evaluates the current state to determine its
quality using a heuristic function, which estimates how close the state is to
the goal.
Selection: It then selects a neighboring state (a state adjacent to the current
state) based on some predefined rules. The neighboring state is chosen if it
has a higher heuristic value than the current state.
Iteration: The algorithm iteratively moves to the neighboring state with the
highest heuristic value until it reaches a peak where no neighboring state
has a higher value.
Advantages:
 Simple and easy to understand.
 Memory efficient as it doesn't store the entire search space.
Disadvantages:
 Can get stuck in local optima, meaning it might not find the global
optimum if the search space has multiple peaks.

3. A* (A Star):
A* is a widely used heuristic search algorithm that combines the benefits of
both Dijkstra's algorithm (which guarantees the shortest path) and greedy
best-first search (which uses heuristics to guide the search). It uses a
heuristic function to estimate the cost from the current state to the goal and
guides the search towards the most promising paths.
Evaluation: A* evaluates each state based on the total cost, which is the sum
of the cost to reach the current state (from the initial state) and the heuristic
estimate of the cost to reach the goal state from the current state.
Selection: It selects the state with the lowest total cost for further
exploration.
Iteration: The algorithm continues exploring states with lower total costs
until it reaches the goal state.
Advantages:
 Completeness (will find a solution if one exists).
 Optimality (finds the shortest path if the heuristic is admissible and
consistent).
Disadvantages:
 Memory intensive, especially for large search spaces, as it needs
to store and manage information about all the visited states.

4. BFS (Breadth-First Search):


BFS is a simple heuristic search technique that explores all the neighbor
nodes at the present depth before moving to nodes at the next depth level
in a graph. It starts at the root (initial state) and explores all the neighbor
nodes at the present depth before moving to nodes at the next depth level.
Exploration: BFS explores the search space level by level, considering all
nodes at the current depth before moving to nodes at the next depth.
Queue: It uses a queue data structure to keep track of the nodes to be
explored next. Nodes are added to the queue as they are discovered, and
they are processed in a first-in-first-out manner.
Advantages:
 Guarantees the shortest path in terms of the number of steps.
 Complete (will find a solution if one exists) for finite state spaces.
Disadvantages:
 Memory intensive, especially for large and deep search spaces, as
it needs to store all nodes at the current depth level.

4. ML life cycle

1. Data Collection: Gather relevant data from various sources, ensuring it's
comprehensive and unbiased.
2. Data Preparation: Cleanse and preprocess the data, handling missing
values and outliers. Split the data into training and testing sets.
3. Feature Selection/Engineering: Identify significant features or create new
ones to enhance the model's performance.
4. Model Selection: Choose an appropriate machine learning algorithm
based on the problem (e.g., regression, classification) and data
characteristics.
5. Training: Feed the training data into the selected algorithm. The model
learns from the data to make predictions.
6. Evaluation: Test the model using the testing data to assess its accuracy,
precision, recall, or other relevant metrics.
7. Tuning: Fine-tune the model by adjusting hyperparameters for optimal
performance.
8. Deployment: Integrate the model into the application, allowing it to make
predictions on new, unseen data.
9. Monitoring and Maintenance: Continuously monitor the model's
performance in real-world scenarios. Retrain or update the model as
needed to maintain accuracy.
.

5. Data cleaning (missing data and outliers)


Data cleaning is a crucial step in the data analysis process. It involves
identifying and correcting errors or inconsistencies in datasets to improve
their quality and reliability. Missing data and outliers are two common
issues that require special attention during data cleaning.
Missing Data:
Missing data occurs when no information is available for certain
observations or attributes in a dataset. Dealing with missing data is
essential to avoid biased or inaccurate analysis results. Several techniques
can be used to handle missing data:
 Removing Rows: One straightforward approach is to remove rows with
missing values. However, this method can lead to a loss of valuable
information, especially if a large portion of the data is missing.
 Filling Missing Values: Missing values can be filled using various
techniques such as mean, median, mode imputation, or using more
advanced methods like regression imputation, where missing values
are predicted based on other variables.
 Interpolation: Interpolation methods estimate missing values
based on existing data points. Linear interpolation, spline
interpolation, or time-based interpolation methods can be used
depending on the nature of the data.
 Multiple Imputation: This method involves creating multiple
imputed datasets with different imputed values for missing data.
Statistical analysis is then performed on each dataset, and the
results are combined to provide more accurate estimates and
uncertainty measures.
Outliers:
Outliers are data points that significantly deviate from the rest of the
dataset. They can distort statistical analyses and lead to misleading
conclusions. Identifying and handling outliers is crucial for maintaining the
integrity of the analysis. Here are some methods to deal with outliers:
 Removing: Outliers can be removed from the dataset, but this
should be done cautiously, considering the impact on the analysis.
 Transforming: Applying mathematical transformations can
sometimes make the data more robust to outliers. Common
transformations include logarithm, square root, or Box-Cox
transformations.
 Binning: Grouping outliers into a separate category/bin can be
useful, especially in categorical data.
 Winsorizing: Replacing extreme outliers with the nearest less
extreme value within a certain range.
6. Supervised vs Unsupervised

Supervised unsupervised
Labeled data Unlabeled data
Used in Email spam Used in Clustering customer
classification, Handwriting segments, Anomaly detection
recognition
The algorithm receives No feedback loop; the model
feedback through labeled data explores patterns without
to adjust and improve supervision.
predictions.
Silhouette Score, Inertia,
Accuracy, Precision, Recall, F1- Davies-Bouldin Index
score

7. Define AOC – ROC Curve


An ROC (Receiver Operating Characteristic) curve is a graphical
representation used in binary classification tasks to assess the
performance of a classification model. ROC curves are particularly useful
when the classes in the dataset are imbalanced, meaning one class occurs
much more frequently than the other.
8. Define log loss curve
Log loss, also known as logistic loss or cross-entropy loss, is a loss function
used in classification tasks, especially when dealing with probabilities. The
log loss measures the performance of a classification model where the
prediction output is a probability value between 0 and 1. It quantifies how
well the predicted probabilities align with the true class labels.
9. Cross validation in ML
Cross-validation is a technique used in machine learning to assess how well
a predictive model will generalize to an independent dataset. It is
particularly important when the available dataset is limited, as it allows for
a more robust evaluation of the model's performance.
Steps of Cross-Validation:
 Data Splitting: The dataset is divided into k subsets of approximately
equal size. Each of these subsets is called a fold.
 Model Training and Validation: The model is trained on k−1 of the folds
and validated on the remaining fold. This process is repeated k times,
each time using a different fold as the validation set and the remaining
folds for training.
 Performance Metric Calculation: The performance metric (such as
accuracy, mean squared error, or log loss) is calculated for each
validation iteration.
 Average Performance: The average of the performance metrics
obtained from all k validation iterations is taken as the overall
performance metric of the model.
Advantages:
 Reliable Estimate: Provides a reliable performance estimate,
especially with limited data or sensitive metrics.
 Data Utilization: Maximizes data use for both training and validation,
ensuring comprehensive evaluation.
 Model Selection: Helps choose the best model among candidates,
aiding in decision-making.
 Bias & Variance Analysis: Offers insights into model bias and variance,
vital for generalization understanding.
Disadvantages:
 Computational Intensity: Can be computationally expensive,
particularly for large datasets or complex models.
 Interpretability Challenge: Interpreting results can be complex,
especially when explaining to non-technical stakeholders.
 Data Dependency: Effectiveness relies on the assumption of
independent and identical data distribution, which might not always
hold.
 Hyper parameter Over fitting: May lead to overfitting hyperparameters
to specific validation sets, especially without proper techniques.
10. Explain linear regression
Linear Regression is a fundamental statistical method and a popular
machine learning algorithm used for predicting a continuous target variable
based on one or more input features. It assumes a linear relationship
between the input variables (independent variables) and the output variable
(dependent variable). In simple terms, linear regression finds the best-
fitting straight line through the data points in a way that minimizes the sum
of the squared differences between the observed and predicted values.
Types:
 Simple Linear Regression: Involves predicting a target variable using
a single input variable.
 Multiple Linear Regression: Predicts the target variable using multiple
input variables.
Pros (Advantages):
 Simplicity: Easy to understand and simple to implement, making it a
good starting point for predictions.
 Interpretability: Provides insights into how each input variable affects
the output, making it easy to interpret results.
 Speed: It's computationally fast, making it efficient for large datasets
and quick analyses.
Cons (Disadvantages):
 Assumption of Linearity: Assumes a linear relationship, which might
not always hold true in real-world scenarios.
 Limited Complexity: Might not capture complex relationships between
variables, leading to less accurate predictions in some cases.
 Sensitive to Outliers: Outliers can significantly influence the
regression line, affecting the accuracy of predictions.
 Overfitting: If too many input variables are used without proper
validation, it might lead to overfitting, where the model performs well
on training data but poorly on new data.
11. Explain error major matrix
 Mean Absolute Error (MAE): •
Definition: MAE is a regression metric that measures the average
absolute difference between the predicted values and the actual
values. •
Formula: MAE = (1/n) * Σ|Y_pred - Y_actual|
 Mean Squared Error (MSE): •
Definition: MSE is another regression metric that measures the average
squared difference between the predicted values and the actual values.
• Formula: MSE = (1/n) * Σ(Y_pred - Y_actual)^2.
 Root Mean Squared Error (RMSE): •
Definition: RMSE is a variation of MSE that provides the square root of
the average squared difference between predicted and actual values. •
Formula: RMSE = √(MSE)
 MAPE (Mean Absolute Percentage Error): •
Definition: MAPE is a percentage-based metric that assesses the
relative accuracy of predictive models by measuring the average
percentage difference between predicted and actual values, commonly
applied in finance, economics, and supply chain management. •
Formula: MAPE = (1/n) * Σ(|(Y_actual - Y_pred) / Y_actual|) * 100

12. Explain ML issues


 Over fitting:
Issue: Over fitting occurs when a model learns the training data too well,
capturing noise and specific patterns that don't generalize well to new,
unseen data. As a result, the model performs poorly on test data.
Solution: Regularization techniques, cross-validation, and using more
training data can help prevent over fitting. Choosing simpler models and
avoiding excessively complex ones also mitigate this issue.
 Under fitting:
Issue: Under fitting happens when the model is too simplistic to capture the
underlying patterns in the data. It performs poorly both on the training and
test data.
Solution: Increasing the model's complexity, using more relevant features,
and employing more advanced algorithms can help address underfitting.
 Data Quality:
Issue: Poor-quality or inconsistent data, including missing values and
outliers, can significantly impact model performance. Models learn from the
data provided, and if the data is erroneous or biased, the predictions will be
flawed.
Solution: Careful data preprocessing, cleaning, and validation are crucial.
Handling missing values, outliers, and ensuring data consistency are
essential steps.
 Imbalanced Data:
Issue: In datasets where one class significantly outnumbers the others
(class imbalance), the model tends to favor the majority class, leading to
biased predictions for the minority class.
Solution: Techniques like oversampling the minority class, under sampling
the majority class, or using algorithms designed for imbalanced data (e.g.,
SMOTE) can mitigate class imbalance issues.

You might also like