NLP and ES
NLP and ES
This comprehensive study material covers the key topics from the final module of the Artificial Intelligence course. It has been designed to provide clear
explanations, analytical content, and visual aids to help students prepare effectively for their examinations.
Table of Contents
2. Learning
2.1 Forms of Learning
2.2 Inductive Learning
2.3 Learning Decision Trees
2.4 Explanation-Based Learning
2.5 Learning Using Relevance Information
2.6 Neural Net Learning
2.7 Genetic Learning
3. Expert Systems
3.1 Representing and Using Domain Knowledge
3.2 Expert System Shells
3.3 Knowledge Acquisition
Understanding the meaning of text and speech Machine Translation (e.g., Google Translate)
Generating human-like responses Sentiment Analysis
Translating between languages Chatbots and Virtual Assistants
Extracting information from text Information Extraction
Answering questions based on textual content Text Summarization
Speech Recognition
Syntactic Semantic
Text Input Tokenization Output
Analysis Analysis
Challenges in NLP
Natural language processing faces several inherent challenges due to the complex nature of human language:
NLP represents one of the most challenging yet promising frontiers in artificial intelligence. While rule-based systems dominated early NLP
approaches, modern techniques rely heavily on statistical methods and deep learning. The shift towards data-driven approaches has significantly
improved performance but raised questions about language understanding versus statistical pattern recognition. A critical analysis reveals that while
current NLP systems excel at many practical applications, they still lack true language comprehension comparable to humans, highlighting the gap
between performance and genuine understanding.
Parsing Techniques
Context-Free Grammar (CFG): Formal grammar used to model the syntax of languages.
Dependency Parsing: Analyzing grammatical structure through word relationships.
Statistical Parsing: Using statistical models to determine the most likely parse.
S
/ \
NP VP
/| / \
D N V NP
| | | / \
The cat ate D N
| |
the food
Fig 1.2: Parse tree for the sentence "The cat ate the food"
Sentence: "The quick brown fox jumps over the lazy dog."
POS Tags: [('The', 'DET'), ('quick', 'ADJ'), ('brown', 'ADJ'), ('fox', 'NOUN'), ('jumps', 'VERB'), ('over', 'ADP'), ('the', 'DET'), ('lazy', 'ADJ'), ('dog', 'NOUN')]
The field of syntactic processing has evolved from rule-based approaches to statistical and neural methods. While early systems relied heavily on
hand-crafted rules and formal grammars, modern approaches use dependency parsing and transformer-based models that learn syntactic
relationships from data. This shift has improved robustness but raises questions about the balance between linguistic theory and data-driven
approaches. An effective analysis would consider how different syntactic processing methods perform across languages with varying grammatical
structures and complexities.
Word Sense Disambiguation: Determining which meaning of a word is used in a specific context.
Semantic Role Labeling: Identifying the relationship between words and their semantic roles (agent, patient, etc.).
Named Entity Recognition (NER): Identifying and classifying named entities (persons, organizations, locations, etc.).
Semantic Parsing: Converting natural language into a formal representation of its meaning.
John - Agent
gave - Predicate
Mary - Recipient
a book - Theme
yesterday
- Temporal
Lexical Semantics: Studying the meaning of words and the relationships between them.
Compositional Semantics: Understanding how meanings of individual words combine to form the meaning of sentences.
Statistical Semantics: Using statistical methods to analyze word co-occurrences and distributions.
Distributional Semantics: Representing word meanings based on their distribution in large text corpora.
Modern semantic analysis often relies on representing words and phrases as dense vectors in a high-dimensional space, where semantically similar words
have similar vector representations.
Word2Vec: Creates word embeddings by training neural networks to predict words from their context.
GloVe: Generates embeddings based on global word co-occurrence statistics.
Contextual Embeddings: Models like BERT and GPT create context-dependent representations.
The word "bank" has different meanings in these sentences based on context.
Semantic analysis remains one of the most challenging aspects of NLP. While syntactic structures can be effectively analyzed with formal grammars,
meaning is more elusive and context-dependent. Modern approaches using distributional semantics and neural networks have made significant
progress, but still struggle with nuanced aspects like metonymy, metaphor, and figurative language. A critical evaluation would consider how well
current semantic models capture human-like understanding versus statistical pattern recognition. The emergence of large language models has
shown impressive capabilities in semantic tasks, yet questions remain about their true "understanding" versus sophisticated pattern matching.
Discourse Analysis
Discourse analysis examines how sentences connect and relate to each other within larger texts:
Cohesion: Linguistic devices that tie text together (pronouns, conjunctions, etc.).
Coherence: Logical connections between ideas that make text meaningful as a whole.
Discourse Structure: Hierarchical organization of text (e.g., introduction, arguments, conclusion).
Anaphora Resolution: Identifying what entities pronouns and other references point to.
Pragmatic Processing
Speech Acts: Analyzing what actions speakers perform through their utterances (promising, requesting, asserting).
Presuppositions: Assumptions embedded in statements that are taken for granted.
Conversational Implicatures: Meanings implied beyond what is literally stated.
Context and Common Ground: Shared knowledge between participants in a conversation.
Coreference Resolution: Determining when different expressions refer to the same entity.
Topic Modeling: Identifying the main themes or subjects in a document.
Rhetorical Structure Theory (RST): Analyzing how parts of text relate to each other hierarchically.
Dialogue Act Classification: Categorizing utterances based on their communicative function.
The system must recognize that "He" refers to "John" and "her" refers to "Mary".
While B doesn't explicitly say "no," the implicature is that they cannot attend due to the exam.
Discourse and pragmatic processing represent the frontier of NLP research, as they require systems to understand language beyond the sentence
level. The challenges include tracking entities across text, understanding implicit connections, and interpreting non-literal meanings. Traditional NLP
approaches struggled with these aspects, but transformer-based models have shown some ability to capture longer-range dependencies. However,
truly understanding pragmatics requires world knowledge and reasoning capabilities that remain difficult to implement. A critical analysis reveals that
while progress has been made in narrow domains, general-purpose discourse and pragmatic understanding still falls short of human capabilities,
particularly for nuanced social contexts, cultural references, and conversational dynamics.
2. Learning
Learning from labeled examples (input-output Learning from unlabeled data Learning through interaction with an environment
pairs) Goal: Find patterns or structure in data Goal: Learn optimal actions to maximize rewards
Goal: Predict outputs for new inputs Examples: Clustering, Dimensionality Reduction Examples: Game playing, Robot control
Examples: Classification, Regression
Selecting the appropriate learning approach depends on several factors including data availability, problem complexity, and computational resources.
Supervised learning provides strong guarantees when high-quality labeled data is abundant, but becomes impractical when labeling is expensive or
scarce. Unsupervised learning offers insights into data structure without requiring labels but may not directly optimize for the desired task.
Reinforcement learning is powerful for sequential decision-making but often requires extensive interaction samples. A critical analysis would consider
the trade-offs between these approaches and how they might be combined (e.g., using unsupervised pre-training followed by supervised fine-tuning) to
leverage the strengths of each paradigm while mitigating their limitations.
Predictions on New
Training Examples Learning Algorithm Hypothesis/Model
Data
Task: Learn a concept for "will play tennis" based on weather conditions.
Final hypothesis: The features "High" humidity and "Weak" wind must be present to play tennis.
Inductive learning mirrors the scientific method, where specific observations lead to general theories. However, induction faces philosophical
challenges, such as the "problem of induction" raised by David Hume: past patterns may not necessarily predict future events. In machine learning, this
manifests as the generalization problem - how well will a model perform on unseen data? The No Free Lunch theorem suggests that no learning
algorithm can outperform others across all possible problems without incorporating domain-specific assumptions. This highlights the importance of
inductive bias in machine learning - the set of assumptions that allows a learner to generalize beyond the training data. A critical analysis would
consider how different inductive biases affect learning outcomes and how to select appropriate biases for specific domains.
Root Node: The starting point that represents the entire population.
Internal Nodes: Decision points where the data is split based on feature values.
Branches: Outcomes of decisions representing possible paths.
Leaf Nodes: Terminal nodes that provide the final classification or prediction.
Outlook
/ | \
Sunny Overcast Rain
/ \
Humidity Wind
/ \ / \
High Normal Strong Weak
| | | |
No Yes No Yes
ID3 (Iterative Dichotomiser 3): Uses Information Gain based on entropy to select attributes.
C4.5: Extension of ID3 that handles continuous attributes and missing values.
CART (Classification and Regression Trees): Uses Gini impurity for classification or mean squared error for regression.
Weighted Average Entropy after split: 5/14 × 0.97 + 4/14 × 0 + 5/14 × 0.97 ≈ 0.69
Advantages Disadvantages
Decision trees represent a fundamental approach to machine learning that balances interpretability with predictive power. Their hierarchical structure
makes them particularly well-suited for problems where understanding the decision process is as important as the outcome itself, such as in medical
diagnosis or credit approval. However, individual decision trees often suffer from high variance and can overfit the training data. This limitation has led
to the development of ensemble methods like Random Forests and Gradient Boosted Trees, which aggregate multiple trees to improve performance
while sacrificing some interpretability. A critical analysis would consider how to balance the trade-off between interpretability and performance,
especially in domains where transparency is required for ethical or regulatory reasons. Additionally, one might examine how decision trees compare to
other interpretable models like rule-based systems or sparse linear models in terms of expressiveness and accuracy.
Domain Theory: Prior knowledge about the domain represented as rules or constraints.
Explanation: Logical proof or reasoning that explains why an example belongs to a concept.
Generalization: Creating broader rules from specific explanations.
Operationalization: Converting explanations into efficiently executable rules.
Inputs Process
Output
Operational concept definition that covers the example and can be efficiently applied
EBL Algorithm
1 Explanation: Use domain theory to explain why the training example satisfies the target concept.
2 Generalization: Identify the general conditions under which the explanation holds.
3 Operationalization: Transform the generalized explanation into an efficient form that satisfies operationality criteria.
Domain Theory: A cup is a stable, liftable container that can hold liquids
Generalized Rule: Any object with a flat bottom, handle, concave shape, and impermeable material can be classified as a cup.
Advantages Disadvantages
Requires fewer examples than purely inductive learning Requires comprehensive domain theories
Incorporates domain knowledge Domain theory must be correct
Produces understandable rules May not generalize well if domain theory is incomplete
Can learn from a single example Computational complexity in explanation generation
Explanation-Based Learning represents an important bridge between purely inductive learning approaches and knowledge-based systems. While
modern machine learning has largely shifted toward data-driven approaches that require minimal prior knowledge, EBL's principles remain relevant for
domains with limited training data but rich domain theories. The integration of prior knowledge with learning systems has seen a resurgence in neuro-
symbolic approaches that combine neural networks with symbolic reasoning. A critical analysis might examine how EBL concepts could enhance
modern deep learning by incorporating domain knowledge to improve sample efficiency, interpretability, and robustness. Additionally, as explainable AI
(XAI) becomes increasingly important, EBL's focus on generating understandable explanations offers valuable insights for developing interpretable
learning systems.
Feature Relevance: Indicating which attributes are most important for classification or prediction.
Example Relevance: Identifying which training examples are most informative or representative.
Domain Constraints: Prior knowledge about relationships or constraints in the problem domain.
Relevance Feedback: User input about the relevance of results or predictions.
Consider a dataset for predicting customer churn with features: Age, Subscription Duration, Monthly Bill, and Customer Service Calls.
Based on this information gain analysis, "Customer Service Calls" is the most relevant feature for predicting churn, followed by "Subscription Duration".
Learning with relevance information represents an important middle ground between purely data-driven and purely knowledge-driven approaches. By
incorporating domain expertise about which features matter most, learners can achieve better performance with less data and computational
resources. However, identifying truly relevant features is itself a challenging problem. Methods like filters and wrappers make different trade-offs
between computational efficiency and the quality of feature selection. A more nuanced analysis would consider how relevance information might
introduce biases into the learning process if the assumed relevance does not match the actual importance in the data. Furthermore, in some domains,
complex interactions between features mean that individually irrelevant features may become highly relevant when considered in combination.
Modern approaches increasingly focus on automatically learning relevance, as seen in attention mechanisms in deep learning, which discover where
to focus computational resources during learning without explicit human guidance about feature relevance.
Neurons: Basic processing units that receive inputs, apply an activation function, and produce outputs.
Weights: Parameters that determine the strength of connections between neurons.
Layers: Groups of neurons operating together.
Input Layer: Receives the initial data
Hidden Layers: Intermediate processing layers
Output Layer: Produces the final prediction or classification
Activation Functions: Non-linear functions that determine neuron output (e.g., Sigmoid, ReLU, Tanh).
Hidden Layer
Input Layer
h₁
Output Layer
x₁
h₂ y₁
x₂
h₃ y₂
x₃
h₄
2 Forward Propagation: Pass input data through the network to generate predictions.
3 Error Calculation: Compute the difference between predictions and actual targets.
5 Weight Update: Adjust weights using an optimization algorithm (e.g., gradient descent).
Loss Function: Measure of prediction error (e.g., Mean Squared Error, Cross-Entropy)
Learning Rate: Controls the size of weight updates during training
Backpropagation: Algorithm for computing gradients of the error with respect to weights
Epoch: One complete pass through the entire training dataset
Batch Learning: Updating weights after processing a subset of training examples
Feedforward Neural Networks Convolutional Neural Networks (CNN) Recurrent Neural Networks (RNN)
Information flows in one direction Specialized for processing grid-like data Contains feedback connections
No loops or cycles Uses convolutional filters Maintains internal state/memory
Used for classification and regression Common in image processing Used for sequential data
The XOR function is a classic example that demonstrates the need for hidden layers in neural networks.
0 0 0
0 1 1
1 0 1
1 1 0
A single-layer perceptron cannot learn this pattern because XOR is not linearly separable. A network with at least one hidden layer is required to solve
this problem.
Neural networks have undergone a remarkable transformation from theoretical models with limited practical applications to the driving force behind
many recent AI advances. This resurgence, often called the "deep learning revolution," can be attributed to several factors: increased computational
power, larger datasets, algorithmic improvements, and architectural innovations. However, neural network learning still faces significant challenges,
including interpretability concerns, data requirements, and optimization difficulties. Traditional neural networks are often viewed as "black boxes,"
making it difficult to understand their decision-making process, which poses challenges in critical applications like healthcare and autonomous
vehicles. Emerging research in explainable AI attempts to address these concerns by developing methods to interpret network decisions. Additionally,
while deep neural networks have achieved impressive results on many tasks, they can struggle with causality, common sense reasoning, and out-of-
distribution generalization. A critical analysis would consider how neural networks might be combined with other AI approaches, such as symbolic
reasoning, to address these limitations while building on their strengths in pattern recognition and representation learning.
3 Selection: Select individuals for reproduction (higher fitness individuals have higher probability).
6 Replacement: Form a new population by replacing some or all of the original population.
Problem: Find the shortest route that visits all cities exactly once and returns to the starting city.
Representation: A chromosome is a permutation of city indices (e.g., [3, 1, 4, 2, 5] represents a specific route).
Crossover: Order Crossover (OX) - preserves relative order of cities from both parents.
Mutation: Swap Mutation - randomly swap positions of two cities in the route.
Advantages Disadvantages
Genetic learning approaches offer several unique advantages compared to traditional optimization methods. They can explore non-convex,
discontinuous, and multi-modal search spaces that challenge gradient-based methods. The population-based approach allows for implicit parallelism
and maintenance of solution diversity. Furthermore, genetic algorithms have demonstrated remarkable versatility across domains from engineering
design to scheduling problems. However, their effectiveness often depends heavily on implementation details like representation choice, operator
design, and parameter settings. A critical analysis would consider when evolutionary approaches are most appropriate compared to alternatives. For
problems with well-defined gradients and smooth landscapes, traditional optimization methods may be more efficient. Genetic learning shines when
problems involve complex constraints, multiple competing objectives, or deceptive landscapes with many local optima. Modern research increasingly
explores hybrid approaches that combine evolutionary methods with other techniques, such as using local search to refine solutions found by genetic
algorithms or incorporating domain knowledge to guide the evolutionary process.
3. Expert Systems
IF-THEN production rules Structured objects with slots Nodes representing concepts
Condition-action pairs Default values and inheritance Edges representing relationships
Easy to understand and modify Good for stereotypical situations Visual representation of relationships
Fever
Logic-Based Representation: Using formal logic (propositional, first-order) to represent knowledge and derive conclusions.
Ontologies: Formal representations of knowledge with classes, properties, and relationships within a domain.
Case-Based Representation: Storing specific problem instances and their solutions for future reference.
Probabilistic Models: Representing uncertain knowledge using Bayesian networks or other probabilistic frameworks.
Inference Mechanisms
Once knowledge is represented, expert systems need mechanisms to reason with this knowledge:
IF temperature > 101°F AND cough AND sore throat THEN possible_flu (CF = 0.7)
IF possible_flu AND body_aches THEN flu (CF = 0.9)
IF temperature > 101°F AND earache THEN possible_ear_infection (CF = 0.6)
IF possible_ear_infection AND ear_discharge THEN ear_infection (CF = 0.8)
Facts: temperature = 102°F, cough = true, sore throat = true, body_aches = true
The choice of knowledge representation formalism involves significant trade-offs that impact system performance, maintainability, and capabilities.
Rule-based systems offer simplicity and modularity but may become unwieldy as the rule set grows, leading to challenges in maintaining consistency
and detecting conflicts. Semantic networks and frames provide more structured representations that can capture hierarchical relationships efficiently
but may struggle with procedural knowledge. Logic-based representations offer formal preciseness and the ability to prove properties but often face
computational complexity issues with large knowledge bases. A critical analysis reveals that no single representation is universally superior; the
optimal choice depends on domain characteristics, reasoning requirements, and development constraints. Modern expert systems increasingly adopt
hybrid approaches, combining multiple representation formalisms to leverage their complementary strengths. Furthermore, the rise of machine
learning has led to interesting research on integrating statistical approaches with symbolic knowledge representation, attempting to combine the
flexibility and learning capacity of statistical models with the transparency and explicability of symbolic representations.
Inference Engine: Applies reasoning methods to draw conclusions from the knowledge base.
Knowledge Base Editor: Tools for creating and modifying the domain knowledge.
Explanation Facility: Provides explanations of how conclusions were reached.
User Interface: Interface for interacting with the expert system.
Knowledge Acquisition Module: Assists in gathering and organizing domain knowledge.
Domain-Specific Knowledge
Inference Engine User Interface
(Added by domain experts)
- Forward/Backward - Input/Output Processing
Chaining - Query Handling
- Conflict Resolution - Results Presentation
- Uncertainty Handling
CLIPS (C Language Integrated Production System): Rule-based programming environment developed by NASA.
JESS (Java Expert System Shell): Rule engine and scripting environment written in Java.
Drools: Business rule management system with forward and backward chaining.
Prolog-based shells: Leverage Prolog's logic programming capabilities.
PyKE (Python Knowledge Engine): Knowledge-based inference engine in Python.
Expert system shells emerged during the heyday of symbolic AI but their relevance has evolved with the changing AI landscape. While traditional rule-
based expert systems have been overshadowed by machine learning approaches in many domains, the core principles of separating domain
knowledge from reasoning mechanisms remain valuable. Modern expert system shells have adapted by incorporating probabilistic reasoning,
integrating with machine learning components, and providing better interoperability with contemporary software ecosystems. A critical analysis
reveals that expert system shells still offer unique advantages in domains requiring transparent reasoning and explicit knowledge representation, such
as regulatory compliance, medical diagnosis, and configuration systems. However, their limitations in handling uncertainty, scaling to very large
knowledge bases, and adapting to changing environments have restricted their applicability. The future of expert system shells likely lies in hybrid
systems that combine the explainability and knowledge representation capabilities of traditional expert systems with the adaptability and pattern
recognition strengths of machine learning approaches.
Sources of Knowledge
Knowledge Articulation Problem: Experts often cannot easily articulate their tacit knowledge.
Expert Availability: Domain experts typically have limited time available.
Knowledge Inconsistencies: Different experts may provide contradicting information.
Communication Gap: Knowledge engineers may lack domain understanding.
Knowledge Evolution: Domain knowledge continuously changes and evolves.
Verification: Ensuring the acquired knowledge is accurate and complete.
Raw Protocol:
"First, I check the patient's vital signs... temperature of 102°F is concerning. Then I look at respiratory rate—it's elevated at 24 breaths per minute. The patient complains
of cough and chest pain—that's typical for respiratory infections. Now I listen to the lungs... there's crackles in the lower right lobe. That's highly suspicious for
pneumonia. I'd order a chest X-ray to confirm, but based on these findings, I'm already thinking bacterial pneumonia. If the white blood cell count is elevated, that would
further support this diagnosis."
Extracted Rules:
IF temperature > 101°F AND elevated respiratory rate THEN consider respiratory infection
IF respiratory infection AND cough AND chest pain THEN suspect pneumonia
IF suspect pneumonia AND lung crackles THEN high probability of pneumonia
IF high probability of pneumonia AND elevated WBC THEN bacterial pneumonia
Knowledge acquisition has long been recognized as the primary bottleneck in expert system development, often requiring more resources than any
other development phase. This challenge arises from the fundamental difficulty of converting human expertise—which is often tacit, intuitive, and
context-dependent—into explicit, formalized representations. While various techniques have been developed to address this challenge, each has
limitations: interviews depend heavily on the expert's ability to articulate their knowledge; observation may miss critical cognitive processes; and
automated approaches may extract superficial patterns without capturing deeper principles. A critical analysis reveals that successful knowledge
acquisition typically requires combining multiple complementary methods and establishing an iterative process of extraction, formalization, validation,
and refinement. Modern approaches increasingly leverage machine learning to support knowledge acquisition, either by pre-processing large volumes
of domain information or by learning initial patterns that human experts can refine. However, even with these advances, the integration of human
expertise remains essential for developing robust expert systems, particularly in domains requiring nuanced judgment, ethical considerations, or
handling of exceptional cases.
Machine Learning
Mitchell, T. M. (1997). Machine Learning. McGraw Hill.
Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Expert Systems
Giarratano, J. C., & Riley, G. D. (2004). Expert Systems: Principles and Programming (4th ed.). Course Technology.
Jackson, P. (1998). Introduction to Expert Systems (3rd ed.). Addison Wesley.
Luger, G. F. (2008). Artificial Intelligence: Structures and Strategies for Complex Problem Solving (6th ed.). Addison-Wesley.
Online Resources
Stanford NLP Group: https://nlp.stanford.edu/
Association for Computational Linguistics: https://www.aclweb.org/
Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php
CLIPS Documentation: https://www.clipsrules.net/