0% found this document useful (0 votes)

28 views20 pages

NLP Week 1 20

Uploaded by

Trần Ngọc Sơn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views20 pages

NLP Week 1 20

Uploaded by

Trần Ngọc Sơn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Introduction to Natural Language Processing & Applications

Huan Vu, Faculty of DS&AI, NEU

Week 1 - Foundations and Modern Approaches
A comprehensive exploration of how computers understand, interpret, and generate human language — and why it matters for your career in data science.
Course Agenda
1 Fundamentals & Evolution 2 Real-World Applications
Understanding NLP's journey from rule-based systems to modern Exploring how NLP powers technologies we use daily
modern transformers

3 Practical Tools & Implementation 4 Building Your First NLP Pipeline

Hands-on experience with Python libraries that make NLP accessible From text preprocessing to implementing your first NLP model
accessible
What is Natural Language Processing?
NLP is the branch of artificial intelligence focused on giving computers the Text Understanding Contextual Interpretation
ability to understand, interpret, and generate human language in a way that is
both meaningful and useful.

It sits at the intersection of:

• Linguistics
• Computer Science
• Artificial Intelligence
• Cognitive Science
Language Generation
Why NLP is Challenging
Ambiguity Context Dependency Structural Complexity
"I saw a man on a hill with a telescope." "The bank is closed." Languages have complex grammars and
exceptions
• Who has the telescope? • Financial institution?
• Is the telescope being used to see the • River bank? • Irregular verbs
man? • Meaning depends on context • Nested clauses
• Multiple valid interpretations • Cultural references
Real-World NLP Applications
NLP powers many technologies we interact with daily, often without realizing it.

Conversational AI Sentiment Analysis Machine Translation

Virtual assistants (Siri, Alexa), customer service Tools that analyze customer reviews, social media Systems like Google Translate that convert text from
service chatbots, and interactive voice response posts, and survey responses to determine positive, one language to another while preserving meaning
response systems that understand and respond to negative, or neutral sentiment. and context.
respond to human requests.
More NLP Applications

Information Retrieval Text Summarization

Search engines that understand queries in natural language and return relevant Tools that condense long documents into shorter versions while retaining key
return relevant results, even with spelling errors or synonyms. retaining key information and main points.

Content Generation Healthcare NLP

Systems that create human-like text for emails, reports, articles, and more Applications that extract information from medical records, identify trends in
more based on prompts or templates. trends in patient data, and assist with clinical documentation.
The Evolution of NLP
1950s-1980s: Rule-Based Era 2010-2017: Neural Era
Hand-crafted linguistic rules and dictionaries Deep learning revolution

• ELIZA (1966) - early chatbot using pattern matching • Word2Vec embeddings (2013)
matching • Recurrent Neural Networks
• Focus on syntax and grammar rules • Sequence-to-sequence models
• Limited by rigid structures

1 2 3 4

1980s-2010s: Statistical Era 2017-Present: Transformer Era

Probability and machine learning Attention mechanisms

• Hidden Markov Models • BERT, GPT, T5

• Conditional Random Fields • Few-shot and zero-shot learning
• N-gram language models • Multimodal models (text + vision)
Rule-Based NLP (1950s-1980s)
Key Characteristics
ELIZA (1966)
• Hand-crafted linguistic rules
One of the earliest NLP systems, ELIZA simulated conversation by
• Pattern matching
pattern matching and substitution. It could mimic a psychotherapist
• Dictionaries and thesauri
by turning statements into questions:
• Syntax parsing based on grammar rules

Limitations Human: "I am feeling sad."

• Couldn't handle exceptions well ELIZA: "Why do you feel sad?"

• Required extensive linguistic expertise
Despite its simplicity, ELIZA created a surprising illusion of
• Difficult to maintain and scale understanding.
• Struggled with ambiguity
Statistical NLP (1980s-2010s)
Statistical approaches shifted focus from rigid rules to probabilities derived from large text corpora.

Key Technologies Advantages Limitations

• N-gram language models • Data-driven rather than rule-based • Limited by feature engineering
• Hidden Markov Models (HMMs) • Better handling of ambiguity • Struggled with long-term dependencies
• Maximum Entropy Models • Could learn from examples dependencies

• Conditional Random Fields (CRFs) • More robust to unexpected inputs • Required large amounts of training data

• Support Vector Machines (SVMs) • Captured patterns humans might miss

• Poor semantic understanding
• Context window limitations
Neural NLP (2010-2017)
The neural revolution began with word embeddings that captured semantic Major Improvements
relationships between words as vectors in a high-dimensional space.
• Better handling of semantics
• Improved language generation
• Ability to capture longer dependencies
Word2Vec (2013): Represented words as dense vectors where
where similar words appear close together. The famous example: • Reduced need for feature engineering
example: king - man + woman ≈ queen. • Transfer learning capabilities

Limitations
Neural Architectures
• Vanishing gradient problem in long sequences
• Recurrent Neural Networks (RNNs)
• Sequential processing (slow)
• Long Short-Term Memory (LSTM)
• Limited context window
• Gated Recurrent Units (GRU)
• High computational requirements
• Sequence-to-sequence models
Transformer Revolution (2017-Present)
The paper "Attention Is All You Need" (2017) introduced the Transformer architecture, fundamentally changing NLP.

1 Attention Mechanism 2 Pretraining & Fine-tuning 3 Massive Scale

Unlike RNNs, Transformers process entire Models are first pretrained on vast amounts Transformer models have grown from
sequences at once through self-attention, of text (billions of words) and then fine- BERT's 340M parameters to GPT-4's
weighing the importance of each word tuned for specific tasks with much smaller reported trillion+ parameters, capturing
relative to all others. This parallelization datasets. This transfer learning approach capturing increasingly subtle patterns in
enables training on massive datasets. dramatically improved performance across in language and demonstrating emergent
all NLP tasks. emergent capabilities not explicitly trained
trained for.
Key Transformer Models

2018 2019 2020 2022+

BERT GPT-2 T5 Modern LLMs
Bidirectional Encoder Representations Generative Pretrained Transformer 2 Text-to-Text Transfer Transformer by GPT-4, Claude, Llama 2, etc. Exhibit
Representations from Transformers by Transformer 2 by OpenAI. Auto- by Google. Unified all NLP tasks into a Exhibit emergent abilities like
Transformers by Google. regressive model trained to predict into a text-to-text format. reasoning, code generation, and
Revolutionized NLP by considering predict next words. Notable for high - Demonstrated how a single model multi-step problem solving not
considering context from both high-quality text generation model architecture could handle present in smaller models.
directions. Excels at understanding capabilities that raised ethical multiple tasks with state-of-the-art
understanding tasks like classification concerns. art results.
classification and named entity
recognition.
The NLP Pipeline
Despite advances in end-to-end learning, most NLP applications still follow a structured pipeline.

Text Acquisition
Gathering raw text from sources like websites, documents, databases, or APIs

Preprocessing
Cleaning text, handling encoding issues, removing HTML tags, normalizing text

Tokenization
Breaking text into words, subwords, characters, or other meaningful units

Feature Extraction
Converting tokens to numerical representations (embeddings, TF-IDF, etc.)

Modeling
Applying algorithms to perform specific NLP tasks
Essential NLP Tasks

Tokenization Part-of-Speech Tagging

Breaking text into tokens (words, subwords, characters) Identifying word types (noun, verb, adjective, etc.)

Named Entity Recognition Dependency Parsing

Finding and classifying named entities (person, organization, location) Analyzing grammatical structure and word relationships

Sentiment Analysis Machine Translation

Determining emotional tone (positive, negative, neutral) Converting text between languages
Course Tools: Python for NLP
Why Python?
Setting Up Your Environment
• Clear, readable syntax
We recommend using Anaconda for this course. Create a dedicated
• Rich ecosystem of NLP libraries
dedicated environment with:
• Strong academic and industry adoption
• Excellent for prototyping and production
conda create -n nlp_course python=3.10
• Extensive documentation and community support conda activate nlp_course
pip install nltk spacy transformers torch datasets
Python has become the de facto standard for NLP and machine learning work,
with an estimated 70% of practitioners using it as their primary language.
NLTK: Natural Language Toolkit
Overview Sample Code

NLTK is one of the oldest and most comprehensive Python libraries for NLP,
import nltk
developed primarily for education and research.
nltk.download('punkt’)
Key Features nltk.download('wordnet’)
from nltk.tokenize import word_tokenize
• Extensive corpus access (50+ corpora and lexical resources) from nltk.stem import WordNetLemmatizer
• Complete text processing pipeline tools text = "The quick brown foxes jumped over the lazy
• Support for classification, tokenization, stemming, tagging, parsing dogs”
tokens = word_tokenize(text)
print(tokens)
• Detailed documentation and book
lemmatizer = WordNetLemmatizer()
lemmas = [lemmatizer.lemmatize(token) for token in
tokens]
print(lemmas)
spaCy: Industrial-Strength NLP
Overview Sample Code

spaCy is designed for production use, focusing on efficiency and ease of use.
import spacy
# Load English model
Key Features nlp = spacy.load("en_core_web_sm")
# Process text
• Built for speed and production environments doc = nlp("Apple is looking to buy U.K. startup for $1
• Pre-trained models for multiple languages billion")
• End-to-end pipeline with single API # Named Entity Recognition
for ent in doc.ents:
• Integrated with deep learning frameworks
print(f"{ent.text}: {ent.label_}")
• Visualization tools
# Dependency parsing
for token in doc:
print(f"{token.text}: {token.dep_} ->
{token.head.text}")
Hugging Face: Transformers Made Easy
Overview Sample Code

Hugging Face has become the central hub for state-of-the-art NLP models and
from transformers import pipeline
tools.
# Sentiment analysis
Key Components sentiment_analyzer = pipeline("sentiment-analysis")
result = sentiment_analyzer("I love this course, it's
• Transformers: Library for pre-trained models amazing!")
• Datasets: Standardized access to NLP datasets print(result)
• Tokenizers: Fast tokenization implementations # [{'label': 'POSITIVE', 'score': 0.9998}]
# Text generation
• Model Hub: Community platform for sharing models
generator = pipeline("text-generation")
• Spaces: Interactive demos for models
text = generator("Natural language processing is",
max_length=30)
print(text[0]['generated_text'])
Course Project Preview: Building an NLP Pipeline
Throughout the course, you'll build a complete NLP system piece by piece, applying what you learn each week.

1 Week 1-2: Data Collection & Preprocessing 2 Week 3-4: Feature Engineering
Gather text data from various sources and build a robust preprocessing Implement different text representation techniques from TF-IDF to
pipeline including cleaning, normalization, and tokenization. IDF to modern embeddings and analyze their effectiveness.

3 Week 5-7: Model Development 4 Week 8-10: Integration & Deployment

Train and evaluate models for specific NLP tasks, starting with classical Combine components into a complete application solving a real -world
classical approaches and progressing to transformer-based solutions. real-world NLP problem and prepare it for deployment.
solutions.
Key Takeaways

NLP is Transforming Industries Rapid Evolution

From healthcare to customer service, NLP is fundamentally changing how The field has progressed from simple rule-based systems to sophisticated
changing how businesses operate and how humans interact with transformer models in just a few decades, with the pace of innovation
with technology. accelerating.

Accessibility Practical Skills Matter

Modern tools and libraries have democratized NLP, making powerful This course will equip you with both theoretical understanding and hands-
powerful techniques accessible to developers without specialized and hands-on experience using industry-standard tools like NLTK, spaCy,
specialized linguistics knowledge. NLTK, spaCy, and Hugging Face.

Next week: We'll dive into text preprocessing techniques and build our first NLP components!

1 NLP
No ratings yet
1 NLP
26 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Chap 1
No ratings yet
Chap 1
54 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
NLP Module 1
No ratings yet
NLP Module 1
31 pages
NLP AI Detailed Presentation
No ratings yet
NLP AI Detailed Presentation
18 pages
Introduction To Natural Language Processing (NLP) : by Ayush Shinde
No ratings yet
Introduction To Natural Language Processing (NLP) : by Ayush Shinde
10 pages
Natural Language Processing - Personal Notes
No ratings yet
Natural Language Processing - Personal Notes
8 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
Natural - Language - Processing (NLP)
No ratings yet
Natural - Language - Processing (NLP)
32 pages
Unit I - Natural Language Processing
No ratings yet
Unit I - Natural Language Processing
34 pages
Unit1 A
No ratings yet
Unit1 A
8 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
Wepik Navigating The Evolution of Natural Language Processing Historical Insights Current Trends and Fu 20241128054350wMK1
No ratings yet
Wepik Navigating The Evolution of Natural Language Processing Historical Insights Current Trends and Fu 20241128054350wMK1
11 pages
256 Fa2024 Intro-1
No ratings yet
256 Fa2024 Intro-1
66 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
Natural Language Processing: John Doe CEO
No ratings yet
Natural Language Processing: John Doe CEO
16 pages
Akchukwu Wisdom Chidi Seminar Corrected Version
No ratings yet
Akchukwu Wisdom Chidi Seminar Corrected Version
17 pages
The Evolution of LLMs in The Context of NLP
No ratings yet
The Evolution of LLMs in The Context of NLP
5 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
18 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
87 pages
Module-1 - Introduction To Natural Language Processing
No ratings yet
Module-1 - Introduction To Natural Language Processing
70 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
24 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
88 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Natural Language Processing
No ratings yet
Natural Language Processing
73 pages
NLP 1
No ratings yet
NLP 1
8 pages
NLPX
No ratings yet
NLPX
3 pages
NLP Presentation
No ratings yet
NLP Presentation
15 pages
N LP Notes Detailed
No ratings yet
N LP Notes Detailed
12 pages
NLP Applications and Challenges
No ratings yet
NLP Applications and Challenges
4 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Natural Language Processing New
No ratings yet
Natural Language Processing New
25 pages
NLP Unit1 Presentation
No ratings yet
NLP Unit1 Presentation
65 pages
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
No ratings yet
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
315 pages
Eco 36
No ratings yet
Eco 36
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
29 pages
BE02000041 Funda of AI Unit 2 NLP
No ratings yet
BE02000041 Funda of AI Unit 2 NLP
16 pages
B0CR67P3H9
No ratings yet
B0CR67P3H9
79 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
NLP AI Professional Presentation 2
No ratings yet
NLP AI Professional Presentation 2
18 pages
NLP Unit 1 and 2
No ratings yet
NLP Unit 1 and 2
106 pages
NLP DL
No ratings yet
NLP DL
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
43 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Deep Learning Paper1
No ratings yet
Deep Learning Paper1
16 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
NLP PPT1
No ratings yet
NLP PPT1
29 pages
Natural Language Processing
No ratings yet
Natural Language Processing
29 pages
Natural Language Processing (NLP) (A Complete Guide)
No ratings yet
Natural Language Processing (NLP) (A Complete Guide)
26 pages
176 DL
No ratings yet
176 DL
11 pages
Bajaj 3-Wheeler Test Specifications
No ratings yet
Bajaj 3-Wheeler Test Specifications
1 page
Green Aesthetic Thesis Defense Presentation
No ratings yet
Green Aesthetic Thesis Defense Presentation
12 pages
Contextual Reference: Reference Words/ Transitional Markers
No ratings yet
Contextual Reference: Reference Words/ Transitional Markers
50 pages
LBCL&FR #A: Ald of The of Natioaal Support
No ratings yet
LBCL&FR #A: Ald of The of Natioaal Support
2 pages
Pressure Ulcer Management Lecture
No ratings yet
Pressure Ulcer Management Lecture
3 pages
Seamless Brass Tube: Standard Specification For
No ratings yet
Seamless Brass Tube: Standard Specification For
6 pages
Chieving Affordable and Clean Energy: A Real-Life Application of SDG 7
No ratings yet
Chieving Affordable and Clean Energy: A Real-Life Application of SDG 7
4 pages
BM Ii-2
No ratings yet
BM Ii-2
2 pages
Anharmonic Crystal Interactions
No ratings yet
Anharmonic Crystal Interactions
3 pages
CNC M-Codes Explained
No ratings yet
CNC M-Codes Explained
10 pages
PDF
No ratings yet
PDF
229 pages
Introduction To Computational Fluid Dynamics Lecture 2: CFD Introduction
No ratings yet
Introduction To Computational Fluid Dynamics Lecture 2: CFD Introduction
18 pages
Bire Article p56 - 56
No ratings yet
Bire Article p56 - 56
16 pages
World Wide Azan Clock User Manual
No ratings yet
World Wide Azan Clock User Manual
20 pages
Semester Vi Ec 1601 Electronic Measurements and Instrumentation
No ratings yet
Semester Vi Ec 1601 Electronic Measurements and Instrumentation
11 pages
Selection of Different ITU-T G.652-Final
No ratings yet
Selection of Different ITU-T G.652-Final
8 pages
Class XII Multimedia Exam Guide
No ratings yet
Class XII Multimedia Exam Guide
5 pages
副本ITP for Cryocan
No ratings yet
副本ITP for Cryocan
8 pages
Auction at Dry Port Lahore
No ratings yet
Auction at Dry Port Lahore
3 pages
The Theory of Logical Types Monographs in Modern L... - (Table of Contents)
No ratings yet
The Theory of Logical Types Monographs in Modern L... - (Table of Contents)
2 pages
đề số 12
No ratings yet
đề số 12
7 pages
Effectiveness of Instagram Reels vs. YouTube Shorts in Brand Promotion
No ratings yet
Effectiveness of Instagram Reels vs. YouTube Shorts in Brand Promotion
14 pages
Bp-Life-Saving-Rules-Poster-Bahasa Indonesia
100% (1)
Bp-Life-Saving-Rules-Poster-Bahasa Indonesia
1 page
Bridgestone OTR Product Guide 17.1 06 28 2017 PDF
No ratings yet
Bridgestone OTR Product Guide 17.1 06 28 2017 PDF
44 pages
The Naming of Characters in Defoe, Richardson, and Fielding
No ratings yet
The Naming of Characters in Defoe, Richardson, and Fielding
18 pages
Motivation Letter Isnaillaila P.
50% (2)
Motivation Letter Isnaillaila P.
2 pages
Geography of India 9th Edition Majid Husain - Download The Ebook and Start Exploring Right Away
100% (3)
Geography of India 9th Edition Majid Husain - Download The Ebook and Start Exploring Right Away
48 pages
7 QC Tools
100% (10)
7 QC Tools
81 pages
100 KW Chetanbhai - Non-Subsidy
No ratings yet
100 KW Chetanbhai - Non-Subsidy
8 pages
Cylinder Liners
No ratings yet
Cylinder Liners
4 pages

NLP Week 1 20

Uploaded by

NLP Week 1 20

Uploaded by

Introduction to Natural Language Processing & Applications

Huan Vu, Faculty of DS&AI, NEU

3 Practical Tools & Implementation 4 Building Your First NLP Pipeline

It sits at the intersection of:

Conversational AI Sentiment Analysis Machine Translation

Information Retrieval Text Summarization

Content Generation Healthcare NLP

1980s-2010s: Statistical Era 2017-Present: Transformer Era

• Hidden Markov Models • BERT, GPT, T5

Limitations Human: "I am feeling sad."

• Couldn't handle exceptions well ELIZA: "Why do you feel sad?"

Key Technologies Advantages Limitations

• Support Vector Machines (SVMs) • Captured patterns humans might miss

1 Attention Mechanism 2 Pretraining & Fine-tuning 3 Massive Scale

2018 2019 2020 2022+

Tokenization Part-of-Speech Tagging

Named Entity Recognition Dependency Parsing

Sentiment Analysis Machine Translation

3 Week 5-7: Model Development 4 Week 8-10: Integration & Deployment

NLP is Transforming Industries Rapid Evolution

Accessibility Practical Skills Matter

You might also like