About this ebook
NLP is a large and multidisciplinary field, so this course can only provide a very general introduction. The first chapter is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next three chapters describe some of the main subareas in more detail. The organisation is roughly based on increased `depth' of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Each chapter will consider the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). However, other approaches will sometimes be discussed briefly. The final chapter brings the preceding material together in order to describe the state of the art in sample applications.
The objective of my book for the students is to:
1. be able to describe the architecture of and basic design for a generic NLP system `shell'.
2. be able to discuss the current and likely future performance of several NLP applications, such as machine translation and email response.
3. be able to briefly describe a fundamental technique for processing language for several subtasks, such as morphological analysis, syntactic parsing, word sense disambiguation etc.
4. understand how these techniques draw on and relate to other areas of (theoretical) computer science, such as formal language theory, formal semantics of programming languages, or theorem proving.
Ajit Singh
Profesor asistente Colegio de mujeres de Patna, Bihar, India Más de 20 años de sólida experiencia docente en cursos de pregrado y posgrado de informática en varias facultades de la Universidad de Patna y NIT Patna, Bihar, IND. Membresías 1. InternetSociety (2168607) - Capítulos de Japón/Francia/Delhi/Trivendrum 2.IEEE (95539159) 3. Asociación Internacional de Ingenieros (IAENG-233408) 4. Investigación de Eurasia STRA-M19371 5. ORCID https://orcid.org/0000-0002-6093-3457 6. Fundación de software de Python 7. Asociación de ciencia de datos 8. Asociación de Autores de No Ficción (NFAA-21979)
Read more from Ajit Singh
5 G Technologies Rating: 5 out of 5 stars5/5Numpy Simply In Depth Rating: 5 out of 5 stars5/5Formal Languages And Automata Theory Rating: 0 out of 5 stars0 ratingsInternet of Things & Wireless Sensor Network Rating: 0 out of 5 stars0 ratingsThe Internet of Things: System and Applications Rating: 0 out of 5 stars0 ratingsAgile & Scrum Methodologies Rating: 0 out of 5 stars0 ratings
Related to Natural Language Processing
Related ebooks
Natural Language Processing with Python: Natural Language Processing Using NLTK Rating: 4 out of 5 stars4/5Python Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Text Analytics with Python: A Brief Introduction to Text Analytics with Python Rating: 0 out of 5 stars0 ratingsPython Text Processing with NLTK 2.0 Cookbook: LITE Rating: 4 out of 5 stars4/5Large Language Models Rating: 2 out of 5 stars2/5Artificial Intelligence: Machine Learning, Deep Learning, and Automation Processes Rating: 4 out of 5 stars4/5Mastering Social Media Mining with Python Rating: 5 out of 5 stars5/5Python 3 Text Processing with NLTK 3 Cookbook Rating: 4 out of 5 stars4/5Deep Learning Fundamentals in Python Rating: 4 out of 5 stars4/5Python Machine Learning Illustrated Guide For Beginners & Intermediates:The Future Is Here! Rating: 5 out of 5 stars5/5Artificial Intelligence with Python - Second Edition: Your complete guide to building intelligent apps using Python 3.x, 2nd Edition Rating: 0 out of 5 stars0 ratingsMachine Learning For Beginners Guide Algorithms: Supervised & Unsupervsied Learning. Decision Tree & Random Forest Introduction Rating: 0 out of 5 stars0 ratingsTensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5Deep Learning with Keras Rating: 4 out of 5 stars4/5Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines Rating: 4 out of 5 stars4/5Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation Rating: 0 out of 5 stars0 ratingsMachine Learning Interview Questions Rating: 5 out of 5 stars5/5Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsConvolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python Rating: 0 out of 5 stars0 ratingsDeep Learning with TensorFlow Rating: 5 out of 5 stars5/5Machine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Building Machine Learning Systems with Python Rating: 4 out of 5 stars4/5
Intelligence (AI) & Semantics For You
Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5AI Money Machine: Unlock the Secrets to Making Money Online with AI Rating: 5 out of 5 stars5/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5The ChatGPT Revolution: How to Simplify Your Work and Life Admin with AI Rating: 0 out of 5 stars0 ratingsArtificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Artificial Intelligence For Dummies Rating: 3 out of 5 stars3/580 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/53550+ Most Effective ChatGPT Prompts Rating: 0 out of 5 stars0 ratingsChat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 4 out of 5 stars4/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery Rating: 2 out of 5 stars2/5How To Become A Data Scientist With ChatGPT: A Beginner's Guide to ChatGPT-Assisted Programming Rating: 4 out of 5 stars4/5Generative AI For Dummies Rating: 2 out of 5 stars2/5Coding with AI For Dummies Rating: 1 out of 5 stars1/5100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi Rating: 5 out of 5 stars5/5AI for Educators: AI for Educators Rating: 3 out of 5 stars3/5The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions Rating: 4 out of 5 stars4/5The Roadmap to AI Mastery: A Guide to Building and Scaling Projects Rating: 3 out of 5 stars3/5
Reviews for Natural Language Processing
0 ratings0 reviews
Book preview
Natural Language Processing - Ajit Singh
Copyrighted Material
Natural Language Processing
Copyright © 2019 by Ajit Singh. All Rights Reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means electronic, mechanical, photocopying, recording or otherwise without prior written permission from the author, except for the inclusion of brief quotations in a review.
For information about this title or to order other books and/or electronic media, contact the publisher:
Ajit Singh
http://www.ajitvoice.in
Published by Ajit Singh at Smashwords.
Library of Congress Control Number: (N/A)
ISBN: A/F
Cover and Interior design: Ajit Singh.
Smashwords Edition, License Notes
This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or given away to other people. If you would like to share this book with another person, please purchase an additional copy for each recipient. If you’re reading this book and did not purchase it, or it was not purchased for your use only, then please return to your favorite ebook retailer and purchase your own copy. Thank you for respecting the hard work of this author.
Preface
NLP is a large and multidisciplinary field, so this book can only provide a very general introduction. The first chapter is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next three chapters describe some of the main subareas in more detail. The organization is based on increased `depth' of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Each chapter will consider the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). However, other approaches will sometimes be discussed briefly. The final chapter brings the preceding material together in order to describe the state of the art in sample applications.
This book aims to introduce the fundamental techniques of natural language processing, to develop an understanding of the limits of those techniques and of current research issues, and evaluate some current and potential applications.
Objectives
The objective of my book for the students is to:
Be able to describe the architecture of and basic design for a generic NLP system `shell'.
Be able to discuss the current and likely future performance of several NLP applications, such as machine translation and email response.
Be able to briefly describe a fundamental technique for processing language for several subtasks, such as morphological analysis, syntactic parsing, word sense disambiguation etc.
Understand how these techniques draw on and relate to other areas of (theoretical) computer science, such as formal language theory, formal semantics of programming languages, or theorem proving.
Key Features
Discussion of the main problems involved in language processing by means of examples taken from NLP applications with methodological distinctions and puts the applications and methodology into some historical context.
Discussion of morphology, concentrating mainly on English morphology. The concept of a lexicon in an NLP system is discussed with respect to morphological processing. Spelling rules are introduced and the use of finite state transducers to implement spelling rules is explained.
Introduces some simple statistical techniques and illustrates their use in NLP for prediction of words and part-of-speech categories. It starts with a discussion of corpora, and then introduces word prediction. Word prediction can be seen as a way of (crudely) modeling some syntactic information (i.e., word order).
NLP with Python
DIY Corpus
Chapter 1
Introduction to NLP
People communicate in many different ways: through speaking and listening, making gestures, using specialized hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms of text.
By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed on a screen or electronic device in order to be read by their intended recipient (or by whoever happens to be passing by).
This book will focus only on the last of these: we will be concerned with various ways in which computer systems can analyze and interpret texts, and we will assume for convenience that these texts are presented in an electronic format. This is of course quite a reasonable assumption, given the huge amount of text we can access via the World Wide Web and the increasing availability of electronic versions of newspapers, novels, textbooks and indeed subject guides. This chapter introduces some essential concepts, techniques and terminology that will be applied in the rest of the course. Some material in this chapter is a little technical but no programming is involved at this stage.
We will begin by considering texts as strings of characters which can be broken up into sub-strings, and introduce some techniques for informally describing patterns of various kinds that occur in texts. Subsequently further we will begin to motivate the analysis of texts in terms of hierarchical structures in which elements of various kinds can be embedded within each other, in a comparable way to the elements that make up an HTML web document. This section introduces some technical machinery such as: finite-state machines (FSMs), regular expressions, regular grammars and context-free grammars.
Basic concepts
Tokenized text and Pattern matching
One of the more basic operations that can be applied to a text is tokenizing: breaking up a stream of characters into words, punctuation marks, numbers and other discrete items. So for example the character string
�Dr. Watson, Mr. Sherlock Holmes�, said Stamford, introducing us.
Can be tokenized as in the following example, where each token is enclosed in single quotation marks
`' `Dr.' `Watson' `,' `Mr.' `Sherlock' `Holmes' `
' `,' `said' `Stamford' `,' `introducing' `us' `.'
At this level, words have not been classified into grammatical categories and we have very little indication of syntactic structure. Still, a fair amount of information may be obtained from relatively shallow analysis of tokenized text. For example, suppose we want to develop a procedure for finding all personal names in a given text. We know that personal names always start with capital letters, but that is not enough to distinguish them from names of countries, cities, companies, racehorses and so on, or from capitalization at the start of a sentence. Some additional ways to identify personal names include
Use of a title Dr., Mr., Mrs., Miss, Professor and so on.
A capitalized word or words followed by a comma and a number, usually below 100: this is a common way of referring to people in news reports, where the number stands for their age � for example Pierre