Showing posts with label IR. Show all posts
Showing posts with label IR. Show all posts

Wednesday, November 21, 2007

YIKES! or The New Information Extraction

The term information extraction may be taking on a whole new meaning to the greater world than computational linguists would have it mean. As someone working in the field of NLP, I think of information extraction as in line with the Wikipedia definition:

information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents.

But my colleague pointed out a whole new meaning to me a couple weeks ago, the day after an episode of the NBC sitcom My Name Is Earl aired (11/1/2007: Our Other Cops Is On!). Thanks to the wonders of The Internets, I managed to find a reference to the sitcom’s usage at TV Fodder.com:

Information extraction in a post-9/11 world involves delving into the nether regions of suspected terrorists....

In other words: TORTURE! The law of unintended consequences has brought the world of NLP and the so called War on Terror into sudden intersection (yes, there are "other" intersections... shhhhhhh, we don't talk about those). Perhaps the term IE is obsolete in CL anyway. Wikipedia described it as a subfield of IR. Manning & Schütze’s new book on the topic is called Introduction to Information Retrieval , not Introduction to Information Extraction. They define IR, on the link above, essentially as finding material that satisfies information needs (note: I'm not quoting directly because the book is not yet out).

Quibbling over names and labels of subfields is often entertaining, but it’s ultimately a fruitless endeavor. I defer to Manning & Schütze on all things NLP. Information Retrieval it is.

Wednesday, September 26, 2007

Intro to CL Books ...

Bob Carpenter has blogged about a new Intro to IR book online here. I'm looking forward to skimming it this weekend. I would also recommend the Python based NLTK Toolkit.

Books and resources like these are generally geared towards people with existing programming background. If a linguist with no programming skills is interested in learning some computational linguistics, Mike Hammond has written a couple of novice's intro books called Programming For Linguists. A novice would be wise to start with Hammond's books, move to the NLTK tutorials, then move on to a more serious book like Manning et al.

And if you're at all curious about what a linguist might DO once she has worked through all that wonderful material, you might could go to my own most wonderful List of Companies That Hire Computational Linguists page here.

And if you're not challenged by any of that above, I dare you to read Bob's Type-Logical Semantics. Go on, you think yer all smart and such. I dare ya! I read it the summer of 1999 with a semanticist, a logician, and a computer scientist and it made all of our heads hurt. I still have Chapter 10 nightmares.

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

 [reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...