Introduction to Natural Language Processing (NLP)
Reference https://blog.algorithmia.com/introduction-natural-language-processing-nlp/
Definition -- what does NLP do
Summary of Definitions
The field of study focuses on the intersection of human language and computers.
"NLP is a field that covers computer understanding and manipulation of human languages." -- Anthony Pesce
Related Fields: computer science, computational linguistics, artificial intelligence
How NLP is Used
Automatic Summarization
Information overload is the problem, which is the motivation. It is not only relevant for summarizing the meaning of documents, but also understanding the emotional meanings inside the information.
e.g. from blog posts and news, to extract information and to avoid redundancy from multiple sources and maximizing the diversity of content obtained.
Machine Translation
Named Entity Recognition
Relationship Extraction
Sentiment Analysis
The goal is to identify sentiments among several posts, where emotions are not explicitly expressed.
e.g. to derive opinions and purchasing decisions of a product.
Parts-of-Speech Tagging
Topic Segmentation
Text Classification To assign predefined categories to a document and to organize it to help find the information.
Question Answering text-only interface or spoken dialogue system.
important idea
"Apart from common word processor operations that treat text like merely sequence of symbols, NLP considers the hierarchy structure of language: several words make a phrase, several phrases make a sentence, ultimately sentences convey ideas." -- John Rehling
Challenges
in general
- Human language is rarely precise, or plainly spoken.
- Not just to understand words, but the concepts and how they're linked to create meaning.
- Ambiguity of language.
for various applications
Machine Translation: The challenge is not in translating words, but in understanding the true meaning of sentences to provide the true translation.
Question Answering: The challenge also lies in search engines.
Realworld Applications
Reference https://machinelearningmastery.com/applications-of-deep-learning-for-natural-language-processing/
Text Classification
The GOAL is to classify the topic or theme of a text.
Examples of Text Classification
- Sentiment analysis, where class labels represents emotional tone of the source text, i.e. positive / negative.
- Spam filtering, classifying email text as spam or not.
- Language identification, classifying the language used in the source text.
- Genre classification, classifying the genre of a fictional story.
note multi-label problem is also desirable.
Language Modeling
...the problem is to predict the next word given the previous words. The task is fundamental to speech or optical character recognition, and is also used in spelling correction, handwriting recognition and statistical machine translation.
Speech Recognition or Automatic Speech Recognition (ASR)
The task of speech recognition is to map an acoustic signal containing a spoken natural language utterance into the corresponding sequence of words intended by the user.
Input: audio file Output: human readable text
Examples of Speech Recognition
- Generating text captions of videos.
- Issuing commands while driving.
Caption Generation
Caption generation is the problem of describing contents of an image.
Machine Translation
Convert source text from one language to another language.
Document Summarization
Create short description of a text document.
Question Answering
...question answering systems try to answer a user query that is formulated in the form of a question by returning the appropriate NOUN, such as a location, time or person.
Deep Learning in NLP
Word Embedding
Reference: https://www.quora.com/What-is-word-embedding-in-deep-learning https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
Definition of Word Embedding
Building a low dimensional-vector representation from corpus of text which preserves the contextual similarity of words.
Advantageous Properties of Word Embedding
- Dimension Reduction -- a more efficient representation
- Contextual Similarity -- a more expressive representation
Train Chinese word2vec Tutorial http://zake7749.github.io/2016/08/28/word2vec-with-gensim/
Stemming and Lemmatization
Reference https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
Definition: The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a base form.
Stemming
Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes.
SSES => SS -- caresses => caress IES => I -- ponies => poni SS => SS -- caress => caress S => -- cats => cat
Useful Resource (Blog Posts, Courses, Videos etc)
word segmentation, part-of-speech tagging, named entity recognition (demonstration)
http://blog.csdn.net/u010102264/article/details/78370058
Chinese NLP Open Source Tools
https://www.zhihu.com/question/19929473 https://github.com/isnowfy/snownlp https://github.com/FudanNLP/fnlp https://github.com/crownpku/Awesome-Chinese-NLP
Curated List of NLP, How to Get Started
https://towardsdatascience.com/how-to-get-started-in-nlp-6a62aa4eaeff https://www.quora.com/How-do-I-learn-Natural-Language-Processing https://blog.ycombinator.com/how-to-get-into-natural-language-processing/