Introduction to Natural Language Processing (NLP)

Reference https://blog.algorithmia.com/introduction-natural-language-processing-nlp/

Definition -- what does NLP do

Summary of Definitions

  • The field of study focuses on the intersection of human language and computers.

  • "NLP is a field that covers computer understanding and manipulation of human languages." -- Anthony Pesce

Related Fields: computer science, computational linguistics, artificial intelligence

How NLP is Used

  • Automatic Summarization

    Information overload is the problem, which is the motivation. It is not only relevant for summarizing the meaning of documents, but also understanding the emotional meanings inside the information.

    e.g. from blog posts and news, to extract information and to avoid redundancy from multiple sources and maximizing the diversity of content obtained.

  • Machine Translation

  • Named Entity Recognition

  • Relationship Extraction

  • Sentiment Analysis

    The goal is to identify sentiments among several posts, where emotions are not explicitly expressed.

    e.g. to derive opinions and purchasing decisions of a product.

  • Parts-of-Speech Tagging

  • Topic Segmentation

  • Text Classification To assign predefined categories to a document and to organize it to help find the information.

  • Question Answering text-only interface or spoken dialogue system.

    important idea

    "Apart from common word processor operations that treat text like merely sequence of symbols, NLP considers the hierarchy structure of language: several words make a phrase, several phrases make a sentence, ultimately sentences convey ideas." -- John Rehling

Challenges

in general

  • Human language is rarely precise, or plainly spoken.
  • Not just to understand words, but the concepts and how they're linked to create meaning.
  • Ambiguity of language.

for various applications

  • Machine Translation: The challenge is not in translating words, but in understanding the true meaning of sentences to provide the true translation.

  • Question Answering: The challenge also lies in search engines.

Realworld Applications

Reference https://machinelearningmastery.com/applications-of-deep-learning-for-natural-language-processing/

Text Classification

The GOAL is to classify the topic or theme of a text.

Examples of Text Classification

  • Sentiment analysis, where class labels represents emotional tone of the source text, i.e. positive / negative.
  • Spam filtering, classifying email text as spam or not.
  • Language identification, classifying the language used in the source text.
  • Genre classification, classifying the genre of a fictional story.

note multi-label problem is also desirable.

Language Modeling

...the problem is to predict the next word given the previous words. The task is fundamental to speech or optical character recognition, and is also used in spelling correction, handwriting recognition and statistical machine translation.

Speech Recognition or Automatic Speech Recognition (ASR)

The task of speech recognition is to map an acoustic signal containing a spoken natural language utterance into the corresponding sequence of words intended by the user.

Input: audio file Output: human readable text

Examples of Speech Recognition

  • Generating text captions of videos.
  • Issuing commands while driving.

Caption Generation

Caption generation is the problem of describing contents of an image.

Machine Translation

Convert source text from one language to another language.

Document Summarization

Create short description of a text document.

Question Answering

...question answering systems try to answer a user query that is formulated in the form of a question by returning the appropriate NOUN, such as a location, time or person.

Deep Learning in NLP

Word Embedding

Reference: https://www.quora.com/What-is-word-embedding-in-deep-learning https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

Definition of Word Embedding

Building a low dimensional-vector representation from corpus of text which preserves the contextual similarity of words.

Advantageous Properties of Word Embedding

  • Dimension Reduction -- a more efficient representation
  • Contextual Similarity -- a more expressive representation

Train Chinese word2vec Tutorial http://zake7749.github.io/2016/08/28/word2vec-with-gensim/

Stemming and Lemmatization

Reference https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

Definition: The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a base form.

Stemming

Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes.

SSES => SS -- caresses => caress IES => I -- ponies => poni SS => SS -- caress => caress S => -- cats => cat

Useful Resource (Blog Posts, Courses, Videos etc)

word segmentation, part-of-speech tagging, named entity recognition (demonstration)

http://blog.csdn.net/u010102264/article/details/78370058

Chinese NLP Open Source Tools

https://www.zhihu.com/question/19929473 https://github.com/isnowfy/snownlp https://github.com/FudanNLP/fnlp https://github.com/crownpku/Awesome-Chinese-NLP

Curated List of NLP, How to Get Started

https://towardsdatascience.com/how-to-get-started-in-nlp-6a62aa4eaeff https://www.quora.com/How-do-I-learn-Natural-Language-Processing https://blog.ycombinator.com/how-to-get-into-natural-language-processing/

results matching ""

    No results matching ""