Smart (?) Question Answering System

keywords: Natural Language Processing (NLP),Information Retrieval (IR) 信息检索,Question Answering (QA) 问答系统

摘要的重要方面:问题描述,特点(问答系统与信息检索系统的区别),涉及学科领域(知识表示、信息检索、自然语言处理、智能推理等),问答系统的分类,系统构成(问题解决流程),流程中关键技术,技术的分类

Different Models Used in Question Answering System

Reference: http://zake7749.github.io/2016/12/17/how-to-develop-chatbot/

  • Rule-based model (样板式模型)
  • Retrieval-based model (检索式模型)
  • Generative mode (生成式模型)

Rule-based Model

Rules are designed to initiate conversation.

if 'weather' in user.query:
    chatbot.say('What a lovely weather today!')

Retrieval-based Model

Questions = q1, q2, ..., qn
Answers = a1, a2, ..., an

Idea

Compare the asked question to all questions in Questions and calculate similarity, propose the one which has the most resemblance, e.g. qk

Then fetch answer ak.

Where Machine Learning Comes in

The idea is to map question qk to a topic (主题).

training set(text, label)

('Will Lebron James leave Cleveland this summer and fly to the Bay?', 'Sports')
('Call Me By Your Name won't win the Academy, but a brilliant and beautiful movie by all means', 'Film')

For example, 当用户说“今天洋基队的比分是多少”,我们会知道问题的意图是问体育赛事。此外,这个问句里面有“洋基队”,“今天”,“比分”这三个特征

Classification --> Machine Learning Feature Extraction --> Named Entity Recognition (NER)

Representation of Words & Sentences

word2vec: cosine similarity between word vectors

sentence2vec: cosine similarity between sentence vectors

note We can feed word vectors or sentence vectors into LSTM / RNN to do classification and sequence labeling problem. Turn word / sentence vectors into images and see what can be done. Try CNN NLP in Google.

Workflow

Explanation Version 1

Reference: https://www.quora.com/How-do-I-make-a-natural-language-processing-question-answering-system-What-software-or-toolkits-can-be-used

  1. Preprocess question

    parse, part-of-speech tagging, named entity recognition

  2. Question analysis component (question classification)

Features:

- bag-of-words
- bag-of-ngrams

Machine learning algorithms:

- Naive Bayes
- Decision Tree
- Support Vector Machine
- Artificial Neural Network
- K-Nearest Neighbor

Explanation Version 2

Reference: 统计自然语言处理 第二版 宗成庆 第十四章

系统构成

用户提问 ==> 提问处理模块 ==> 检索模块 ==> 答案抽取模块 ==> 问题答案

Techniques & Algorithms Used in Sub-tasks

Reference: https://github.com/rockingdingo/deepnlp

Word Segmentation

Linear Chine CRF (conditional-random-field)

POS (Parts-of-speech Tagging)

LSTM / Bi-LSTM / LSTM-CRF network

NER (Named Entity Recognition)

LSTM / Bi-LSTM / LSTM-CRF network

Parse

Arc-standard System with Feed-forward Neural Network

Automatic Summarization Seq2Seq-Attention

Seq2Seq with Attention Mechanism

Question Answering Projects Online

Face bAbI dataset https://smerity.com/articles/2015/keras_qa.html https://github.com/Smerity/keras_qa

Useful Resources

Blog Posts

Blog Post: Text-based question answering https://deeplearn.school.blog/2017/01/07/text-based-question-answering-system/

Blog Post: Dynamic Coattention Network https://einstein.ai/research/state-of-the-art-deep-learning-model-for-question-answering

Academic Papers

Paper: Word 2 Vector https://arxiv.org/abs/1301.3781

Open Source Tools & Projects

Open Source Tool: gensim (Topic Modeling) https://github.com/RaRe-Technologies/gensim

Open Source Tool: spaCy https://github.com/explosion/spaCy

Open Source Project: Chatbot (基于向量匹配的情景式聊天机器人) https://github.com/zake7749/Chatbot

Open Source Project: Seq2Seq_Chatbot_QA (Tensorflow, Seq2Seq) https://github.com/qhduan/Seq2Seq_Chatbot_QA

Open Source Project: QA https://github.com/S-H-Y-GitHub/QA

results matching ""

    No results matching ""