QA with Phrase Retrieval

NLP/MRC 2021. 10. 18. 14:46

Limitation of Retriever-Reader approach

Error Propagation: 5-10개의 문서만 reader에게 전달됨
Query-Dependent encoding: query에 따라 정답이 되는 answer span에 대한 encoding이 달라짐

Phrase Search

기존의 방식은 question이 들어올 때마다 F라는 function을 다시 계산했어야한다.

Decomposiability Gap: 기존 Question, Passage, Answer가 모두 함께 encoding 되던 것이 G와 H로 나누어지지 않을 수 있다.

→ question과 passage 사이 attention x

Dense Vector vs Sparse Vector

Dense vector: 통사적, 의미적 정보를 담는데 효과적
Sparse Vector: 어휘적 정보를 담는데 효과적

Dense + Sparse vector를 합쳐서 임베딩

Dense Representation

Dense Vector
- Pre-Trained LM (e.g. BERT)를 이용
- Start, end vector를 재사용

Coherency Vector
- phrase가 한 단위의 문장 구성 요소에 해당하는지를 나타냄
- 구(句)를 형성하지 않는 phrase를 걸러내기 위해 사용함
- Start vector와 end vector를 이용하여 계산

Question Embedding

Sparse Representation

https://arxiv.org/abs/1911.02896

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models. In this paper, we a

arxiv.org

https://arxiv.org/abs/1906.05807

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of documen

arxiv.org

'NLP > MRC' 카테고리의 다른 글

Closed book QA with T5 (0)	2021.10.18
Reducing Bias (0)	2021.10.18
Open Domain Question Answering (0)	2021.10.13
Passage Retrieval - Scaling Up (0)	2021.10.13
Dense Embedding (0)	2021.10.13

ABOUT ME

꾸준히 꾸준히

'NLP > MRC' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'NLP > MRC' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바