Matching parse thickets for open domain question answering
Traditional parse trees are combined together and enriched with anaphora and rhetoric information to form a unified representation for a paragraph of text. We refer to these representations as parse thickets. They are introduced to support answering complex questions, which include multiple sentences, to tackle as many constraints expressed in this question as possible. The question answering system is designed so that an initial set of answers, which is obtained by a TF*IDF or other keyword search model, is re-ranked. Passage re-ranking is performed using matching of the parse thickets of answers with the parse thicket of the question. To do that, a graph representation and matching technique for parse structures for paragraphs of text have been developed. We define the operation of generalization of two parse thickets as a measure of the distance between paragraphs of text to be the maximal common sub-graph of these parse thickets. A partial case of parse thickets, a rhetoric map of an answer, allows leveraging discourse for relevance in a rule-based manner. Passage re-ranking improvement via parse thickets is evaluated in a variety of search domains with long questions. Using parse thickets improves search accuracy compared with the bag-of words, the pairwise matching of parse trees for sentences, and the tree kernel approaches. As a baseline, we use a web search engine API, which provides much more accurate search results than the majority of search benchmarks, such as TREC. A comparative analysis of the impact of various sources of discourse information on the search accuracy is conducted. An open source plug-in for SOLR is developed so that the proposed technology can be easily integrated with industrial search engines.
유료 다운로드의 경우 해당 사이트의 정책에 따라 신규 회원가입, 로그인, 유료 구매 등이 필요할 수 있습니다. 해당 사이트에서 발생하는 귀하의 모든 정보활동은 NDSL의 서비스 정책과 무관합니다.
NDSL에서는 해당 원문을 복사서비스하고 있습니다. 위의 원문복사신청 또는 장바구니 담기를 통하여 원문복사서비스 이용이 가능합니다.
- 이 논문과 함께 출판된 논문 + 더보기