본문 바로가기
HOME> 논문 > 논문 검색상세

학위논문 상세정보

최신 개체명의 번역을 위한 효율적인 그래프 기반 방법론 : An Efficient Graph-based Approach for Translating Emerging Named Entity 원문보기

  • 저자

    김진한

  • 학위수여기관

    포항공과대학교 일반대학원

  • 학위구분

    국내박사

  • 학과

    컴퓨터공학과 데이버 베이스 및 정보 검색

  • 지도교수

    황승원

  • 발행년도

    2014

  • 총페이지

    106

  • 키워드

    데이터 분석 개체명 번역 그래프 방법론;

  • 언어

    eng

  • 원문 URL

    http://www.riss.kr/link?id=T13533523&outLink=K  

  • 초록

    Named Entities (NEs) normally refer to a range of concepts such as people names, location names, organization names, and product names. As large quantities of new named entities (or emerging named entities) appear everyday in newspaper, web sites, and TV programs, NE analysis becomes more and more important in data mining and information retrieval society. Information on NEs can be extracted from (a) structured sources such as databases and tables, (b) semi-structured sources such as knowledge bases (or, called interchangeably as ontologies), or (c) unstructured sources such as text corpora. Among many research topics related with NE analysis such as ontology integration, named entity linking, and named entity translation, this dissertation addresses the problem of mining NE translations from comparable corpora, specifically, mining English and Chinese NE translation. I observe that existing approaches use one or more of the following NE similarity metrics: entity name similarity, entity context similarity, and entity relationship similarity. Motivated by this observation, this dissertation proposes a new holistic approach, by (1) combining all similarity types used and (2) additionally considering a new similarity measure, relationship context similarity between pairs of NEs, which is a missing quadrant in the taxonomy of similarity metrics. I abstract the NE translation problem as the matching of two NE graphs extracted from the comparable corpora. Specifically, two monolingual NE graphs are first constructed from comparable corpora to extract relationship between NEs. Entity name similarity and entity context similarity are then calculated from every pair of bilingual NEs for computing initial pairwise NE similarity. A reinforcing method is utilized to reflect relationship similarity and relationship context similarity between NEs. I also discover corpus “latent” features lost in the graph extraction process and integrate them into proposed framework, and improve relationship-based similarities by overcoming asymmetry of comparable corpora and considering other types of NEs. According to the experimental results, proposed holistic graph-based approaches and its enhancements are highly effective and proposed framework significantly outperforms previous state-of-the-art approaches.


 활용도 분석

  • 상세보기

    amChart 영역
  • 원문보기

    amChart 영역