한영번역기의 번역률 개선을 위한 복합명사의 효율적 분해
(An) Efficient segmentation of compound noun for improving the translation ratio of Korean-English translator
한영번역기 번역률개선 복합명사 전자계산학;
- 원문 URL
In the field of the machine translation, it is imperative that we should have dictionary information to process the natural language. In this case, a compound noun which does not exist in the dictionary greatly influences the performance of the translator. Especially, the Korean compound, because of the concatenation of Hangul system, makes it fail to consult the independent noun by the dictionary. In this dissertation, an algorithm which utilizes the preference ratio and the segmentation algorithm which uses prefix and suffix rules of the compound nouns is suggested and experimented. For the segmentation of compound noun, the independent noun dictionary which is made up of nouns of more than two-word syllable is used. And the prefix and suffix dictionary is used to process them. The segmentation ratio was 99.3% when the compound nouns which are drawn out from the Korean dictionary, encyclopedia and internet search were experimented. The suggested algorithm has gained some better result than the other segmentation algorithms. The characteristics of the final segmentation-failed compound nouns include proper nouns or special characters such as hyphen and parenthesis. It was rare to fail by typing error or word distinction. For the segmentation-failed compound nouns, there has been carried out error correction by the ratio of 34.01% when it utilized proper noun guessing algorithm and grammar error correction algorithm. When the unregistered compound nouns before error correction are analyzed utilizing preference information, the 88.49% of segmentation ratio was achieved; then, when the compound noun segmentation algorithm utilizing compound nouns after their error correction is experimented, the result was 89.92% and the ratio was improved. The longer the length of syllables, however, the lower the segmentation ratio. And the ratio was very low when the compound noun includes foreign words or more than two unregistered words. Accordingly, it is additionally requested to study how to increase the segmentation ratio of compound noun.