SVM을 이용한 한국어 절 경계 인식
Clause boundary identification in Korean text using support vector machines
- 원문 URL
The parsing of natural language mostly deals with complex sentences rather than simple ones. However, it is not easy to parse the complex sentences since there may exist syntactical ambiguity. Therefore the partial parsing have been studied to reduce the complexity of full parsing. This thesis is concerned with deeper level of partial parsing, the clause identification. We propose a method to identify the clause boundary in Korean sentences using Support Vector Machines (SVMs). Clause Identification was formulated as a classification problem in this thesis. And this system consists of two base classifiers. One is to search for ending points. The other is to search for starting points using the ending points which are predicted in earlier step. The features for identifying starting points are more extensive than those of ending points. This is due to the characteristics of Korean sentences which have SOV word order. The Experimental results show that the proposed method achieves a F-score of 86.87% and word level accuracy of 96.63%. This result can make syntactic analysis being practical and provide deeper syntactic information to many NLP applications.