본문 바로가기
HOME> 논문 > 논문 검색상세

학위논문 상세정보

Semi-supervised learning using frequent itemset and ensemble learning for SMS classification 원문보기

  • 저자

    Ishtiaq Ahmed

  • 학위수여기관

    경희대학교

  • 학위구분

    국내석사

  • 학과

    컴퓨터공학과

  • 지도교수

  • 발행년도

    2014

  • 총페이지

    77 p.

  • 키워드

  • 언어

    eng

  • 원문 URL

    http://www.riss.kr/link?id=T13536662&outLink=K  

  • 초록

    Short Message Service (SMS) has become one of the most important media of communications due to the rapid increase of mobile users and it's easy to use operating mechanism. This flood of SMS goes with the problem of spam SMS that are generated by spurious users. The detection of spam SMS has got more attention of the researchers in recent era and is thus treated with a number of different machine learning approaches. Supervised machine learning approaches used so far, demands for large amount of labeled data which is not always available in real applications. Similarly, the traditional semi-supervised methods can alleviate this problem, but if they are provided with only positive and unlabeled data, they may not produce good results. In this thesis, we have proposed a novel semi-supervised learning method which makes use of frequent itemset and ensemble learning FIEL to overcome these limitations. In this approach, Apriori algorithm has been used for finding the frequent itemset while decision tree, Naive Bayes and SVM are used as base learners for ensemble learning which uses majority voting scheme. Our proposed approach can work well in situation with only small number of positive dataset and different amount of unlabeled dataset with higher accuracy. Extensive experiments have been conducted over UCI SMS Spam Collection Data Set which show significant improvements in accuracy with very small amount of positive data. We have compared our proposed FIEL approach with the existing SPY-EM and PEBL approaches and the results shows that our approach is more stable than the compared approaches under minimum support and relative minimum support.


 활용도 분석

  • 상세보기

    amChart 영역
  • 원문보기

    amChart 영역