본문 바로가기
HOME> 저널/프로시딩 > 저널/프로시딩 검색상세

저널/프로시딩 상세정보

권호별목차 / 소장처보기

H : 소장처정보

T : 목차정보

Journal of biomedical informatics 20건

  1. [해외논문]   Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID shared tasks Track 2   SCI SCIE

    Filannino, Michele (University at Albany, State University of New York, Albany, NY, USA ) , Stubbs, Amber (Simmons College, Boston, MA, USA ) , Uzuner, Ö (University at Albany, State University of New York, Albany, NY, USA) , zlem
    Journal of biomedical informatics v.75 suppl. ,pp. S62 - S70 , 2017 , 1532-0464 ,

    초록

    Abstract The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification. Highlights Results from 110 researchers in 24 teams and 65 submissions. The best system performs comparably to the least experienced annotator. Positive domain symptom severity classification can be tackled automatically. Systems fail when patients show both signs of negative and positive valence. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  2. [해외논문]   De-identification of clinical notes via recurrent neural network and conditional random field   SCI SCIE

    Liu, Zengjian (Corresponding author.) , Tang, Buzhou , Wang, Xiaolong , Chen, Qingcai
    Journal of biomedical informatics v.75 suppl. ,pp. S34 - S42 , 2017 , 1532-0464 ,

    초록

    Abstract De-identification, identifying information from data, such as protected health information (PHI) present in clinical data, is a critical step to enable data to be shared or published. The 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-scale and RDOC Individualized Domains (N-GRID) clinical natural language processing (NLP) challenge contains a de-identification track in de-identifying electronic medical records (EMRs) (i.e., track 1). The challenge organizers provide 1000 annotated mental health records for this track, 600 out of which are used as a training set and 400 as a test set. We develop a hybrid system for the de-identification task on the training set. Firstly, four individual subsystems, that is, a subsystem based on bidirectional LSTM (long-short term memory, a variant of recurrent neural network), a subsystem-based on bidirectional LSTM with features, a subsystem based on conditional random field (CRF) and a rule-based subsystem, are used to identify PHI instances. Then, an ensemble learning-based classifiers is deployed to combine all PHI instances predicted by above three machine learning-based subsystems. Finally, the results of the ensemble learning-based classifier and the rule-based subsystem are merged together. Experiments conducted on the official test set show that our system achieves the highest micro F1-scores of 93.07%, 91.43% and 95.23% under the “token”, “strict” and “binary token” criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge. In addition, on the dataset of 2014 i2b2 NLP challenge, our system achieves the highest micro F1-scores of 96.98%, 95.11% and 98.28% under the “token”, “strict” and “binary token” criteria respectively, outperforming other state-of-the-art systems. All these experiments prove the effectiveness of our proposed method. Highlights We propose a hybrid method based on RNN and CRF for de-identification. Extend the LSTM-based model by adding some context features to neural network. An ensemble classifier is deployed to combine results of different methods. Our system achieves a strict F1-score of 91.43% on N-GRID corpus, which ranks first. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  3. [해외논문]   Exploring associations of clinical and social parameters with violent behaviors among psychiatric patients   SCI SCIE

    Dai, Hong-Jie (Department of Computer Science and Information Engineering, National Taitung University, Taitung, Taiwan ) , Su, Emily Chia-Yu (Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan ) , Uddin, Mohy (King Abdullah International Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, Publication Office, King Abdulaziz Medical City, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia ) , Jonnagaddala, Jitendra (School of Public Health and Community Medicine, UNSW Sydney, Australia ) , Wu, Chi-Shin (Department of Psychiatry, National Taiwan University Hospital and College of Medicine, National Taiwan University, Taipei, Taiwan ) , Syed-Abdul, Shabbir (Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan)
    Journal of biomedical informatics v.75 suppl. ,pp. S149 - S159 , 2017 , 1532-0464 ,

    초록

    Abstract Evidence has revealed interesting associations of clinical and social parameters with violent behaviors of patients with psychiatric disorders. Men are more violent preceding and during hospitalization, whereas women are more violent than men throughout the 3days following a hospital admission. It has also been proven that mental disorders may be a consistent risk factor for the occurrence of violence. In order to better understand violent behaviors of patients with psychiatric disorders, it is important to investigate both the clinical symptoms and psychosocial factors that accompany violence in these patients. In this study, we utilized a dataset released by the Partners Healthcare and Neuropsychiatric Genome-scale and RDoC Individualized Domains project of Harvard Medical School to develop a unique text mining pipeline that processes unstructured clinical data in order to recognize clinical and social parameters such asage, gender, history of alcohol use, and violent behaviors, and explored the associations between these parameters and violent behaviors of patients with psychiatric disorders. The aim of our work was to demonstrate the feasibility of mining factors that are strongly associated with violent behaviors among psychiatric patients from unstructured psychiatric evaluation records using clinical text mining. Experiment results showed that stimulants, followed by a family history of violent behavior, suicidal behaviors, and financial stress were strongly associated with violent behaviors. Key aspects explicated in this paper include employing our text mining pipeline to extract clinical and social factors linked with violent behaviors, generating association rules to uncover possible associations between these factors and violent behaviors, and lastly the ranking of top rules associated with violent behaviors using statistical analysis and interpretation. Highlights Text mining can be used to explore parameters associated with violent behaviors in unstructured clinical notes. Mental disorders are a significant risk factor for the violent behavior among the patients. Stimulants and suicidal tendency were also strongly associated with the patients’ violent behavior. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  4. [해외논문]   Cover 2: Editorial Board   SCI SCIE


    Journal of biomedical informatics v.75 suppl. ,pp. IFC - IFC , 2017 , 1532-0464 ,

    초록

    Abstract Evidence has revealed interesting associations of clinical and social parameters with violent behaviors of patients with psychiatric disorders. Men are more violent preceding and during hospitalization, whereas women are more violent than men throughout the 3days following a hospital admission. It has also been proven that mental disorders may be a consistent risk factor for the occurrence of violence. In order to better understand violent behaviors of patients with psychiatric disorders, it is important to investigate both the clinical symptoms and psychosocial factors that accompany violence in these patients. In this study, we utilized a dataset released by the Partners Healthcare and Neuropsychiatric Genome-scale and RDoC Individualized Domains project of Harvard Medical School to develop a unique text mining pipeline that processes unstructured clinical data in order to recognize clinical and social parameters such asage, gender, history of alcohol use, and violent behaviors, and explored the associations between these parameters and violent behaviors of patients with psychiatric disorders. The aim of our work was to demonstrate the feasibility of mining factors that are strongly associated with violent behaviors among psychiatric patients from unstructured psychiatric evaluation records using clinical text mining. Experiment results showed that stimulants, followed by a family history of violent behavior, suicidal behaviors, and financial stress were strongly associated with violent behaviors. Key aspects explicated in this paper include employing our text mining pipeline to extract clinical and social factors linked with violent behaviors, generating association rules to uncover possible associations between these factors and violent behaviors, and lastly the ranking of top rules associated with violent behaviors using statistical analysis and interpretation. Highlights Text mining can be used to explore parameters associated with violent behaviors in unstructured clinical notes. Mental disorders are a significant risk factor for the violent behavior among the patients. Stimulants and suicidal tendency were also strongly associated with the patients’ violent behavior. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  5. [해외논문]   Automatic classification of RDoC positive valence severity with a neural network   SCI SCIE

    Clark, Cheryl (Corresponding author at: The MITRE Corporation, 202 Burlington Rd., Bedford, MA 01730, USA.) , Wellner, Ben , Davis, Rachel , Aberdeen, John , Hirschman, Lynette
    Journal of biomedical informatics v.75 suppl. ,pp. S120 - S128 , 2017 , 1532-0464 ,

    초록

    Abstract Objective Our objective was to develop a machine learning-based system to determine the severity of Positive Valance symptoms for a patient, based on information included in their initial psychiatric evaluation. Severity was rated on an ordinal scale of 0–3 as follows: 0 ( absent =no sy no symptoms), 1 ( mild =mod modest significance), 2 ( moderate =require requires treatment), 3 ( severe =cause causes substantial impairment) by experts. Materials and methods We treated the task of assigning Positive Valence severity as a text classification problem. During development, we experimented with regularized multinomial logistic regression classifiers, gradient boosted trees, and feedforward, fully-connected neural networks. We found both regularization and feature selection via mutual information to be very important in preventing models from overfitting the data. Our best configuration was a neural network with three fully connected hidden layers with rectified linear unit activations. Results Our best performing system achieved a score of 77.86%. The evaluation metric is an inverse normalization of the Mean Absolute Error presented as a percentage number between 0 and 100, where 100 means the highest performance. Error analysis showed that 90% of the system errors involved neighboring severity categories. Conclusion Machine learning text classification techniques with feature selection can be trained to recognize broad differences in Positive Valence symptom severity with a modest amount of training data (in this case 600 documents, 167 of which were unannotated). An increase in the amount of annotated data can increase accuracy of symptom severity classification by several percentage points. Additional features and/or a larger training corpus may further improve accuracy. Highlights We trained a machine learning-based system to determine psychiatric symptom severity. Regularization and feature selection via mutual information reduced overfitting. Increasing the amount of annotated data increased accuracy by several percent. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  6. [해외논문]   Predictive modeling for classification of positive valence system symptom severity from initial psychiatric evaluation records   SCI SCIE

    Posada, Jose D. (Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd., Pittsburgh, PA 15206, United States ) , Barda, Amie J. (Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd., Pittsburgh, PA 15206, United States ) , Shi, Lingyun (Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd., Pittsburgh, PA 15206, United States ) , Xue, Diyang (Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd., Pittsburgh, PA 15206, United States ) , Ruiz, Victor (Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd., Pittsburgh, PA 15206, United States ) , Kuan, Pei-Han (Institute of Manufacturing Information and System, National Cheng-Kung University, Tainan, Taiwan ) , Ryan, Neal D. (Department of Psychiatry, University of Pittsburgh, 3811 O'Hara St., Pittsburgh, PA 15213, United States ) , Tsui, Fuchiang (Rich) (Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd., Pittsburgh, PA 15206, United States)
    Journal of biomedical informatics v.75 suppl. ,pp. S94 - S104 , 2017 , 1532-0464 ,

    초록

    Abstract In response to the challenges set forth by the CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing , we describe a framework to automatically classify initial psychiatric evaluation records to one of four positive valence system severities: absent, mild, moderate, or severe. We used a dataset provided by the event organizers to develop a framework comprised of natural language processing (NLP) modules and 3 predictive models (two decision tree models and one Bayesian network model) used in the competition. We also developed two additional predictive models for comparison purpose. To evaluate our framework, we employed a blind test dataset provided by the 2016 CEGS N-GRID. The predictive scores, measured by the macro averaged-inverse normalized mean absolute error score, from the two decision trees and NaIve Bayes models were 82.56%, 82.18%, and 80.56%, respectively. The proposed framework in this paper can potentially be applied to other predictive tasks for processing initial psychiatric evaluation records, such as predicting 30-day psychiatric readmissions. Highlights Proposed a method to automatically classify symptom severity in psychiatric reports. Question-answers from reports are the most important source of information. Best predictive models automatically selected features prevalent in literature. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  7. [해외논문]   Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes   SCI SCIE

    Dehghan, Azad (School of Computer Science, University of Manchester, Manchester, UK ) , Kovacevic, Aleksandar (Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia ) , Karystianis, George (Macquarie University, Australian Institute of Health Innovation, Australia ) , Keane, John A (School of Computer Science, University of Manchester, Manchester, UK ) , Nenadic, Goran (School of Computer Science, University of Manchester, Manchester, UK)
    Journal of biomedical informatics v.75 suppl. ,pp. S28 - S33 , 2017 , 1532-0464 ,

    초록

    Abstract De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F 1 -scores of ∼90% and above. Yet, some classes ( Profession , Organization ) proved again to be challenging given the variability of expressions used to reference given information. Highlights We present machine-learning methods for automatic de-identification of clinical narratives. We propose and validate a two-pass tagging method to improve entity recognition on non-longitudinal clinical narratives. The methods are validated on a set of psychiatric evaluation notes. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  8. [해외논문]   De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1   SCI SCIE

    Stubbs, Amber (Simmons College, School of Library and Information Science, 300 The Fenway, Boston, MA 02115, United States ) , Filannino, Michele (University at Albany, United States ) , Uzuner, Ö (University at Albany, United States) , zlem
    Journal of biomedical informatics v.75 suppl. ,pp. S4 - S18 , 2017 , 1532-0464 ,

    초록

    Abstract The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a “sight unseen” task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP. Highlights NLP shared task with new set of 1000 de-identified psychiatric records. “Sight-unseen” task: top F1 of 0.799 using out-of-the-box system on new data. “Standard task: top F1 of 0.914 on test data after 2months of development. Hybrid systems most effective, but often missed PHI requiring world knowledge or context. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  9. [해외논문]   The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge   SCI SCIE

    Bui, Duy Duc An (Corresponding author.) , Wyatt, Mathew , Cimino, James J.
    Journal of biomedical informatics v.75 suppl. ,pp. S54 - S61 , 2017 , 1532-0464 ,

    초록

    Abstract Clinical narratives (the text notes found in patients’ medical records) are important information sources for secondary use in research. However, in order to protect patient privacy, they must be de-identified prior to use. Manual de-identification is considered to be the gold standard approach but is tedious, expensive, slow, and impractical for use with large-scale clinical data. Automated or semi-automated de-identification using computer algorithms is a potentially promising alternative. The Informatics Institute of the University of Alabama at Birmingham is applying de-identification to clinical data drawn from the UAB hospital’s electronic medical records system before releasing them for research. We participated in a shared task challenge by the Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDoC Individualized Domains (N-GRID) at the de-identification regular track to gain experience developing our own automatic de-identification tool. We focused on the popular and successful methods from previous challenges: rule-based, dictionary-matching, and machine-learning approaches. We also explored new techniques such as disambiguation rules, term ambiguity measurement, and used multi-pass sieve framework at a micro level. For the challenge’s primary measure (strict entity), our submissions achieved competitive results (f-measures: 87.3%, 87.1%, and 86.7%). For our preferred measure (binary token HIPAA), our submissions achieved superior results (f-measures: 93.7%, 93.6%, and 93%). With those encouraging results, we gain the confidence to improve and use the tool for the real de-identification task at the UAB Informatics Institute. Highlights We described an automatic de-identification (de-id) system for clinical texts. We used three de-id methods: pattern-matching, dictionary-matching, and machine-learning. Dictionary-matching with disambiguation remained a useful de-id approach. We also explored multi-pass sieve framework, term ambiguity measurement and disambiguation rule. The system achieved competitive results in CEGS N-GRID 2016 challenge, de-id regular track. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지
  10. [해외논문]   Cover 1/Spine   SCI SCIE


    Journal of biomedical informatics v.75 suppl. ,pp. OFC - OFC , 2017 , 1532-0464 ,

    초록

    Abstract Clinical narratives (the text notes found in patients’ medical records) are important information sources for secondary use in research. However, in order to protect patient privacy, they must be de-identified prior to use. Manual de-identification is considered to be the gold standard approach but is tedious, expensive, slow, and impractical for use with large-scale clinical data. Automated or semi-automated de-identification using computer algorithms is a potentially promising alternative. The Informatics Institute of the University of Alabama at Birmingham is applying de-identification to clinical data drawn from the UAB hospital’s electronic medical records system before releasing them for research. We participated in a shared task challenge by the Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDoC Individualized Domains (N-GRID) at the de-identification regular track to gain experience developing our own automatic de-identification tool. We focused on the popular and successful methods from previous challenges: rule-based, dictionary-matching, and machine-learning approaches. We also explored new techniques such as disambiguation rules, term ambiguity measurement, and used multi-pass sieve framework at a micro level. For the challenge’s primary measure (strict entity), our submissions achieved competitive results (f-measures: 87.3%, 87.1%, and 86.7%). For our preferred measure (binary token HIPAA), our submissions achieved superior results (f-measures: 93.7%, 93.6%, and 93%). With those encouraging results, we gain the confidence to improve and use the tool for the real de-identification task at the UAB Informatics Institute. Highlights We described an automatic de-identification (de-id) system for clinical texts. We used three de-id methods: pattern-matching, dictionary-matching, and machine-learning. Dictionary-matching with disambiguation remained a useful de-id approach. We also explored multi-pass sieve framework, term ambiguity measurement and disambiguation rule. The system achieved competitive results in CEGS N-GRID 2016 challenge, de-id regular track. Graphical abstract [DISPLAY OMISSION]

    원문보기

    원문보기
    무료다운로드 유료다운로드

    회원님의 원문열람 권한에 따라 열람이 불가능 할 수 있으며 권한이 없는 경우 해당 사이트의 정책에 따라 회원가입 및 유료구매가 필요할 수 있습니다.이동하는 사이트에서의 모든 정보이용은 NDSL과 무관합니다.

    NDSL에서는 해당 원문을 복사서비스하고 있습니다. 아래의 원문복사신청 또는 장바구니담기를 통하여 원문복사서비스 이용이 가능합니다.

    이미지

    Fig. 1 이미지

논문관련 이미지