Panda: Knowledge Extraction and Exchange from Electronic Health Records
The increasing use of electronic health records (EHR) has allowed for an unprecedented ability to perform analysis on patient data. The rich content contained within EHR can be processed to provide a variety of services to the physician, such as, risk classification and summarization. However, accessing the unstructured textual content locked away in EHR presents several challenges. One of the foremost challenges is developing a classification model that can properly represent a patient’s status and how it correlates with potential outcomes(mortality, AKI-rifle, ICU duration, etc). As every patient’s condition is highly complex and unique, the model must be able to deal with high dimensionality and sparsity of the underlying data. Our approach to solving these challenges is to leverage medical concepts found within EHR as a method of representing a patient’s health. To further refine these medical concepts, we perform extensive feature selection to narrow down the dimensionality to only the most important contributing features. Lastly, to further reinforce every concept’s role in an outcome, we solicit physician’s feedback on the condition of the patient and the top-k contributing factors.
Knowledge extraction from medical notes
We extend an initial bag-of-words model to a bag-of-concepts model, which uses cTakes and UMLS to extract medical terms and concepts from medical notes. We also extend cTakes to improve the knowledge extraction. The medical concepts are then filtered through a feature selection step, in which we rank the importance of each medical concept and its role in outcome prediction. The weakly ranked medical concepts are then blacklisted to reduce dimensionality. We plan to further reduce sparsity using medical ontology hierarchies as a means to collapse infrequent concepts into higher frequency concepts.
Due to the inconsistencies that may arise from extraction errors or sparse data, we propose to incorporate expert knowledge as a method to improve the model further. By soliciting knowledge from experts, feature quality and classification accuracy can be boosted. As the first step, we build a web interface to present the predictor’s knowledge to the physician and gather feedback on the presented knowledge. We evaluated this interface in a pilot study with several physicians and found that predictor out performed the physicians, but also the physicians improved over time. We plan to incorporate the physicians opinions on the top-k features into the predictive model to improve its accuracy.