Cancer Institute A national cancer institute
designated cancer center

Henry J. Lowe, MD

Publication Details

  • Discretization of Continuous Features in Clinical Datasets

    Maslove DM, Podchiyska T, Lowe HJ. J. AM Inform Assoc (In Press).. 2012

    Objective: The increasing availability of clinical data from electronic medical records (EMR) has created opportunities for secondary uses of health information. When used in machine learning classification, many data features must first be transformed by means of discretization. We evaluated 6 discretization strategies, both supervised and unsupervised, using EMR data. Materials and Methods: We classified laboratory data (arterial blood gas [ABG] measurements) and physiologic data (cardiac output [CO] measurements) derived from adult Intensive Care Unit patients using decision trees and naïve Bayes classifiers. Continuous features were partitioned using two supervised, and 4 unsupervised discretization strategies. The resulting classification accuracy was compared to that obtained with the original, continuous data. Results: Supervised methods were more accurate and consistent than unsupervised, but tended to produce larger decision trees. Amongst the unsupervised methods, equal frequency and k-means performed well overall, while equal width was significantly less accurate. Discussion: This is, we believe, the first dedicated evaluation of discretization strategies using EMR data. It is unlikely that any one discretization method applies universally to EMR data. Performance was influenced by the choice of class labels, and in the case of unsupervised methods, the number of intervals. In selecting the number of intervals there is generally a trade-off between greater accuracy, and greater consistency. Conclusion: In general, supervised methods yield higher accuracy, but are constrained to a single specific application. Unsupervised methods do not require class labels, and can produce discretized data that can be used for multiple purposes.

Stanford Medicine Resources:

Footer Links: