Machine Learning Models Can Predict Persistence of Early Childhood Asthma

Machine learning modules can be trained with the use of electronic health record data to differentiate between transient and persistent cases of early childhood asthma

Machine learning modules can be trained with the use of electronic health record (EHR) data to differentiate between transient and persistent cases of early childhood asthma, according the results of an analysis published in PLoS One. Researchers conducted a retrospective cohort study using data derived from the Pediatric Big Data (PBD) resource at the Children’s Hospital of Philadelphia (CHOP) — a pediatric tertiary academic medical center located in Pennsylvania.

The researchers sought to develop machine learning modules that could be used to identify individuals who were diagnosed with asthma at aged 5 years or younger whose symptoms will continue to persist and who will thus continue to experience asthma-related visits. They trained 5 machine learning modules to distinguish between individuals without any subsequent asthma-related visits (transient asthma diagnosis) from those who did experience asthma-related visits from 5 to 10 years of age (persistent asthma diagnosis), based on clinical information available in these children up to 5 years of age.

The PBD resource used in the current study included data obtained from the CHOP Care Network — a primary care network of more than 30 sites — and from CHOP Specialty Care and Surgical Centers. The study cohort included children with an incident asthma diagnosis between 2 and 5 years of age, which was captured during an in-person health care encounter (ie, an inpatient hospital stay, an ambulatory visit, or an emergency department visit) between January 1, 2005, and December 31, 2016. To guarantee that children were not lost to follow-up, all participants needed to have had at least 1 health care visit with an International Classification of Diseases (ICD) diagnosis (which was not necessarily related to asthma) every year from 5 years until 11 years of age.

Based on study inclusion criteria, a total of 9934 children were enrolled in the current study — 8802 of whom were in the persistent asthma group and 1132 of whom were in the transient asthma group. The resultant dataset comprised approximately 89% positive instances and 11% negative instances. In the present study, a child was considered to have persistent asthma if he or she fulfilled all of these conditions: (1) initial diagnosis of asthma was between 2 and 5 years of age; (2) at least 1 additional asthma diagnosis was reported between 5 and 10 years of age; and (3) an asthma-related medication was prescribed at least 1 time at a visit that coincided with or followed the child’s initial diagnosis of asthma and that occurred after 2 years of age.

The 5 machine learning algorithms that were trained to distinguish between those with persistent asthma and those with transient asthma diagnoses were: (1) Naive Bayes, (2) Logistic Regression, (3) K-Nearest Neighbors, (4) Random Forest, and (5) Gradient Boosted Trees (XGBoost). According to negative predictive value (NPV)-specificity curves, each of the machine learning modules performed significantly better than random chance, although markedly poorer performance was observed with the Naive Bayes model. Based on average NPV-specificity area (ANSA), all of the models performed significantly better than random chance, with the best performance obtained with the XGBoost (mean ANSA, 0.43).

Feature importance analysis per the XGBoost model demonstrated that the total number of asthma-related visits and the age at last asthma diagnosis was younger than 5 years were the most clinically relevant features. Other important features identified by this model included the diagnosis of allergic rhinitis, the presence of eczema, and patients self-identifying their race as Black.

The investigators concluded that before the XGBoost model can be implemented as a clinical decision support tool, additional research is warranted to examine and improve model generalizability by adding other input features and assessing the model on external databases. Future studies designed to explore the interpretability of the models may play a key role in translating their use into clinical practice.


Bose S, Kenyon CC, Masino AJ. Personalized prediction of early childhood asthma persistence: a machine learning approach. PLoS One. 2021;16(3):e0247784. doi:10.1371/journal.pone.0247784