Rahimi Research Lab

Inventory Papers

Paper 1

Paper Title: Type 2 diabetes screening test by means of a pulse oximeter

Authors or developers

Moreno, E.
Lujan, M. J.
Anyo Lujan, M.
Torrres Rusinol, M.
Juarez Fernandez, P.
Nunez Manrique, P.
Aragon Trivino, C.
Miquel, M.
Rodriguez, M.
Gonzalez Burguillos, M. J.

Year of Publication

2016

Full reference of the study

Moreno, Enrique Monte, et al. “Type 2 diabetes screening test by means of a pulse oximeter.” IEEE Transactions on Biomedical Engineering 64.2 (2016): 341-351.

Abstract

In this paper, we propose a method for screening for the presence of type 2 diabetes by means of the signal obtained from a pulse oximeter. The screening system consists of two parts; the first analyses the signal obtained from the pulse oximeter, and the second consists of a machine-learning module. The system consists of a front end that extracts a set of features from the pulse oximeter signal. These features are based on physiological considerations. The set of features were the input of a machine-learning algorithm that determined the class of the input sample, i.e. whether the subject had diabetes or not. The machine-learning algorithms were random forests, gradient boosting, and linear discriminant analysis as benchmark. The system was tested on a database of one; 157 subjects (two samples per subject) collected from five community health centres. The mean receiver operating characteristic (ROC) area found was 69:4% (median value 71:9% and range [75:4%?61:1%]), with a specificity=64% for a threshold that gave a sensitivity=65%. We present a screening method for detecting diabetes that has a performance comparable to the glycated haemoglobin (haemoglobin A1c HbA1c) test, do not require blood extraction, and yields results in less than five minutes.

Country of Research

Spain

Design of Study

Screening trial

Duration of Study

Not specified,(spring of 2013)

Name of Condition

Type 2 Diabetes

Artificial Intelligence Technique Used

Random forest, gradient boosting

Provider’s involvement in

Developing

Accuracy of the AI Intervention

Not specified

Patient-oriented Outcomes Assessed

Screening method for detecting diabetes that has a performance comparable to the glycated haemoglobin (haemoglobin A1c HbA1c) test, does not require blood extraction, and yields results in less than five minutes.

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Not specified

Maintenance

Key Conclusions

This work has presented a method of screening and diagnosis of diabetes based on the signal obtained from a photoplethysmogram [PPG]. One of the advantages of this method is that it is a fast test, the results can be obtained in two or three minutes, and the price is low because it needs only a PPG sensor and the computation can be done using either a low-cost computer or a smart phone. The procedure consists of obtaining a sample of the pulse oximeter signal of one minute in duration, which does not require qualified personnel, and the whole process (including the computation of the results) takes less than five minutes (including possible repetitions of the measure). This is in contrast to plasma glucose measurements or glycated haemoglobin tests, which require the extraction of a blood sample and laboratory measurements.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	?

Color Code

Low

Unclear

High

Paper 2

Paper Title: ProPath - A guideline based software for the implementation into the medical environment

Authors or developers	S. Klausner K. Entacher S. Kranzer A. Sönnichsen M. Flamm G. Fritsch
Year of Publication	2014
Full reference of the study	S. Klausner, K. Entacher, S. Kranzer, A. Snnichsen, M. Flamm and G. Fritsch, “ProPath – A guideline based software for the implementation into the medical environment,” 2014 IEEE Canada International Humanitarian Technology Conference – (IHTC), Montreal, QC, 2014, pp. 1-6, doi: 10.1109/IHTC.2014.7147551.
Abstract	Over the last decades, the amount of medical information has been growing rapidly. Online platforms such as patients’ and doctors’ blogs, forums and medical databases are widely and easily accessible to medical professionals as well as to the public. However, researching, filtering and evaluating the quality of these often-overwhelming amounts of data remain a challenge. Moreover, existing guidelines in the medical context are extensive and hardly applicable in the clinical context since reading and translation into clinical practice is time consuming. Due to growing critical awareness among patients towards their medical treatment, there is an increased demand from internists, general practitioners, and other specialists, to explain medical conditions, treatment options and procedures in a more comprehensive fashion. In addition, this discussion should be supported by the current state of clinical research. Expert systems could provide valuable support to fulfill these needs. Initial prototypes of expert systems in the inpatient arena were already implemented in the 1960’s in the context of clinical trials. The main goal of these systems was to improve medical care by assisting in the medical decision process. However, most of these systems did not remain in clinical practice for a prolonged period of time. In most cases, the user interface of the software was too complex for daily use. Appropriate application and a detailed insight into these systems require a lot of handbook knowledge. Therefore, the initial hurdles for the integration of software into specific clinical application, faced by the potential users, were too cumbersome. The main purpose of the project Pro-Path was to eliminate these issues and at the same time provide optimal clinical practice for the health care system in a variety of medical topics. Both in the outpatient and inpatient scenario, there is an increasing demand to support communication and to improve the distribution of – published knowledge and the application of practical experiences within the medical field. The main challenge to achieve that objective is to design an intuitive, user-friendly software product that can be integrated into the current standard network environments. An example of successful implementation of a medical information system into clinical practice is the PROP system [4]. It is a medical decision support system, which has been designed, developed and implemented in Austria in the course of Reform pool project, in order to optimize the preoperative process. Since 2008, it is applied by general practitioners, pediatricians, clinicians and internists, in the state of Salzburg and was externally evaluated by the Paracelsus Medical University (PMU) in Salzburg. This paper provides an overview on how acquired knowledge can be utilized to reduce the complexity of designing and implementing clinical pathways (ProPath), supported by medical information or expert systems. Finally, statistical results evaluating PROP user-behavior are described.
Country of Research	Austria
Design of Study	Unclear (Not mentioned)
Duration of Study	2 years
Name of Condition	Not applicable
Artificial Intelligence Technique Used	Expert system (ProPath)
Provider’s involvement in	Developing : Not specified,Testing : Not specified,Validating : Not specified
Accuracy of the AI Intervention	Not specified
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Integration of Information and Communication Technology (lCT) Systems into a common network environment can improve medical care
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 14986 visits in outpatient department
Implementation	Not specified
Maintenance	Yes
Key Conclusions	The expert system provides increased communication between inpatient and outpatient medical professionals. Moreover, the system can translate medical information in a testing proposal for each individual patient.

Paper 3

Paper Title: Tackling Missing Data in Community Health Studies Using Additive LS-SVM Classifier

Authors or developers	G. Wang Z. Deng K. S. Choi
Year of Publication	2018
Full reference of the study	Tackling missing data in community health studies using additive LS-SVM classifier G Wang, Z Deng, KS Choi – IEEE journal of biomedical and health informatics, 2016
Abstract	Missing data is a common issue in community health and epidemiological studies. Direct removal of samples with missing data can lead to reduced sample size and information bias, which deteriorate the significance of the results. While data imputation methods are available to deal with missing data, they are limited in performance and could introduce noise into the dataset. Instead of data imputation, a novel method based on additive least square support vector machine (LS-SVM) is proposed in this paper for predictive modeling when the input features of the model contain missing data. The method also determines simultaneously the influence of the features with missing values on the classification accuracy using the fast leave-one-out cross-validation strategy. The performance of the method is evaluated by applying it to predict the quality of life (QOL) of elderly people using health data collected in the community. The dataset involves demographics, socioeconomic status, health history, and the outcomes of health assessments of 444 community-dwelling elderly people, with 5% to 60% of data missing in some of the input features. The QOL is measured using a standard questionnaire of the World Health Organization. Results show that the proposed method outperforms four conventional methods for handling missing data-case deletion, feature deletion, mean imputation, and K-nearest neighbor imputation, with the average QOL prediction accuracy reaching 0.7418. It is potentially a promising technique for tackling missing data in community health research and other applications.
Country of Research	China
Design of Study	Unclear
Duration of Study	1 year
Name of Condition	Not specified,(Addressing missing data)
Artificial Intelligence Technique Used	LS-SVM: Square support vector machine,(1. LS-SVM classifier 2. Case deletion 3. Feature deletion 4. Mean imputation 5. KNN imputation)
Provider’s involvement in Developing	A least square support vector machine classifier was developed to tackle the issues of missing data that are common in community health research
Accuracy of the AI Intervention	Accuracy mean(S.D) 1. LS-SVM classifier:0.7430.021) 2. Case deletion: 0.7140.04 3. Feature deletion: 0.718 (0.040) 4. Mean imputation: 0.689 (0.016) 5. KNN imputation: 0.705 (0.029)
Patient-related Outcomes Assessed	World Health Organization Questionnaire on Quality of Life: Short Form Hong Kong version
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Not specified
Adoption	Yes (number of providers i.e. PHC participating) : Data collected from a nurse-led mobile health center that provides primary and preventive healthcare services in the community
Implementation	Additive LS-SVM classifier was developed to tackle the issues of missing data that are common in community health research. The handling of missing data and the construction of pattern classification model were carried out at the same time.
Maintenance	Not specified (Unclear)
Key Conclusions	The study reported that all three probability response scales (numerical, verbal label, combined numerical-verbal scale) supported probability assessment. However, the combined verbal-numerical scale would be preferred to develop automatic decision support tool with Bayesian network as it accommodates to the needs of both more experienced and less experienced GP’s in primary care.

Paper 4

Paper Title: Rapid identification of familial hypercholesterolemia from electronic health records: the SEARCH study

Authors or developers	Safarova, Ms Liu, H Kullo, Ij
Year of Publication	2016
Full reference of the study	Safarova, Maya S., Hongfang Liu, and Iftikhar J. Kullo. “Rapid identification of familial hypercholesterolemia from electronic health records: the SEARCH study.” Journal of clinical lipidology 10.5 (2016): 1230-1239.
Abstract	Background: Little is known about prevalence, awareness, and control of familial hypercholesterolemia (FH) in the United States. Objective: To address these knowledge gaps, we developed an ePhenotyping algorithm for rapid identification of FH in electronic health records (EHRs) and deployed it in the Screening Employees And Residents in the Community for Hypercholesterolemia (SEARCH) study. Methods: We queried a database of 131,000 individuals seen between 1993 and 2014 in primary care practice to identify 5,992 (mean age 52 +/- 13 years, 42% men) patients with low-density lipoprotein cholesterol (LDL-C) >190 mg/dL, triglycerides <400 mg/dL and without secondary causes of hyperlipidemia. Results: Our EHR-based algorithm ascertained the Dutch Lipid Clinic Network criteria for FH using structured data sets and natural language processing for family history and presence of FH stigmata on physical examination. Blinded expert review revealed positive and negative predictive values for the SEARCH algorithm at 94% and 97%, respectively. The algorithm identified 32 definite and 391 probable cases with an overall FH prevalence of 0.32% (1:310). Only 55% of the FH cases had a diagnosis code relevant to FH. Mean LDL-C at the time of FH ascertainment was 237 mg/dL; at follow-up, 70% (298 of 423) of patients were on lipid-lowering treatment with 80% achieving an LDL-C <100 mg/dL. Of treated FH patients with premature CHD, only 22% (48 of 221) achieved an LDL-C <70 mg/dL. Conclusions: In a primary care setting, we found the prevalence of FH to be 1:310 with low awareness and control. Further studies are needed to assess whether automated detection of FH in EHR improves patient outcomes. Copyright 2016 National Lipid Association.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	21 years,(June 1993-Dec 2014)
Name of Condition	Familial hypercholesterolemia,(Additional history provided regarding familial history, secondary causes, personal history)
Artificial Intelligence Technique Used	Mayo SEARCH algorithm (Natural language processing),(Algorithm mined lipid and non lipid criteria for familial hypercholesterolemia.)
Provider’s involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	Sensitivity: 97% Specificity: 94% Positive predictive value: 94% Negative predictive value: 97%
Patient-related Outcomes Assessed	Dutch Lipid Clinic Network scoring system score,(Definite familial hypercholesterolemia: 32 (Dutch Lipid Clinic Network scoring system score : 10.2 S.D: 1.7) Probable familial hypercholesterolemia: 391 (Dutch Lipid Clinic Network scoring system score : 6.1 S.D: 0.4))
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Mayo Employee and Community Health system that delivers primary care to residents of Olmsted County and southeastern Minnesota
Implementation	Not specified
Maintenance	Not specified (Unclear)
Key Conclusions	The study reveals that a natural language processing algorithm (Mayo SEARCH) had a high accuracy in ascertaining familial hypercholesterolemia cases among individuals with severe hypercholesterolemia.

Paper 5

Paper Title: A bioinformatics approach to identify patients with symptomatic peanut allergy using peptide microarray immunoassay

Authors or developers

Lin, J
Bruni, Fm
Fu, Z
Maloney, J
Bardina, L
Boner, Al
Gimenez, G
Sampson, Ha

Year of Publication

2012

Full reference of the study

Lin, Jing, et al. “A bioinformatics approach to identify patients with symptomatic peanut allergy using peptide microarray immunoassay.” Journal of allergy and clinical immunology 129.5 (2012): 1321-1328.

Abstract

Background: Peanut allergy is relatively common, typically permanent, and often severe. A double-blind, placebo-controlled food challenge is considered the gold standard for the diagnosis of food allergy-related disorders. However, the complexity and potential of a double-blind, placebo-controlled food challenge to cause life-threatening allergic reactions affects its clinical application. A laboratory test that could accurately diagnose symptomatic peanut allergy would greatly facilitate clinical practice. Objective: We sought to develop an allergy diagnostic method that could correctly predict symptomatic peanut allergy by using peptide microarray immunoassays and bioinformatic methods. Methods: Microarray immunoassays were performed by using the sera from 62 patients (31 with symptomatic peanut allergy and 31 who had outgrown their peanut allergy or were sensitized but were clinically tolerant to peanuts). Specific IgE and IgG₄ binding to 419 overlapping peptides (15 mers, 3 offset) covering the amino acid sequences of Ara h 1, Ara h 2, and Ara h 3 were measured by using a peptide microarray immunoassay. Bioinformatic methods were applied for data analysis. Results: Individuals with a peanut allergy showed significantly greater IgE binding and broader epitope diversity than did peanut-tolerant individuals. No significant difference in IgG₄ binding was found between groups. By using machine learning methods, four peptide biomarkers were identified and prediction models that can predict the outcome of double-blind, placebo-controlled food challenges with high accuracy were developed by using a combination of the biomarkers. Conclusions: In this study, we developed a novel diagnostic approach that can predict peanut allergy with high accuracy by combining the results of a peptide microarray immunoassay and bioinformatic methods. Further studies are needed to validate the efficacy of this assay in clinical practice. 2012 American Academy of Allergy, Asthma & Immunology.

Country of Research

USA

Design of Study

Screening trial

Duration of Study

6 years, 2001-2007

Name of Condition

Peanut allergy

Artificial Intelligence Technique Used

Supervised machine learning models: Decision tree, support vector machine (develop/train the prediction models and select a combination of the least number of peptides with the highest accuracy in classifying patients); For decision trees, the tree was pruned to four nodes to reach the best generalization performance. For support vector machines, the Gaussian kernel was selected, SD of the Gaussian kernel was set to three, and Cbound was set to 10. Both classifiers were used for feature selection and subjected to five-fold cross-validation as explained below.

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Specificity: 94%, Sensitivity: 87%, Accuracy: 90%

Patient-related Outcomes Assessed

Serum sIgE levels to peanut allergens Ara h 1, Ara h 2, and Ara h 3 were determined by using ISAC, According to the ISAC; data reported in the study both Ara h 1 and Ara h 3 were bound by approximately 13% of peanut-allergic patients and approximately 9% of peanut-tolerant patients and had very low or no classification power. Ara h 2 were bound by 20 peanut-allergic patients and only 1 peanut-tolerant patient. It had the highest classification power among Ara h 1, Ara h2, and Ara h 3, and it reached 74% sensitivity and 96% specificity in distinguishing peanut-allergic groups from peanut-tolerant groups.

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Data gathered from a primary care centre

Implementation

Not specified

Maintenance

Not specified

Key Conclusions

A novel peanut allergy diagnostic approach with higher accuracy than current allergy tests was developed by employing peptide microarray immunoassays and bioinformatics. This method may be useful for clinical allergy testing in the future

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	–

Color Code

Low

Unclear

High

Paper 6

Paper Title: Medicine In Words And Numbers: A Cross-Sectional Survey Comparing Probability Assessment Scales

Authors or developers	Witteman, Cl Renooij, S Koele, P
Year of Publication	2007
Full reference of the study	Witteman, C. L., Renooij, S., & Koele, P. (2007). Medicine in words and numbers: a cross-sectional survey comparing probability assessment scales. BMC Medical Informatics and Decision Making, 7(1), 13.
Abstract	BACKGROUND: In the complex domain of medical decision making, reasoning under uncertainty can benefit from supporting tools. Automated decision support tools often build upon mathematical models, such as Bayesian networks. These networks require probabilities which often have to be assessed by experts in the domain of application. Probability response scales can be used to support the assessment process. We compare assessments obtained with different types of response scale. METHODS: General practitioners (GPs) gave assessments on and preferences for three different probability response scales: a numerical scale, a scale with only verbal labels, and a combined verbal-numerical scale we had designed ourselves. Standard analyses of variance were performed. RESULTS: No differences in assessments over the three response scales were found. Preferences for type of scale differed: the less experienced GPs preferred the verbal scale, the most experienced preferred the numerical scale, with the groups in between having a preference for the combined verbal-numerical scale. CONCLUSION: We conclude that all three response scales are equally suitable for supporting probability assessment. The combined verbal-numerical scale is a good choice for aiding the process, since it offers numerical labels to those who prefer numbers and verbal labels to those who prefer words, and accommodates both more and less experienced professionals.
Country of Research	Netherlands
Design of Study	Screening trial
Duration of Study	Not specified
Name of Condition	Not applicable
Artificial Intelligence Technique Used	Real life Bayesian network
Provider’s involvement in	Developing : Not specified,Testing : GP,Validating : GP
Accuracy of the AI Intervention	Not tested
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Diagnostic, prognostic or therapeutic alternatives judged on the basis of probability questions (verbal, numerical, verbal-numerical). A vignette represented a medical situation followed by three probabilistic choices that the GP could make, i.e. what was the probability of the patient suffering from this disease? Moreover, the preference regarding verbal, numerical, numerical-verbal scale was also inquired.
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes : Not specified
Adoption	Yes (number of providers i.e. PHC participating) : Not specified
Implementation	The study acquired data for developing automated decision making model. Therefore, not implemented.
Maintenance	Not specified (Unclear)
Key Conclusions	The study reported that all three probability response scales (numerical, verbal label, combined numerical-verbal scale) supported probability assessment. However, the combined verbal-numerical scale would be preferred to develop automatic decision support tool with Bayesian network as it accommodates to the need of both more experienced and less experienced GP’s in primary care.

Paper 7

Paper Title: Development of an Automatic Diagnostic Algorithm for Pediatric Otitis Media

Authors or developers

Tran, T. T.
Fang, T. Y.
Pham, V. T.
Lin, C.
Wang, P. C.
Lo, M. T.

Year of Publication

2018

Full reference of the study

Tran, Thi-Thao, et al. “Development of an automatic diagnostic algorithm for pediatric otitis media.” Otology & Neurotology 39.8 (2018): 1060-1065.

Abstract

HYPOTHESIS: The artificial intelligence and image processing technology can develop automatic diagnostic algorithm for pediatric otitis media (OM) with accuracy comparable to that from well-trained otologists. BACKGROUND: OM is a public health issue that occurs commonly in pediatric populations. Caring for OM may incur a significant indirect cost that stems mainly from loss of school or working days seeking for medical consultation. In this study, we aim to develop an automatic diagnostic algorithm for pediatric OM. METHODS: A total of 1,230 otoscopic images were collected. Among them, 214 images diagnosed of acute otitis media (AOM) and otitis media with effusion (OME) are used as the database for image classification in this study. For the OM image classification system, the image database is randomly partitioned into the test and train subsets. Of each image in the train and test sets, the desired eardrum image region is first segmented, then multiple image features such as color and shape are extracted. The multitask joint sparse representation-based classification to combine different features of the OM image is used for classification. RESULTS: The multitask joint sparse representation algorithm was applied for the classification of the AOM and OME images. The approach is able to differentiate the OME from AOM images and achieves the classification accuracy as high as 91.41%. CONCLUSION: Our results demonstrated that this automatic diagnosis algorithm has acceptable accuracy to diagnose pediatric OM. The cost-effective algorithm can assist patients for early detection and continuous monitoring at home to decrease consequence of the disease.

Country of Research

Taiwan

Design of Study

Screening trial

Duration of Study

Not specified

Name of Condition

Not specified

Artificial Intelligence Technique Used

machine learning-based image classification approaches

Providers’ involvement in

Developing

Accuracy of the AI Intervention

91.41%, Classification accuracy as high as 91.41%.

Patient-related Outcomes Assessed

Automatic diagnosis algorithm has acceptable accuracy to diagnose pediatric otitis media. The cost-effective algorithm can assist parents for early detection and continuous monitoring at home to decrease consequence of the disease

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Diagnosis algorithm can be installed in a portable device to assist parents with initial detection and monitor the disease progress to reduce burden and expenses for medical consultation

Reached Target Population?

Yes

Adoption

Not mentioned

Implementation

Yes

Maintenance

Key Conclusions

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	?	?

Color Code

Low

Unclear

High

Paper 8

Paper Title: Using natural language processing for identification of herpes zoster ophthalmicus cases to support population-based study

Authors or developers

Zheng, C.
Luo, Y.
Mercado, C.
Sy, L.
Jacobsen, S. J.
Ackerson, B.
Lewin, B.
Tseng, H. F.

Year of Publication

2018

Full reference of the study

Zheng, Chengyi, et al. “Using natural language processing for identification of herpes zoster ophthalmicus cases to support population‐based study.” Clinical & experimental ophthalmology 47.1 (2019): 7-14.

Abstract

IMPORTANCE: Diagnosis codes are inadequate for accurately identifying herpes zoster (HZ) ophthalmicus (HZO). There lacks significant population-based studies on HZO due to the high expense of manual review of medical records. BACKGROUND: To assess whether HZO can be identified from the clinical notes using natural language processing (NLP). To investigate the epidemiology of HZO among HZ population based on the developed approach. DESIGN: A retrospective cohort analysis. PARTICIPANTS: A total of 49914 southern California residents aged over 18 years, who had a new diagnosis of HZ. METHODS: An NLP-based algorithm was developed and validated with the manually curated validation data set (n =461). The algorithm was applied to over 1 million clinical notes associated within the study population. HZO versus non-HZO cases were compared by age, sex, race and co-morbidities. MAIN OUTCOME MEASURES: We measured the accuracy of NLP algorithm. RESULTS: NLP algorithm achieved 95.6% sensitivity and 99.3% specificity. Compared to the diagnosis codes, NLP identified significantly more HZO cases among HZ population (13.9% vs. 1.7%). Compared to the non-HZO group, the HZO group was older, had more males, had more Caucasians and more outpatient visits. CONCLUSIONS AND RELEVANCE: We developed and validated an automatic method to identify HZO cases with high accuracy. As one of the largest studies on HZO, our finding emphasizes the importance of preventing HZ in the elderly population. This method can be a valuable tool to support population-based studies and clinical care of HZO in the era of big data.

Country of Research

USA

Design of Study

Cohort study,Unclear,Retrospective

Duration of Study

2 years 11 months (1 January 2012 and 31 December 2014)

Name of Condition

Herpes zoster ophthalmicus (Additional morbidities also mentioned including asthma, allergy)

Artificial Intelligence Technique Used

Natural language processing (The machine learning modules included sentence splitting, tokenization, part-of-speech tagging, parsing and indexing)

Provider’s involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Sensitivity: 95.6%, Specificity, 99.3%, positive predictive value: 93.5%, negative predictive value: 99.5%, positive likelihood ratio: 132.5, negative likelihood ratio: 0.04

Patient-related Outcomes Assessed

Prevalence herpes: 1.7% (Hispanics, African Americans and people of mixed race were less likely to have herpes zoster ophthalmicus compared to Caucasians)

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Not implemented

Maintenance

Not specified (Unclear)

Key Conclusions

A natural language processing algorithm was tested and validated to diagnose herpes zoster opthalmicus with high accuracy.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	+

Color Code

Low

Unclear

High

Paper 9

Paper Title: A Machine Learning Recommender System to Tailor Preference Assessments to Enhance Person-Centered Care Among Nursing Home Residents

Authors or developers	Gannod, G. C. Abbott, K. M. Van Haitsma, K. Martindale, N. Heppner, A.
Year of Publication	2018
Full reference of the study	Gannod, G. C., Abbott, K. M., Van Haitsma, K., Martindale, N., & Heppner, A. (2019). A Machine Learning Recommender System to Tailor Preference Assessments to Enhance Person-Centered Care Among Nursing Home Residents. The Gerontologist, 59(1), 167-176.
Abstract	Background and Objectives: Nursing homes (NHs) using the Preferences for Everyday Living Inventory (PELI-NH) to assess important preferences and provide person-centered care find the number of items (72) to be a barrier to using the assessment. Research Design and Methods: Using a sample of n = 255 NH resident responses to the PELI-NH, we used the 16 preference items from the MDS 3.0 Section F to develop a machine learning recommender system to identify additional PELI-NH items that may be important to specific residents. Much like the Netflix recommender system, our system is based on the concept of collaborative filtering whereby insights and predictions (e.g., filters) are created using the interests and preferences of many users. The algorithm identifies multiple sets of “you might also like” patterns called association rules, based upon responses to the 16 MDS preferences that recommends an additional set of preferences with a high likelihood of being important to a specific resident. Results: In the evaluation of the combined apriori and logistic regression approach, we obtained a high recall performance (i.e., the ratio of correctly predicted preferences compared with all predicted preferences and non-preferences) and high precision (i.e., the ratio of correctly predicted rules with respect to the rules predicted to be true) of 80.2% and 79.2%, respectively. Discussion and Implications: The recommender system successfully provides guidance on how to best tailor the preference items asked of residents and can support preference capture in busy clinical environments, contributing to the feasibility of delivering person-centered care.
Country of Research	USA
Design of Study	Unclear
Duration of Study	Not specified
Name of Condition	Not specified, Tailor made care approach to enhance person-centered care
Artificial Intelligence Technique Used	Combined Apriori algorithm and a logistic regression configured with a generalized linear regression model
Providers’ involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	70%, Recall = .8021, precision = .7919, accuracy = .6979, and F1 -score = .7953.
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 255
Implementation	Not specified
Maintenance	Not specified
Key Conclusions	The recommender system discussed in the study provides guidance on how to best tailor the preference items asked of residents and can support preference capture in busy clinical environments, contributing to the feasibility of delivering person-centered care

Paper 10

Paper Title: A web-based prediction score for head and neck cancer referrals

Authors or developers

Lau, K.
Wilkinson, J.
Moorthy, R.

Year of Publication

2018

Full reference of the study

Lau, K., Wilkinson, J., & Moorthy, R. (2018). A web‐based prediction score for head and neck cancer referrals. Clinical Otolaryngology, 43(4), 1043-1049.

Abstract

OBJECTIVE: Following the announcement of the NHS Cancer Plan in 2000, anyone suspected of having cancer must see a specialist within two weeks of referral. Since this introduction, studies have shown that only 6.3%-14.6% of two-week referrals were diagnosed with a head and neck cancer and that the majority of the cancer diagnoses were from other referral routes. These studies suggest that the referral scheme is not currently cost-effective. Our aim is to develop a scoring system that determines the risk of head and neck cancer in a patient, which can then be used to aid GP referrals. DESIGN: Retrospective data were collected from 1,075 patients with two-week head and neck cancer referrals from general practitioners. The retrospective data collected included patients’ demographics, risk factors and relevant investigations. The data were used as input into a logistic regression to arrive at our model. Our approach included data analysis, machine learning techniques, statistical inference and model validation metrics to arrive at the best performing model. The model was then tested with more data from 235 prospective patients. RESULTS: Using our results from the logistic regression, we created a web-based tool that GPs can use to calculate their patient’s probability of cancer and use this result to assist in their decision regarding referral. CONCLUSION: We have created a prototype scoring system that can be hosted online to assist GPs with their referrals with a sensitivity of 31% and specificity of 92%. While we acknowledge that there are several limitations to our model, we believe we have created a novel preliminary scoring system that has the potential to be improved dramatically with further data and be very helpful for GPs in a long run.

Country of Research

United Kingdom

Design of Study

Cohort study, Unclear : Retrospective cohort study

Duration of Study

Birmingham site: 1 year, 01/07/2009–01/07/2010 Slogh site: 4 months, 1/4/2013–31/8/2013

Name of Condition

Head & neck cancer referral, Additionally subgroup distribution was made on the basis of following habits: smoking, alcohol consumption, hoarse voice, neck lump, dysphagia, weight loss, oral ulcer etc.

Artificial Intelligence Technique Used

Logistic regression

Providers’ involvement in

Developing : Not specified, Testing : General Physicians referral, Validating : Not specified

Accuracy of the AI Intervention

Not specified, 16 patients referred during validation (True positive: 5 (31%), False positive: 11, False negative: 18 (78%)), Thereafter, revision of model was performed on the basis of symptom i.e. thyroid swelling. False negative during re-evaluation reduced to 70%)

Patient-related Outcomes Assessed

True positive and false positive for cancer referral

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified : General physicians participated in the study from 2 different hospitals in UK (number not specified)

Implementation

Not specified

Maintenance

Yes : The authors mentioned that additional data is needed for further validating, modifying and advancing the web based referral platform.

Key Conclusions

This study has described the process of developing a web-based referral tool that GPs can use when referring a patient to the head and neck 2-week wait clinic

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	+

Color Code

Low

Unclear

High

Paper 11

Paper Title : Long-term outcomes of a large, prospective observational cohort of older adults with back pain

Authors or developers

Jarvik, J. G.
Gold, L. S.
Tan, K.
Friedly, J. L.
Nedeljkovic, S. S.
Comstock, B. A.
Deyo, R. A.
Turner, J. A.
Bresnahan, B. W.
Rundell, S. D.
James, K. T.
Nerenz, D. R.
Avins, A. L.
Bauer, Z.
Kessler, L.
Heagerty, P. J.

Year of Publication

2018

Full reference of the study

Jarvik, J. G., Gold, L. S., Tan, K., Friedly, J. L., Nedeljkovic, S. S., Comstock, B. A., … & James, K. T. (2018). Long-term outcomes of a large, prospective observational cohort of older adults with back pain. The Spine Journal, 18(9), 1540-1551.

Abstract

BACKGROUND CONTEXT: Although back pain is common among older adults, there is relatively little research on the course of back pain in this age group. PURPOSE: Our primary goals were to report 2-year outcomes of older adults initiating primary care for back pain and to examine the relative importance of patient factors versus medical interventions in predicting 2-year disability and pain. STUDY DESIGN/SETTING: This study used a predictive model using data from a prospective, observational cohort from a primary care setting. PATIENT SAMPLE: The study included patients aged >=65 years at the time of new primary care visits for back pain. OUTCOME MEASURES: Self-reported 2-year disability (Roland-Morris Disability Questionnaire [RDQ]) and back pain (0-10 numerical rating scale [NRS]). METHODS: We developed our models using a machine learning least absolute shrinkage and selection operator approach. We evaluated the predictive value of baseline characteristics and the incremental value of interventions that occurred between 0 and 90 days, and the change in patient disability and pain from 0 to 90 days. Limitations included confounding by indication and unmeasured confounding. RESULTS: Of 4,665 patients (89%) with follow-up, both RDQ (from mean 9.6 [95% confidence interval {CI} 9.4-9.7] to mean 8.3 [95% CI 8.0-8.5]) and back pain NRS (from mean 5.0 [95% CI 4.9-5.1] to mean 3.5 [95% CI 3.4-3.6]) scores improved slightly. Only 16% (15%-18%) reported no back pain-related disability or back pain at 2 years after initial visits. Regression model parameters explained 40% of the variation (R²) in 2-year RDQ scores, and the addition of 0- to 3-month change in RDQ score and pain improved prediction (R²=51%). The most consistent predictors of 2-year RDQ scores and back pain NRS scores were 0- to 90-day change in each respective outcome and patient confidence in improvement. Patients experienced 50% and 43% improvement in back pain and disability, respectively, 2 years after their initial visit. However, fewer than 20% of patients had complete resolution of their back pain and disability at that time. CONCLUSIONS: Baseline patient factors were more important than early interventions in explaining disability and pain after 2 years.

Country of Research

USA

Design of Study

Cohort study, Observational study, Unclear, prospective

Duration of Study

Not specified

Name of Condition

Low back pain

Artificial Intelligence Technique Used

Minimization of the Schwarz Bayesian Criterion used to select the final predictive model

Provider’s involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Prediction 2 year 30% Back pain improvement: AUC Model 1, 2: 0.66, AUC Model 2, 3: 0.69 Prediction of 2-year 30% Roland-Morris disability improvement: Model 1: 0.67, Model 2: 0.69, Model 3: 0.76

Patient-related Outcomes Assessed

Pain-related characteristics: modified Roland-Morris Disability Questionnaire, the average back pain intensity and average leg pain intensity in the past week on a 0-10 numerical rating scale, the Brief Pain Inventory (BPI) Activity Interference Scale. Psychological distress: the four-item PHQ-4 (012) measure of anxiety and depressive symptoms, Health-related quality of life (HRQoL): European Quality of Life 5 Dimension (EQ5D), including both the quality of life index (01) (European Quality of Life 5 Dimension Index [EQ5D Index]) and the visual analog scale (European Quality of Life 5 Dimension Visual Analog Scale [EQ5D VAS]), Falls: the number of falls in the past three weeks and how many resulted in injury, from the Behavioral Risk Factor Surveillance System (BRFSS) survey, Body mass index, Quan comorbidity score, baseline diagnosis, number of relative value units, spine related interventions, opioid prescriptions, days from index visit to consent, chronic pain risk score, back complaints in the elders trial; Of 4,665 patients (89%) with follow-up, both Roland-Morris disability questionnaire (from mean 9.6 [95% confidence]. interval {CI} 9.49.7] to mean 8.3 [95% CI 8.08.5]) and back pain numerical rating scale (from mean 5.0 [95% CI 4.95.1] to mean 3.5 [95% CI 3.43.6]) scores improved slightly. Only 16% (15%-18%) reported no back pain-related disability or back pain at 2 years after initial visits.

Primary Healthcare Worker Related Outcomes Assessed

Not specified, Our primary outcome was 2-year RDQ score (continuous variable), and secondary outcomes were back pain intensity rating (continuous variable), dichotomous 30% RDQ improvement, and dichotomous 30% back pain intensity improvement.

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified , Data from primary care centers included (Harvard Vanguard (Boston), Henry Ford Health System (Detroit), and Kaiser-Permanente Northern California)

Implementation

Not specified

Maintenance

No : Not specified

Key Conclusions

The study explains that the baseline patient factors were more important than early interventions in explaining disability and pain after 2 years.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	+

Color Code

Low

Unclear

High

Paper 12

Paper Title: Innovative Informatics Approaches for Peripheral Artery Disease: Current State and Provider Survey of Strategies for Improving Guideline-Based Care

Authors or developers	Chaudhry, A. P. Afzal, N. Abidian, M. M. Mallipeddi, V. P. Elayavilli, R. K. Scott, C. G. Kullo, I. J. Wennberg, P. W. Pankratz, J. J. Liu, H. Chaudhry, R. Arruda-Olson, A. M.
Year of Publication	2018
Full reference of the study	Chaudhry, Alisha P., et al. “Innovative informatics approaches for peripheral artery disease: current state and provider survey of strategies for improving guideline-based care.” Mayo Clinic Proceedings: Innovations, Quality & Outcomes 2.2 (2018): 129-136.
Abstract	Objective: To quantify compliance with guideline recommendations for secondary prevention in peripheral artery disease (PAD) using natural language processing (NLP) tools deployed to an electronic health record (EHR) and investigate provider opinions regarding clinical decision support (CDS) to promote improved implementation of these strategies. Methods: Natural language processing was used for automated identification of moderate to severe PAD cases from narrative clinical notes of an EHR of patients seen in consultation from May 13, 2015, to July 27, 2015. Guideline-recommended strategies assessed within 6 months of PAD diagnosis included therapy with statins, antiplatelet agents, angiotensin-converting enzyme inhibitors or angiotensin receptor blockers, and smoking abstention. Subsequently, a provider survey was used to assess provider knowledge regarding PAD clinical practice guidelines, comfort in recommending secondary prevention strategies, and potential role for CDS. Results: Among 73 moderate to severe PAD cases identified by NLP, only 12 (16%) were on 4 guideline-recommended strategies. A total of 207 of 760 (27%) providers responded to the survey; of these 141 (68%) were generalists and 66 (32%) were specialists. Although 183 providers (88%) managed patients with PAD, 51 (25%) indicated they were uncomfortable doing so; 138 providers (67%) favored the development of a CDS system tailored for their practice and 146 (71%) agreed that an automated EHR-derived mortality risk score calculator for patients with PAD would be helpful. Conclusion: Natural language processing tools can identify cases from EHRs to support quality metric studies. Findings of this pilot study demonstrate gaps in application of guideline-recommended strategies for secondary risk prevention for patients with moderate to severe PAD. Providers strongly support the development of CDS systems tailored to assist them in providing evidence-based care to patients with PAD at the point of care.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	3 months, May 13, 2015, to July 27, 2015
Name of Condition	Peripheral Artery Disease, Diabetes, hypertension as aditional comorbidities mentioned
Artificial Intelligence Technique Used	Natural language processing
Providers’ involvement in	Developing : Not specified,Testing : Not specified,Validating : Not specified
Accuracy of the AI Intervention	Not specified
Patient-related Outcomes Assessed	Patient-centered outcomes addressed: Study demonstrates gaps in application of guideline-recommended strategies for secondary risk prevention for patients with moderate to severe peripheral artery disease
Primary Healthcare Worker Related Outcomes Assessed	The provider survey had an overall response rate of 27% (207 of 760 providers). Among these 207 responders, 123 (59%) were staff physicians, 58 (28%) were nurse practitioners or physician assistants, and 26 (13%) residents or fellows; Within the responder group, 141 (68%) were generalists and 66 (32%) were specialists (25% cardiology, 5% vascular medicine, and 2% vascular surgery). A total of 183 (88%) respondents currently cared for patients with peripheral arterial disease, and 129 (62%) reported seeing an average of 1 to 5 patients with peripherial arterial disease per month.
Healthcare System-related Outcomes Assessed	This pilot study demonstrates gaps in application of guideline-recommended strategies for secondary risk prevention for patients with moderate to severe peripheral artery disease. Providers strongly support the development of clinical decision support systems tailored to assist them in providing evidence-based care to patients with peripheral artery disease at the point of care.
Reached Target Population?	Yes
Adoption	Not specified
Implementation	Yes
Maintenance	Not specified
Key Conclusions	The natural language processing tool used in the study identified cases from electronic health records for the application of guideline recommended strategies for secondary risk prevention for patients with peripheral arterial disease. Moreover, provider survey reported a strong support for the development of such clinical decision support system.

Paper 13

Paper Title: A new computational intelligence approach to detect autistic features for autism screening

Authors or developers

Thabtah, F.
Kamalov, F.
Rajab, K.

Year of Publication

2018

Abstract

Autism Spectrum Disorder (ASD) is one of the fastest growing developmental disability diagnosis. General practitioners (GPs) and family physicians are typically the first point of contact for patients or family members concerned with ASD traits observed in themselves or their family member. Unfortunately, some families and adult patients are unaware of ASD traits that may be exhibited and as a result do not seek out necessary diagnostic services or contact their GP. Therefore, providing a quick, accessible, and simple tool utilizing items related to ASD to these families may increase the likelihood they will seek professional assessment and is vital to the early detection and treatment of ASD. This study aims at identifying fewer, albeit influential, features in common ASD screening methods in order to achieve efficient screening as demands on evaluating the items’ influences on ASD within existing tools are urgent. To achieve this aim, a computational intelligence method called Variable Analysis (Va) is proposed that considers feature-to-class correlations and reduces feature-to-feature correlations. The results of the Va have been verified using two machine learning algorithms by deriving automated classification systems with respect to specificity, sensitivity, positive predictive values (PPVs), negative predictive values (NPVs), and predictive accuracy. Experimental results using cases and controls related to items in three common screening methods, along with features related to individuals, have been analysed and compared with results obtained from other common filtering methods. The results exhibited that Va was able to derive fewer numbers of features from adult, adolescent, and child screening methods yet maintained competitive predictive accuracy, sensitivity, and specificity rates.

Full reference of the study

Thabtah, Fadi, Firuz Kamalov, and Khairan Rajab. “A new computational intelligence approach to detect autistic features for autism screening.” International journal of medical informatics 117 (2018): 112-124.

Country of Research

University of Cambridge United Kingdom

Design of Study

Unclear

Duration of Study

Not specified

Name of Condition

Autism spectrum disorder, 194 cases with family members diagnosed with ASD and 165 cases of individuals born with jaundice

Artificial Intelligence Technique Used

Variable analysis, 5 filtration methods compared, Chi square, Information gain, correlation feature set, Correlation, all of them

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Adolescent: specificity: 87.3%, sensitivity: 80.95%, positive predictive values: 80.95%, negative predictive values: 84.13%, and positive predictive accuracy: 80% Adult: sensitivity: 80.95-82.54% (C4.5 and ripper algorithm)

Patient-related Outcomes Assessed

Not specified

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified : The tool was developed to be used in a primary care setting i.e. by general practitioners, patients, medical staff etc.

Implementation

Not specified

Maintenance

Not specified

Key Conclusions

A self-administered autism spectrum disorder assessment tool that could be used by patients, caregivers of medical staff has been described in the study. The results exhibited that variable analysis was able to derive fewer numbers of features from adult, adolescent, and child screening methods yet maintained competitive predictive accuracy, sensitivity, and specificity rates

Risk of Bias

Participants	Predictors	Outcome	Analysis
–	–	?	–

Color Code

Low

Unclear

High

Paper 14

Paper Title: Chronic obstructive lung disease 'expert system' : validation of a predictive tool for assisting diagnosis

Authors or developers

Braido, F.
Santus, P.
Corsico, A. G.
Di Marco, F.
Melioli, G.
Scichilone, N.
Solidoro, P.

Year of Publication

2018

Full reference of the study

Braido, F., Santus, P., Corsico, A. G., Di Marco, F., Melioli, G., Scichilone, N., & Solidoro, P. (2018). Chronic obstructive lung disease ‘expert system’: validation of a predictive tool for assisting diagnosis. International journal of chronic obstructive pulmonary disease, 13, 1747.

Abstract

Purpose: The purposes of this study were development and validation of an expert system (ES) aimed at supporting the diagnosis of chronic obstructive lung disease (COLD). Methods: A questionnaire and a WebFlex code were developed and validated in silico. An expert panel pilot validation on 60 cases and a clinical validation on 241 cases were performed. Results: The developed questionnaire and code validated in silico resulted in a suitable tool to support the medical diagnosis. The clinical validation of the ES was performed in an academic setting that included six different reference centers for respiratory diseases. The results of the ES expressed as a score associated with the risk of suffering from COLD were matched and compared with the final clinical diagnoses. A set of 60 patients were evaluated by a pilot expert panel validation with the aim of calculating the sample size for the clinical validation study. The concordance analysis between these preliminary ES scores and diagnoses performed by the experts indicated that the accuracy was 94.7% when both experts and the system confirmed the COLD diagnosis and 86.3% when COLD was excluded. Based on these results, the sample size of the validation set was established in 240 patients. The clinical validation, performed on 241 patients, resulted in ES accuracy of 97.5%, with confirmed COLD diagnosis in 53.6% of the cases and excluded COLD diagnosis in 32% of the cases. In 11.2% of cases, a diagnosis of COLD was made by the experts, although the imaging results showed a potential concomitant disorder. Conclusion: The ES presented here (COLD_ES) is a safe and robust supporting tool for COLD diagnosis in primary care settings.

Country of Research

Italy

Design of Study

Unclear

Duration of Study

Not specified

Name of Condition

Chronic obstructive lung disease

Artificial Intelligence Technique Used

The expert system is based on frame rules (representing the knowledge base) driving the system itself and on forms for input and output. Coded in WebFlex., The expert system was developed on the basis of rules developed by expert pulmonologists by using specific symptomatic pattern, lung function, X-ray.

Providers’ involvement in

Developing : Not specified,Testing : Pulmonologists,Validating : General physicians, pulmonologists

Accuracy of the AI Intervention

Training (true positive , system + expert): 94.7%, True negative()system + expert): 86.3%, Validation: True positive 97.5% expert, True negative: 53.6%), excluded cases of chronic obstructive lung disease: 32%

Patient-related Outcomes Assessed

Not specified

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes : The expert system identified patients with chronic obstructive lung disease

Adoption

Yes (number of providers i.e. PHC participating) : Not specified

Implementation

The study was applied in a clinical setting and validated by expert pulmonologists

Maintenance

Key Conclusions

The expert system is a safe and robust supporting tool for chronic obstructive lung disease diagnosis in primary care settings

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	–

Color Code

Low

Unclear

High

Paper 15

Paper Title: A machine learning based approach to identify protected health information in Chinese clinical text

Authors or developers	Du, L. Xia, C. Deng, Z. Lu, G. Xia, S. Ma, J.
Year of Publication	2018
Full reference of the study	Du, L., Xia, C., Deng, Z., Lu, G., Xia, S., & Ma, J. (2018). A machine learning based approach to identify protected health information in Chinese clinical text. International journal of medical informatics, 116, 24-32.
Abstract	BACKGROUND: With the increasing application of electronic health records (EHRs) in the world, protecting private information in clinical text has drawn extensive attention from healthcare providers to researchers. De-identification, the process of identifying and removing protected health information (PHI) from clinical text, has been central to the discourse on medical privacy since 2006. While de-identification is becoming the global norm for handling medical records, there is a paucity of studies on its application on Chinese clinical text. Without efficient and effective privacy protection algorithms in place, the use of indispensable clinical information would be confined. OBJECTIVES: We aimed to (i) describe the current process for PHI in China, (ii) propose a machine learning based approach to identify PHI in Chinese clinical text, and (iii) validate the effectiveness of the machine learning algorithm for de-identification in Chinese clinical text. METHODS: Based on 14,719 discharge summaries from regional health centers in Ya’an City, Sichuan province, China, we built a conditional random fields (CRF) model to identify PHI in clinical text, and then used the regular expressions to optimize the recognition results of the PHI categories with fewer samples. RESULTS: We constructed a Chinese clinical text corpus with PHI tags through substantial manual annotation, wherein the descriptive statistics of PHI manifested its wide range and diverse categories. The evaluation showed with a high F-measure of 0.9878 that our CRF-based model had a good performance for identifying PHI in Chinese clinical text. CONCLUSION: The rapid adoption of EHR in the health sector has created an urgent need for tools that can parse patient specific information from Chinese clinical text. Our application of CRF algorithms for de-identification has shown the potential to meet this need by offering a highly accurate and flexible solution to analyzing Chinese clinical text.
Country of Research	China
Design of Study	Unclear : Not described but a cohort study in reviewer’s opinion
Duration of Study	Not specified
Name of Condition	Not applicable
Artificial Intelligence Technique Used	Conditional random fields, Chinese clinical text corpus with protected health information (PHI) tags via substantial manual annotation and also ran the descriptive statistics of protected health information in corpus from level-specific (Tertiary, Secondary, and Primary) medical institutions Jieba tool: a tool that splits Chinese text into a sequence of words according to the predefined word segmentation rules Medical Dictionaries used: ICD-10 (the International Classification of Disease, 10th Revision) and surgery operation concepts in ICD-9-CM-3 (the International Classification of Diseases, 9th Revision, Clinical Modification, 3th Revision) Lexical features: Lexical features contain the current token and its part of speech (POS) tag, four previous and four next tokens and their corresponding POS tags, which are always a helpful indicator for identifying the boundaries of PHI. For example, prepositions such as “at” and “to” often appear in front of LOCATION and INSTITUTION entities. Dictionary features: We extracted the PHI entities of PROVINCE, CITY, COUNTY, and INSTITUTION in the training data and affixed them with webpages of province, city, county, and medical institution to generate the dictionaries, wherein all of the elements, rather than the phrases, are taken as tokens. The full names and abbreviations of 34 provinces in China, 21 cities, 181 counties in Sichuan and 212 medical institutions in Ya’an City, Sichuan were incorporated in the PROVINCE, CITY, COUNTY and INSTITUTION dictionaries, respectively, to facilitate further classification of PHI entities.
Providers’ involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	Precession [Conditional random field + Lexicon: 98.5%, Conditional random field + Lexicon + dictionary: 99.27%, Conditional random field + rules: 99.27%], Recall [Conditional random field + Lexicon: 96.96%, Conditional random field + Lexicon + dictionary: 98.19%, Conditional random field + rules: 98.29%], F-measure [Conditional random field + Lexicon: 97.73%, Conditional random field + Lexicon + dictionary: 98.73%, Conditional random field + rules: 98.78%]
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 180 primary care institutions participated in Ya’an city
Implementation	The data of 14719 discharge summaries from regional health information platform (180: primary care institutions, 30: Secondary care institutions, 2: tertiary care institutions.) in Ya’an City, Sichuan
Maintenance	Not specified
Key Conclusions	The rapid adoption of electronic health record in the health sector has created an urgent need for tools that can parse patient specific information from Chinese clinical text. Our application of conditional random fields algorithms for de-identification has shown the potential to meet this need by offering a highly accurate and flexible solution to analyzing Chinese clinical text.

Paper 16

Paper Title: Automatic address validation and health record review to identify homeless Social Security disability applicants

Authors or developers	Erickson, J. Abbott, K. Susienka, L.
Year of Publication	2018
Full reference of the study	Jennifer Erickson, Kenneth Abbott, Lucinda Susienka, Automatic address validation and health record review to identify homeless Social Security disability applicants, Journal of Biomedical Informatics, Volume 82, 2018, Pages 41-46, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2018.04.012.
Abstract	OBJECTIVE: Homeless patients face a variety of obstacles in pursuit of basic social services. Acknowledging this, the Social Security Administration directs employees to prioritize homeless patients and handle their disability claims with special care. However, under existing manual processes for identification of homelessness, many homeless patients never receive the special service to which they are entitled. In this paper, we explore address validation and automatic annotation of electronic health records to improve identification of homeless patients. MATERIALS AND METHODS: We developed a sample of claims containing medical records at the moment of arrival in a single office. Using address validation software, we reconciled patient addresses with public directories of homeless shelters, veterans’ hospitals and clinics, and correctional facilities. Other tools annotated electronic health records. We trained random forests to identify homeless patients and validated each model with 10-fold cross validation. RESULTS: For our finished model, the area under the receiver operating characteristic curve was 0.942. The random forest improved sensitivity from 0.067 to 0.879 but decreased positive predictive value to 0.382. DISCUSSION: Presumed false positive classifications bore many characteristics of homelessness. Organizations could use these methods to prompt early collection of information necessary to avoid labor-intensive attempts to reestablish contact with homeless individuals. Annually, such methods could benefit tens of thousands of patients who are homeless, destitute, and in urgent need of assistance. CONCLUSION: We were able to identify many more homeless patients through a combination of automatic address validation and natural language processing of unstructured electronic health records.
Country of Research	USA
Design of Study	Qualitative study
Duration of Study	1 year
Name of Condition	Homeless Social Security disability
Artificial Intelligence Technique Used	Natural language processing
Providers’ involvement in	Developing : na,Testing : na,Validating : na
Accuracy of the AI Intervention	0.942, Random forest improved sensitivity: 0.067 to 0.879, Positive predictive value: 0.382
Patient-related Outcomes Assessed	A combination of automatic address validation and health record review can help identify homeless disability applicants
Primary Healthcare Worker Related Outcomes Assessed	na
Healthcare System-related Outcomes Assessed	na
Reached Target Population?	Yes : Social service agencies may benefit from implementation of similar methods.
Adoption	Yes (number of providers i.e. PHC participating) : Minnesota Disability Determination Services
Implementation	not specified
Maintenance	Not specified
Key Conclusions	Authors were able to identify homeless patients other than reported through a combination of automatic address validation and natural language processing of unstructured electronic health records.

Paper 17

Paper Title: Quantifying the incidence and burden of herpes zoster in New Zealand general practice: a retrospective cohort study using a natural language processing software inference algorithm

Authors or developers	Turner, N. M. MacRae, J. Nowlan, M. L. McBain, L. Stubbe, M. H. Dowell, A.
Year of Publication	2018
Full reference of the study	Turner NM, MacRae J, Nowlan ML, et al. Quantifying the incidence and burden of herpes zoster in New Zealand general practice: a retrospective cohort study using a natural language processing software inference algorithm. BMJ Open 2018;8:e021241. doi:10.1136/ bmjopen-2017-021241
Abstract	OBJECTIVE: To investigate the incidence of primary care presentations for herpes zoster (zoster) in a representative New Zealand population and to evaluate the utilisation of primary healthcare services following zoster diagnosis. DESIGN: A cross-sectional retrospective cohort study used a natural language processing software inference algorithm to identify general practice consultations for zoster by interrogating 22 million electronic medical record (EMR) transactions routinely recorded from January 2005 to December 2015. Data linking enabled analysis of the demographics of each case. The frequency of doctor visits was assessed prior to and after the first consultation diagnosing zoster to determine health service utilisation. SETTING: General practice, using EMRs from two primary health organisations located in the lower North Island, New Zealand. PARTICIPANTS: Thirty-nine general practices consented interrogation of their EMRs to access deidentified records for all enrolled patients. Out-of-hours and practice nurse consultations were excluded. MAIN OUTCOME MEASURES: The incidence of first and repeated zoster-related visits to the doctor across all age groups and associated patient demographics. To determine whether zoster affects workload in general practice. RESULTS: Overall, for 6,189,019 doctor consultations, the incidence of zoster was 48.6 per 10000 patient-years (95%CI 47.6 to 49.6). Incidence increased from the age of 50 years to a peak rate of 128 per 10000 in the age group of 80-90 years and was significantly higher in females than males (p<0.001). Over this 11-year period, incidence increased gradually, notably in those aged 80-85 years. Only 19% of patients had one or more follow-up zoster consultations within 12 months of a zoster index consultation. The frequency of consultations, for any reason, did not change between periods before and after the diagnosis. CONCLUSIONS: Zoster consultations in general practice are rare, and the burden of these cases on overall general practice caseload is low.
Country of Research	New Zealand
Design of Study	Cohort study, Unclear (Cross-sectional)
Duration of Study	10 years,(January 2005 to December 2015)
Name of Condition	Herpes zoster
Artificial Intelligence Technique Used	Natural language processing software inference algorithm to identify herpes zoster (zoster) presentation rates and service utilisation using primary care electronic medical records over an 11-year period
Provider’s involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	The natural language algorithm had a positive predictive value of 0.82 (95% CI 0.72 to 0.92), specificity of 0.9998 (95% CI 0.9997 to 0.9999) and sensitivity of 0.84 (95% CI 0.74 to 0.92). This was more accurate than using keywords only (positive predictive value:0.66, specificity 0.9994 and sensitivity 1.0) or using a single clinical expert (positive predictive value:0.53, specificity 0.9991 and sensitivity 0.93). Incidence increased from the age of 50 years to a peak rate of 128 per 10,000 in the age group of 80-90 years and was significantly higher in females than males (p<0.001).
Patient-related Outcomes Assessed	Despite a low frequency of zoster cases, the large data set enabled analysis of rates of zoster incidence by age bands and different demographics across the whole time period. The algorithm was designed in study to maximise specificity and accuracy, thereby generating a conservative estimate of the burden of zoster presentations in primary care by keeping false positives to a minimum.
Primary health care worker related outcomes assessed	The overall age-adjusted apparent rate of zoster index consultations was 42.7 per 10,000 person-years observed (95%CI 41.9 to 43.5), with an estimated true rate of 48.6 (95% CI 47.6 to 49.6). There were 10,316 index consultations for zoster and 3,060 zoster-related follow-up consultations. The apparent rate for zoster index consultations was 16.7 per 10,000 doctor consultations (95%CI 16.3 to 17.0) with an estimated true rate of 17.5 (95% CI 17.1 to 17.9). This was the equivalent to one in 571 doctor consultations. The rate of consultations was much higher in older age groups, as shown, with the highest rate in the age group of 8,090years at 128 consultations per 10,000 person-years.
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 39 general practices,Not specified
Implementation	Not specified
Maintenance	No
Key Conclusions	The study concludes that the natural language processing algorithm had higher accuracy than using keyword or a single clinical expert. Moreover, the study reveals that Zoster consultations in general practice are rare, and the burden of these cases on overall general practice caseload is low.

Paper 18

Paper Title: Methods for estimating kidney disease stage transition probabilities using electronic medical records

Authors or developers

Luo, L.
Small, D.
Stewart, W. F.
Roy, J. A.

Year of Publication

2013

Full reference of the study

Luo, Lola, et al. “Methods for estimating kidney disease stage transition probabilities using electronic medical records.” eGEMs 1.3 (2013).

Abstract

Chronic diseases are often described by stages of severity. Clinical decisions about what to do are influenced by the stage, whether a patient is progressing, and the rate of progression. For chronic kidney disease (CKD), relatively little is known about the transition rates between stages. To address this, we used electronic health records (EHR) data on a large primary care population, which should have the advantage of having both sufficient follow-up time and sample size to reliably estimate transition rates for CKD. However, EHR data have some features that threaten the validity of any analysis. In particular, the timing and frequency of laboratory values and clinical measurements are not determined a priori by research investigators, but rather, depend on many factors, including the current health of the patient. We developed an approach for estimating CKD stage transition rates using hidden Markov models (HMMs), when the level of information and observation time vary among individuals. To estimate the HMMs in a computationally manageable way, we used a “discretization” method to transform daily data into intervals of 30 days, 90 days, or 180 days. We assessed the accuracy and computation time of this method via simulation studies. We also used simulations to study the effect of informative observation times on the estimated transition rates. Our simulation results showed good performance of the method, even when missing data were non-ignorable. We applied the methods to EHR data from over 60,000 primary care patients who have chronic kidney disease (stage 2 and above). We estimated transition rates between six underlying disease states. The results were similar for men and women.

Country of Research

USA

Design of Study

Unclear

Duration of Study

6 years, 6 months (July 30th, 2003 to Dec. 31st, 2009)

Name of Condition

Kidney Disease Stage Transition

Artificial Intelligence Technique Used

Hidden Markov models, Missing data mechanism (Missing at random) generated as an indicator for variable for missing data.

Provider’s involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Not specified, For the missing data mechanism the absolute bias is 0.007 in the 30-day interval, 0.013 in the 90-day interval, and 0.023 in the 180-day interval.

Patient-related Outcomes Assessed

Over 60,000 primary care patients who have chronic kidney disease (stage 2 and above) data were analysed. The estimated transition rates between six underlying disease states were computed. The results were similar for men and women.

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Not specified

Maintenance

Not specified (Unclear)

Key Conclusions

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	+	+

Color Code

Low

Unclear

High

Paper 19

Paper Title: Enabling Stroke Rehabilitation in Home and Community Settings: A Wearable Sensor-Based Approach for Upper-Limb Motor Training

Authors or developers	Lee, S. I. Adans-Dester, C. P. Grimaldi, M. Dowling, A. V. Horak, P. C. Black-Schaffer, R. M. Bonato, P. Gwin, J. T.
Year of Publication	2018
Full reference of the study	Enabling Stroke Rehabilitation in Home and Community Settings: A Wearable Sensor-Based Approach for Upper-Limb Motor Training
Abstract	High-dosage motor practice can significantly contribute to achieving functional recovery after a stroke. Performing rehabilitation exercises at home and using, or attempting to use, the stroke-affected upper limb during Activities of Daily Living (ADL) are effective ways to achieve high-dosage motor practice in stroke survivors. This paper presents a novel technological approach that enables 1) detecting goal-directed upper limb movements during the performance of ADL, so that timely feedback can be provided to encourage the use of the affected limb, and 2) assessing the quality of motor performance during in-home rehabilitation exercises so that appropriate feedback can be generated to promote high-quality exercise. The results herein presented show that it is possible to detect 1) goal-directed movements during the performance of ADL with a [Formula: see text]-statistic of 87.0% and 2) poorly performed movements in selected rehabilitation exercises with an [Formula: see text]-score of 84.3%, thus enabling the generation of appropriate feedback. In a survey to gather preliminary data concerning the clinical adequacy of the proposed approach, 91.7% of occupational therapists demonstrated willingness to use it in their practice, and 88.2% of stroke survivors indicated that they would use it if recommended by their therapist.
Country of Research	USA
Design of Study	Unclear
Duration of Study	Not specified
Name of Condition	Stroke
Artificial Intelligence Technique Used	Logistic regression classification model
Providers’ involvement in	Developing : Not specified,Testing : Occupational therapist,Validating : Occupational therapist
Accuracy of the AI Intervention	87% AUC, True positive rate: 79%, True negative rate: 78%.
Patient-related Outcomes Assessed	88.2% of stroke survivors indicated that they would use it
Primary Healthcare Worker Related Outcomes Assessed	91.7% of occupational therapists demonstrated willingness to use it
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Occupational therapists
Implementation	Not specified
Maintenance	Not specified
Key Conclusions	The authors presented a novel technological approach that utilizes two wearable sensors for detecting goal directed movements during activities of daily living and for determining appropriate feedback during in-home rehabilitation exercise

Paper 20

Paper Title: External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches

Authors or developers

Morales, D. R.
Flynn, R.
Zhang, J.
Trucco, E.
Quint, J. K.
Zutis, K.

Year of Publication

2018

Full reference of the study

Morales, D., Flynn, R., Zhang, J., Trucco, E., & Quint, J. K. (2018). External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches. Respiratory Medicine, 138, 150-155. https://doi.org/10.1016/j.rmed.2018.04.003

Abstract

BACKGROUND: Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. METHODS: We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. RESULTS: The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. CONCLUSION: In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches.

Country of Research

Design of Study

Cohort study

Duration of Study

14 years, 01.01.2000 to 01.04.2014

Name of Condition

Chronic obstructive pulmonary disease

Artificial Intelligence Technique Used

Logistic regression and a support vector 12 machine learning (SVM)

Providers’ involvement in

Developing : Not specified,Testing : Not specified,Validating : Not specified

Accuracy of the AI Intervention

ADO index score 27 performed the best with a 1 year c-statistic of 0.723 1 year DOSE index score 8 1 c-statistic of 0.654 COTE index score of 0.650 CODEX index score of 0.651, Class weighting was used to improve prediction accuracy and 95% confidence intervals were 14 generated through bootstrapping

Patient-related Outcomes Assessed

The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 14 25.4-25.7) per 1000 person years for men and women respectively. Complete data were 15 available for 54879 patients to predict 1-year mortality

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Our study has shown that predictive 21 performance can be improved by incorporating more clinical information, in this example 22 using COTE comorbidities. However, incorporating larger amounts of data may be infeasible 23 to interpret through human factors alone. Using a limited feature set, SVM performs as well 24 as standard logistic regression helping to validate this approach which could now be applied 25 to a large data set that includes far more clinical data (such as blood results, prescriptions, 26 pattern of health care access and social care data).

Maintenance

Key Conclusions

Complete data were 15 available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic 16 of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c17 statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved 18 discrimination at predicting 1-year mortality when combined with COTE comorbidities (c19 statistic 0.780 ADO+COTE; c-statistic 0.727 DOSE+COTE). Discrimination did not change 20 significantly over 1-3 years. Comparable results were observed using support vector 12 machine learning (SVM) method of analysis. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	–	+

Color Code

Low

Unclear

High

Paper 21

Paper Title: Automatic infection detection based on electronic medical records

Authors or developers

Tou, H.
Yao, L.
Wei, Z.
Zhuang, X.
Zhang, B.

Year of Publication

2018

Full reference of the study

Tou, Huaixiao, et al. “Automatic infection detection based on electronic medical records.” BMC bioinformatics 19.5 (2018): 117.

Abstract

BACKGROUND: Making an accurate patient care decision, as early as possible, is a constant challenge, especially for physicians in the emergency department. The increasing volumes of electronic medical records (EMRs) open new horizons for automatic diagnosis. In this paper, we propose to use machine learning approaches for automatic infection detection based on EMRs. Five categories of information are utilized for prediction, including personal information, admission note, vital signs, diagnostic test results and medical image diagnostic. RESULTS: Experimental results on a newly constructed EMRs dataset from emergency department show that machine learning models can achieve a decent performance for infection detection with area under the receiver operator characteristic curve (AUC) of 0.88. Out of all the five types of information, admission note in text form makes the most contribution with the AUC of 0.87. CONCLUSIONS: This study provides a state-of-the-art EMRs processing system to automatically make medical decisions. It extracts five types of features associated with infection and achieves a decent performance on automatic infection detection based on machine learning models.

Country of Research

China

Design of Study

Cohort study

Duration of Study

4 year, 2012-2016

Name of Condition

Infection, cess, Necrosis, Gangrene, Pyogenic, Sepsis, Erysipelatous, Pneumonia, Pyothorax, Mastitis, Perforation, Peritonitis, Acute cholecystitis, Gangrenous cholecystitis, Acute attacking of chronic cholecystitis, Acute cholangitis, Acute suppurative cholangitis, Acute gangrenous cholangitis, Biliary pancreatitis, Acute appendicitis, Acute suppurated appendicitis, Acute gangrened appendicitis, Acute purulent gangrenous appendicitis, Acute phlegmonous appendicitis, Systemic inflammatory response syndrome, Sepsis, Septic shock, Acute attacking of chronic appendicitis

Artificial Intelligence Technique Used

Machine learning: Random forest, logistic regression CV, Bernoulli NB, Gradient boosting classifier

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Random forest: 0.84, logistic regression CV: 0.87, Bernoulli NB: 0.68, Gradient boosting classifier: 0.88

Patient-related Outcomes Assessed

Not specified

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Not implemented studies on a cohort of electronic medical health record.

Maintenance

Key Conclusions

The study demonstrates a state-of-the-art electronic medical records processing system to automatically make medical decisions. The single factor correlation analysis shows the processing system is able to identify indicative factors for the detection of infection. Research also analyzes the effectiveness of different types of features for infection detection and reveal the effectiveness of text-based features. The system, using all features achieves the best performance with AUC over 88%.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	+

Color Code

Low

Unclear

High

Paper 22

Paper Title: Detecting Motor Impairment in Early Parkinson's Disease via Natural Typing Interaction With Keyboards: Validation of the neuroQWERTY Approach in an Uncontrolled At-Home Setting

Authors or developers

Arroyo-Gallego, T.
Ledesma-Carbayo, M. J.
Butterworth, I.
Matarazzo, M.
Montero-Escribano, P.
Puertas-Martin, V.
Gray, M. L.
Giancardo, L.
Sanchez-Ferro, A.

Year of Publication

2018

Full reference of the study

Arroyo-Gallego, Teresa, et al. “Detecting motor impairment in early Parkinson’s disease via natural typing interaction with keyboards: validation of the neuroQWERTY approach in an uncontrolled at-home setting.” Journal of medical Internet research 20.3 (2018): e89.

Abstract

BACKGROUND: Parkinson’s disease (PD) is the second most prevalent neurodegenerative disease and one of the most common forms of movement disorder. Although there is no known cure for PD, existing therapies can provide effective symptomatic relief. However, optimal titration is crucial to avoid adverse effects. Today, decision making for PD management is challenging because it relies on subjective clinical evaluations that require a visit to the clinic. This challenge has motivated recent research initiatives to develop tools that can be used by nonspecialists to assess psychomotor impairment. Among these emerging solutions, we recently reported the neuroQWERTY index, a new digital marker able to detect motor impairment in an early PD cohort through the analysis of the key press and release timing data collected during a controlled in-clinic typing task. OBJECTIVE: The aim of this study was to extend the in-clinic implementation to an at-home implementation by validating the applicability of the neuroQWERTY approach in an uncontrolled at-home setting, using the typing data from subjects’ natural interaction with their laptop to enable remote and unobtrusive assessment of PD signs. METHODS: We implemented the data-collection platform and software to enable access and storage of the typing data generated by users while using their computer at home. We recruited a total of 60 participants; of these participants 52 (25 people with Parkinson’s and 27 healthy controls) provided enough data to complete the analysis. Finally, to evaluate whether our in-clinic-built algorithm could be used in an uncontrolled at-home setting, we compared its performance on the data collected during the controlled typing task in the clinic and the results of our method using the data passively collected at home. RESULTS: Despite the randomness and sparsity introduced by the uncontrolled setting, our algorithm performed nearly as well in the at-home data (area under the receiver operating characteristic curve [AUC] of 0.76 and sensitivity/specificity of 0.73/0.69) as it did when used to evaluate the in-clinic data (AUC 0.83 and sensitivity/specificity of 0.77/0.72). Moreover, the keystroke metrics presented a strong correlation between the two typing settings, which suggests a minimal influence of the in-clinic typing task in users’ normal typing. CONCLUSIONS: The finding that an algorithm trained on data from an in-clinic setting has comparable performance with that tested on data collected through naturalistic at-home computer use reinforces the hypothesis that subtle differences in motor function can be detected from typing behavior. This work represents another step toward an objective, user-convenient, and quasi-continuous monitoring tool for PD.

Country of Research

Spain

Design of Study

Cohort study, Unclear : Longitudnal study

Duration of Study

6 months

Name of Condition

Parkinson Disease

Artificial Intelligence Technique Used

neuroQWERTY, nQi is the output of a computational algorithm that uses the information contained in the sequences of hold times

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

At home (0.76 [0.66-0.88]), In clinic (0.83 [0.74-0.92])

Patient-related Outcomes Assessed

neuroQWERTY index (nQi) performance comparison Parkinson: Mean (S.D): Clinic: 0.092(0.058), Home: 0.09(0.048) Healthy: Mean (S.D): Clinic: 0.092(0.058), Home: 0.09(0.048)

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Not specified

Maintenance

Key Conclusions

This study validated the findings of neuroQWERTY algorithm in a home-based setting and validated its findings to be at a comparable performance with the in-clinic data.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	–

Color Code

Low

Unclear

High

Paper 23

Paper Title: Home Health Care: Nurse-Physician Communication, Patient Severity, and Hospital Readmission

Authors or developers	Pesko, M. F. Gerber, L. M. Peng, T. R. Press, M. J.
Year of Publication	2018
Full reference of the study	Pesko, Michael F., et al. “Home health care: nurse’physician communication, patient severity, and hospital readmission.” Health services research 53.2 (2018): 1008-1024.
Abstract	OBJECTIVE: To evaluate whether communication failures between home health care nurses and physicians during an episode of home care after hospital discharge are associated with hospital readmission, stratified by patients at high and low risk of readmission. DATA SOURCE/STUDY SETTING: We linked Visiting Nurse Services of New York electronic medical records for patients with congestive heart failure in 2008 and 2009 to hospitalization claims data for Medicare fee-for-service beneficiaries. STUDY DESIGN: Linear regression models and a propensity score matching approach were used to assess the relationship between communication failure and 30-day readmission, separately for patients with high-risk and low-risk readmission probabilities. DATA COLLECTION/EXTRACTION METHODS: Natural language processing was applied to free-text data in electronic medical records to identify failures in communication between home health nurses and physicians. PRINCIPAL FINDINGS: Communication failure was associated with a statistically significant 9.7 percentage point increase in the probability of a patient readmission (32.6 percent of the mean) among high-risk patients. CONCLUSIONS: Poor communication between home health nurses and physicians is associated with an increased risk of hospital readmission among high-risk patients. Efforts to reduce readmissions among this population should consider focusing attention on this factor.
Country of Research	USA
Design of Study	Unclear
Duration of Study	1 year, 2008-2009
Name of Condition	Congestive heart faliure
Artificial Intelligence Technique Used	Natural language processing, Linear regression models and a propensity score matching approach
Providers’ involvement in	Developing : Not specified,Testing : Not specified,Validating : Not specified
Accuracy of the AI Intervention	Sensitivity analysis: Analysis 1: 7.9% increased likelihood of readmission due to communication faliure, Analysis 2: 6% increased likelihood of readmission due to communication failure
Patient-related Outcomes Assessed	Patient readmission rate
Primary Healthcare Worker Related Outcomes Assessed	Physician-nurse communication
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 778 nurses,Not specified : Nurse: years of experience: One comunication= 7.11 years, S.D= 6.89, all communication= 7.08, SD= 7.22 One communication(Physician: 0.59, Non-surgeon subspecialist:0.35, Surgeon: 0.06) All communication(Physician: 0.51, Non-surgeon subspecialist: 0.41, Surgeon: 0.07)
Implementation	Not implemented, tested on electronic medical records
Maintenance	Not specified
Key Conclusions	The study reports a natural language processing algorithm that from home health care electronic medical records match with medical data examines how failure in communication between nurses and physicians during an episode in home health care can influence the patients’ probability for a 30 day re admission.

Paper 24

Paper Title: Examining Healthcare Utilization Patterns of Elderly Middle-Aged Adults in the United States

Authors or developers	Zayas, C. E. He, Z. Yuan, J. Maldonado-Molina, M. Hogan, W. Modave, F. Guo, Y. Bian, J.
Year of Publication	2016
Full reference of the study	Zayas, Cilia E., et al. “Examining Healthcare Utilization Patterns of Elderly and Middle-Aged Adults in the United States.” The Twenty-Ninth International Flairs Conference. 2016.
Abstract	Elderly patients, aged 65 or older, make up 13.5% of the U.S. population, but represent 45.2% of the top 10% of healthcare utilizers, in terms of expenditures. Middle-aged Americans, aged 45 to 64 make up another 37.0% of that category. Given the high demand for healthcare services by the aforementioned population, it is important to identify high-cost users of healthcare systems and, more importantly, ineffective utilization patterns to highlight where targeted interventions could be placed to improve care delivery. In this work, we present a novel multi-level framework applying machine learning (ML) methods (i.e., random forest regression and hierarchical clustering) to group patients with similar utilization profiles into clusters. We use a vector space model to characterize a patient’s utilization profile as the number of visits to different care providers and prescribed medications. We applied the proposed methods using the 2013 Medical Expenditures Panel Survey (MEPS) dataset. We identified clusters of healthcare utilization patterns of elderly and middle-aged adults in the United States, and assessed the general and clinical characteristics associated with these utilization patterns. Our results demonstrate the effectiveness of the proposed framework to model healthcare utilization patterns. Understanding of these patterns can be used to guide healthcare policy-making and practice.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	1 year, 2013
Name of Condition	Diabetes, cancer, coronary heart disease, angina, heart attack, other heart disease, stroke
Artificial Intelligence Technique Used	Random forest regression, hierarchical clustering
Providers’ involvement in	Developing : presented a novel multi-level framework applying machine learning methods (i.e., random forest regression and hierarchical clustering) to group patients with similar utilization profiles into clusters
Accuracy of the AI Intervention	Prediction performance random forest regression model: r squared: 0.46, NRMSE: 1.68, The Silhouette scores for k=20, 100, and 150 are -0.592, 0.007, and 0.065, respectively
Patient-related Outcomes Assessed	Healthcare Utilization Patterns of Elderly Middle-Aged Adults in the United States
Primary Healthcare Worker Related Outcomes Assessed	not specified
Healthcare System-related Outcomes Assessed	Utilization Patterns of patients with similar utilization profiles into clusters thus helping in better optimized health care system
Reached Target Population?	Yes
Adoption	Not specified
Implementation	Not specified
Maintenance	No
Key Conclusions	This study presented a simple but novel vector space model of patients’ utilization profiles. The evaluations, using the 2013 MEPS dataset, demonstrate the usefulness of the proposed approaches in identifying meaningful utilization patterns of elderly and middle-aged adults in the United States.

Paper 25

Paper Title: Data-based Decision Rules to Personalize Depression Follow-up

Authors or developers

Lin, Y.
Huang, S.
Simon, G. E.
Liu, S.

Year of Publication

2018

Full reference of the study

Lin, Y., Huang, S., Simon, G.E. et al. Data-based Decision Rules to Personalize Depression Follow-up. Sci Rep 8, 5064 (2018). https://doi.org/10.1038/s41598-018-23326-1

Abstract

Depression is a common mental illness with complex and heterogeneous progression dynamics. Risk grouping of depression treatment population based on their longitudinal patterns has the potential to enable cost-effective monitoring policy design. This paper establishes a rule-based method to identify a set of risk predictive patterns from person-level longitudinal disease measurements by integrating the data transformation, rule discovery and rule evaluation. We further extend the identified rules to create rule-based monitoring strategies to adaptively monitor individuals with different disease severities. We applied the rule-based method on an electronic health record (EHR) dataset of depression treatment population containing person-level longitudinal Patient Health Questionnaire (PHQ)-9 scores for assessing depression severity. Twelve risk predictive rules are identified, and the rule-based prognostic model based on identified rules enables more accurate prediction of disease severity than other prognostic models including RuleFit, logistic regression and Support Vector Machine. Two rule-based monitoring strategies outperform the latest PHQ-9 based monitoring strategy by providing higher sensitivity and specificity. The rule-based method can lead to a better understanding of disease dynamics, achieving more accurate prognostics of disease progressions, personalizing follow-up intervals, and designing cost-effective monitoring of patients in clinical practice.

Country of Research

USA

Design of Study

Unclear

Duration of Study

5 years, 2007-2012

Name of Condition

Depression

Artificial Intelligence Technique Used

Rule-based prognostic model RuleFit Logistic regression Support vector machine

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Rule-based prognostic model 0.83, RuleFit: 0.81, Logistic regression:0.81, support vector machine 0.81

Patient-related Outcomes Assessed

The rule-based method can lead to a better understanding of disease dynamics, achieving more accurate prognostics of disease progressions, personalizing follow-up intervals, and designing cost-effective monitoring of patients in clinical practice

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Extended use of Electronic Health Record (EHR) provides an abundance of clinical measurements that may help to predict patients’ disease progressions. Leveraging this rich information can accelerate the transition from one-size-fits-all monitoring guidelines to personalized monitoring strategies

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating),Not specified

Implementation

Implemented rule-based method on an electronic health record (EHR) dataset of depression treatment population containing person-level longitudinal Patient Health Questionnaire (PHQ)-9 scores for assessing depression severity. Twelve risk predictive rules are identified, and the rule-based prognostic model based on identified rules enables more accurate prediction of disease severity than other prognostic models including RuleFit, logistic regression and Support Vector Machine

Maintenance

Not specified

Key Conclusions

The work showed 12 risk predictive rules from a depression treatment population that can segment individuals into risk subgroups based on their longitudinal patterns. Further the work also developed and evaluated adaptive monitoring strategies based on these identified rules along with established rule-based analytic framework to automatically leverage the sparse, irregular and time-varying measurements in electronic Health Record data to support the monitoring strategy design by integrating the data transformation, rule discovery and rule evaluation. More generally, the proposed method can lead to a better understanding of disease dynamics, more accurate prognostics of disease progressions, and efficient monitoring of a treatment population in clinical practice.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	?

Color Code

Low

Unclear

High

Paper 26

Paper Title: A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes

Authors or developers

Hertroijs, D. F. L.
Elissen, A. M. J.
Brouwers, Mcgj
Schaper, N. C.
Kohler, S.
Popa, M. C.
Asteriadis, S.
Hendriks, S. H.
Bilo, H. J.
Ruwaard, D.

Year of Publication

2018

Full reference of the study

Hertroijs DFL, Elissen AMJ, Brouwers Martijn C. G. J., et al. A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes. Diabetes Obes Metab. 2018;20:681’688. https://doi.org/10.1111/dom.13148

Abstract

AIM: To identify, predict and validate distinct glycaemic trajectories among patients with newly diagnosed type 2 diabetes treated in primary care, as a first step towards more effective patient-centred care. METHODS: We conducted a retrospective study in two cohorts, using routinely collected individual patient data from primary care practices obtained from two large Dutch diabetes patient registries. Participants included adult patients newly diagnosed with type 2 diabetes between January 2006 and December 2014 (development cohort, n=10,528; validation cohort, n=3,777). Latent growth mixture modelling identified distinct glycaemic five-year trajectories. Machine learning models were built to predict the trajectories using easily obtainable patient characteristics in daily clinical practice. RESULTS: Three different glycaemic trajectories were identified: (1) stable, adequate glycaemic control (76.5% of patients); (2) improved glycaemic control (21.3% of patients); and (3) deteriorated glycaemic control (2.2% of patients). Similar trajectories could be discerned in the validation cohort. Body mass index and glycated haemoglobin and triglyceride levels were the most important predictors of trajectory membership. The predictive model, trained on the development cohort, had a receiver-operating characteristic area under the curve of 0.96 in the validation cohort, indicating excellent accuracy. CONCLUSIONS: The developed model can effectively explain heterogeneity in future glycaemic response of patients with type 2 diabetes. It can therefore be used in clinical practice as a quick and easy tool to provide tailored diabetes care.

Country of Research

Netherlands

Design of Study

Cohort study, Unclear : Retrospective

Duration of Study

5 years, January 1, 2009 and December 31, 2014

Name of Condition

Type 2 diabetes

Artificial Intelligence Technique Used

Latent growth mixture modelling (LGMM), Akaike Information Criterion, Bayesian Information Criterion and the LoMendel-Rubin-likelihood ratio test.

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

AUC: 0.96

Patient-related Outcomes Assessed

Not specified

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

The developed model can effectively explain heterogeneity in future glycaemic response of patients with type 2 diabetes. It can therefore be used in clinical practice as a quick and easy tool to provide tailored diabetes care.

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : 95 primary care practices in Maastricht

Implementation

Three different glycaemic trajectories were identified: (1) stable, adequate glycaemic control (76.5% of patients); (2) improved glycaemic control (21.3% of patients); and (3) deteriorated glycaemic control (2.2% of patients). Similar trajectories could be discerned in the validation cohort. Body mass index and glycated haemoglobin and triglyceride levels were the most important predictors of trajectory membership. The predictive model, trained on the development cohort, had a receiver-operating characteristic area under the curve of 0.96 in the validation cohort, indicating excellent accuracy

Maintenance

Key Conclusions

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	+

Color Code

Low

Unclear

High

Paper 27

Paper Title: Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records

Authors or developers

Kop, R.
Hoogendoorn, M.
Teije, A. T.
Buchner, F. L.
Slottje, P.
Moons, L. M.
Numans, M. E.

Year of Publication

2016

Full reference of the study

Kop, Reinier, et al. “Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records.” Computers in biology and medicine 76 (2016): 30-38.

Abstract

Over the past years, research utilizing routine care data extracted from Electronic Medical Records (EMRs) has increased tremendously. Yet there are no straightforward, standardized strategies for pre-processing these data. We propose a dedicated medical pre-processing pipeline aimed at taking on many problems and opportunities contained within EMR data, such as their temporal, inaccurate and incomplete nature. The pipeline is demonstrated on a dataset of routinely recorded data in general practice EMRs of over 260,000 patients, in which the occurrence of colorectal cancer (CRC) is predicted using various machine learning techniques (i.e., CART, LR, RF) and subsets of the data. CRC is a common type of cancer, of which early detection has proven to be important yet challenging. The results are threefold. First, the predictive models generated using our pipeline reconfirmed known predictors and identified new, medically plausible, predictors derived from the cardiovascular and metabolic disease domain, validating the pipeline’s effectiveness. Second, the difference between the best model generated by the data-driven subset (AUC 0.891) and the best model generated by the current state of the art hypothesis-driven subset (AUC 0.864) is statistically significant at the 95% confidence interval level. Third, the pipeline itself is highly generic and independent of the specific disease targeted and the EMR used. In conclusion, the application of established machine learning techniques in combination with the proposed pipeline on EMRs has great potential to enhance disease prediction, and hence early detection and intervention in medical practice.

Country of Research

Netherlands

Design of Study

Unclear

Duration of Study

4 years,(2007-2011)

Name of Condition

Colorectal cancer

Artificial Intelligence Technique Used

CART, Logistic regression, random forest

Provider’s involvement in

Developing : Not specified,Testing : Not specified,Validating : Not specified

Accuracy of the AI Intervention

CART: Age & gender (AUC: 0.83, 95% CI: 0.81-0.84), Bristol-Birmingham equation (AUC: 0.85, 95%CI: 0.83-0.86) Logistic regression: Age & gender (AUC:0.83 , 95% CI: 0.82-0.85), Bristol-Birmingham equation (AUC: 0.86, 95%CI: 0.85-0.87) Random forest: Age & gender (AUC: 0.83, 95% CI: 0.82-0.84), Bristol-Birmingham equation (AUC: 0.88, 95%CI: 0.87-0.89)

Patient-related Outcomes Assessed

Application of established machine learning techniques in combination with the proposed pipeline on EMRs has great potential to enhance disease prediction, and hence early detection and intervention in medical practice

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Implementation

Results demonstrated on a dataset of routinely recorded data in general practice EMRs of over 260,000 patients, in which the occurrence of colorectal cancer (CRC) is predicted using various machine learning techniques (i.e., CART, LR, RF) and subsets of the data. CRC is a common type of cancer, of which early detection has proven to be important yet challenging.

Maintenance

Key Conclusions

The authors describe pipelines using three machine learning approaches i.e. random forest, CART, logistic regression and reports a great potential for enhancing the prediction of colorectal cancer and its detection; Main results: 1. The predictive models generated using our pipeline reconfirmed known predictors and identified new, medically plausible, predictors derived from the cardiovascular and metabolic disease domain, validating the pipeline’s effectiveness. 2. The difference between the best model generated by the data-driven subset (AUC 0.891) and the best model generated by the current state of the art hypothesis-driven subset (AUC 0.864) is statistically significant at the 95% confidence interval level. 3. The pipeline itself is highly generic and independent of the specific disease targeted and the EMR used. In conclusion, the application of established machine learning techniques in combination with the proposed pipeline on EMRs have great potential to enhance disease prediction, and hence early detection and intervention in medical practice.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	+	+

Color Code

Low

Unclear

High

Paper 28

Paper Title: Natural language processing improves identification of colorectal cancer testing in the electronic medical record

Authors or developers	Denny, J. C. Choma, N. N. Peterson, J. F. Miller, R. A. Bastarache, L. Li, M. Peterson, N. B.
Year of Publication	2012
Full reference of the study	Denny, J. C., Choma, N. N., Peterson, J. F., Miller, R. A., Bastarache, L., Li, M., & Peterson, N. B. (2012). Natural Language Processing Improves Identification of Colorectal Cancer Testing in the Electronic Medical Record. Medical Decision Making, 32(1), 188 197. https://doi.org/10.1177/0272989X11400418
Abstract	BACKGROUND: Difficulty identifying patients in need of colorectal cancer (CRC) screening contributes to low screening rates. OBJECTIVE: To use Electronic Health Record (EHR) data to identify patients with prior CRC testing. DESIGN: A clinical natural language processing (NLP) system was modified to identify four CRC tests (colonoscopy, flexible sigmoidoscopy, fecal occult blood testing, and double contrast barium enema) within electronic clinical documentation. Text phrases in clinical notes referencing CRC tests were interpreted by the system to determine whether testing was planned or completed and to estimate the date of completed tests. SETTING: Large academic medical center. PATIENTS: 200 patients >= 50 years old who had completed >= 2 non-acute primary care visits within a 1-year period. MEASURES: Recall and precision of the NLP system, billing records, and human chart review were compared to a reference standard of human review of all available information sources. RESULTS: For identification of all CRC tests, recall and precision were as follows: NLP system (recall 93%, precision 94%), chart review (74%, 98%), and billing records review (44%, 83%). Recall and precision for identification of patients in need of screening were: NLP system (recall 95%, precision 88%), chart review (99%, 82%), and billing records (99%, 67%). LIMITATIONS: Small sample size and requirement for a robust EHR. CONCLUSIONS: Applying NLP to EHR records detected more CRC tests than either manual chart review or billing records review alone. NLP had better precision but marginally lower recall to identify patients who were due for CRC screening than billing record review.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	Not Specified
Name of Condition	Colorectal Cancer
Artificial Intelligence Technique Used	Natural language processing (NLP)
Provider’s involvement in	Developing : Unified Medical Language System (UMLS) concepts from biomedical text documents and produces XML-tagged output containing lists of UMLS concepts found in each sentence with relevant context
Accuracy of the AI Intervention	Not specified
Patient-related Outcomes Assessed	Natural language processing NLP system (recall 93%, precision 94%), chart review (74%, 98%), and billing records review (44%, 83%). Recall and precision for identification of patients in need of screening were: NLP system (recall 95%, precision 88%), chart review (99%, 82%), and billing records (99%, 67%).
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	No
Implementation	Natural language processing NLP system (recall 93%, precision 94%), chart review (74%, 98%), and billing records review (44%, 83%). Recall and precision for identification of patients in need of screening were: NLP system (recall 95%, precision 88%), chart review (99%, 82%), and billing records (99%, 67%).
Maintenance	No
Key Conclusions	Applying Natural language processing NLP to electronic health record EHR detected more colorectal cancer CRC tests than either manual chart review or billing records review alone. NLP had better precision but marginally lower recall to identify patients who were due for CRC screening than billing record review.

Paper 29

Paper Title: Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis

Authors or developers

Zhou, S. M.
Fernandez-Gutierrez, F.
Kennedy, J.
Cooksey, R.
Atkinson, M.
Denaxas, S.
Siebert, S.
Dixon, W. G.’, “O’Neill, T. W.”, ‘Choy, E.
Sudlow, C.
U. K. Biobank Follow-up,Outcomes
Group,Brophy

Year of Publication

2016

Full reference of the study

Zhou S-M, Fernandez-Gutierrez F, Kennedy J, Cooksey R, Atkinson M, Denaxas S, et al. (2016) Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis. PLoS ONE 11(5): e0154515. doi:10.1371/journal.pone.0154515

Abstract

OBJECTIVES: 1) To use a data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs. METHODS: This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method; iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge. RESULTS: Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified eight predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge-based methods. CONCLUSION: Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.

Country of Research

Design of Study

Unclear

Duration of Study

ABMU region from March 2009-October 2012 Cardiff from October 2013-July 2014, SAIL databank: 14 years 1999-2013

Name of Condition

Rheumatoid Arthritis

Artificial Intelligence Technique Used

Random forest method, Algorithm identified 8 predictors related to diagnostic codes for Rheumatoid Arthritis, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods

Providers’ involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Overall accuracy of 92.29%, Positive predictive value: 85.6%, specificity: 94.6%, sensitivity: 86.2%

Patient-related Outcomes Assessed

27% prevalence of rheumatoid arthritis in the assessed population

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Not specified

Adoption

Implementation

Not implemented but electronic medical records used.

Maintenance

No : The approach was not implemented.

Key Conclusions

The study proposed a data-driven method which performed as well as the expert clinical knowledge-based methods to detect rheumatoid arthritis, The findings of this work demonstrate how machine learning methods can be utilized to create reliable disease phenotypes in electronic health records electronic health records. This method may be particularly valuable for large population-based research cohorts to give simple algorithms with good performance that are transparent and easy to apply. This paper has also compared the data-driven methods with the two existing Rheumatoid Arthritis, RA algorithms available for rheumatoid arthritis research using UK datasets, so offers a comparison of performance that can be used by researchers to decide which algorithm is most appropriate to their research.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	+

Color Code

Low

Unclear

High

Paper 30

Paper Title: Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer

Authors or developers

Hoogendoorn, M.
Szolovits, P.
Moons, L. M. G.
Numans, M. E.

Year of Publication

2016

Full reference of the study

Hoogendoorn, Mark, et al. “Utilizing un-coded consultation notes from electronic medical records for predictive modeling of colorectal cancer.” Artificial intelligence in medicine 69 (2016): 53-61.

Abstract

OBJECTIVE: Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from un-coded consultation notes and study whether they can help to improve predictive performance. METHODS: We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the un-coded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. RESULTS: Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown to be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. CONCLUSION: It is possible to extract useful predictors from un-coded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.

Country of Research

Netherlands

Design of Study

Unclear

Duration of Study

4.5 years, Comments : July 1, 2006 and December 31, 2011

Name of Condition

Colorectal cancer, Comments : consultation notes

Artificial Intelligence Technique Used

Natural language processing extension NLP techniques: a benchmark (bag of words), unsupervised methods to extract information from text (topic modeling), and specifically designed approaches for the case at hand (the remaining approaches). The choices for specific techniques within these categories are based on observations from the literature

Provider’s involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

AUC: 0.87,Comments : Processing consultation notes: 0.896

Patient-related Outcomes Assessed

Not specified

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Dutch general practitioners

Implementation

Not specified

Maintenance

Key Conclusions

The paper studies several natural language processing (NLP) techniques to extract predictors from un-coded data in electronic medical records (EMRs). Some techniques are well-known while others have been developed specifically for this research. The approaches have been applied to a large dataset we have access to, covering 90,000 patients in general practices. We focus on predictive modelling of colorectal cancer, which is a challenging disease to study as it is a common type of cancer, while the symptoms are very a-specific for the disease. The results show that some of the NLP techniques studied can complement the coded EMR data, and hence, result in improved predictive models.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	–	–

Color Code

Low

Unclear

High

Paper 31

Paper Title: Forecasting outpatient visits using empirical mode decomposition coupled with back-propagation artificial neural networks optimized by particle swarm optimization

Authors or developers	Huang, D. Wu, Z.
Year of Publication	2017
Full reference of the study	Huang D, Wu Z (2017) Forecasting outpatient visits using empirical mode decomposition coupled with back-propagation artificial neural networks optimized by particle swarm optimization. PLoS ONE 12(2): e0172539. doi:10.1371/journal.pone.0172539
Abstract	Accurately predicting the trend of outpatient visits by mathematical modeling can help policy makers manage hospitals effectively, reasonably organize schedules for human resources and finances, and appropriately distribute hospital material resources. In this study, a hybrid method based on empirical mode decomposition and back-propagation artificial neural networks optimized by particle swarm optimization is developed to forecast outpatient visits on the basis of monthly numbers. The data outpatient visits are retrieved from January 2005 to December 2013 and first obtained as the original time series. Second, the original time series is decomposed into a finite and often small number of intrinsic mode functions by the empirical mode decomposition technique. Third, a three-layer back-propagation artificial neural network is constructed to forecast each intrinsic mode functions. To improve network performance and avoid falling into a local minimum, particle swarm optimization is employed to optimize the weights and thresholds of back-propagation artificial neural networks. Finally, the superposition of forecasting results of the intrinsic mode functions is regarded as the ultimate forecasting value. Simulation indicates that the proposed method attains a better performance index than the other four methods.
Country of Research	China
Design of Study	Unclear
Duration of Study	8 years, January 2005-December 2013
Name of Condition	Not specified
Artificial Intelligence Technique Used	Back propagation artifical neural network, Three-layer back-propagation artificial neural network, three-layer back-propagation artificial neural network
Providers’ involvement in	Developing : Not specified,Testing : Not specified,Validating : Not specified
Accuracy of the AI Intervention	Wavelet decomposition-Particle Swarm Optimatization-back propagation artificial neural network: R: 0.975, RMSE: 8.156e+3, MAPE: 0.13, SSE: 8.68e+8 Empirical mode decomposition-Particle Swarm Optimatization-back propagation artificial neural network: R: 0.991. RMSE: 3.1653e+3, MAPE: 0.0241, SSE: 1.3024e+8 Empirical mode decomposition-Genetic algorithm-back propagation artificial neural network: R: 0.9442, RMSE: 12.661e+3, MAPE: 0.34, SSE: 208e+8 Particle Swarm Optimatization-back propagation artificial neural network: R: 0.992, RMSE: 3.7786e+3, MAPE: 0.11, SSE: 1.8561e+8 Genetic algorithm-back propagation artificial neural network: R; -0.0687, RMSE: 36.824e+3, MAPE: 1.13, SSE: 176.28e+8
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Forecasting outpatient visits
Implementation	Electronic medical records used
Maintenance	Not specified
Key Conclusions	A new forecasting method that combines empirical mode decomposition and back-propagation artificial neural network based on particle swarm optimization is proposed to forecast outpatient visits. Simulation results show that this method can improve forecasting and thus help policy makers manage hospitals effectively.

Paper 32

Paper Title: Comparison of the Effectiveness of Interactive Didactic Lecture Versus Online Simulation-Based CME Programs Directed at Improving the Diagnostic Capabilities of Primary Care Practitioners

Authors or developers	McFadden, P. Crim, A.
Year of Publication	2016
Full reference of the study	McFadden, Pam FACEHP; Crim, Andrew CHCP, FACEHP Comparison of the Effectiveness of Interactive Didactic Lecture Versus Online Simulation-Based CME Programs Directed at Improving the Diagnostic Capabilities of Primary Care Practitioners, Journal of Continuing Education in the Health Professions: Winter 2016 – Volume 36 – Issue 1 – p 32-37 doi: 10.1097/CEH.0000000000000061
Abstract	INTRODUCTION: Diagnostic errors in primary care contribute to increased morbidity and mortality, and billions in costs each year. Improvements in the way practicing physicians are taught so as to optimally perform differential diagnosis can increase patient safety and lower the costs of care. This study represents a comparison of the effectiveness of two approaches to CME training directed at improving the primary care practitioner’s diagnostic capabilities against seven common and important causes of joint pain. METHODS: Using a convenience sampling methodology, one group of primary care practitioners was trained by a traditional live, expert-led, multimedia-based training activity supplemented with interactive practice opportunities and feedback (control group). The second group was trained online with a multimedia-based training activity supplemented with interactive practice opportunities and feedback delivered by an artificial intelligence-driven simulation/tutor (treatment group). RESULTS: Before their respective instructional intervention, there were no significant differences in the diagnostic performance of the two groups against a battery of case vignettes presented with joint pain. Using the same battery of case vignettes to assess postintervention diagnostic performance, there was a slight but not statistically significant improvement in the control group’s diagnostic accuracy (P = .13). The treatment group, however, demonstrated a significant improvement in accuracy (P < .02; Cohen d, effect size = 0.79). DISCUSSION: These data indicate that within the context of a CME activity, a significant improvement in diagnostic accuracy can be achieved by the use of a web-delivered, multimedia-based instructional activity supplemented by practice opportunities and feedback delivered by an artificial intelligence-driven simulation/tutor.
Country of Research	USA
Design of Study	Case control study
Duration of Study	Not specified
Name of Condition	joint pain, rheumatoid arthritis
Artificial Intelligence Technique Used	AI-driven diagnostic training simulator/tutor called Knowledge-Based Inference Tool (KBIT), training with a software and assessing the diagnostic accuracy after AI training
Providers’ involvement in	Developing : Not specified, Testing : General practitioners, Validating : Not specified
Accuracy of the AI Intervention	Pretraining vs post training accuracy: Rheumatoid arth: 93.9 vs 69.5%, Osteoarthritis: 62.2 vs 48.8, Fibromyalgia: 92.7 vs 82.9, gout: 87.8 vs 72, pseudogout: 63.4 vs 37.8, psoriatic arthritis: 96.3 vs 85.4, Lupus: 78 vs 73.2
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Pretraining and post training accuracy for diagnosis various orthopedic conditions
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 68
Implementation	There were no significant differences in the diagnostic performance of the two groups against a battery of case vignettes presenting with joint pain. Using the same battery of case vignettes to assess postintervention diagnostic performance, there was a slight but not statistically significant improvement in the control group’s diagnostic accuracy (P = .13). The treatment group, however, demonstrated a significant improvement in accuracy (P < .02; Cohen d, effect size = 0.79).
Maintenance	No
Key Conclusions	Data indicate that a significant improvement in diagnostic accuracy can be achieved by the use of a web-delivered, multimedia-based instructional activity supplemented by practice opportunities and feedback delivered by an artificial intelligence’ driven simulation/tutor

Paper 33

Paper Title: Modeling obesity using abductive networks

Authors or developers	Abdel-Aal, R. E. Mangoud, A. M.
Year of Publication	1997
Full reference of the study	Abdel-Aal RE, Mangoud AM. Modeling obesity using abductive networks. Computers and Biomedical Research, an International Journal. 1997 Dec;30(6):451-471. DOI: 10.1006/cbmr.1997.1460.
Abstract	This paper investigates the use of abductive-network machine learning for modeling and predicting outcome parameters in terms of input parameters in medical survey data. Here we consider modeling obesity as represented by the waist-to-hip ratio (WHR) risk factor to investigate the influence of various parameters. The same approach would be useful in predicting values of clinical parameters that are difficult or expensive to measure from others that are more readily available. The AIM abductive network machine learning tool was used to model the WHR from 13 other health parameters. Survey data were collected for a randomly selected sample of 1100 persons aged 20 yr and over attending nine primary health care centers at Al-Khobar, Saudi Arabia. Models were synthesized by training on a randomly selected set of 800 cases, using both continuous and categorical representations of the parameters, and evaluated by predicting the WHR value for the remaining 300 cases. Models for WHR as a continuous variable predict the actual values within an error of 7.5% at the 90% confidence limits. Categorical models predict the correct logical value of WHR with an error in only two of the 300 evaluation cases. Analytical relationships derived from simple categorical models explain global observations on the total survey population to an accuracy as high as 99%. Simple continuous models represented as analytical functions highlight global relationships and trends. Results confirm the strong correlation between WHR and diastolic blood pressure, cholesterol level, and family history of obesity. Compared to other statistical and neural network approaches, AIM abductive networks provide faster and more automated model synthesis. A review is given of other areas where the proposed modeling approach can be useful in clinical practice.
Country of Research	UAE
Design of Study	Unclear
Duration of Study	6 months
Name of Condition	Obesity
Artificial Intelligence Technique Used	AIM abductive network machine learning tool
Provider’s involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	Accuracy 99%
Patient-related Outcomes Assessed	Results confirm the strong correlation between WHR and diastolic blood pressure, cholesterol level, and family history of obesity. Compared to other statistical and neural network approaches, AIM abductive networks provide faster and more automated model synthesis, waist-to-hip ratio (WHR), AIM abductive network machine learning
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	NS
Reached Target Population?	Yes : Assisting the clinician in selecting the most appropriate course of intervention based on the effectiveness of alternatives treatment methods for various classes of patients, as modeled using available historical data. For example, this approach can aid in choosing between bypass surgery, angioplasty, or medication and rehabilitation for the treatment of coronary artery disease, based on models for their short- term mortality performance
Adoption	Yes (number of providers i.e. PHC participating) : nine PHC centers in Al-Khobar, Saudi Arabia
Implementation	Categorical models predict the correct logical value of WHR with an error in only 2 of the 300 evaluation cases. Analytical relationships derived from simple categorical models explain global observations on the total survey population to an accuracy as high as 99%. Simple continuous models represented as analytical functions highlight global relationships and trends. Results confirm the strong correlation between WHR and diastolic blood pressure, cholesterol level, and family history of obesity. Compared to other statistical and neural network approaches, AIM abductive networks provide faster and more automated model synthesis
Maintenance	Not specified (Unclear)
Key Conclusions	Investigates the use of abductive-network machine learning for modeling and predicting outcome parameters in terms of input parameters in medical survey data. Here we consider modeling obesity as represented by the waist-to-hip ratio (WHR) risk factor to investigate the influence of various parameters. The same approach would be useful in predicting values of clinical parameters that are difficult or expensive to measure from others that are more readily available. The AIM abductive network machine learning tool was used to model the WHR from 13 other health parameters. Survey data were collected for a randomly selected sample of 1,100 persons aged 20 years and over attending nine primary health care centers

Paper 34

Paper Title: Identifying acute kidney injury in the community--a novel informatics approach

Authors or developers

Xu, G.
Player, P.
Shepherd, D.
Brunskill, N. J.

Year of Publication

2016

Full reference of the study

Xu, Gang, et al. “Identifying acute kidney injury in the community’s novel informatics approach.” Journal of nephrology 29.1 (2016): 93-98.

Abstract

BACKGROUND: Acute kidney injury (AKI) is a serious and common problem that is associated with high mortality. Currently nearly all efforts at improving outcomes in AKI have been focused on secondary care. We now know that a large number of patients most likely develop the condition in primary care. To our knowledge there has been no previous attempts to approach this topic from the primary care perspective. AIM: To test the utility of novel informatics software to identify patients with AKI in the community. SETTING AND METHOD: We carried out a retrospective audit of patients in one urban practice in Leicestershire using novel informatics software. The audit data were run on two occasions, once for high-risk patients between 4th July 2010 through until 30th September 2013, and once for low risk patients for the period of 27th October 2011 through until 21st January 2014. RESULTS: During the period of the data collection the average practice list size was 12,420, with 235 and 19 AKI episodes in the high and low risk groups respectively. The annual AKI incidence was 27.9/1,000 in the high-risk group, 1.22/1,000 in the low risk group, and 10.6/1,000 overall. The most common associated factor was sepsis in 170 patients, followed by dehydration in 54 patients. CONCLUSION: We have shown it is possible to identify patients with AKI in the community using informatics software. Our data suggests that AKI in the community is much more common than previously thought and demonstrates the need to better understand this condition from the primary care perspective.

Country of Research

Design of Study

Cohort study, Unclear : Retrospective

Duration of Study

43 months

Name of Condition

Acute kidney injury

Artificial Intelligence Technique Used

IMPAKT-EVOLVE-AKI is a web based software

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Not Specified, (% comorbidities identified)Chronic kidney disease 49.4 Hypertension 43.8 Diabetes mellitus 34.4 Ischaemic heart disease 31.1 Malignancy 26.6 Atrial fibrillation 20.3 Chronic airways disease 19.5 Cerebrovascular disease 18.7 Heart failure 15.4 Urinary obstruction 3.3(% Causal factor )Sepsis 43.0 Dehydration 13.7 Cancer 11.9 Surgery 10.1 Drugs 10.9 Falls/collapse 9.4, (% prescribed medicines) Statin 42.0 Loop or thiazide diuretic 38.2 Angiotensin-converting-enzyme inhibitor 28.4 Beta-blocker 20.5 Insulin 16.2 Antibiotics 15.9 Steroids 15.7 Angiotensin II receptor blockers 14.4 Proton pump inhibitor 11.6 NSAID 11.4 Metformin 11.4 Sulphonylurea 10.4 K? sparing diuretic 8.9 Coumarin 8.6

Patient-related Outcomes Assessed

Annual acute kidney injury incidence was 27.9/1,000 in the high-risk group, 1.22/1,000 in the low risk group, and 10.6/1,000 overall. The most common associated factor was sepsis in 170 patients, followed by dehydration in 54 patients

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Primary health providers have shown it is possible to identify patients with AKI in the community using informatics software.

Implementation

Not specified

Maintenance

Not specified

Key Conclusions

The study has shown it is possible to identify patients with AKI in the community using informatics software. Our data suggests that AKI in the community is much more common than previously thought and demonstrates the need to better understand this condition from the primary care perspective.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	+	+

Color Code

Low

Unclear

High

Paper 35

Paper Title: Identifying influenza-like illness presentation from unstructured general practice clinical narrative using a text classifier rule-based expert system versus a clinical expert

Authors or developers

MacRae, J.
Love, T.
Baker, M. G.
Dowell, A.
Carnachan, M.
Stubbe, M.
McBain, L.

Year of Publication

2015

Full reference of the study

MacRae, J., Love, T., Baker, M.G. et al. Identifying influenza-like illness presentation from unstructured general practice clinical narrative using a text classifier rule-based expert system versus a clinical expert. BMC Med Inform Decis Mak 15, 78 (2015). https://doi.org/10.1186/s12911-015-0201-3

Abstract

BACKGROUND: We designed and validated a rule-based expert system to identify influenza like illness (ILI) from routinely recorded general practice clinical narratives to aid a larger retrospective research study into the impact of the 2009 influenza pandemic in New Zealand. METHODS: Rules were assessed using pattern matching heuristics on routine clinical narrative. The system was trained using data from 623 clinical encounters and validated using a clinical expert as a gold standard against a mutually exclusive set of 901 records. RESULTS: We calculated a 98.2 % specificity and 90.2 % sensitivity across an ILI incidence of 12.4 % measured against clinical expert classification. Peak problem list identification of ILI by clinical coding in any month was 9.2 % of all detected ILI presentations. Our system addressed an unusual problem domain for clinical narrative classification; using notational, unstructured, clinician entered information in a community care setting. It performed well compared with other approaches and domains. It has potential applications in real-time surveillance of disease, and in assisted problem list coding for clinicians. CONCLUSIONS: Our system identified ILI presentation with sufficient accuracy for use at a population level in the wider research study. The peak coding of 9.2 % illustrated the need for automated coding of unstructured narrative in our study.

Country of Research

New Zealand

Design of Study

Cohort study, Unclear : Retrospective research study

Duration of Study

3 year, 2007-2010

Name of Condition

Influenza infection

Artificial Intelligence Technique Used

Pattern matching heuristics

Providers’ involvement in

Developing : NS, Testing : NS, Validating : Clinical expert

Accuracy of the AI Intervention

Incidence: 0.136, sensitivity: 0.883, specificity: 0.983, Positive predictive value: 0.893, F-measure: 0.888

Patient-related Outcomes Assessed

Not Specified

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

This paper describes the successful use of a heuristic rule-based expert system for identifying presentations of ILI in general practice from routine clinical narrative. The system performed to a standard sufficient to use for population based research similar to the best performing systems documented for other similar problems. This research demonstrates that a text classifier can be applied successfully to a clinical narrative that is highly abbreviated and contains substantial spelling and typographical errors. General practice captures a large volume of medical information in narrative form on patients and the application of the techniques described can unlock this previously difficult to access information source. Furthermore this type of approach is computationally simple enough to apply to millions of records or in real time in general practice surgeries, Influenza like illness (ILI)

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : 99 healthcare practices

Implementation

Yes

Maintenance

Not specified

Key Conclusions

The system identified ILI presentation with sufficient accuracy for use at a population level in the wider research study. The peak coding of 9.2 % illustrated the need for automated coding of unstructured narrative in our study.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	+

Color Code

Low

Unclear

High

Paper 36

Paper Title: Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method

Authors or developers	Patel, R. Jayatilleke, N. Broadbent, M. Chang, C. K. Foskett, N. Gorrell, G. Hayes, R. D. Jackson, R. Johnston, C. Shetty, H. Roberts, A. McGuire, P. Stewart, R.
Year of Publication	2015
Full reference of the study	Patel R, Jayatilleke N, Broadbent M, et al. Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method. BMJ Open 2015;5:e007619. doi:10.1136/bmjopen-2015- 007619
Abstract	OBJECTIVES: To identify negative symptoms in the clinical records of a large sample of patients with schizophrenia using natural language processing and assess their relationship with clinical outcomes. DESIGN: Observational study using an anonymized electronic health record case register. SETTING: South London and Maudsley NHS Trust (SLaM), a large provider of inpatient and community mental healthcare in the UK. PARTICIPANTS: 7,678 patients with schizophrenia receiving care during 2011. MAIN OUTCOME MEASURES: Hospital admission, readmission and duration of admission. RESULTS: 10 different negative symptoms were ascertained with precision statistics above 0.80. Forty one percent of patients had two or more negative symptoms. Negative symptoms were associated with younger age, male gender and single marital status, and with increased likelihood of hospital admission (OR 1.24, 95% CI 1.10 to 1.39), longer duration of admission (beta-coefficient 20.5 days, 7.6-33.5), and increased likelihood of readmission following discharge (OR 1.58, 1.28 to 1.95). CONCLUSIONS: Negative symptoms were common and associated with adverse clinical outcomes, consistent with evidence that these symptoms account for much of the disability associated with schizophrenia. Natural language processing provides a means of conducting research in large representative samples of patients, using data recorded during routine clinical practice.
Country of Research	UK
Design of Study	Observational study
Duration of Study	Not specified (NS)
Name of Condition	Schizophrenia
Artificial Intelligence Technique Used	Natural language processing, Natural language processing (NLP) information extraction allows structured information to be obtained from unstructured text records. Study used NLP to detect statements in the correspondence fields of clinical records to determine references to prespecified negative symptoms. Further, the training data set was used to construct an application (CRIS Negative Symptoms Scale, CRIS-NSS) using a hybrid classification model consisting of a support vector machine (SVM) learning algorithm and rule-based text matching, using the Generalised Architecture for Text Engineering (GATE) software package.
Provider’s involvement in	Developing : ns
Accuracy of the AI Intervention	ns
Patient-related Outcomes Assessed	This is the largest known study (over 7,000 participants) to investigate the relationship of negative symptoms with clinical outcomes in people with schizophrenia. Our findings demonstrate that negative symptoms are present in a substantial number of people with schizophrenia and are associated with increased hospital admission, readmission and duration of inpatient stay.
Primary Healthcare Worker Related Outcomes Assessed	Findings are based on data recorded by clinicians delivering routine mental healthcare who were not specifically ascertaining negative symptoms. It is therefore possible that negative symptoms were not comprehensively documented in the electronic health records from which they were identified leading to an inaccurate estimate of their prevalence in the analysed sample.
Healthcare System-related Outcomes Assessed	ns
Reached Target Population?	Not specified
Adoption	Yes (number of providers i.e. PHC participating) : To identify negative symptoms in the clinical records of a large sample of patients with schizophrenia using natural language processing and assess their relationship with clinical outcomes.
Implementation	Yes
Maintenance	No
Key Conclusions	This is the largest known study (over 7,000 participants) to investigate the relationship of negative symptoms with clinical outcomes in people with schizophrenia. Our findings demonstrate that negative symptoms are present in a substantial number of people with schizophrenia and are associated with increased hospital admission, readmission and duration of inpatient stay. Further, negative symptoms were common and associated with adverse clinical outcomes, consistent with evidence that these symptoms account for much of the disability associated with schizophrenia. Natural language processing provides a means of conducting research in large representative samples of patients, using data recorded during routine clinical practice.

Paper 37

Paper Title: Accessing primary care Big Data: the development of a software algorithm to explore the rich content of consultation records

Authors or developers	MacRae, J. Darlow, B. McBain, L. Jones, O. Stubbe, M. Turner, N. Dowell, A.
Year of Publication	2015
Full reference of the study	MacRae J, Darlow B, McBain L, et al. Accessing primary care Big Data: the development of a software algorithm to explore the rich content of consultation records. BMJ Open 2015;5: e008160. doi:10.1136/ bmjopen-2015-008160
Abstract	OBJECTIVE: To develop a natural language processing software inference algorithm to classify the content of primary care consultations using electronic health record Big Data and subsequently test the algorithm’s ability to estimate the prevalence and burden of childhood respiratory illness in primary care. DESIGN: Algorithm development and validation study. To classify consultations, the algorithm is designed to interrogate clinical narrative entered as free text, diagnostic (Read) codes created and medications prescribed on the day of the consultation. SETTING: Thirty-six consenting primary care practices from a mixed urban and semirural region of New Zealand. Three independent sets of 1200 child consultation records were randomly extracted from a data set of all general practitioner consultations in participating practices between 1 January 2008-31 December 2013 for children under 18 years of age (n=754,242). Each consultation record within these sets was independently classified by two expert clinicians as respiratory or non-respiratory, and subclassified according to respiratory diagnostic categories to create three ‘gold standard’ sets of classified records. These three gold standard record sets were used to train, test and validate the algorithm. OUTCOME MEASURES: Sensitivity, specificity, positive predictive value and F-measure were calculated to illustrate the algorithm’s ability to replicate judgements of expert clinicians within the 1,200 record gold standard validation set. RESULTS: The algorithm was able to identify respiratory consultations in the 1,200 record validation set with a sensitivity of 0.72 (95% CI 0.67 to 0.78) and a specificity of 0.95 (95% CI 0.93 to 0.98). The positive predictive value of algorithm respiratory classification was 0.93 (95% CI 0.89 to 0.97). The positive predictive value of the algorithm classifying consultations as being related to specific respiratory diagnostic categories ranged from 0.68 (95% CI 0.40 to 1.00; other respiratory conditions) to 0.91 (95% CI 0.79 to 1.00; throat infections). CONCLUSIONS: A software inference algorithm that uses primary care Big Data can accurately classify the content of clinical consultations. This algorithm will enable accurate estimation of the prevalence of childhood respiratory illness in primary care and resultant service utilisation. The methodology can also be applied to other areas of clinical care.
Country of Research	New Zealand
Design of Study	Cohort study, Unclear : Algorithm development and validation study. To classify consultations
Duration of Study	6 years, 1 January 2008 to 31 December 2013
Name of Condition	Childhood respiratory illness
Artificial Intelligence Technique Used	Natural language processing The Child Respiratory Algorithm using PROSAIC software, Child Respiratory Algorithm
Providers’ involvement in	Developing : NS, Testing : Consultation records were randomly extracted from a data set of all general practitioner consultations, Validating : NS
Accuracy of the AI Intervention	Sensitivity of 0.72 (95% CI 0.67 to 0.78) specificity of 0.95 (95% CI 0.93 to 0.98) positive predictive value0.93 (95% CI 0.89 to 0.97).
Patient-related Outcomes Assessed	Natural language processing software inference algorithm that analyses the content of clinical consultation records, diagnostic classifications and prescription information, is able to classify child-GP consultations related to respiratory conditions with similar accuracy to clinical experts. This algorithm will enable accurate estimation of the prevalence of childhood respiratory illnesses in primary care and the resultant service utilisation
Primary Healthcare Worker Related Outcomes Assessed	Not Specified
Healthcare System-related Outcomes Assessed	Not Specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Thirty-six consenting primary care practices from a mixed urban and semirural region of New Zealand
Implementation	The algorithm demonstrated excellent specificity and positive predictive values for detecting respiratory conditions
Maintenance	Not specified
Key Conclusions	A natural language processing software inference algorithm that analyses the content of clinical consultation records, diagnostic classifications and prescription information, is able to classify child-General Practitioners consultations related to respiratory conditions with similar accuracy to clinical experts. This algorithm enables accurate estimation of the prevalence of childhood respiratory illnesses in primary care and the resultant service utilisation

Paper 38

Paper Title: Automatic Detection of Skin and Subcutaneous Tissue Infections from Primary Care Electronic Medical Records

Authors or developers

Gu, Y.
Kennelly, J.
Warren, J.
Nathani, P.
Boyce, T.

Year of Publication

2015

Full reference of the study

Gu, Y., Kennelly, J., Warren, J., Nathani, P., & Boyce, T. (2015). Automatic detection of skin and subcutaneous tissue infections from primary care electronic medical records. Stud Health Technol Inform, 214, 74-80.

Abstract

INTRODUCTION: Skin and subcutaneous tissue infections (SSTI) are common conditions that cause avoidable hospitalisation in New Zealand. As part of a program to improve the management of SSTI in primary care, electronic medical records (EMR) of four Auckland general practices were analysed to identify SSTI occurrences in the last three years. METHODS: An ontology for SSTI risks, manifestation and treatment was created based on literature and guidelines. An SSTI identification algorithm was developed examining EMR data for skin swab tests, diagnoses (READ codes) and textual clinical notes. RESULTS: High occurrence and recurrence rates in those aged 20 or younger were found. Due to low usage of READ coding and laboratory tests, 65% of SSTI occurrences were identified by notes. However, 91% of all identified SSTI occurrences were appropriately treated with oral/topical antibiotics according to prescription records in the EMR. The F1 score of the analysis algorithm is 0.76 using manual review as gold standard. DISCUSSION AND CONCLUSION: The SSTI identification algorithm shows a reasonable accuracy suggesting the feasibility of automatic detecting SSTI occurrences using clinical data that are routinely collected in healthcare delivery.

Country of Research

New Zealand

Design of Study

Unclear

Duration of Study

3 years, 2011-2014

Name of Condition

Skin and subcutaneous tissue infections

Artificial Intelligence Technique Used

Natural language processing, READ codes and textual clinical codes used

Providers’ involvement in

Developing : Not specified,Testing : Senior general practitioner (JK)),Validating : Not specified

Accuracy of the AI Intervention

Positive predictive value: 64%, sensitivity: 94%, specificity: 97%, negative predictive value: 99.6%, F1 score: 0.76

Patient-related Outcomes Assessed

Occurnce of skin and subcutaneous tissue infection 230/100person year.

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Not specified : The authors mentioned that the analyzed patient list cannot be generalized to the existing population.

Adoption

Yes (number of providers i.e. PHC participating) : Electronic medical record from primary practice was used

Implementation

Not implemented, tested on electronic medical health record.

Maintenance

Not specified

Key Conclusions

The authors report an automatic detection method that used READ codes on general practice electronic medical record data proved that it is feasible to automatically detect skin and subcutaneous tissue infection from the data collected as part of routine primary care delivery.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	–	–

Color Code

Low

Unclear

High

Paper 39

Paper Title: Postdischarge Communication Between Home Health Nurses and Physicians: Measurement, Quality, and Outcomes

Authors or developers	Press, M. J. Gerber, L. M. Peng, T. R. Pesko, M. F. Feldman, P. H. Ouchida, K. Sridharan, S. Bao, Y. Barron, Y. Casalino, L. P.
Year of Publication	2015
Full reference of the study	Press, Matthew J., et al. “Postdischarge communication between home health nurses and physicians: measurement, quality, and outcomes.” Journal of the American Geriatrics Society 63.7 (2015): 1299-1305.
Abstract	OBJECTIVES: To use natural language processing (NLP) of text from electronic medical records (EMRs) to identify failed communication attempts between home health nurses and physicians, to identify predictors of communication failure, and to assess the association between communication failure and hospital readmission. DESIGN: Retrospective cohort study. SETTING: Visiting Nurse Service of New York (VNSNY), the nation’s largest freestanding home health agency. PARTICIPANTS: Medicare beneficiaries with congestive heart failure who received home health care from VNSNY after hospital discharge in 2008-09 (N = 5,698). MEASUREMENTS: Patient-level measures of communication failure and risk-adjusted 30-day all-cause readmission. RESULTS: Identification of failed communication attempts using NLP had high external validity (kappa = 0.850, P < .001). A mean of 8% of communication attempts failed per episode of home care; failure rates were higher for black patients and lower for patients from higher median income ZIP codes. The association between communication failure and readmission was not significant with adjustment for patient, nurse, physician, and hospital factors. CONCLUSION: NLP of EMRs can be used to identify failed communication attempts between home health nurses and physicians, but other variables mostly explained the association between communication failure and readmission. Communication failures may contribute to readmissions in more-serious clinical situations, an association that this study may have been underpowered to detect.
Country of Research	USA
Design of Study	Cohort study,Unclear, Retrospective cohort study
Duration of Study	1 year,(2008-2009)
Name of Condition	Heart failure
Artificial Intelligence Technique Used	Natural language processing
Provider’s involvement in	Developing : NS,Testing : NS,Validating : Home health nurses interpretation
Accuracy of the AI Intervention	NS,Linear regression model All conditions (Standard error): Model 1: Linear term: -0.0977 (0.0972), quadratic term: 0.1823 (0.1168), Model 2: 0.0180 (0.0226), model 3: 0.0234 (0.0248) CHF condition: Model 1: Linear term: -0.0295 (0.0519), quadratic term: 0.0519 (0.1867), Model 2: 0.0034 (0.0325), model 3: 0.0042 (0.0327)
Patient-related Outcomes Assessed	Patient-level measures of communication failure and risk-adjusted 30-day all-cause readmission.
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	Identification of failed communication attempts using NLP had high external validity (kappa = 0.850, P < .001). A mean of 8% of communication attempts failed per episode of home care; failure rates were higher for black patients and lower for patients from higher median income ZIP codes. The association between communication failure and readmission was not significant with adjustment for patient, nurse, physician, and hospital factors., natural language processing (NLP) electronic medical records (EMRs)
Reached Target Population?	Yes,Not specified
Adoption	Not specified
Implementation	NLP of EMRs can be used to identify failed communication attempts between home health nurses and physicians, but other variables mostly explained the association between communication failure and readmission.
Maintenance	No
Key Conclusions	NLP of EMRs can be used to identify failed communication attempts between home health nurses and physicians, but other variables mostly explained the association between communication failure and readmission. Communication failures may contribute to readmissions in more-serious clinical situations, an association that this study may have been underpowered to detect., natural language processing (NLP) electronic medical records (EMRs)

Paper 40

Paper Title: Regular expression-based learning to extract bodyweight values from clinical notes

Authors or developers	Murtaugh, M. A. Gibson, B. S. Redd, D. Zeng-Treitler, Q.
Year of Publication	2015
Full reference of the study	Murtaugh, Maureen A., et al. “Regular expression-based learning to extract bodyweight values from clinical notes.” Journal of biomedical informatics 54 (2015): 186-190.
Abstract	BACKGROUND: Bodyweight related measures (weight, height, BMI, abdominal circumference) are extremely important for clinical care, research and quality improvement. These and other vitals signs data are frequently missing from structured tables of electronic health records. However they are often recorded as text within clinical notes. In this project we sought to develop and validate a learning algorithm that would extract bodyweight related measures from clinical notes in the Veterans Administration (VA) Electronic Health Record to complement the structured data used in clinical research. METHODS: We developed the Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm that generates regular expressions from a training set. The regular expressions generated by REDEx were then used to extract the numerical values of interest. To train the algorithm we created a corpus of 268 outpatient primary care notes that were annotated by two annotators. This annotation served to develop the annotation process and identify terms associated with bodyweight related measures for training the supervised learning algorithm. Snippets from an additional 300 outpatient primary care notes were subsequently annotated independently by two reviewers to complete the training set. Inter-annotator agreement was calculated. REDEx was applied to a separate test set of 3561 notes to generate a dataset of weights extracted from text. We estimated the number of unique individuals who would otherwise not have bodyweight related measures recorded in the CDW and the number of additional bodyweight related measures that would be additionally captured. RESULTS: REDEx’s performance was: accuracy=98.3%, precision=98.8%, recall=98.3%, F=98.5%. In the dataset of weights from 3561 notes, 7.7% of notes contained bodyweight related measures that were not available as structured data. In addition, two additional bodyweight related measures were identified per individual per year. CONCLUSION: Bodyweight related measures are frequently stored as text in clinical notes. A supervised learning algorithm can be used to extract this data. Implications for clinical care, epidemiology, and quality improvement efforts are discussed.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	2 years,(October 1, 2011 through September 30, 2013)
Name of Condition	NS
Artificial Intelligence Technique Used	Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm
Provider’s involvement in	Developing : Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm that generates regular expressions from a training set. The regular expressions generated by REDEx were then used to extract the numerical values of interest.
Accuracy of the AI Intervention	Accuracy = 98.3%, precision = 98.8%, recall = 98.3%, F = 98.5%
Patient-related Outcomes Assessed	Bodyweight related measures (weight, height, BMI, abdominal circumference) are extremely important for clinical care, research and quality improvement. These and other vitals signs data are frequently missing from structured tables of electronic health records. However they are often recorded as text within clinical notes. In this project we sought to develop and validate a learning algorithm that would extract bodyweight related measures from clinical notes in the Veterans Administration (VA) Electronic Health Record to complement the structured data used in clinical research.
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	NS
Reached Target Population?	Yes,Not specified
Adoption	Yes (number of providers i.e. PHC participating) : Veterans Administration (VA) Electronic Health Record,Not specified
Implementation	Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm that generates regular expressions from a training set. The regular expressions generated by REDEx were then used to extract the numerical values of interest
Maintenance	No
Key Conclusions	Bodyweight related measures are frequently stored as text in clinical notes. A supervised learning algorithm can be used to extract this data. Implications for clinical care, epidemiology, and quality improvement efforts are discussed.

Paper 41

Paper Title: Monitoring suicidal patients in primary care using electronic health records

Authors or developers	Anderson, H. D. Pace, W. D. Brandt, E. Nielsen, R. D. Allen, R. R. Libby, A. M. West, D. R. Valuck, R. J.
Year of Publication	2015
Full reference of the study	Anderson, H. D., Pace, W. D., Brandt, E., Nielsen, R. D., Allen, R. R., Libby, A. M., … & Valuck, R. J. (2015). Monitoring suicidal patients in primary care using electronic health records. The Journal of the American Board of Family Medicine, 28(1), 65-71.
Abstract	INTRODUCTION: Patients at risk for suicide often come into contact with primary care providers, many of whom use electronic health records (EHRs) for charting. It is not known, however, how often suicide ideation or attempts are documented in EHRs. METHODS: We used retrospective analyses of de-identified EHR data from a distributed health network of primary care organizations to estimate the frequency of using diagnostic codes to record suicidal ideation and attempts. Data came from three sources: a clinician notes field processed using natural language processing; a suicidal ideation item on a patient-reported depression severity instrument (9-item Patient Health Questionnaire [PHQ-9]); and diagnostic codes from the EHR. RESULTS: Only 3% of patients with an indication of suicidal ideation in the notes field had a corresponding International Classification of Diseases, 9th Revision (ICD-9), code (kappa = 0.036). Agreement between an indication of suicidal ideation from item 9 of the PHQ-9 and an ICD-9 code was slightly higher (kappa = 0.068). Suicide attempt indicated in the notes field was more likely to be recorded using an ICD-9 code (19%; kappa = 0.18). CONCLUSIONS: Few cases of suicidal ideation and attempt were documented in patients’ EHRs using diagnostic codes. Increased documentation of suicidal ideation and behaviors in patients’ EHRs may improve their monitoring in the health care system.
Country of Research	USA
Design of Study	Cohort study, Unclear , Retrospective
Duration of Study	13 years(1998-2011)
Name of Condition	Suicidal tendencies
Artificial Intelligence Technique Used	Natural language processing
Provider’s involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	Not specified
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	No : The authors mentioned that the data studied retrospectively could have included more younger patients and that the results could not be generalizable for older population groups.
Adoption	Yes (number of providers i.e. PHC participating) : 8 primary care practices
Implementation	Not implemented, Electronic medical records were retrospectively studied.
Maintenance	Not specified (Unclear)
Key Conclusions	The study reports a natural language processing that monitored suicidal patients in primary care using electronic medical records. The study reports few cases of suicidal ideation and attempt were documented in patients electronic health records using diagnostic codes. The study also mentions that an increased documentation of suicidal ideation and behaviors in patient electronic health records may improve their monitoring in the health care system.

Paper 42

Paper Title: Measuring physician adherence with gout quality indicators: a role for natural language processing

Authors or developers	Kerr, G. S. Richards, J. S. Nunziato, C. A. Patterson, O. V. DuVall, S. L. Aujero, M. Maron, D. Amdur, R.
Year of Publication	2015
Full reference of the study	Kerr, Gail S., et al. “Measuring physician adherence with gout quality indicators: a role for natural language processing.” Arthritis care & research 67.2 (2015): 273-279.
Abstract	OBJECTIVE: To evaluate physician adherence with gout quality indicators (QIs) for medication use and monitoring, and behavioral modification (BM). METHODS: Gout patients were assessed for the QIs as follows: QI 1: initial allopurinol dosage <300 mg/day for patients with chronic kidney disease (CKD); QI 2: uric acid within six months of allopurinol start; and QI 3: complete blood count and creatinine phosphokinase within six months of colchicine initiation. Natural language processing (NLP) was used to analyze clinical narrative data from electronic medical records (EMRs) of overweight (body mass index >=28 kg/m(2) ) gout patients for BM counseling on gout-specific dietary restrictions, weight loss, and alcohol consumption (QI 4). Additional data included sociodemographic, comorbidities, and number of rheumatology and primary care visits. QI compliance versus noncompliance was compared using chi-square analyses and independent-groups t-test. RESULTS: In 2,280 gout patients, compliance with QI was as follows: QI 1: 92.1%, QI 2: 44.8%, and QI 3: 7.7%. Patients compliant with QI 2 had more rheumatology visits at 3.5 versus 2.6 visits (P < 0.001), while those compliant with QI 3 had more CKD (P < 0.01). Of 1,576 eligible patients, BM counseling for weight loss occurred in 1,008 patients (64.0%), low purine diet in 390 (24.8%), alcohol abstention in 137 (8.7%), and all three elements in 51 patients (3.2%). Regular rheumatology clinic visits correlated with frequent advice on weight loss and gout-specific diet (P < 0.0001). CONCLUSION: Rheumatology clinic attendance was associated with greater QI compliance. NLP proved a valuable tool for measuring BM as documented in the clinical narrative of EMRs.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	2 years,(January 1, 2008 and December 31, 2010)
Name of Condition	Gout
Artificial Intelligence Technique Used	Natural language processing, Natural language processing (NLP) comprises a wide range of computational algorithms and methods that enable information documented in text to be identified and used for analysis. Clinical NLP is an effective approach to retrieving patient data contained in the narrative portion of an electronic medical records EMR. The variability between EMR implementations in clinical practice precludes development of a single, general purpose NLP system capable of performing all text processing needs
Provider’s involvement in	Developing : NS, Testing : Clinical Natural language processing NLP is an effective approach to retrieving patient data contained in the narrative portion of an electronic medical records EMR. The variability between EMR implementations in clinical practice precludes development of a single, general purpose NLP system capable of performing all text processing needs, Validating : NS
Accuracy of the AI Intervention	System accuracy was measured by comparing the output of the natural language processing system identifying patients who received behavioral management counseling to the reference standard
Patient-related Outcomes Assessed	The study reported 2,280 gout patients, compliance with quality indicators (QI) was as follows: QI 1: 92.1%, QI 2: 44.8%, and QI 3: 7.7%. Patients compliant with QI 2 had more rheumatology visits at 3.5 versus 2.6 visits (P < 0.001), while those compliant with QI 3 had more chronic kidney disease CKD (P < 0.01). Of 1,576 eligible patients, behavioral modification BM counseling for weight loss occurred in 1,008 patients (64.0%), low purine diet in 390 (24.8%), alcohol abstention in 137 (8.7%), and all 3 elements in 51 patients (3.2%). Regular rheumatology clinic visits correlated with frequent advice on weight loss and gout-specific diet (P < 0.0001).
Primary Healthcare Worker Related Outcomes Assessed	Adherence with gout quality indicators directed at medication toxicity monitoring was subpar and correlated poorly with behavior modification counseling.
Healthcare System-related Outcomes Assessed	Adherence with behavior modification counseling was low and directed by comorbidities.
Reached Target Population?	Yes
Adoption	No
Implementation	Natural language processing successfully identified documentation of behavior modification counseling in gout patients. Adherence with behavior modification counseling was low and directed by comorbidities
Maintenance	No
Key Conclusions	Rheumatology clinic attendance was associated with greater quality indicators compliance. Natural language processing proved a valuable tool for measuring behavioral modification as documented in the clinical narrative of electronic medical records. Further, Natural language processing successfully identified documentation of behavior modification counseling in gout patients. Moreover, Adherence with behavior modification counseling was low and directed by comorbidities and adherence with gout quality indicators directed at medication toxicity monitoring was subpar and correlated poorly with behavior modification counseling

Paper 43

Paper Title: Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record

Authors or developers	Vijayakrishnan, R. Steinhubl, S. R. Ng, K. Sun, J. Byrd, R. J. Daar, Z. Williams, B. A. eFilippi, C. Ebadollahi, S. Stewart, W. F.
Year of Publication	2014
Full reference of the study	Vijayakrishnan, Rajakrishnan, et al. “Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record.” Journal of cardiac failure 20.7 (2014): 459-464.
Abstract	BACKGROUND: The electronic health record (EHR) contains a tremendous amount of data that if appropriately detected can lead to earlier identification of disease states such as heart failure (HF). Using a novel text and data analytic tool we explored the longitudinal EHR of over 50,000 primary care patients to identify the documentation of the signs and symptoms of HF in the years preceding its diagnosis. METHODS AND RESULTS: Retrospective analysis consisted of 4,644 incident HF cases and 45,981 group-matched control subjects. Documentation of Framingham HF signs and symptoms within encounter notes were carried out with the use of a previously validated natural language processing procedure. A total of 892,805 affirmed criteria were documented over an average observation period of 3.4 years. Among eventual HF cases, 85% had >=1 criterion within 1 year before their HF diagnosis, as did 55% of control subjects. Substantial variability in the prevalence of individual signs and symptoms were found in both case and control subjects. CONCLUSIONS: HF signs and symptoms are frequently documented in a primary care population as identified through automated text and data mining of EHRs. Their frequent identification demonstrates the rich data available within EHRs that will allow for future work on automated criterion identification to help develop predictive models for HF.
Country of Research	USA
Design of Study	Cohort study, Unclear, Retrospective analysis
Duration of Study	9 years,(2001-2010)
Name of Condition	Heart Failure
Artificial Intelligence Technique Used	Natural language processing,(An NLP application was developed and validated for identifying affirmations and denials of 14 of the 17 Framingham criteria)
Provider’s involvement in	Testing
Accuracy of the AI Intervention	Natural Language Processing tool developed by our group to extract signs and symptoms potentially consistent with Heart Failure based on the Framingham criteria has good accuracy compared with expert human adjudication
Patient-related Outcomes Assessed	Documentation of HF Heart Failure signs and symptoms within encounter notes were carried out with the use of a previously validated natural language processing procedure. A total of 892,805 affirmed criteria were documented over an average observation period of 3.4 years. Among eventual HF cases, 85% had $1 criterion within 1 year before their HF diagnosis, as did 55% of control subjects. Substantial variability in the prevalence of individual signs and symptoms were found in both case and control subjects.
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	Work explored the application of a validated sophisticated text- and data-mining tool to identify the presence of Framingham Heart Failure signs and symptoms criteria in the EHRs of a large primary care population. We found that the Framingham signs and symptoms were frequently documented in both case and control populations, albeit much more frequently among the eventual Heart Failure HF cases. These findings are novel and to our knowledge the first automated evaluation of HF signs and symptoms using Electronic Health Record data
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 41 community practice clinics
Implementation	Applying automated text and data mining of EHRs for HF signs and symptoms is feasible and that they are frequently documented in case subjects years before a clinical diagnosis, though also frequently identified in control subjects. While further refinement is necessary, these results support the future potential to improve patient care by informing physicians additional measures through biomarkers, electrocardiography, or even imaging may be necessary to optimize early identification of those at highest risk for HF.,(HF = Heart Failure and EHR Electronic Health Record)
Maintenance	No, Not specified (Unclear)
Key Conclusions	HF signs and symptoms are frequently documented in a primary care population as identified through automated text and data mining of EHRs. Their frequent identification demonstrates the rich data available within EHRs that will allow for future work on automated criterion identification to help develop predictive models for HF,(HF : Heart Failure and EHR Electronic Health Record)

Paper 44

Paper Title: Multilevel temporal Bayesian networks can model longitudinal change in multimorbidity

Authors or developers

Lappenschaar, M.
Hommersom, A.
Lucas, P. J.
Lagro, J.
Visscher, S.
Korevaar, J. C.
Schellevis, F. G.

Year of Publication

2013

Full reference of the study

Lappenschaar, Martijn, et al. “Multilevel temporal Bayesian networks can model longitudinal change in multimorbidity.” Journal of clinical epidemiology 66.12 (2013): 1405-1416.

Abstract

OBJECTIVES: Although the course of single diseases can be studied using traditional epidemiologic techniques, these methods cannot capture the complex joint evolutionary course of multiple disorders. In this study, multilevel temporal Bayesian networks were adopted to study the course of multimorbidity in the expectation that this would yield new clinical insight. STUDY DESIGN AND SETTING: Clinical data of patients were extracted from 90 general practice registries in the Netherlands. One and half million patient-years were used for analysis. The simultaneous progression of six chronic cardiovascular conditions was investigated, correcting for both patient and practice-related variables. RESULTS: Cumulative incidence rates of one or more new morbidities rapidly increase with the number of morbidities present at baseline, ranging up to 47% and 76% for 3- and 5-year follow-ups, respectively. Hypertension and lipid disorders, as health risk factors, increase the cumulative incidence rates of both individual and multiple disorders. Moreover, in their presence, the observed cumulative incidence rates of combinations of cardiovascular disorders, that is, multimorbidity differs significantly from the expected rates. CONCLUSION: There are clear synergies between health risks and chronic diseases when multimorbidity within a patient progresses over time. The method used here supports a more comprehensive analysis of such synergies compared with what can be obtained by traditional statistics.

Country of Research

Netherlands

Design of Study

Cohort study

Duration of Study

9 years (2002-2011)

Name of Condition

Multiple disorders, Diabetes mellitus , Ischemic heart disease, retinopathy, heart failure, Stroke, hypertension,

Artificial Intelligence Technique Used

Multilevel temporal Bayesian networks, Adapted, multilevel temporal Bayesian networks to study the course of multimorbidity in order to yield new clinical insight.

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

NS, Probabilities are reported Persistence of individual chronic diseases, Probability of having comorbid combinations

Patient-related Outcomes Assessed

The overall multimorbidity rate of chronic cardiovascular (related) disorders rapidly increases when multimorbidity is already present at baseline.

Primary Healthcare Worker Related Outcomes Assessed

The urbanization level of a general practice is associated with the cumulative incidence of chronic cardiovascular conditions, in particular those with a high prevalence, that is, obesity, hypertension, dyslipidemia, diabetes mellitus, and ischemic heart disease.

Healthcare System-related Outcomes Assessed

When multimorbidity progresses over time, certain disease combinations develop more quickly than what can be expected from individual disease progression. This synergistic effect happens particularly in the presence of hypertension and dyslipidemia

Reached Target Population?

Yes : Cumulative incidence rates of one or more new morbidities rapidly increase with the number of morbidities present at baseline, ranging up to 47% and 76% for 3- and 5-year follow-ups, respectively. Hypertension and lipid disorders, as health risk factors, increase the cumulative incidence rates of both individual and multiple disorders. Moreover, in their presence, the observed cumulative incidence rates of combinations of cardiovascular disorders, that is, multimorbidity differs significantly from the expected rates

Adoption

Yes (number of providers i.e. PHC participating) : 90 general practice registries in the Netherlands

Implementation

Not specified

Maintenance

Not specified (Unclear)

Key Conclusions

There are clear synergies between health risks and chronic diseases when multimorbidity within a patient progresses over time. The method used here supports a more comprehensive analysis of such synergies compared with what can be obtained by traditional statistics; The urbanization level of a general practice is associated with the cumulative incidence of chronic cardiovascular conditions, in particular those with a high prevalence, that is, obesity, hypertension, dyslipidemia, diabetes mellitus, and ischemic heart disease. The overall multimorbidity rate of chronic cardiovascular (related) disorders rapidly increases when multimorbidity is already present at baseline. When multimorbidity progresses over time, certain disease combinations develop more quickly than what can be expected from individual disease progression. This synergistic effect happens particularly in the presence of hypertension and dyslipidemia.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	+	+

Color Code

Low

Unclear

High

Paper 45

Paper Title: Patient-tailored prioritization for a pediatric care decision support system through machine learning

Authors or developers	Klann, J. G. Anand, V. Downs, S. M.
Year of Publication	2013
Full reference of the study	Klann, J. G., Anand, V., & Downs, S. M. (2013). Patient-tailored prioritization for a pediatric care decision support system through machine learning. Journal of the American Medical Informatics Association, 20(e2), e267-e274.
Abstract	OBJECTIVE: Over eight years, we have developed an innovative computer decision support system that improves appropriate delivery of pediatric screening and care. This system employs a guideline evaluation engine using data from the electronic health record (EHR) and input from patients and caregivers. Because guideline recommendations typically exceed the scope of one visit, the engine uses a static prioritization scheme to select recommendations. Here we extend an earlier idea to create patient-tailored prioritization. MATERIALS AND METHODS: We used Bayesian structure learning to build networks of association among previously collected data from our decision support system. Using area under the receiver-operating characteristic curve (AUC) as a measure of discriminability (a sine qua non for expected value calculations needed for prioritization), we performed a structural analysis of variables with high AUC on a test set. Our source data included 177 variables for 29,402 patients. RESULTS: The method produced a network model containing 78 screening questions and anticipatory guidance (107 variables total). Average AUC was 0.65, which is sufficient for prioritization depending on factors such as population prevalence. Structure analysis of seven highly predictive variables reveals both face-validity (related nodes are connected) and non-intuitive relationships. DISCUSSION: We demonstrate the ability of a Bayesian structure learning method to ‘phenotype the population’ seen in our primary care pediatric clinics. The resulting network can be used to produce patient-tailored posterior probabilities that can be used to prioritize content based on the patient’s current circumstances. CONCLUSIONS: This study demonstrates the feasibility of EHR-driven population phenotyping for patient-tailored prioritization of pediatric preventive care services.
Country of Research	USA
Design of Study	Unclear, Not specified by could be cohort since electronic health records were used
Duration of Study	6 year(2005-2011)
Name of Condition	Asthma
Artificial Intelligence Technique Used	Bayesian structure learning
Provider’s involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	AUC: 0.65,SE: 0.020
Patient-related Outcomes Assessed	Not specified
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Electronic health record (EHR) and input from patients and caregivers
Implementation	No, implemented on electronic medical record.
Maintenance	Not specified (Unclear)
Key Conclusions	The study defines the ability of a Bayesian structure learning method to phenotype the population seen in primary care pediatric clinics.

Paper 46

Paper Title: Automatic generation of case-detection algorithms to identify children with asthma from large electronic health record databases

Authors or developers

Afzal, Z.
Engelkes, M.
Verhamme, K. M.
Janssens, H. M.
Sturkenboom, M. C.
Kors, J. A.
Schuemie, M. J.

Year of Publication

2013

Full reference of the study

Afzal, Zubair, et al. “Automatic generation of case‐detection algorithms to identify children with asthma from large electronic health record databases.” Pharmacoepidemiology and drug safety 22.8 (2013): 826-833.

Abstract

PURPOSE: Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. METHODS: The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5-18years using a broad query on asthma-related codes, drugs, and free text. A training set of 5,032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. RESULTS: The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. CONCLUSIONS: The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.

Country of Research

The Netherlands

Design of Study

Cohort study

Duration of Study

12 years, 1 January 2000 till 31 January 2012

Name of Condition

Athma

Artificial Intelligence Technique Used

rule-learning algorithm RIPPER

Providers’ involvement in

Developing : NS Testing: The rule-learning algorithm RIPPER was used on the training set to automatically generate rules for each of the asthma case definitions. The RIPPER algorithm produces an ordered set of decision rules. The advantages of such machine-learning algorithms are their ability to produce output that is understandable by humans, their ease of use, and their applicability to a wide range of problems, Validating : NS

Accuracy of the AI Intervention

positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases.

Patient-related Outcomes Assessed

The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases.

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

An automated case detection algorithm can reduce the workload of manual annotation and allow large-scale epidemiological studies

Reached Target Population?

Yes : NS

Adoption

Yes (number of providers i.e. PHC participating) : Integrated Primary Care Information (IPCI) database

Implementation

The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5–18 years using a broad query on asthma-related codes, drugs, and free text. A training set of 5,032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set.

Maintenance

Not specified

Key Conclusions

The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	–	+

Color Code

Low

Unclear

High

Paper 47

Paper Title: The use of data-mining to identify indicators of health-related quality of life in patients with irritable bowel syndrome

Authors or developers	Penny, K. I. Smith, G. D.
Year of Publication	2012
Full reference of the study	Penny, Kay I., and Graeme D. Smith. “The use of data‐mining to identify indicators of health‐related quality of life in patients with irritable bowel syndrome.” Journal of clinical nursing 21.19pt20 (2012): 2761-2771.
Abstract	AIM: To examine the health-related quality of life in a cohort of individuals with irritable bowel syndrome and to explore the use of several data-mining methods to identify which socio-demographic and irritable bowel syndrome symptoms are most highly associated with impaired health-related quality of life. BACKGROUND: Health-related quality of life can be adversely affected by irritable bowel syndrome. Little is presently known about the predictive factors that may influence the quality of life in these patients. DESIGN: Cross-sectional survey design involving the general population of the UK. METHODS: Individuals with symptoms of irritable bowel syndrome were recruited to a longitudinal cohort survey via a UK-wide newspaper advert. Health-related quality of life was measured using a battery of validated questionnaires. Several data-mining models to determine which factors are associated with impaired health-related quality of life are considered in this study and include logistic regression, a classification tree and artificial neural networks. RESULTS: As well as irritable bowel syndrome symptom severity, results indicate that psychological morbidity and socio-demographic factors such as marital status and employment status also have a major influence on health-related quality of life in irritable bowel syndrome. CONCLUSION: Health-related quality of life is impaired in community-based individuals in the UK with irritable bowel syndrome. Although not always as easily interpreted as logistic regression, data-mining techniques indicate subsets of factors that are highly associated with impaired quality of life. These models tend to include subsets of irritable bowel syndrome symptoms and psychosocial factors. RELEVANCE TO CLINICAL PRACTICE: Identification of the role of psychological and socio-demographic factors on health-related quality of life may provide more insight into the nature of irritable bowel syndrome. Greater understanding of these factors will facilitate more flexible and efficient nursing assessment and management of this patient group.
Country of Research	UK
Design of Study	Cohort study, Unclear, Prospective study
Duration of Study	6 months
Name of Condition	Irritable Bowel Syndrome,(Measurement of health-related quality of life (HRQoL))
Artificial Intelligence Technique Used	Logistic regression, a classification tree (CT) and three different artificial neural networks (ANNs)
Provider’s involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	Accuracy Classify pain Logistic regression method: 76.9%, CT:73.9%, ANN 2 layers with alpha 0.9: 66%, 2 layers with alpha: 0.7: 67.1%, 3 layers with alpha: 66.9% Classify anxiety depression Logistic regression method: 59.1%, CT:66%, ANN 2 layers with alpha 0.9: 58.9%, 2 layers with alpha: 0.7: 56.2%, 3 layers with alpha: 60.8%,(Sensity, specificity, PPV and NPV also reported)
Patient-related Outcomes Assessed	Health-related quality of life can be adversely affected by irritable bowel syndrome (IBS). The aims of this study were to examine the health-related quality of life in a cohort of individuals with IBS and to determine which socio-demographic and IBS symptoms are independently associated with reduced health related quality of life.
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	several factors which are associated with impaired health-related quality of life have been identified; these include psychological morbidity, marital status and employment status.
Reached Target Population?	Yes : The study aimed to examine the health-related quality of life in a cohort of individuals with IBS and to determine which socio-demographic and IBS symptoms are independently associated with reduced health related quality of life.
Adoption	Yes (number of providers i.e. PHC participating) : A research nurse screened individuals
Implementation	Several data-mining models to determine which factors are associated with impaired health-related quality of life are considered in this study and include logistic regression, a classification tree and artificial neural networks
Maintenance	No
Key Conclusions	Study showed several factors which are associated with impaired health-related quality of life have been identified; these include psychological morbidity, marital status and employment status. Understanding the health-related problems of irritable bowel syndrome of patients enables health professionals to gain increased insight into the nature of the condition, and may provide insight to plan appropriate interventions to enhance the well-being of patients. Further study of these factors and the influence of health-related quality of life over time in a community-based population is merited.

Paper 48

Paper Title: Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes

Authors or developers	Zhou, L. Plasek, J. M. Mahoney, L. M. Karipineni, N. Chang, F. Yan, X. Chang, F. Dimaggio, D. Goldman, D. S. Rocha, R. A.
Year of Publication	2011
Full reference of the study	Zhou, L., Plasek, J. M., Mahoney, L. M., Karipineni, N., Chang, F., Yan, X., … & Rocha, R. A. (2011). Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1639). American Medical Informatics Association.
Abstract	Clinical information is often coded using different terminologies, and therefore is not interoperable. Our goal is to develop a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Evaluators manually reviewed 30 free-text and 10 structured outpatient clinical notes compared to MTERMS output. MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively for medication and temporal information. The local medication terminology had 83.0% coverage compared to RxNorm’s 98.0% coverage for free-text notes. 61.6% of mappings between the terminologies are exact match. Capture of duration was significantly improved (91.7% vs. 52.5%) from systems in the third i2b2 challenge.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	2 years (2009-2010)
Name of Condition	Medical Text Extraction, Reasoning and Mapping
Artificial Intelligence Technique Used	Natural language processing system, Medical Text Extraction Reasoning and Mapping System (MTERMS),Comment : Overall, there were 1.108 free-text note terms from 30 charts and 1,035 structured note terms from 10 charts for a combination of findings types
Provider’s involvement in	Developing : NS, Testing : NS, Validating : physician (NK) and a clinical Informatician, Doctor of Pharmacy candidate (DD)
Accuracy of the AI Intervention	MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively
Patient-related Outcomes Assessed	NS
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	The gap in knowledge addressed is an automated approach to mapping terminologies that can be used to extract clinical concepts from notes to increase the interoperability and utility of clinical information.
Reached Target Population?	Not specified
Adoption	No : free-text outpatient clinical notes created mainly by patients’ primary care physicians
Implementation	Yes
Maintenance	Not specified (Unclear)
Key Conclusions	Developing a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Further, MTERMS finding was the combination of automated NLP methodologies for processing clinical notes with a terminology mapper and a temporal reasoning system can be used to extract, encode and reason about clinically relevant information. The gap in knowledge addressed is an automated approach to mapping terminologies that can be used to extract clinical concepts from notes to increase the interoperability and utility of clinical information.

Paper 49

Paper Title: Application of artificial neural networks to a study of nursing burnout

Authors or developers	Ladstatter, F. Garrosa, E. Badea, C. Moreno, B.
Year of Publication	2010
Full reference of the study	Ladst’tter, Felix, et al. “Application of artificial neural networks to a study of nursing burnout.” Ergonomics 53.9 (2010): 1085-1096.
Abstract	Nursing is generally considered to be a profession with high levels of emotional and physical stress that tend to increase. These high stress levels lead to a high risk of burnout. The objective was to assess whether artificial neural network (ANN) paradigms offer greater predictive accuracy than statistical methodologies, which are commonly used in the field of burnout. A radial basis function (RBF) network and hierarchical stepwise regression was used to assess burnout. The comparison of the two methodologies was carried out by analysing a sample of 462 nurses and student nurses. The subjects were from three hospitals in Madrid (Spain), who completed the ‘Nursing Burnout Scale’ survey. A RBF network was better suited for the analysis of burnout than hierarchical stepwise regression. The outcomes indicate furthermore that the relationship with the burnout process of the predictive variables age, job status, workload, experience with pain and death, conflictive interaction, role ambiguity and hardy personality is not entirely linear. The usage of ANNs in the field of burnout has been justified due to their superior ability to capture non-linear relationships, which is relevant for theory development. STATEMENT OF RELEVANCE: Due to the superior ability to capture non-linear relationships, ANNs are better suited to explain and predict burnout and its subdimensions than common statistical methods. From this perspective, more specific programmes to prevent burnout and its consequences in the workplace can be designed.
Country of Research	Spain
Design of Study	Unclear
Duration of Study	Not Specified
Name of Condition	Nursing burnout
Artificial Intelligence Technique Used	Artificial neural networks, radial basis function (RBF) network and hierarchical stepwise regression
Providers’ involvement in	Developing : (ANN) Artificial neural networks together with other disciplines such as rule-based expert systems, fuzzy expert systems, evolutionary computation and hybrid intelligent systems, ANNs fall within the field of artificial intelligence (AI). Most of the literature on AI, and particularly on ANNs, is written in computer science terminology and packed with complex matrix algebra and differential equations. This obviously lends AI and ANNs an aura of respectability and is probably the reason why intelligent systems have been, until recently, scarcely applied by non-computer scientists even though most ideas behind intelligent systems are amazingly simple and straightforward and there are numerous user-friendly software packages
Accuracy of the AI Intervention	(R square values) Hierarchical stepwise regression Modelling 0.51 0.42 0.46 0.46 Validation 0.41 0.39 0.37 0.39 Radial basis function network Training 0.68 0.61 0.59 0.63 Validation 0.50 0.38 0.48 0.45
Patient-related Outcomes Assessed	Not Specified
Primary Healthcare Worker Related Outcomes Assessed	Burnout model data Workload Mean: 2.71, SD: 0.47 IRQ: 0.64 Experience with pain 3.05 0.48 0.62 Conflictive interaction 2.53 0.46 0.55 Role ambiguity 2.32 0.50 0.66 Emotional exhaustion 2.29 0.57 0.72 Depersonalisation 1.73 0.47 0.57 Lack of personal accomplice. 1.72 0.47 0.66 Hardy personality 3.03 0.33 0.52
Healthcare System-related Outcomes Assessed	Not Specified
Reached Target Population?	Yes : Nursing is generally considered to be a profession with high levels of emotional and physical stress that tend to increase. These high stress levels lead to a high risk of burnout. The objective was to assess whether artificial neural network (ANN) paradigms offer greater predictive accuracy than statistical methodologies, which are commonly used in the field of burnout.
Adoption	Yes (number of providers i.e. PHC participating) : 462 nurses from three Spanish hospitals
Implementation	Not Specified
Maintenance	No
Key Conclusions	Due to the superior ability to capture non-linear relationships, artificial neural network (ANNs) are better suited to explain and predict burnout and its subdimensions than common statistical methods. From this perspective, more specific programmes to prevent burnout and its consequences in the workplace can be designed. Further, a radial basis function (RBF) network and hierarchical stepwise regression was used to assess burnout. The comparison of the two methodologies was carried out by analysing a sample of 462 nurses and student nurses. The subjects were from three hospitals in Madrid (Spain), who completed the ‘Nursing Burnout Scale’ survey. A RBF network was better suited for the analysis of burnout than hierarchical stepwise regression. The outcomes indicate furthermore that the relationship with the burnout process of the predictive variables age, job status, workload, experience with pain and death, conflictive interaction, role ambiguity and hardy personality is not entirely linear. The usage of ANNs in the field of burnout has been justified due to their superior ability to capture non-linear relationships, which is relevant for theory development.

Paper 50

Paper Title: Data mining of tuberculosis patient data using multiple correspondence analysis

Authors or developers	Rennie, T. W. Roberts, W.
Year of Publication	2009
Full reference of the study	RENNIE, T., & ROBERTS, W. (2009). Data mining of tuberculosis patient data using multiple correspondence analysis. Epidemiology and Infection, 137(12), 1699-1704. doi:10.1017/S0950268809002787
Abstract	The aim of this study was to demonstrate the epidemiological use of multiple correspondence analysis (MCA), as applied to tuberculosis (TB) data from North East London. Data for TB notifications in North East London primary care trusts (PCTs) between the years 2002 and 2007 were used. TB notification data were entered for MCA allowing display of graphical data output (n=4947); MCA analyses were performed on the whole dataset, by PCT, and by year of notification. Graphical MCA output displayed variance of data categories; clustering of variable categories in MCA output signified association. Clustering patterns in MCA output demonstrated different associations by year of notification, within PCTs and between PCTs. MCA is a useful technique for displaying association of variable categories used in TB epidemiology. Results suggest that MCA could be a useful tool in informing commissioning of TB services.
Country of Research	UK
Design of Study	Cohort study
Duration of Study	5 year, 2002-2007
Name of Condition	Tuberculosis
Artificial Intelligence Technique Used	multiple correspondence analysis (MCA)
Providers’ involvement in	Testing
Accuracy of the AI Intervention	Not Specified
Patient-related Outcomes Assessed	Not Specified
Primary Healthcare Worker Related Outcomes Assessed	Not Specified
Healthcare System-related Outcomes Assessed	Not Specified
Reached Target Population?	Yes : Yes
Adoption	Yes (number of providers i.e. PHC participating) : TB notifications in North East London primary care trusts
Implementation	Yes
Maintenance	Not specified
Key Conclusions	In this cohort of patients, male gender was slightly more common and the three most common ethnicities were Black African, Indian Asian, and Pakistani Asian; only 18. 3% of patients were born in the UK. A minority of patients (11. 7%) had their consumption of treatment supervised by directly observed treatment (DOT) and over a third of patients were hospitalized. For three variables in particular, the analytical technique provides an iterative procedure which estimates the noise variance and the parameters that capture the longitudinal (temporal) correlation structure, thus further allows analysis of multiple datasets that can contain different data types. The developed tool can be used as an epidemiological method to inform commissioning priorities in healthcare such as Tuberculosis service provision. Whilst users should be aware of the limitations, MCA multiple correspondence analysis is an efficient technique that effectively produces a data map displaying association.

Paper 51

Paper Title: Agreement between patient-reported symptoms and their documentation in the medical record

Authors or developers	Pakhomov, S. V. Jacobsen, S. J. Chute, C. G. Roger, V. L.
Year of Publication	2008
Full reference of the study	Pakhomov, Serguei V et al. “Agreement between patient-reported symptoms and their documentation in the medical record.” The American journal of managed care vol. 14,8 (2008): 530-9.
Abstract	OBJECTIVES: To determine the agreement between patient-reported symptoms of chest pain, dyspnea, and cough and the documentation of these symptoms by physicians in the electronic medical record. METHODS: Symptoms reported on patient-provided information forms between January 1, 2006, and June 30, 2006, were compared with those identified by natural language processing of the text of clinical notes from care providers. Terms that represent the three symptoms were used to search clinical notes electronically with subsequent manual identification of the context (e.g. affirmative, negated, family history) in which they occurred. Results were reported using positive and negative agreement, and kappa statistics. RESULTS: Symptoms reported by 1119 patients age 18 years or older were compared with the non-negated terms identified in their clinical notes. Positive agreement was 74, 70, and 63 for chest pain, dyspnea, and cough, while negative agreement was 78, 76, and 75, respectively. Kappa statistics were 0.52 (95% confidence interval [CI] = 0.44, 0.60) for chest pain, 0.46 (95% CI = 0.37, 0.54) for dyspnea, and 0.38 (95% CI = 0.28, 0.48) for cough. Positive agreement was higher for older men (P >.05), and negative agreement was higher for younger women (P >.05). CONCLUSIONS: We found discordance between patient self-report and documentation of symptoms in the medical record. This discordance has important implications for research studies that rely on symptom information for patient identification and may have clinical implications that must be evaluated for potential impact on quality of care, patient safety, and outcomes.
Country of Research	USA
Design of Study	Cohort study
Duration of Study	6 months, January 1, 2006 and June 30, 2006
Name of Condition	Angina pectoris, Chest pain, dyspnea and cough
Artificial Intelligence Technique Used	Natural language processing (NLP)
Providers’ involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	Chest pain: sensitivity: 88 (84-90), specificity: 92 (91-94) Dyspnea: sensitivity: 84 (80-87), specificity: 93 (91-94) Cough: sensitivity: 83 (78-88), specificity: 91 (90-93)
Patient-related Outcomes Assessed	The study found discordance between patient self report and documentation of symptoms in the medical record. This has important implications for research studies that rely on symptom information for patient identification and may have clinical implications that must be evaluated for potential impact on quality of care, patient safety and outcomes.
Primary Healthcare Worker Related Outcomes Assessed	The current study is the first step towards understanding the implications of symptom documentation practices for both primary and secondary uses of the EMR.
Healthcare System-related Outcomes Assessed	Not Specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : clinical notes in this study sample represent a variety of clinical specialties including primary care (20%)
Implementation	YES
Maintenance	Not specified
Key Conclusions	The study found discordance between patient self report and documentation of symptoms in the medical record. This has important implications for research studies that rely on symptom information for patient identification and may have clinical implications that must be evaluated for potential impact on quality of care, patient safety and outcomes.

Paper 52

Paper Title: An expert system for headache diagnosis: the Computerized Headache Assessment tool (CHAT)

Authors or developers

Maizels, M.
Wolfe, W. J.

Year of Publication

2008

Full reference of the study

Maizels M, Wolfe WJ. An expert system for headache diagnosis: the Computerized Headache Assessment tool (CHAT). Headache. 2008;48(1):72‐78. doi:10.1111/j.1526-4610.2007.00918.x

Abstract

BACKGROUND: Migraine is a highly prevalent chronic disorder associated with significant morbidity. Chronic daily headache syndromes, while less common, are less likely to be recognized, and impair quality of life to an even greater extent than episodic migraine. A variety of screening and diagnostic tools for migraine have been proposed and studied. Few investigators have developed and evaluated computerized programs to diagnose headaches. OBJECTIVES: To develop and determine the accuracy and utility of a computerized headache assessment tool (CHAT). CHAT was designed to identify all of the major primary headache disorders, distinguish daily from episodic types, and recognize medication overuse. METHODS: CHAT was developed using an expert systems approach to headache diagnosis, with initial branch points determined by headache frequency and duration. Appropriate clinical criteria are presented relevant to brief and longer-lasting headaches. CHAT was posted on a web site using Microsoft active server pages and a SQL-server database server. A convenience sample of patients who presented to the adult urgent care department with a headache, and patients in a family practice waiting room, were solicited to participate. Those who completed the on-line questionnaire were contacted for a diagnostic interview. RESULTS: One hundred and thirty-five patients completed CHAT and 117 completed a diagnostic interview. CHAT correctly identified 35/35 (100%) patients with episodic migraines and 42/49 (85.7%) of patients with transformed migraines. CHAT also correctly identified 11/11 patients with chronic tension-type headaches, 2/2 with episodic tension-type headaches, and 1/1 with an episodic cluster headache. Medication overuse was correctly recognized in 43/52 (82.7%). The most common misdiagnoses by CHAT were seen in patients with a transformed migraine or a new daily persistent headache. Fifty patients were referred to their primary care physician and 62 to the headache clinic. Of 29 patients referred to the PCP with a confirmed diagnosis of a migraine, 25 made a follow-up appointment, the PCP diagnosed migraine in 19, and initiated migraine-specific therapy or prophylaxis in 17. CONCLUSION: The described expert system displays high diagnostic accuracy for migraines and other primary headache disorders, including daily headache syndromes and medication overuse. As part of a disease management program, CHAT led to patients receiving appropriate diagnoses and therapy. Limitations of the system include patient willingness to utilize the program, introducing such a process into the culture of medical care, and the difficult distinction of transformed migraines.

Design of Study

Cohort study

Duration of Study

Not mentioned

Name of Condition

Migraine and other primary headache disorders

Artificial Intelligence Technique Used

Computerized headache assessment tool (CHAT, multilevel Bayesian networks)

Providers’ involvement in

Developing : Computerized headache assessment tool (CHAT). CHAT was designed to identify all of the major primary headache disorders, distinguish daily from episodic types, and recognize medication overuse, Testing : NS, Validating : NS

Accuracy of the AI Intervention

The diagnostic accuracy for all headache diagnoses, including unclassifiable (but not including medication overuse) was 104/117 (88.9%).

Patient-related Outcomes Assessed

Described expert system displays high diagnostic accuracy for migraines and other primary headache disorders, including daily headache syndromes and medication overuse. As part of a disease management program, CHAT led to patients receiving appropriate diagnoses and therapy.

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Patient data in primary care registries

Implementation

YES

Maintenance

Not specified

Key Conclusions

An internet-based headache assessment tool has demonstrated a high degree of accuracy in recognizing primary headache disorders, distinguishing chronic daily from episodic headaches, and recognizing medication overuse. No other computer-assisted headache diagnostic program described in the medical literature has a similar scope of diagnosis, or demonstrated accuracy. The major challenge of computer-assisted headache diagnosis at present is not in developing better programs, but in facilitating patients to use such programs, and encouraging their use in primary care settings. Further development might integrate online headache assessment with education about headache treatment, identification of headache triggers, and innovative online behavioral modification programs.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	+	–

Color Code

Low

Unclear

High

Paper 53

Paper Title: The K-nearest neighbor algorithm predicted rehabilitation potential better than current Clinical Assessment Protocol

Authors or developers

Zhu, M.
Chen, W.
Hirdes, J. P.
Stolee, P.

Year of Publication

2007

Full reference of the study

Zhu M, Chen W, Hirdes JP, Stolee P. The K-nearest neighbor algorithm predicted rehabilitation potential better than current Clinical Assessment Protocol. J Clin Epidemiol. 2007;60(10):1015‐1021. doi:10.1016/j.jclinepi.2007.06.001

Abstract

OBJECTIVE: There may be great potential for using computer-modeling techniques and machine-learning algorithms in clinical decision making, if these can be shown to produce results superior to clinical protocols currently in use. We aim to explore the potential to use an automatic, data-driven, machine-learning algorithm in clinical decision making. STUDY DESIGN AND SETTING: Using a database containing comprehensive health assessment information (the interRAI-HC) on home care clients (N=24,724) from eight community-care regions in Ontario, Canada, we compare the performance of the K-nearest neighbor (KNN) algorithm and a Clinical Assessment Protocol (the “ADLCAP”) currently used to predict rehabilitation potential. For our purposes, we define a patient as having rehabilitation potential if the patient had functional improvement or remained at home over a follow-up period of approximately 1 year. RESULTS: The KNN algorithm has a lower false positive rate in all but one of the eight regions in the sample, and lower false negative rates in all regions. Compared to using likelihood ratio statistics, KNN is uniformly more informative than the ADLCAP. CONCLUSION: This article illustrates the potential for a machine-learning algorithm to enhance clinical decision making.

Country of Research

Canada

Design of Study

Unclear, Comparative study

Duration of Study

Name of Condition

NS, Potential to use an automatic, data-driven, machine-learning algorithm in clinical decision making

Artificial Intelligence Technique Used

e K-nearest neighbor (KNN) algorithm and a Clinical Assessment Protocol (the ‘‘ADLCAP’’)

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

False positive: ADCLAP: 0.334, KNN: 0.277, false negative: ADCLAP: 0.629, KNN: 0.415

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Work is relevant to all those working to make better use of standardized health information in clinical decision making and service planning

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes : NS

Adoption

Yes (number of providers i.e. PHC participating) : Eight community-care regions in Ontario, Canada

Implementation

Study illustrates the potential for a machine-learning algorithm to enhance clinical decision making. As use of computerized health information systems, such as those based on the interRAI instruments, becomes more widespread, there is great potential to use these algorithms to direct therapy and services at those patients most likely to benefit

Maintenance

Not specified (Unclear)

Key Conclusions

The work illustrates the potential for a machine-learning algorithm to enhance clinical decision making.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	?	+

Color Code

Low

Unclear

High

Paper 54

Paper Title: Neural networks for longitudinal studies in Alzheimer's disease

Authors or developers

Tandon, R.
Adak, S.
Kaye, J. A.

Year of Publication

2006

Full reference of the study

Tandon, R., Adak, S., & Kaye, J. A. (2006). Neural networks for longitudinal studies in Alzheimer’s disease. Artificial Intelligence in Medicine, 36(3), 245-255.

Abstract

OBJECTIVE: Alzheimer’s disease affects a growing population of elderly people today. The predictions about the course of the disease is a key component of health care decision making for patients with Alzheimer’s. The physician’s prognosis and predicted trajectory of cognitive decline often form the basis of treatment and health care decisions taken by patients and their families. These predictions are difficult to make because of the high variability and non-linearity exhibited by individual patterns of cognitive decline. This paper presents a new method of predicting the course of a disease using longitudinal data collected through multiple clinic visits. Longitudinal databases are similar to temporal databases, with some important differences–data is collected at irregular time intervals that are patient-specific and also a varying number of observations are made for each patient, depending upon the number of times the patient visited the clinic. We propose a new type of neural network called the mixed effects neural network (MENN) model that can incorporate this type of longitudinal information. MATERIAL AND METHODS: We have used longitudinal data on 704 subjects enrolled at the Layton aging and research center (LAARC) at Oregon Health and Science University. A back-propagation algorithm, modified for longitudinal data is used to obtain the weight parameters of the MENN. The modified back-propagation algorithm is further embedded in an iterative procedure that estimates the noise variance and the parameters that capture the longitudinal (temporal) correlation structure. RESULTS: We have compared the performance of the MENN with linear mixed effects models and standard neural networks (NN). MENN show better performance (misclassification rate = 0.13 and relative MSE = 0.35) as compared to standard NN (misclassification rate = 0.34 and relative MSE = 2.74) and linear mixed effects models (misclassification rate = 0.14 and relative MSE = 0.4). CONCLUSION: The results show that this method can be a useful tool for predicting non-linear disease trajectories and uncovering significant prognostic factors in longitudinal databases.

Country of Research

India

Design of Study

Cohort study

Duration of Study

14 years(1988-2002)

Name of Condition

Alzheimer’s disease

Artificial Intelligence Technique Used

& Linear mixed effects neural network, Standard neural network, mixed effect neural network

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Relative mean squared error as accuracy Mean (SD) R2 Linear mixed effect: 0.94 (0.03), Standard neural network: 0.81 (0.07), Mixed effect neural network: 0.95 (0.02) MSE 0.2 (0.16) 7.06 (7.33) 0.37 (0.34) Relative MSE 0.4 (0.11) 2.74 (0.77) 0.35 (0.11) Misclassification rate 0.14 (0.04) 0.34 (0.07) 0.13 (0.04)

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Physician’s prognosis for Alzheimer disease, No

Implementation

Yes

Maintenance

Not specified (Unclear)

Key Conclusions

The results show that this method can be a useful tool for predicting non-linear disease trajectories and uncovering significant prognostic factors in longitudinal databases

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	–	–

Color Code

Low

Unclear

High

Paper 55

Paper Title: Intention to adopt a smoking cessation expert system within a self-selected sample of Dutch general practitioners

Authors or developers	Hoving, C. Mudde, A. N. e Vries, H.
Year of Publication	2006
Full reference of the study	Hoving C, Mudde AN, de Vries H. Intention to adopt a smoking cessation expert system within a self-selected sample of Dutch general practitioners. Eur J Cancer Prev. 2006;15(1):82‐86. doi:10.1097/01.cej.0000186633.81753.8b
Abstract	To investigate intention to adopt a new smoking cessation expert system as well as outline perceived barriers by general practitioners (GPs) to adopt this expert system, a written questionnaire was sent to 771 registered GPs. Respondents, representing 34.8% of the registered GPs, were classified as adopters (34.2%), doubters (36.2%) or non-adopters (29.2%). Adopters and doubters were less negative about the time investment for the GP when adopting the expert system than non-adopters. Adopters expected a more positive reaction from their patients than non-adopters. Smoking cessation was mostly considered to be a task for the practice assistant. The authors discuss the relevance of barriers mentioned not to implement the expert system and give recommendations for further steps into implementing primary prevention activities in Dutch general practice.
Country of Research	Netherlands
Design of Study	Unclear
Duration of Study	8 weeks
Name of Condition	Not Specified, Intention to adopt a smoking cessation
Artificial Intelligence Technique Used	Expert system
Providers’ involvement in	Developing : NS, Testing : General physician, Validating : NS
Accuracy of the AI Intervention	Not Specified
Patient-related Outcomes Assessed	Adopters and doubters were less negative about the time investment for the GP when adopting the expert system than non-adopters. Adopters expected a more positive reaction from their patients than non-adopters. Smoking cessation was mostly considered to be a task for the practice assistant
Primary Healthcare Worker Related Outcomes Assessed	Not Specified
Healthcare System-related Outcomes Assessed	The authors discuss the relevance of barriers mentioned not to implement the expert system and give recommendations for further steps into implementing primary prevention activities in Dutch
Reached Target Population?	Yes
Adoption	Not specified
Implementation	YES
Maintenance	No
Key Conclusions	The GP’s role concerning this expert system is that of an intermediate. As intermediates are the key to reach the target population, it is highly important to have insight into the determinants of adoption of this expert system. This study focused on the relation between GPs’ perceptions and their intention to adopt this expert system. Results indicate that GPs feel they are too busy to adopt the expert system and have other priorities.

Paper 56

Paper Title: Translating research into practice: organizational issues in implementing automated decision support for hypertension in three medical centers

Authors or developers	Goldstein, M. K. Coleman, R. W. Tu, S. W. Shankar, R. D.’, “O’Connor, M. J.”, ‘Musen, M. A. Martins, S. B. Lavori, P. W. Shlipak, M. G. Oddone, E. Advani, A. A. Gholami, P. Hoffman, B. B.
Year of Publication	2004
Full reference of the study	Goldstein MK, Coleman RW, Tu SW, et al. Translating research into practice: organizational issues in implementing automated decision support for hypertension in three medical centers. J Am Med Inform Assoc. 2004;11(5):368‐376. doi:10.1197/jamia.M1534
Abstract	Information technology can support the implementation of clinical research findings in practice settings. Technology can address the quality gap in health care by providing automated decision support to clinicians that integrates guideline knowledge with electronic patient data to present real-time, patient-specific recommendations. However, technical success in implementing decision support systems may not translate directly into system use by clinicians. Successful technology integration into clinical work settings requires explicit attention to the organizational context. We describe the application of a “sociotechnical” approach to integration of ATHENA DSS, a decision support system for the treatment of hypertension, into geographically dispersed primary care clinics. We applied an iterative technical design in response to organizational input and obtained ongoing endorsements of the project by the organization’s administrative and clinical leadership. Conscious attention to organizational context at the time of development, deployment, and maintenance of the system was associated with extensive clinician use of the system.
Country of Research	USA
Design of Study	Unclear
Duration of Study	15 months
Name of Condition	Hypertension
Artificial Intelligence Technique Used	ATHENA DSS, a decision support system,(ATHENA DSS, a decision support system for the treatment of hypertension)
Provider’s involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	In order to maintain the system accuracy over time, the system must send alerts to the developers for drug names that do not match a drug already recognized by the system (for example, when a new drug is added to the formulary). Updating the knowledge base must be possible without reinstallation on the clinic computers. The system must monitor each clinic computer for activity. The system must allow clinician-users to provide free-text feedback that can be monitored for early identification of problems.
Patient-related Outcomes Assessed	NS
Primary Healthcare Worker Related Outcomes Assessed	Information technology-based implementation of clinical research findings in practice settings address the quality gap in health care by providing automated decision support to clinicians that integrates guideline knowledge with electronic patient data to present real-time, patient-specific recommendations.
Healthcare System-related Outcomes Assessed	Application of a ‘‘sociotechnical’’ approach to integration of ATHENA DSS, a decision support system for the treatment of hypertension, into geographically dispersed primary care clinics. This iterative technical design in response to organizational input and obtained ongoing endorsements of the project by the organization’s administrative and clinical leadership. Conscious attention to organizational context at the time of development, deployment, and maintenance of the system was associated with extensive clinician use of the system.
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 91 primary care clinics
Implementation	Yes
Maintenance	Not specified (Unclear)
Key Conclusions	The authors mentioned an expert system that implements new information technology addresses both social-organizational issues and informatics technical issues in an interrelated manner and can be applied to cross-platform and cross-institution implementations in other settings.

Paper 57

Paper Title: Using an artificial neural network to predict healing times and risk factors for venous leg ulcers

Authors or developers	Taylor, R. J. Taylor, A. D. Smyth, J. V.
Year of Publication	2002
Full reference of the study	Taylor RJ, Taylor AD, Smyth JV. Using an artificial neural network to predict healing times and risk factors for venous leg ulcers. J Wound Care. 2002;11(3):101‐105. doi:10.12968/jowc.2002.11.3.26381
Abstract	OBJECTIVES: This study aimed to identify the risk factors that influence the healing process of venous leg ulcers treated with compression bandaging, with a view to predicting healing time. METHOD: A retrospective cohort study was performed on data collected prospectively on 325 consecutive patients presenting with 345 venous ulcers at the Salford Primary Care Trust leg ulcer clinic between January 1997 and December 1999. Use of an artificial neural network (ANN) technique accurately predicted the healing times for 68% of the patients. RESULTS: The ANN demonstrated that healing was significantly related to a history of previous leg ulceration, ‘quite wet’ ulcer exudate, high body mass index, large initial total ulcer area, increasing age and male gender, (Artificial neural network (ANN)). CONCLUSION: The ability to identify at presentation ulcers that might be resistant to standard therapy would allow early consideration of more radical treatments such as hospitalisation, wound debridement or venous surgery.
Country of Research	UK
Design of Study	Cohort study, Unclear (Retrospective cohort)
Duration of Study	3 years,(January 1997 and December 1999)
Name of Condition	Ulcers venous aetiology
Artificial Intelligence Technique Used	Artificial neural network
Provider’s involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	Accuracy 68%, (Artificial neural network (ANN))
Patient-related Outcomes Assessed	ANN demonstrated that healing was significantly related to a history of previous leg ulceration, ‘quite wet’ ulcer exudate, high body mass index, large initial total ulcer area, increasing age and male gender, (Artificial neural network (ANN))
Primary Healthcare Worker Related Outcomes Assessed	Research showed the ability of the method to identify at presentation ulcers that might be resistant to standard therapy would allow early consideration of more radical treatments such as hospitalisation, wound debridement or venous surgery.
Healthcare System-related Outcomes Assessed	Retrospective cohort study was performed on data collected prospectively on 325 consecutive patients presenting with 345 venous ulcers at the Salford Primary Care Trust leg ulcer clinic between January 1997 and December 1999. Use of an artificial neural network (ANN) technique accurately predicted the healing times for 68% of the patients
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Primary care providers with this study aimed to identify the risk factors that influence the healing process of venous leg ulcers treated with compression bandaging, with a view to predicting healing time. Not specified
Implementation	ANN demonstrated that healing was significantly related to a history of previous leg ulceration, ‘quite wet’ ulcer exudate, high body mass index, large initial total ulcer area, increasing age and male gender
Maintenance	NS
Key Conclusions	This study describes how an artificial neural network (ANN) can be used to predict leg ulcer healing. An ANN is a computer modelling program inspired by the way in which the brain processes information. It can detect patterns or trends that are too complex to be noticed by humans or other computer techniques. The ANN was used to calculate the influence of 45 risk factors on three outcomes (occurrence of leg ulcer healing within 12, 24–24 and after 24 weeks). During a validation study, it was ‘trained’ to reduce prediction error and had an accuracy of 68%. The data set used was taken from 504 consecutive patients with 557 leg ulcers treated at a leg ulcer clinic. The ANN generally underestimated the time taken for an ulcer to heal, with only 3% of cases healing quicker than predicted. The most significant risk factors identified were: history of previous leg ulceration; ‘wet’ ulcer exudate; high BMI; large initial total ulcer area; advanced age at presentation; male gender

Paper 58

Paper Title: Validation of a knowledge based reminder system for diagnostic test ordering in general practice

Authors or developers	Bindels, R. Winkens, R. A. Pop, P. van Wersch, J. W. Talmon, J. Hasman, A.
Year of Publication	2001
Full reference of the study	Bindels R, Winkens RA, Pop P, van Wersch JW, Talmon J, Hasman A. Validation of a knowledge based reminder system for diagnostic test ordering in general practice. Int J Med Inform. 2001;64(2-3):341‐354. doi:10.1016/s1386-5056(01)00207-6
Abstract	We describe the validation of a real-time automated reminder system that assists General Practitioners (GP) in appropriate test ordering. We compared the comments of human experts with the comments of the reminder system using a retrospective random selection of 253 request forms. A panel of three expert physicians judged the requested tests independently based on their interpretations of the practice guidelines. The majority assessment of the physicians was compared with the assessment of the reminder system. In case the system’s output differed from the majority assessment, the written practice guidelines were consulted. On average, 1.75 reminders were produced per form. In total, 32 of the 442 given reminders (7%) were given incorrectly. The amount of information and the level of detail (the specificity of the terms) in which the GP describes the patients’ medical status are crucial for the reminder system to react correctly.
Country of Research	Netherlands
Design of Study	Unclear
Duration of Study	NS
Name of Condition	NS
Artificial Intelligence Technique Used	GRIF automated reminder system, Comment : Validation of a knowledge based reminder system
Provider’s involvement in	Developing : NS, Testing : NS, Validating : GPs
Accuracy of the AI Intervention	82.5%
Patient-related Outcomes Assessed	NS
Primary Healthcare Worker Related Outcomes Assessed	J
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : The primary health care providers describe the validation of a real-time automated reminder system that assists General Practitioners (GP) in appropriate test ordering
Implementation	Majority assessment of the physicians was compared with the assessment of the reminder system. In case the system’s output differed from the majority assessment the written practice guidelines were consulted. On average 1.75 reminders were produced per form. In total 32 of the 442 given reminders (7%) were given incorrectly. The amount of information and the level of detail (the specificity of the terms) in which the GP describes the patients’ medical status are crucial for the reminder system to react correctly.
Maintenance	Not specified (Unclear)
Key Conclusions	Study described the validation of a real-time automated reminder system that assists General Practitioners (GP) in appropriate test ordering. The work compared the comments of human experts with the comments of the reminder system using a retrospective random selection of 253 request forms. A panel of three expert physicians judged the requested tests independently based on their interpretations of the practice guidelines. The majority assessment of the physicians was compared with the assessment of the reminder system. In case the system’s output differed from the majority assessment the written practice guidelines were consulted. On average 1.75 reminders were produced per form. In total 32 of the 442 given reminders (7%) were given incorrectly. The amount of information and the level of detail (the specificity of the terms) in which the GP describes the patients’ medical status are crucial for the reminder system to react correctly.

Paper 59

Paper Title: The use of a computer-based decision support system facilitates primary care physicians' management of chronic pain

Authors or developers	Knab, J. H. Wallace, M. S. Wagner, R. L. Tsoukatos, J. Weinger, M. B.
Year of Publication	2001
Full reference of the study	Knab JH, Wallace MS, Wagner RL, Tsoukatos J, Weinger MB. The use of a computer-based decision support system facilitates primary care physicians’ management of chronic pain. Anesth Analg. 2001;93(3):712-720. doi:10.1097/00000539-200109000-00035
Abstract	We tested whether computer-based decision support (CBDS) could enhance the ability of primary care physicians (PCPs) to manage chronic pain. Structured summaries were generated for 50 chronic pain patients referred by PCPs to a pain clinic. A pain specialist used a decision support system to determine appropriate pain therapy and sent letters to the referring physicians outlining these recommendations. Separately, five board-certified PCPs used a CBDS system to “treat” the 50 cases. A successful outcome was defined as one in which new or adjusted therapies recommended by the software were acceptable to the PCPs (i.e., they would have prescribed it to the patient in actual practice). Two pain specialists reviewed the PCPs’ outcomes and assigned medical appropriateness scores (0 = totally inappropriate to 10 = totally appropriate). One year later, the hospital database provided information on how the actual patients’ pain was managed and the number of patients re-referred by their PCP to the pain clinic. On the basis of CBDS recommendations, the PCP subjects “prescribed” additional pain therapy in 213 of 250 evaluations (85%), with a medical appropriateness score of 5.5 +/- 0.1. Only 25% of these chronic pain patients were subsequently re-referred to the pain clinic within one year. The use of a CBDS system may improve the ability of PCPs to manage chronic pain and may also facilitate screening of consults to optimize specialist utilization (computer-based decision support (CBDS)).
Country of Research	USA
Design of Study	Unclear
Duration of Study	8 Months,(March and October of 1997)
Name of Condition	Chronic Pain
Artificial Intelligence Technique Used	Computer-Based Decision Support System,(Pain Management Advisor)
Provider’s involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	NS
Patient-related Outcomes Assessed	NS
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	Hospital database provided information on how the actual patients’ pain was managed and the number of patients re-referred by their PCP to the pain clinic. On the basis of CBDS recommendations, the PCP subjects prescribed additional pain therapy in 213 of 250 evaluations (85%), with a medical appropriateness score of 5.5 0.1. Only 25% of these chronic pain patients were subsequently re-referred to the pain clinic within 1 yr. The use of a CBDS system may improve the ability of PCPs to manage chronic pain and may also facilitate screening of consults to optimize specialist utilization.,(computer-based decision support (CBDS))
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Primary care providers tested whether computer-based decision support (CBDS) could enhance the ability of primary care physicians (PCPs) to manage chronic pain,No
Implementation	tested whether computer-based decision support (CBDS) could enhance the ability of primary care physicians (PCPs) to manage chronic pain. Structured summaries were generated for 50 chronic pain patients referred by PCPs to a pain clinic
Maintenance	Not specified (Unclear)
Key Conclusions	The results suggest that the use of CBDS could significantly improve the ability of PCPs to manage chronic pain. Additionally, such a decision aid could possibly permit Pain Clinic personnel to prescreen pain consults and provide feedback of standard management algorithms to PCPs before specialist referral. These novel approaches may lead to improved patient and physician satisfaction and lower costs. A much larger prospective controlled clinical trial with a parallel placebo control group will be necessary to validate whether the use of CBDS in chronic pain management will improve quality of care and reduce specialist referrals.,(Computer-based decision support (CBDS) Primary care physicians (PCPs))

Paper 60

Paper Title: Initial use of a computer system for assisting dermatological diagnosis in general practice

Authors or developers

Smith, H. R.
Ashton, R. E.
Brooks, G. J.

Year of Publication

2000

Full reference of the study

H. R. Smith, R. E. Ashton & G. J. Brooks (2000) Initial use of a computer system for assisting dermatological diagnosis in general practice, Medical Informatics and the Internet in Medicine, 25:2, 103-108, DOI: 10.1080/14639230050058284

Abstract

The accuracy of skin lesion description by 24 general practitioners was assessed as they used a computer diagnosis assistance system (DERMIS). Descriptive accuracy determines the quality of advice. Only four doctors showed a trend to accurate description over 25 hospital referrals.

Country of Research

Design of Study

Unclear

Duration of Study

1 year

Name of Condition

accuracy of skin lesion description

Artificial Intelligence Technique Used

DERMIS expert system, Bayes’ theorem and expert rules

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Accuracy: 93%

Patient-related Outcomes Assessed

Two hundred and eighty patients were referred to the hospital using DERMIS over a one-year period by 20 doctors. The range of referrals was from 2 to 42 with a mean of 14. Four doctors did not refer to a hospital using the computer system. The doctors were divided into two groups by frequency of referral. Sixteen doctors made an average of DERMIS referrals to hospital (low referrers). Four doctors averaged at least 25 DERMIS referrals (high referrers). We did not record the non-referred use of the computer system by the trial doctors.

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Aim of the study is to Initial use of a computer system for assisting dermatological diagnosis in general practice, Not specified

Implementation

Six primary care centres installed DERMIS on their doctors’ desktop computers. The programme was demonstrated to the doctors at the initial installation. DERMIS was available to 24 doctors for a one year period. The general practitioners were free to use DERMIS as they wished. Referrals to the dermatology department were accepted with or without the use of DERMIS. If used the DERMIS referral letter listed the clinical findings and the programme derived best diagnoses for that description.

Maintenance

Not specified

Key Conclusions

In this trial, accurate use of DERMIS by a specialist put the correct diagnosis in the top three of its differentials in 93% of cases. Four general practitioners GPs were accurate enough by the end of the trial to benefit from near to this level of DERMIS performance. We hypothesize that this quality of diagnostic advice could benefit patient care by suggesting an overlooked condition or strengthen a diagnosis. The further development of DERMIS and general practice IT may increase the number of GPs who could accurately use DERMIS

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	–	–

Color Code

Low

Unclear

High

Paper 61

Paper Title: Electronic surveillance of disease states: a preliminary study in electronic detection of respiratory diseases in a primary care setting

Authors or developers

Hung, J.
Posey, J.
Freedman, R.
Thorton, T.

Year of Publication

1998

Full reference of the study

Hung J, Posey J, Freedman R, Thorton T. Electronic surveillance of disease states: a preliminary study in electronic detection of respiratory diseases in a primary care setting. Proc AMIA Symp. 1998;688-692.

Abstract

The present project is a step-by-step description of the creation of a computerized surveillance system using historical information derived from automated expert system acquisition. Since historical information is in many cases not sufficient for establishing an individual’s medical diagnosis, the accuracy of surveillance is measured against the “gold standard” diagnosis provided by a panel of physicians. It was possible to survey within acceptable limits of accuracy in the conditions of the project. The results reveal a high level of sensitivity by computer surveillance as well as an accurate ability of electronic tracking of disease incidence over a period of time. However, further investigation into the accuracy of electronic surveillance and selection of symptoms used to define a disease should be studied. The feasibility of employing electronic historical medical information to survey disease has potential in providing real-time epidemiological data.

Country of Research

USA

Design of Study

Unclear

Duration of Study

2 year

Name of Condition

Respiratory diseases

Artificial Intelligence Technique Used

Computerized surveillance system using historical information derived from automated expert system acquisition.

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

90% cases reported for otitis media

Patient-related Outcomes Assessed

Not Specified

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

The most prevalent symptoms identified were “cough” and “difficulty with runny nose” (rhinitis). The software engine identified the predominant symptom area by clinic patients as being a predominant “respiratory problem” (75% positive responses from respiratory review) with 220 identified patients encounters. Five hundred and thirty-one patient encounters recorded respiratory difficulty with reading the criteria of a predominant problem. Given the prevalence of respiratory disease within the patient data identified, it was determined that studying patients with respiratory diseases would provide the maximum data set. Additionally, the correlation of the symptoms “cough” and “difficulty with runny nose” were selected from other possible respiratory symptoms from the database to provide a manageable number of charts for physician review. Cough and rhinitis were predominating symptoms linked with many other upper respiratory phenomena. To develop a workable data set, a preliminary sort was performed on the database to extract those patients with symptoms of cough and rhinitis

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : The study demonstrated the technique of surveillance and measured the sensitivity of surveillance of respiratory disease in a primary care physician’s office. Not specified

Implementation

Computerized surveillance system using historical information derived from automated expert system acquisition. Since historical information is in many cases not sufficient for establishing an individual’s medical diagnosis, the accuracy of surveillance is measured against the “gold standard” diagnosis provided by a panel of physicians.

Maintenance

Not specified

Key Conclusions

Given financial, geographical and other practical limitations, an electronic method of disease recognition may be advantageous as a surveillance tool for broad-based epidemiological studies. Current epidemiological studies require a time lag between acquisition and analysis of data as well as a high level of physician participation through their clinical diagnoses. The system proposed in this study eliminates these requirements. The work has taken patient historical data from an automated acquisition system and used it to survey a disease. The accuracy of surveillance varies with disease and the selection of the associated symptom criteria. Electronic detection of disease can occur in real time as the data is acquired. Information on outbreaks and other movements of diseases among populations can be perceived immediately. Also, dependence on physician diagnosis can be minimized. This frees the epidemiological information from human bias and inconsistency. However, the need to further investigate and test electronic methods still remains. Only with more advanced and sophisticated techniques will electronic disease detection be accepted as an accurate epidemiological tool.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	+

Color Code

Low

Unclear

High

Paper 62

Paper Title: Modeling obesity using abductive networks

Authors or developers

Abdel-Aal, R. E.
Mangoud, A. M.

Year of Publication

1997

Full reference of the study

Abdel-Aal RE, Mangoud AM. Modeling obesity using abductive networks. Computers and Biomedical Research, an International Journal. 1997 Dec;30(6):451-471. DOI: 10.1006/cbmr.1997.1460.

Abstract

This paper investigates the use of abductive-network machine learning for modeling and predicting outcome parameters in terms of input parameters in medical survey data. Here we consider modeling obesity as represented by the waist-to-hip ratio (WHR) risk factor to investigate the influence of various parameters. The same approach would be useful in predicting values of clinical parameters that are difficult or expensive to measure from others that are more readily available. The AIM abductive network machine learning tool was used to model the WHR from 13 other health parameters. Survey data were collected for a randomly selected sample of 1100 persons aged 20 years and over attending nine primary health care centers at Al-Khobar, Saudi Arabia. Models were synthesized by training on a randomly selected set of 800 cases, using both continuous and categorical representations of the parameters, and evaluated by predicting the WHR value for the remaining 300 cases. Models for WHR as a continuous variable predict the actual values within an error of 7.5% at the 90% confidence limits. Categorical models predict the correct logical value of WHR with an error in only two of the 300 evaluation cases. Analytical relationships derived from simple categorical models explain global observations on the total survey population to an accuracy as high as 99%. Simple continuous models represented as analytical functions highlight global relationships and trends. Results confirm the strong correlation between WHR and diastolic blood pressure, cholesterol level, and family history of obesity. Compared to other statistical and neural network approaches, AIM abductive networks provide faster and more automated model synthesis. A review is given of other areas where the proposed modeling approach can be useful in clinical practice.

Country of Research

UAE

Design of Study

Unclear

Duration of Study

6 months

Name of Condition

Obesity

Artificial Intelligence Technique Used

AIM abductive network machine learning tool

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Accuracy 99%

Patient-related Outcomes Assessed

Results confirm the strong correlation between WHR and diastolic blood pressure, cholesterol level, and family history of obesity. Compared to other statistical and neural network approaches, AIM abductive networks provide faster and more automated model synthesis, waist-to-hip ratio (WHR), AIM abductive network machine learning

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes: Assisting the clinician in selecting the most appropriate course of intervention based on the effectiveness of alternatives treatment methods for various classes of patients, as modeled using available historical data. For example, this approach can aid in choosing between bypass surgery, angioplasty, or medication and rehabilitation for the treatment of coronary artery disease, based on models for their short-term mortality performance.

Adoption

Yes (number of providers i.e. PHC participating) : nine PHC centers in Al-Khobar, Saudi Arabia

Implementation

Categorical models predict the correct logical value of WHR with an error in only 2 of the 300 evaluation cases. Analytical relationships derived from simple categorical models explain global observations on the total survey population to an accuracy as high as 99%. Simple continuous models represented as analytical functions highlight global relationships and trends. Results confirm the strong correlation between WHR and diastolic blood pressure, cholesterol level, and family history of obesity. Compared to other statistical and neural network approaches, AIM abductive networks provide faster and more automated model synthesis

Maintenance

Not specified (Unclear)

Key Conclusions

Investigates the use of abductive-network machine learning for modeling and predicting outcome parameters in terms of input parameters in medical survey data. Here we consider modeling obesity as represented by the waist-to-hip ratio (WHR) risk factor to investigate the influence of various parameters. The same approach would be useful in predicting values of clinical parameters that are difficult or expensive to measure from others that are more readily available. The AIM abductive network machine learning tool was used to model the WHR from 13 other health parameters. Survey data were collected for a randomly selected sample of 1,100 persons aged 20 years and over attending nine primary health care centers.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	–	–

Color Code

Low

Unclear

High

Paper 63

Paper Title: A diagnostic support system in general practice: is it feasible?

Authors or developers

Ridderikhoff, J.
van Herk, E.

Year of Publication

1997

Full reference of the study

Ridderikhoff J, van Herk E. A diagnostic support system in general practice: is it feasible?. Int J Med Inform. 1997;45(3):133‐143. doi:10.1016/s1386-5056(97)00022-1

Abstract

A medical diagnostic decision support system (DDSS) has been developed for and tested in general practice. Two major issues have been addressed: diagnostic support and usefulness. The diagnostic support pertains to the ability of the system to generate diagnostic hypotheses from a set of patient data. The usefulness is approached by creating a computer system which can be used simultaneously with the doctor-patient consultation. The support function operates by matching symptoms from the patient data base with symptom configurations contained in the knowledge base. The support is presented as a list of diagnostic hypotheses ranked by degree of concordance. A user-friendly interface has been constructed with a comprehensive set of clinical terms within which the doctor can locate a desired symptom and store it with a single keystroke. With another keystroke the doctor can check the stored data and ask for support at any moment during the process. The overall purpose is to invite the doctor to rethink and re-examine his/her steps and to reconsider possible alternatives in the light of the presented diagnostic information. In our view it has to be the doctor who makes the final judgement. A test with the system in general practice revealed good performance of the system and an astonishing proficiency of the participating doctors in its use during the consultation. Twenty doctors solved five patient cases, entering 2,000 clinical items within acceptable limits of consultation time. In 96% of the cases the correct diagnosis appeared in the differential diagnosis list. The doctors’ diagnostic accuracy was 43%. The use of standardised terminology as an option for further development is discussed. The role of the doctor in computer-aided diagnostics remains open to debate. A computer-aided diagnostic support system in general practice appears to be feasible.

Country of Research

The Netherlands

Design of Study

Unclear

Duration of Study

Not Specified

Name of Condition

Not Specified

Artificial Intelligence Technique Used

Medical diagnostic decision support system (DDSS)

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Diagnostic accuracy 43%.

Patient-related Outcomes Assessed

Not Specified

Primary Healthcare Worker Related Outcomes Assessed

The availability of immediate support, the option of retracing and reconsidering particular steps, and an open discussion with the patient provide opportunities for the practising doctor to improve the quality of his/her cure and care.

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes : A test with the system in general practice revealed good performance of the system and an astonishing proficiency of the participating doctors in its use during the consultation. Twenty doctors solved five patient cases, entering 2,000 clinical items within acceptable limits of consultation time. In 96% of the cases the correct diagnosis appeared in the differential diagnosis list. The doctors’ diagnostic accuracy was 43%.

Adoption

Not specified

Implementation

The support function operates by matching symptoms from the patient data base with symptom configurations contained in the knowledge base. The support is presented as a list of diagnostic hypotheses ranked by degree of concordance. A user-friendly interface has been constructed with a comprehensive set of clinical terms within which the doctor can locate a desired symptom and store it with a single keystroke. With another keystroke the doctor can check the stored data and ask for support at any moment during the process. The overall purpose is to invite the doctor to rethink and re-examine his steps and to reconsider possible alternatives in the light of the presented diagnostic information. In our view it has to be the doctor who makes the final judgement.

Maintenance

Not specified

Key Conclusions

A medical diagnostic decision support system (DDSS) has been developed for and tested in general practice. Two major issues have been addressed: diagnostic support and usefulness. The diagnostic support pertains to the ability of the system to generate diagnostic hypotheses from a set of patient data. The usefulness is approached by creating a computer system which can be used simultaneously with the doctor-patient consultation. The support function operates by matching symptoms from the patient data base with symptom configurations contained in the knowledge base. The support is presented as a list of diagnostic hypotheses ranked by degree of concordance. A user-friendly interface has been constructed with a comprehensive set of clinical terms within which the doctor can locate a desired symptom and store it with a single keystroke. With another keystroke the doctor can check the stored data and ask for support at any moment during the process. The overall purpose is to invite the doctor to rethink and re-examine his steps and to reconsider possible alternatives in the light of the presented diagnostic information. In our view it has to be the doctor who makes the final judgement. A test with the system in general practice revealed good performance of the system and an astonishing proficiency of the participating doctors in its use during the consultation. Twenty doctors solved five patient cases, entering 2,000 clinical items within acceptable limits of consultation time. In 96% of the cases the correct diagnosis appeared in the differential diagnosis list. The doctors’ diagnostic accuracy was 43%.

Risk of Bias

Participants	Predictors	Outcome	Analysis
–	?	+	–

Color Code

Low

Unclear

High

Paper 64

Paper Title: Comparison of an expert system with other clinical scores for the evaluation of severity of asthma

Authors or developers

Gautier, V.
Redier, H.
Pujol, J. L.
Bousquet, J.
Proudhon, H.
Michel, C.
Daures, J. P.
Michel, F. B.
Godard, P.

Year of Publication

1996

Full reference of the study

Gautier V, Rédier H, Pujol JL, et al. Comparison of an expert system with other clinical scores for the evaluation of severity of asthma. Eur Respir J. 1996;9(1):58‐64. doi:10.1183/09031936.96.09010058

Abstract

“Asthmaexpert” was produced at the special request of several clinicians in order to obtain a better understanding of the medical decisions taken by clinical experts in the management of asthmatic patients. In order to assess the severity of asthma, a new score called Artificial Intelligence score (AI score), produced by Asthmaexpert, was compared with three other scores (Aas, Hargreave and Brooks). One hundred patients were enrolled prospectively in the study during their first consultation in the out-patient clinic. Distribution of severity level according to the different scores was studied, and the reliability between AI and other scores was evaluated by Kappa and MacNemar tests. Correlations with functional parameters were performed. The AI score assessed higher levels of severity than the other scores (Kappa = 18, 28 and 10% for Aas, Hargreave and Brooks, respectively) with significant MacNemar test in all cases. There was a significant correlation between AI score and forced expiratory volume in one second (FEV1) (r = 0.73). These data indicate that the AI score is a severity score which defines higher levels of severity than the chosen scores. Correlations for functional parameters are good. This score appears easy to use for the first consultation of an asthmatic patient.

Country of Research

France

Design of Study

Unclear

Duration of Study

Not Specified

Name of Condition

Asthma

Artificial Intelligence Technique Used

Asthmaexpert

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Not Specified

Patient-related Outcomes Assessed

One hundred patients were enrolled prospectively in the study during their first consultation in the out-patient clinic. Distribution of severity level according to the different scores was studied, and the reliability between AI and other scores was evaluated by Kappa and MacNemar tests. Correlations with functional parameters were performed. The AI score assessed higher levels of severity than the other scores (Kappa=18, 28 and 10% for Aas, Hargreave and Brooks, respectively) with significant MacNemar test in all cases. There was a significant correlation between AI score and forced expiratory volume in one second (FEV1) (r=0.73).

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes : “Asthmaexpert” was produced at the special request of several clinicians in order to obtain a better understanding of the medical decisions taken by clinical experts in the management of asthmatic patients. In order to assess the severity of asthma, a new score called Artificial Intelligence score (AI score), produced by Asthmaexpert, was compared with three other scores (Aas, Hargreave and Brooks).

Adoption

Not specified

Implementation

Maintenance

Not mentioned/td>

Key Conclusions

“Asthmaexpert” was produced at the special request of several clinicians in order to obtain a better understanding of the medical decisions taken by clinical experts in the management of asthmatic patients. In order to assess the severity of asthma, a new score called Artificial Intelligence score (AI score), produced by Asthmaexpert, was compared with three other scores (Aas, Hargreave and Brooks). One hundred patients were enrolled prospectively in the study during their first consultation in the out-patient clinic. Distribution of severity level according to the different scores was studied, and the reliability between AI and other scores was evaluated by Kappa and MacNemar tests. Correlations with functional parameters were performed. The AI score assessed higher levels of severity than the other scores (Kappa=18, 28 and 10% for Aas, Hargreave and Brooks, respectively) with significant MacNemar test in all cases. There was a significant correlation between AI score and forced expiratory volume in one second (FEV1) (r=0.73). These data indicate that the AI score is a severity score which defines higher levels of severity than the chosen scores. Correlations for functional parameters are good. This score appears easy to use for the first consultation of an asthmatic patient.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	+

Color Code

Low

Unclear

High

Paper 65

Paper Title: An expert system for performance-based direct delivery of published clinical evidence

Authors or developers	Balas, E. A. Li, Z. R. Spencer, D. C. Jaffrey, F. Brent, E. Mitchell, J. A.
Year of Publication	1996
Full reference of the study	Balas EA, Li ZR, Spencer DC, Jaffrey F, Brent E, Mitchell JA. An expert system for performance-based direct delivery of published clinical evidence. J Am Med Inform Assoc. 1996;3(1):56‐65. doi:10.1136/jamia.1996.96342649
Abstract	OBJECTIVE: To develop a system for clinical performance improvement through rule-based analysis of medical practice patterns and individualized distribution of published scientific evidence. METHODS: The Quality Feedback Expert System (QFES) was developed by applying a Level-5 expert system shell to generate clinical direct reports for performance improvement. The system comprises three data and knowledge bases: 1) a knowledge base of measurable clinical practice parameters; 2) a practice pattern database of provider-specific numbers of patients and clinical activities; and 3) a management rule base comprising “redline rules” that identify providers whose practice styles vary significantly. Clinical direct reports consist of a table of practice data highlighting individual utilization vs recommendation and selected pertinent statements from medical literature. RESULTS: The QFES supports integration of recommendations from several guidelines into a comprehensive and measurable quality improvement plan, analysis of actual practice patterns and comparison with accepted recommendations, and generation of a confidential individualized direct report to those who significantly deviate from clinical recommendations. The feasibility of the practice pattern analysis by the QFES was demonstrated in a sample of 182 urinary tract infection cases from a primary care clinic. In a set of clinical activities, four questions/procedures were associated with significant (p < 0.001) and unexplained variation. CONCLUSION: The QFES provides a flexible tool for the implementation of clinical practice guidelines in diverse and changing clinical areas without the need for special program development. Preliminary studies indicate utility in the analysis of clinical practice variation and deviations. Using data obtained through a retrospective chart audit, the QFES was able to detect overutilization, and to identify nonrandom differences in practice patterns.
Country of Research	USA
Design of Study	Unclear : Randomized clinical study
Duration of Study	Not Specified
Name of Condition	Cystitis Pyelonephritis Urinary tract infection
Artificial Intelligence Technique Used	Quality Feedback Expert System (QFES)
Providers’ involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	Not Specified
Patient-related Outcomes Assessed	Not Specified
Primary Healthcare Worker Related Outcomes Assessed	Not Specified
Healthcare System-related Outcomes Assessed	QFES provides a flexible tool for the implementation of clinical practice guidelines in diverse and changing clinical areas without the need for special program development. Preliminary studies indicate utility in the analysis of clinical practice variation and deviations. Using data obtained through a retrospective chart audit, the QFES was able to detect overutilization, and to identify non-random differences in practice patterns.
Reached Target Population?	Yes : QFES provides a flexible tool for the implementation of clinical practice guidelines in diverse and changing clinical areas without the need for special program development. Preliminary studies indicate utility in the analysis of clinical practice variation and deviations.
Adoption	Yes (number of providers i.e. PHC participating) : 182 urinary tract infection cases from a primary care clinic.
Implementation	Quality Feedback Expert System (QFES) was developed by applying a Level-5 expert system shell to generate clinical direct reports for performance improvement. The system comprises of three data and knowledge bases: 1) a knowledge base of measurable clinical practice parameters; 2) a practice pattern database of provider-specific numbers of patients and clinical activities; and 3) a management rule base comprising “redline rules” that identify providers whose practice styles vary significantly.
Maintenance	Not specified
Key Conclusions	The QFES provides a flexible tool for the implementation of clinical practice guidelines in diverse and changing clinical areas without the need for special program development. Preliminary studies indicate utility in the analysis of clinical practice variation and deviations. Using data obtained through a retrospective chart audit, the QFES was able to detect overutilization, and to identify nonrandom differences in practice patterns., Quality Feedback Expert System (QFES)

Paper 66

Paper Title: Categorization of major depression in an outpatient sample

Authors or developers

Haslam, N.
Beck, A. T.

Year of Publication

1993

Full reference of the study

Haslam N, Beck AT. Categorization of major depression in an outpatient sample. J Nerv Ment Dis. 1993;181(12):725-731. doi:10.1097/00005053-199312000-00003

Abstract

Intake Beck Depression Inventory (BDI) item scores of 400 outpatient major depressives were submitted to a categorization algorithm developed for artificial intelligence applications. The algorithm maximizes a function of “category utility” that is preferable in several respects to available clustering methods, and has demonstrated its capacity to locate the most informative, or “basic,” level of categorization. The analysis yielded four syndromal subtypes: a common, general depressive type; a common and relatively severe melancholic type; an infrequent type characterized by self-critical features, generalized anxiety, and an absence of melancholic features; and an infrequent, mild type distinguished by enervation and anhedonic features. Implications for the classification of depression are discussed.

Country of Research

USA

Design of Study

Unclear

Duration of Study

6 years, 1985-1991

Name of Condition

Depression

Artificial Intelligence Technique Used

COBWEB/3 analysis

Providers’ involvement in

Testing

Accuracy of the AI Intervention

Not Specified

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : 400 outpatients

Implementation

Intake Beck Depression Inventory (BDI) item scores of 400 outpatient major depressives were submitted to a categorization algorithm developed for artificial intelligence applications. The algorithm maximizes a function of “category utility” that is preferable in several respects to available clustering methods, and has demonstrated its capacity to locate the most informative, or “basic,” level of categorization. The analysis yielded four syndromes subtypes: a common, general depressive type; a common and relatively severe melancholic type; an infrequent type characterized by self-critical features, generalized anxiety, and an absence of melancholic features; and an infrequent, mild type distinguished by enervation and anhedonic features. Implications for the classification of depression are discussed.

Maintenance

Not specified

Key Conclusions

Intake Beck Depression Inventory (BDI) item scores of 400 outpatient major depressives were submitted to a categorization algorithm developed for artificial intelligence applications. The algorithm maximizes a function of “category utility” that is preferable in several respects to available clustering methods, and has demonstrated its capacity to locate the most informative, or “basic,” level of categorization. The analysis yielded four syndromes subtypes: a common, general depressive type; a common and relatively severe melancholic type; an infrequent type characterized by self-critical features, generalized anxiety, and an absence of melancholic features; and an infrequent, mild type distinguished by enervation and an hedonic features. Implications for the classification of depression are discussed.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	+

Color Code

Low

Unclear

High

Paper 67

Paper Title: Predicting suicidal ideation in primary care: An approach to identify easily assessable key variables

Authors or developers

Jordan, P.
Shedden-Mora, M. C.
LÃwe, B.

Year of Publication

2018

Full reference of the study

Jordan P, Shedden-Mora MC, Löwe B. Predicting suicidal ideation in primary care: An approach to identify easily assessable key variables. Gen Hosp Psychiatry. 2018;51:106‐111. doi:10.1016/j.genhosppsych.2018.02.002

Abstract

Objective: To obtain predictors of suicidal ideation, which can also be used for an indirect assessment of suicidal ideation (SI). To create a classifier for SI based on variables of the Patient Health Questionnaire (PHQ) and sociodemographic variables, and to obtain an upper bound on the best possible performance of a predictor based on those variables. Methods: From a consecutive sample of 9,025 primary care patients, 6,805 eligible patients (60% female; mean age = 51.5 years) participated. Advanced methods of machine learning were used to derive the prediction equation. Various classifiers were applied and the area under the curve (AUC) was computed as a performance measure. Results: Classifiers based on methods of machine learning outperformed ordinary regression methods and achieved AUCs around 0.87. The key variables in the prediction equation comprised four items – namely feelings of depression/hopelessness, low self-esteem, worrying, and severe sleep disturbances. The generalized anxiety disorder scale (GAD-7) and the somatic symptom subscale (PHQ-15) did not enhance prediction substantially. Conclusions: In predicting suicidal ideation, researchers should refrain from using ordinary regression tools. The relevant information is primarily captured by the depression subscale and should be incorporated in a nonlinear model. For clinical practice, a classification tree using only four items of the whole PHQ may be advocated.

Country of Research

Germany

Design of Study

Unclear

Duration of Study

NS Not specified

Name of Condition

Suicidal Ideation

Artificial Intelligence Technique Used

Machine learning, Advanced methods of machine learning were used to derive the prediction equation. Various classifiers were applied and the area under the curve (AUC) was computed as a performance measure.

Providers

Developing : From a consecutive sample of 9025 primary care patients, 6805 eligible patients (60% female; mean age = 51.5 years) participated. Advanced methods of machine learning were used to derive the prediction equation. Various classifiers were applied and the area under the curve (AUC) was computed as a performance measure.

Accuracy of the AI Intervention

AUC 0.87

Patient-related Outcomes Assessed

The present study indicates that, with 12.6% “two-week”-prevalence, Suicidal ideation SI is a relevant issue in primary care, in line with the literature . Results of this study also indicate which symptoms are closely interrelated with SI and might therefore give reference to a potential SI: feeling down, depressed or hopeless, low self-esteem, worrying, and severe sleep disturbances were identified as core indicators. AUCs near 87 are achievable for the purpose of predicting SI in a primary care setting based on sociodemographic variables and variables pertaining to the Patient Health Questionnaire.

Primary Healthcare Worker Related Outcomes Assessed

Classifiers based on methods of machine learning outperformed ordinary regression methods and achieved AUCs around 0.87. The key variables in the prediction equation comprised four items – namely feelings of depression/hopelessness, low self-esteem, worrying, and severe sleep disturbances.

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Assessment performed in 33 primary care practices

Implementation

Information

Maintenance

Not specified (Unclear)

Key Conclusions

In predicting suicidal ideation researchers should refrain from using ordinary regression tools. The relevant information is primarily captured by the depression subscale and should be incorporated in a nonlinear model. For clinical practice, a classification tree using only four items of the whole PHQ may be advocated., Patient Health Questionnaire

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	?	+

Color Code

Low

Unclear

High

Paper 68

Paper Title: Predicting out-of-office blood pressure in the clinic for the diagnosis of hypertension in primary care: An economic evaluation

Authors or developers

Monahan, M.
Jowett, S.
Lovibond, K.
Gill, P.
Godwin, M.
Greenfield, S.
Hanley, J.
Hobbs, F. D. R.
Martin, U.
Mant, J.
McKinstry, B.
Williams, B.
Sheppard, J. P.
McManus, R. J.

Year of Publication

2018

Full reference of the study

Monahan, M., Jowett, S., Lovibond, K., Gill, P., Godwin, M., Greenfield, S., … & McKinstry, B. (2018). Predicting out-of-office blood pressure in the clinic for the diagnosis of hypertension in primary care: an economic evaluation. Hypertension, 71(2), 250-261.

Abstract

Clinical guidelines in the United States and United Kingdom recommend that individuals with suspected hypertension should have ambulatory blood pressure (BP) monitoring to confirm the diagnosis. This approach reduces misdiagnosis because of white coat hypertension but will not identify people with masked hypertension who may benefit from treatment. The Predicting Out-of-Office Blood Pressure (PROOF-BP) algorithm predicts masked and white coat hypertension based on patient characteristics and clinic BP, improving the accuracy of diagnosis while limiting subsequent ambulatory BP monitoring. This study assessed the cost-effectiveness of using this tool in diagnosing hypertension in primary care. A Markov cost.utility cohort model was developed to compare diagnostic strategies: The PROOF-BP approach, including those with clinic BP .130/80 mm Hg who receive ambulatory BP monitoring as guided by the algorithm, compared with current standard diagnostic strategies including those with clinic BP .140/90 mm Hg combined with further monitoring (ambulatory BP monitoring as reference, clinic, and home monitoring also assessed). The model adopted a lifetime horizon with a three-month time cycle, taking a UK Health Service/Personal Social Services perspective. The PROOF-BP algorithm was cost-effective in screening all patients with clinic BP .130/80 mm Hg compared with current strategies that only screen those with clinic BP .140/90 mm Hg, provided healthcare providers were willing to pay up to 20, 000 Euro ($26, 000)/quality-adjusted life year gained. Deterministic and probabilistic sensitivity analyses supported the base-case findings. The PROOF-BP algorithm seems to be cost-effective compared with the conventional BP diagnostic options in primary care. Its use in clinical practice is likely to lead to reduced cardiovascular disease, death, and disability.

Country of Research

Design of Study

Cohort study

Duration of Study

Name of Condition

Out of office Hypertension

Artificial Intelligence Technique Used

Markov cost–utility cohort model, the model adopted a lifetime horizon with a 3-month time cycle

Provider’s involvement in

Accuracy of the AI Intervention

Patient-related Outcomes Assessed

Study assessed the cost-effectiveness of using this tool in diagnosing hypertension in primary care. A Markov cost–utility cohort model was developed to compare diagnostic strategies: the PROOF-BP approach, including those with clinic BP ≥130/80 mmHg who receive ambulatory BP monitoring as guided by the algorithm, compared with current standard diagnostic strategies including those with clinic BP ≥140/90 mmHg combined with further monitoring (ambulatory BP monitoring as reference, clinic, and home monitoring also assessed)., The prevalence of masked hypertension was increased and decreased by 25%

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Health Survey for England,12 and adjusted clinic BPs

Implementation

Predicting Out-of-Office Blood Pressure (PROOF-BP) algorithm predicts masked and white coat hypertension based on patient characteristics and clinic BP, improving the accuracy of diagnosis while limiting subsequent ambulatory BP monitoring. This study assessed the cost-effectiveness of using this tool in diagnosing hypertension in primary care. A Markov cost–utility cohort model was developed to compare diagnostic strategies: the PROOF-BP approach, including those with clinic BP ≥130/80 mmHg who receive ambulatory BP monitoring as guided by the algorithm, compared with current standard diagnostic strategies including those with clinic BP ≥140/90 mmHg combined with further monitoring (ambulatory BP monitoring as reference, clinic, and home monitoring also assessed).

Maintenance

Not specified (Unclear)

Key Conclusions

Clinical guidelines in the United States and United Kingdom recommend that individuals with suspected hypertension should have ambulatory blood pressure (BP) monitoring to confirm the diagnosis. This approach reduces misdiagnosis because of white coat hypertension but will not identify people with masked hypertension who may benefit from treatment. The Predicting Out-of-Office Blood Pressure (PROOF-BP) algorithm predicts masked and white coat hypertension based on patient characteristics and clinic BP, improving the accuracy of diagnosis while limiting subsequent ambulatory BP monitoring. This study assessed the cost-effectiveness of using this tool in diagnosing hypertension in primary care. A Markov cost–utility cohort model was developed to compare diagnostic strategies: the PROOF-BP approach, including those with clinic BP ≥130/80 mmHg who receive ambulatory BP monitoring as guided by the algorithm, compared with current standard diagnostic strategies including those with clinic BP ≥140/90 mmHg combined with further monitoring (ambulatory BP monitoring as reference, clinic, and home monitoring also assessed). The model adopted a lifetime horizon with a 3-month time cycle, taking a UK Health Service/Personal Social Services perspective. The PROOF-BP algorithm was cost-effective in screening all patients with clinic BP ≥130/80 mmHg compared with current strategies that only screen those with clinic BP ≥140/90 mmHg, provided healthcare providers were willing to pay up to £20 000 ($26000)/quality-adjusted life year gained. Deterministic and probabilistic sensitivity analyses supported the base-case findings. The PROOF-BP algorithm seems to be cost-effective compared with the conventional BP diagnostic options in primary care. Its use in clinical practice is likely to lead to reduced cardiovascular disease, death, and disability.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	?

Color Code

Low

Unclear

High

Paper 69

Paper Title: Artificial neural network based prediction of malaria abundances using big data: A knowledge capturing approach

Authors or developers

Thakur, S.
Dharavath, R.

Year of Publication

2018

Full reference of the study

Thakur, Santosh, and Ramesh Dharavath. “Artificial neural network based prediction of malaria abundances using big data: A knowledge capturing approach.” Clinical Epidemiology and Global Health 7.1 (2019): 121-126.

Abstract

Background and objective: Malaria is one of the most prevalent diseases in urban areas. Malaria flourishes in sub-tropical countries and affects the public health. The impact is very high, where health monitoring facilities are very limited. To minimize the impact of malaria population in sub-tropical domains, a suitable disease prediction model is required. The objective of this study is to determine the malaria abundances using clinical and environmental variables with Big Data on the geographical location of Khammam district, Telanagana, India. Methods: The prediction model is based on the data collected from primary health centres in the department of vector borne diseases (DVBD) of Khammam district and satellite data such as rain fall, relative humidity, temperature and vegetation taken from 1995-2014. In this study, we test the efficacy of the artificial neural network (ANN) for mosquito abundance prediction. The prediction model was developed during 2015 using a feed forward neural network and compared with the observed values. Results and conclusions: The results vary from area to area based on clinical variables and rainfall in the prediction model corresponding to areas. The average error of the prediction model ranges from 18% to 117%. Clinical data such as number of patients treated with symptoms and without symptoms can improve the prediction level when combined with environmental variables. We perform preliminary findings of malaria abundances by collecting clinical big data across different seasons. Further, more exploration is required in prediction of malaria using big data to improve the accuracy in real practice. In this manuscript, we perform some preliminary findings of malaria abundances by collecting larger data across different seasons. Until today, many models have been developed to examine the malaria prediction with different approaches, but malaria prediction with environmental and clinical data is a new approach with big data analysis.

Country of Research

India

Design of Study

Unclear

Duration of Study

20 years, 1995’2014

Name of Condition

Malaria

Artificial Intelligence Technique Used

Artificial neural networks ANN model used in this study is a feed-forward network, which is implemented in R studio, A prediction model is presented to predict mosquito abundances by using the simple back propagation method. A feed forward neural network model with a single hidden layer provides time series modelling which is widely used in forecasting, feed forward neural network

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

RMSPE error was analyzed and found that the geographical location of Khammam has the highest error rate at 117% and Venkatapuram has the lowest error rate as 18.3%.

Patient-related Outcomes Assessed

Based on clinical variables and rainfall in the prediction model corresponding to areas. The average error of the prediction model ranges from 18% to 117%. Clinical data such as number of patients treated with symptoms and without symptoms can improve the prediction level when combined with environmental variables.

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Primary health centres of department of vector borne diseases (DVBD) of Khammam district

Implementation

Yes

Maintenance

Not mentioned

Key Conclusions

This study has observed how climatic conditions and clinical treatment can play a significant role in malaria prediction. Until now, many models have been developed to examine the malaria prediction with different approaches, but malaria prediction with environmental and clinical data is a new approach to big data analysis.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	–	+

Color Code

Low

Unclear

High

Paper 70

Paper Title: On an algorithm for decision-making for the optimization of disease prediction at the primary health care level using neural network clustering

Authors or developers

Selskyy, P.
Vakulenko, D.
Televiak, A.
Veresiuk, T.

Year of Publication

2018

Full reference of the study

Selskyy P, Vakulenko D, Televiak A, Veresiuk T. On an algorithm for decision-making for the optimization of disease prediction at the primary health care level using neural network clustering. Family Medicine & Primary Care Review. 2018;20(2):171-175. doi:10.5114/fmpcr.2018.76463.

Abstract

Background. A number of studies have been aimed at solving the problems of introducing information technology and systems, but the questions of informatization in rural medicine are not completely resolved. It is important to optimize the prognosis of diseases using available and inexpensive information methods for the improvement of primary healthcare. Objectives. The aim of our study was to develop an algorithm to optimize the decision-making prognosis of disease at the primary health care level based on information methods. Material and methods. The data used for analysis originated from the survey results of 63 patients with hypertension in educational and practical centers of primary health care (EPCPHC) of Ternopil region (Ukraine). For a deeper analysis and clustering, the neural network approach was used with the NeuroXL Classifier add-in application for Microsoft Excel. Results. Thirteen (19.40%) patients experienced health deterioration and the development of complications. It has been established that neural network clustering could effectively and objectively allocate patients to the appropriate category in terms of the average survey results. Cluster analysis results have shown that the combination of high blood pressure (systolic, diastolic and pulse) gave reason to anticipate the deterioration of patients’ conditions. Conclusions. A decision algorithm was created in order to optimize the prediction of diseases at the primary health care level, and also to correct examination and treatment based on an analysis of average values of patients’ examination and the use of neural network clustering.

Country of Research

Ukraine

Design of Study

Unclear

Duration of Study

2 years(2011-2012)

Name of Condition

Hypertension

Artificial Intelligence Technique Used

NeuroXL Classifier, Add-in application for Microsoft Excel

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

NS, Correlation analysis showed a direct correlation between the number of hemodynamic values during the first and second surveys of the patients in the group with stable hypertension course (pulse ± 0.5, blood pressure, diastolic ± 0.3, pulse ± 0.1) and deterioration (pulse ± 0.6, blood pressure systolic ± 0.5, diastolic ± 0.7 pulse ± 0.3).

Patient-related Outcomes Assessed

Thirteen (19.40%) patients experienced health deterioration and the development of complications. It has been established that neural network clustering could effectively and objectively allocate patients to the appropriate category in terms of the average survey results. Cluster analysis results have shown that the combination of high blood pressure (systolic, diastolic and pulse) gave reason to anticipate the deterioration of patients’ conditions., The average heart rate was (78.24 ± 1.15) beats per minute. Blood pressure values were as follows: systolic − (154.76 ± 2.29) mm Hg. Art., diastolic – (92.94 ± 1.04) mm Hg. Art., pulse pressure – (61.83 ± 1.95) mm Hg. Art.

Primary Healthcare Worker Related Outcomes Assessed

As the results of the study show, the simple analysis based on average values and the correlation coefficients between age, the position of the electrical axis of the heart, and series of hemodynamic parameters is the primary tool among researchers. But such simple data processing makes it impossible to determine a combination of changes for certain parameters in order to predict disease course and outcome, such as further deterioration or improvement. At the same time, neural network clustering can effectively and objectively allocate patients to the appropriate category – either deterioration or with stable outcome. Cluster analysis results have shown that the combination of high blood pressure (systolic, diastolic and pulse) gave reason to anticipate the deterioration of patients’ conditions, while the combination of high age and heart rate (tachycardia) were essential, but not a priority for the prediction. Tactics phased analysis of indicators of examination of patients with hypertension has been proposed in the form of an algorithm for decision-making in order to optimize the prediction of disease and to improve patient examination and treatment

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Educational and practical centers of primary health care (EPCPHC) of Ternopil region (Ukraine)

Implementation

The data used for analysis originated from the survey results of 63 patients with hypertension in educational and practical centers of primary health care (EPCPHC) of Ternopil region (Ukraine). For a deeper analysis and clustering, the neural network approach was used with the NeuroXL Classifier add-in application for Microsoft Excel

Maintenance

Key Conclusions

This article describes a method of analysis consisting of the survey results of patients with hypertension in educational and practical centers of primary health care units based on primary cardiac function indicators, their average values, correlation coefficients, and algorithms of network clustering. Neural network clustering has been used for the purpose of effective and objective allocation of patients to relevant categories in terms of certain indicators obtained during observation, which enables the determination of a combination of changes in certain parameters for prognosis of the disease towards deterioration or improvement. The algorithm for decision-making has been created in order to optimize the prediction of diseases at the primary level, and for correct examination and treatment through the use of neural network clustering.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	+	–

Color Code

Low

Unclear

High

Paper 71

Paper Title: Machine Learning Detection of Cognitive Impairment in Primary Care

Authors or developers

Levy, B.
Hogan, J.
Hess, C.
Greenspan, S.
Hogan, M.
Gable, S.
Falcon, K.
Elber, A.’, “O’Connor, M.”, ‘Driscoll, D.
Hashmi, A.

Year of Publication

2018

Full reference of the study

Levy B, Gable S, Tsoy E, et al. (2017) Machine Learning Detection of Cognitive Impairment in Primary Care. Alzheimers Dis Dement 1(2):38-46

Abstract

In an effort to decrease these impediments, the current study evaluated the validity of a screening procedure that had been specifically designed to impose minimal burden on the clinic.

Country of Research

USA

Design of Study

Unclear

Duration of Study

9 months

Name of Condition

Cognitive Impairment

Artificial Intelligence Technique Used

Support Vector Machines (SVM)

Provider’s involvement in

Developing : NS,Testing : NS,Validating : NS

Accuracy of the AI Intervention

Patient-related Outcomes Assessed

The classifying algorithm correctly assigned participants to their respective groups at a probability of 0.945 through a leave one out validation procedure

Primary Healthcare Worker Related Outcomes Assessed

CNS Screen may offer a pragmatic alternative to clinician-administered procedures, while maintaining the validity required for clinical practice. Implications for patient care in primary care settings are discussed.

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Routine cognitive screenings in primary care settings

Implementation

Maintenance

nan

Key Conclusions

Routine cognitive screenings in primary care settings can benefit patient care and preventive medicine in multiple ways; however, their integration to the protocol of physical exams, as a standard of care, may be hampered by systemic considerations related to labor and cost. In an effort to decrease these impediments, the current study evaluated the validity of a screening procedure that had been specifically designed to impose minimal burden on the clinic. Further, The findings suggest that instruments such as the CNS Screen may offer a pragmatic alternative to clinician-administered procedures, while maintaining the validity required for clinical practice.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	+	?

Color Code

Low

Unclear

High

Paper 72

Paper Title: Development and validation of clinical prediction models: Marginal differences between logistic regression, penalized maximum likelihood estimation, and genetic programming

Authors or developers

Janssen, K. J. M.
Siccama, I.
Vergouwe, Y.
Koffijberg, H.
Debray, T. P. A.
Keijzer, M.
Grobbee, D. E.
Moons, K. G. M.

Year of Publication

2012

Full reference of the study

Janssen, Kristel JM, et al. “Development and validation of clinical prediction models: marginal differences between logistic regression, penalized maximum likelihood estimation, and genetic programming.” Journal of clinical epidemiology 65.4 (2012): 404-412.

Abstract

Objective: Many prediction models are developed by multivariable logistic regression. However, there are several alternative methods to develop prediction models. We compared the accuracy of a model that predicts the presence of deep venous thrombosis (DVT) when developed by four different methods. Study Design and Setting: We used the data of 2,086 primary care patients suspected of DVT, which included 21 candidate predictors. The cohort was split into a derivation set (1,668 patients, 329 with DVT) and a validation set (418 patients, 86 with DVT). Also, 100 cross-validations were conducted in the full cohort. The models were developed by logistic regression, logistic regression with shrinkage by boot-strapping techniques, logistic regression with shrinkage by penalized maximum likelihood estimation, and genetic programming. The accuracy of the models was tested by assessing discrimination and calibration. Results: There were only marginal differences in the discrimination and calibration of the models in the validation set and cross-validations. Conclusion: The accuracy measures of the models developed by the four different methods were only slightly different, and the 95% confidence intervals were mostly overlapped. We have shown that models with good predictive accuracy are most likely developed by sensible modeling strategies rather than by complex development methods. (C) 2012 Elsevier Inc. All rights reserved.

Country of Research

The Netherlands

Design of Study

Cohort study

Duration of Study

4 year, 2001-2005

Name of Condition

Deep venous thrombosis (DVT)

Artificial Intelligence Technique Used

Logistic regression (without shrinkage, with inherent shrinkage, one shrinkage factor, genetic programming)

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Logistic regression (without shrinkage: 0.904 (0.558-0.922), with inherent shrinkage: 0.904 (0.885-0.922), one shrinkage factor 0.902 (0..883-0.92), genetic programming: 0.91 (0.89-0.928)

Patient-related Outcomes Assessed

The study used the data of 2,086 primary care patients suspected of DVT, which included 21 candidate predictors. The cohort was split into a derivation set (1,668 patients, 329 with DVT) and a validation set (418 patients, 86 with DVT). Also, 100 cross-validations were conducted in the full cohort. The models were developed by logistic regression, logistic regression with shrinkage by bootstrapping techniques, logistic regression with shrinkage by penalized maximum likelihood estimation, and genetic programming. The accuracy of the models was tested by assessing discrimination and calibration. Results: There were only marginal differences in the discrimination and calibration of the models in the validation set and cross validations.

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes : Many prediction models are developed by multivariable logistic regression. However, there are several alternative methods to develop prediction models. We compared the accuracy of a model that predicts the presence of deep venous thrombosis (DVT) when developed by four different methods.

Adoption

Yes (number of providers i.e. PHC participating) : 2,086 primary care patients suspected of DVT

Implementation

Used the data of 2,086 primary care patients suspected of DVT, which included 21 candidate predictors. The cohort was split into a derivation set (1,668 patients, 329 with DVT) and a validation set (418 patients, 86 with DVT). Also, 100 cross validations were conducted in the full cohort. The models were developed by logistic regression, logistic regression with shrinkage by bootstrapping techniques, logistic regression with shrinkage by penalized maximum likelihood estimation, and genetic programming. The accuracy of the models was tested by assessing discrimination and calibration

Maintenance

Not specified

Key Conclusions

The accuracy measures of the models developed by the four different methods were only slightly different, and the 95% confidence intervals were mostly overlapped. We have shown that models with good predictive accuracy are most likely developed by sensible modeling strategies rather than by complex development methods

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	+	+

Color Code

Low

Unclear

High

Paper 73

Paper Title: A hybrid knowledge-based approach to supporting the medical prescription for general practitioners: Real case in a Hong Kong medical center

Authors or developers	Ting, S. L. Kwok, S. K. Tsang, A. H. C. Lee, W. B.
Year of Publication	2011
Full reference of the study	Ting, Jacky S. L., S. K. Kwok, Albert H. C. Tsang and W. B. Lee. “A hybrid knowledge-based approach to supporting the medical prescription for general practitioners: Real case in a Hong Kong medical center.” Known. Based Syst. 24 (2011): 444-456.
Abstract	Objective: With the increased complexity and uncertainty in drug information, issuing medical prescriptions has become a vexing issue. As many as 240,000 medicines are available on the market, so this paper proposes a novel approach to the issuing of medical prescriptions. The proposed process will provide general practitioners (GPs) with medication advice and suggest a range of medicines for specific medical conditions by taking into consideration the collective pattern as well as the individual preferences of physicians’ prescription decisions. Methods and material: A hybrid approach is described that uses a combination of case-based reasoning (CBR) and Bayesian reasoning. In the CBR process, all the previous knowledge retrieved via similarity measures is made available for the reference of physicians as to what medicines have been prescribed (to a particular patient) in the past. After obtaining the results from CBR; Bayesian reasoning is then applied to model the prescription experience of all physicians within the organization. By comparing the two sets of results, more refined recommendations on a range of medicines are suggested along with the ranking for each recommendation. Results: To validate the proposed approach, a Hong Kong medical center was selected as a testing site. Through application of the hybrid approach in the medical center for a period of one month, the results demonstrated that the approach produced satisfactory performance in terms of user satisfaction, ease of use, flexibility and effectiveness. In addition, the proposed approach yields better results and a faster learning rate than when either CBR or Bayesian reasoning are applied alone. Conclusion: Even with the help of a decision support system, the current approach to anticipating what drugs are to be prescribed is not flexible enough to cater for individual preferences of GPs, and provides little support for managing complex and dynamic changes in drug information. Therefore, with the increase in the amount of information about drugs, it is extremely difficult for physicians to write a good prescription. By integrating CBR and Bayesian reasoning, the general practitioners’ prescription practices can be retrieved and compared with the collective prescription experience as modeled by probabilistic reasoning. As a result, physicians can select the drugs which are supported by informed evidential decisions. Which is, they can take into consideration the pattern of decisions made by other physicians in similar cases. (C) 2011 Elsevier B.V. All rights reserved.
Country of Research	China
Design of Study	Unclear : Case study
Duration of Study	Not Specified
Name of Condition	Not Specified, Medication advice
Artificial Intelligence Technique Used	Hybrid approach is described that uses a combination of case-based reasoning (CBR) and Bayesian reasoning, case-based reasoning (CBR) and Bayesian reasoning
Providers’ involvement in	Developing : Hybrid approach is described that uses a combination of case-based reasoning (CBR) and Bayesian reasoning. In the CBR process, all the previous knowledge retrieved via similarity measures is made available for the reference of physicians as to what medicines have been prescribed (to a particular patient) in the past. After obtaining the results from CBR, Bayesian reasoning is then applied to model the prescription experience of all physicians within the organization
Accuracy of the AI Intervention	A hybrid knowledge-based decision support approach capable of extracting comprehensible individual and collective prescription behavior with good accuracy in medical prescription is proposed.
Patient-related Outcomes Assessed	Not Specified
Primary Healthcare Worker Related Outcomes Assessed	The satisfactory results demonstrate the potential for adoption of this method in various medical organizations. However, there is still room for further development. Further research will consider more factors to determine the recommended drug lists. For example, combining the drug supply chain concept can further improve the results of drug selection. In addition, mining the relationships between drugs can generate more precise drug lists. Thus, we will extend our hybrid approach for medical prescription to take more and different factors into consideration.
Healthcare System-related Outcomes Assessed	A hybrid approach is described that uses a combination of case-based reasoning (CBR) and Bayesian reasoning. In the CBR process, all the previous knowledge retrieved via similarity measures is made available for the reference of physicians as to what medicines have been prescribed (to a particular patient) in the past. After obtaining the results from CBR, Bayesian reasoning is then applied to model the prescription experience of all physicians within the organization. By comparing the two sets of results, more refined recommendations on a range of medicines are suggested along with the ranking for each recommendation
Reached Target Population?	Yes
Adoption	Not specified
Implementation	The proposed hybrid approach has been validated in a medical center. The satisfactory results demonstrate the potential for adoption of this method in various medical organizations.
Maintenance	Not specified
Key Conclusions	The proposed hybrid approach has been validated in a medical center. The satisfactory results demonstrate the potential for adoption of this method in various medical organizations. However, there is still room for further development. Further research will consider more factors to determine the recommended drug lists. For example, combining the drug supply chain concept can further improve the results of drug selection. In addition, mining the relationships between drugs can generate more precise drug lists. Thus, we will extend our hybrid approach for medical prescription to take more and different factors into consideration.

Paper 74

Paper Title: RACER: Rule-Associated CasE-based Reasoning for supporting General Practitioners in prescription making

Authors or developers	Ting, S. L. Wang, W. M. Kwok, S. K. Tsang, A. H. C. Lee, W. B.
Year of Publication	2010
Full reference of the study	Ting, S. L., et al. “RACER: Rule-Associated CasE-based Reasoning for supporting General Practitioners in prescription making.” Expert systems with applications 37.12 (2010): 8079-8089.
Abstract	Prescription is an important element in the medical practice. An appropriate drug therapy is complex in which the decision of prescribing is influenced by many factors. Any discrepancy in the prescription making process can lead to serious consequences. In particular, the General Practitioners (GPs), who need to diagnose and treat a wide range of health conditions and diseases, must be knowledgeable enough in deciding what type of medicines should be given to the patients. With the widespread computerization of medical records, GPs now can make use of accumulated historic clinical data in retrieving similar decisions in therapeutic treatment for treating the new situation. However, the applications of decision support tools are rarely found in the prescription domain due to the complex nature of the domain and limitations of the existing tools. It was argued that existing tools can only solve a small amount of the cases on the real world dataset. This paper proposes a new revised Case-based Reasoning (CBR) mechanism, named Rule-Associated CasE-based Reasoning (RACER), which integrates CBR and association rules mining for supporting GPs prescription. It aims at leveraging the two most common techniques in the field and dealing with the complex multiple values solution. Eight hundred real cases from a medical organization are collected and used for evaluating the performance of RACER. The proposed method was also compared with CBR and association rules mining for testing. The results demonstrate that the combination leads to increased in both recall and precision in various settings of parameters. The performance of RACER remains stable by using different sets of parameters, which shows that the most important element of the mechanism is self-determined. (C) 2010 Elsevier Ltd. All rights reserved.
Country of Research	Hong Kong
Design of Study	Unclear
Duration of Study	NS
Name of Condition	Different Healthcare problem
Artificial Intelligence Technique Used	Case-based Reasoning (CBR) mechanism, named Rule-Associated CasE-based Reasoning (RACER)
Provider’s involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	NS, (CBR does not have the tendency to overgeneralize, and thus CBR can achieve excellent accuracy provided that it generates the solutions from the memorized cases)
Patient-related Outcomes Assessed	NS
Primary Healthcare Worker Related Outcomes Assessed	This paper presents a hybrid approach, RACER, which integrates CBR and association rules mining for supporting the prescription making of GPs. By taking the specific experiential knowledge (i.e. from cases) and general knowledge in the medical records (i.e. from the associative relationship between clinical findings and medicines being prescribed) into considerations, the proposed approach is able to leverage and compensate both kinds of knowledge, so as to provide a better decision support.
Healthcare System-related Outcomes Assessed	NS
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : GPs but number not specified
Implementation	This paper proposes a new revised Case-based Reasoning (CBR) mechanism, named Rule-Associated CasE-based Reasoning (RACER), which integrates CBR and association rules mining for supporting GPs prescription. It aims at leveraging the two most common techniques in the field and dealing with the complex multiple values solution. Eight hundred real cases from a medical organization are collected and used for evaluating the performance of RACER.
Maintenance	Not specified (Unclear)
Key Conclusions	This paper presents a hybrid approach, RACER, which integrates CBR and association rules mining for supporting the prescription making of GPs. By taking the specific experiential knowledge (i.e. from cases) and general knowledge in the medical records (i.e. from the associative relationship between clinical findings and medicines being prescribed) into considerations, the proposed approach is able to leverage and compensate both kinds of knowledge, so as to provide a better decision support. This paper also introduces a new ranking measurement for assigning a likelihood ratio for each medicine extracted from the cases. A series of experiments has been carried out for measuring the performance of RACER against CBR and association rules mining by using real prescription data. The results showed that RACER is outperforming than the other two approaches in various settings. The performance of RACER remains stable by using different sets of parameters, which shows that it is not necessary to know what the appropriate settings for the RACER are in advance. The next research stage will be to select the appropriate features and parameters in order to optimize the accuracy of the algorithm and also to design a user-friendly interface for the GPs to apply RACER in their daily operations.

Paper 75

Paper Title: THE POTENTIAL FOR COMPUTER-AIDED DIAGNOSIS OF TROPICAL DISEASES IN DEVELOPING-COUNTRIES - AN EXPERT SYSTEM CASE-STUDY

Authors or developers

Doukidis, G. I.
Forster, D.

Year of Publication

1990

Full reference of the study

Georgios I. Doukidis, Dayo Forster, The potential for computer-aided diagnosis of tropical diseases in developing countries: An expert system case study, European Journal of Operational Research, Volume 49, Issue 2, 1990, Pages 271-278

Abstract

In industrialised countries, although extensive research into medical diagnosis expert systems over the past two decades proved initially promising, few expert systems are in routine use. In the context of developing countries, a different environment exists in which computer-aided diagnosis could have a substantial impact. Better resource allocation and management is required to alleviate the problems imposed on health systems by financial and personnel constraints. This paper proposes the development and implementation of ESTROPID, an Expert system on Tropical Diseases, to assist paramedical staff during training and in clinical practice. This forms part of a project designed to investigate the potential of various health information systems. The current ESTROPID prototype is evaluated and suggestions for future developments are given. The authors remain optimistic about the eventual realisation of this project.

Country of Research

Design of Study

Unclear , An expert system case study

Duration of Study

Name of Condition

Tropical diseases

Artificial Intelligence Technique Used

ESTROPID, an Expert system

Provider’s involvement in

Developing : Development and implementation of ESTROPID, an Expert system on aatoPIcal Diseases, to assist paramedical staff during training and in clinical practice, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Overall accuracy of 92.5%.

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Better resource allocation and management is required to alleviate the problems imposed on health systems by financial and personnel constraints. This paper proposes the development and implementation of ESTROPID, an Expert system on aatoPIcal Diseases, to assist paramedical staff during training and in clinical practice.

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : Primary Health Care (PHC) villages.

Implementation

Development and implementation of ESTROPID, an Expert system on aatoPIcal Diseases, to assist paramedical staff during training and in clinical practice. This forms part of a project designed to investigate the potential of various health information systems.

Maintenance

Key Conclusions

A lot of interest is being generated concerning the introduction of cost-effective and appropriate information technology applications into developing countries. A case study of a successful implementation of an information system in the Gambian environment could be a model of benefit to developers and policy makers. As yet, ESTROPID is in its infant stages and is to be regarded as part of a wider reaching project. Its diagnostic capabilities are rather limited at present and future work will incorporate the issues raised in the previous Section and also consider further aspects such as: computer hardware, staff requirements and training, the range of other possible applications, and a proposal for operating a compatible system nationwide. The overall concern is that from this project would emerge a practical system that has taken the economic, social and educational aspects into consideration and will ultimately be of use. The initial indications are positive.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	–	–

Color Code

Low

Unclear

High

Paper 76

Paper Title: Applying a human factors approach to improve usability of a decision support system in tele-nursing

Authors or developers	Tariq, Amina Westbrook, Johanna Byrne, Mary Robinson, Maureen Baysari, Melissa T.
Year of Publication	2017
Full reference of the study	Tariq, A., Westbrook, J., Byrne, M., Robinson, M., & Baysari, M. T. (2017). Applying a human factors approach to improve usability of a decision support system in tele-nursing. Collegian, 24(3), 227-236.
Abstract	Summary Aim To evaluate usability of a decision support system for telephone triage nurses. Background Telephone triage by nurses has become an internationally accepted form of health service delivery to cope with increasing demands on primary and emergency care. Decision support software systems are used by nurses to facilitate the telephone triage process, yet, the usability of these systems is rarely assessed. Method We applied a multi-method human factors approach to evaluate the usability of decision support software used by Health direct Australia nurses during telephone triage. Methods included: (1) stakeholder discussions; (2) heuristic analysis by two independent experts across ten usability heuristics; and (3) interviews with system end users ( n = 9). A list of heuristic violations with their severity ratings was developed. Qualitative content analysis of the interview transcripts was undertaken to validate the results of the heuristic evaluation. Findings Forty-one unique heuristic violations were identified in the interface design of the decision support software with median severity of 2.25 (range 0’4, with 0 = no problem to 4 = catastrophic problem). The highest number of violations was observed for flexibility and efficiency of use ( n = 12, median severity = 2.5) and for aesthetic and minimalist design ( n = 11, median severity = 2). Interviews with nurses verified many of the violations identified in the heuristic analysis. Improving the navigational design of the system for flexibility and efficiency of use was identified as necessary by both the experts and end users. Conclusion In adopting a multi-method human factors approach, we identified a number of system design features which may be impacting the safety and efficiency of the nurse telephone triage process. Addressing the identified usability issues and using feedback from end-users to modify the decision support system would optimise system use and so improve the triage process.
Country of Research	Australia
Design of Study	Unclear : Multi-method human factors approach employing three methods
Duration of Study	3 months, December 2014- Feb 2015
Name of Condition	Type 2 diabetes mellituss
Artificial Intelligence Technique Used	Call Enhance Call Centre software
Providers’ involvement in	Developing : Study applied a multi-method human factors approach to evaluate the usability of decision support software used by Health direct Australia nurses during telephone triage. Methods included: (1) stakeholder discussions; (2) heuristic analysis by two independent experts across ten usability heuristics; and (3) interviews with system end users (n = 9). A list of heuristic violations with their severity ratings was developed. Qualitative content analysis of the interview transcripts was undertaken to validate the results of the heuristic evaluation.
Accuracy of the AI Intervention	Design of the CeCC system to address these issues may improve efficiency and accuracy of patient documentation, the provision of information and care advice to callers, and the overall alignment of the software with actual triage processes., 41 unique violations were identified with a median severity of 2.25 (range 0’4 with 0 = no problem to 4 = catastrophic problem)
Patient-related Outcomes Assessed	Not Specified
Primary Healthcare Worker Related Outcomes Assessed	Personal preference via interviews regarding individual nurse perception of the system
Healthcare System-related Outcomes Assessed	Not Specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : 9 nurses.
Implementation	yes
Maintenance	No
Key Conclusions	The authors identified a number of system design features which may be impacting on the safety and efficiency of the nurse telephone triage process.

Paper 77

Paper Title: Simple Prediction of Type 2 Diabetes Mellitus via Decision Tree Modeling

Authors or developers

Sayadi, Mehrab
Zibaeenezhad, Mohammadjavad
Taghi Ayatollahi, Seyyed Mohammad

Year of Publication

2017

Full reference of the study

Sayadi, M., Zibaeenezhad, M., & Ayatollahi, S. T. (2017). Simple prediction of type 2 diabetes mellitus via decision tree modeling. Int Cardiovasc Res J, 11(2), 71-76.

Abstract

Background: Type 2 Diabetes Mellitus (T2DM) is one of the most important risk factors in cardiovascular disorders considered as a common clinical and public health problem. Early diagnosis can reduce the burden of the disease. Decision tree, as an advanced data mining method, can be used as a reliable tool to predict T2DM. Objectives: This study aimed to present a simple model for predicting T2DM using decision tree modeling. Materials and Methods: This analytical model-based study used a part of the cohort data obtained from a database in Healthy Heart House of Shiraz, Iran. The data included routine information, such as age, gender, Body Mass Index (BMI), family history of diabetes, and systolic and diastolic blood pressure, which were obtained from the individuals referred for gathering baseline data in Shiraz cohort study from 2014 to 2015. Diabetes diagnosis was used as binary datum. Decision tree technique and J48 algorithm were applied using the WEKA software (version 3.7.5, New Zealand). Additionally, Receiver Operator Characteristic (ROC) curve and Area Under Curve (AUC) were used for checking the goodness of fit. Results: The age of the 11,302 cases obtained after data preparation ranged from 18 to 89 years with the mean age of 48.1 11.4 years. Additionally, 51.1% of the cases were male. In the tree structure, blood pressure and age were placed where most information was gained. In our model, however, gender was not important and was placed on the final branch of the tree. Total precision and AUC were 87% and 89%, respectively. This indicated that the model had good accuracy for distinguishing patients from normal individuals. Conclusions: The results showed that T2DM could be predicted via decision tree model without laboratory tests. Thus, this model can be used in pre-clinical and public health screening programs.

Country of Research

Iran

Design of Study

Cohort study

Duration of Study

1 year,(2014-2015)

Name of Condition

Type 2 diabetes mellitus (Additional information about blood pressure (systolic/diastolic), BMI provided)

Artificial Intelligence Technique Used

Decision tree modelling,(J48 algorithm applied with WEKA software)

Provider’s involvement in

Developing : Not specified, Testing : Not specified, Validating : Not specified

Accuracy of the AI Intervention

Accuracy: 88%,(Overall average: True positive: 0.89, false positive: 0.52, precession: 87%, recall: 89%, F measure: 0.88, ROC area: 0.89 for healthy, diabetic)

Patient-related Outcomes Assessed

Logistic regression analysis(coefficient): age: 0.061, gender: -0.024, BMI: 0.06, BP (dystopic): 0.085, BP (systolic): 0.016, family history: 0.695,(Type 2 diabetes prevalence: 12.1%)

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Yes

Adoption

Yes (number of providers i.e. PHC participating) : The study applied decision tree using real data and variables in primary healthcare and surveillance system.

Implementation

The study used electronic medical data.

Maintenance

Not specified (Unclear)

Key Conclusions

The study defines a decision tree model that can predict type 2 diabetes mellitus effectively without laboratory tests

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	+	+

Color Code

Low

Unclear

High

Paper 78

Paper Title: An annotation and modeling schema for prescription regimens

Authors or developers	Aberdeen, J. Bayer, S. Clark, C. Keybl, M. Tresner-Kirsch, D.
Year of Publication	2019
Full reference of the study	Aberdeen, J., Bayer, S., Clark, C. et al. An annotation and modeling schema for prescription regimens. J Biomed Semant 10, 10 (2019). https://doi.org/10.1186/s13326-019-0201-9
Abstract	BACKGROUND: We introduce TranScriptML, a semantic representation schema for prescription regimens allowing various properties of prescriptions (e.g. dose, frequency, route) to be specified separately and applied (manually or automatically) as annotations to patient instructions. In this paper, we describe the annotation schema, the curation of a corpus of prescription instructions through a manual annotation effort, and initial experiments in modeling and automated generation of TranScriptML representations. RESULTS: TranScriptML was developed in the process of curating a corpus of 2.914 ambulatory prescriptions written within the Partners Healthcare network, and its schema is informed by the content of that corpus. We developed the representation schema as a novel set of semantic tags for prescription concept categories (e.g. frequency); each tag label is defined with an accompanying attribute framework in which the meaning of tagged concepts can be specified in a normalized fashion. We annotated a subset (1.746) of this dataset using cross-validation and reconciliation between multiple annotators, and used Conditional Random Field machine learning and various other methods to train automated annotation models based on the manual annotations. The TranScriptML schema implementation, manual annotation, and machine learning were all performed using the MITRE Annotation Toolkit (MAT). We report that our annotation schema can be applied with varying levels of pairwise agreement, ranging from low agreement levels (0.125 F for the relatively rare REFILL tag) to high agreement levels approaching 0.9 F for some of the more frequent tags. We report similarly variable scores for modeling tag labels and spans, averaging 0.748 F-measure with balanced precision and recall. The best of our various attribute modeling methods captured most attributes with accuracy above 0.9. CONCLUSIONS: We have described an annotation schema for prescription regimens, and shown that it is possible to annotate prescription regimens at high accuracy for many tag types. We have further shown that many of these tags and attributes can be modeled at high accuracy with various techniques. By structuring the textual representation through annotation enriched with normalized values, the text can be compared against the pharmacist-entered structured data, offering an opportunity to detect and correct discrepancies.
Country of Research	USA
Design of Study	Unclear
Duration of Study	Not Specified
Name of Condition	Not Specified
Artificial Intelligence Technique Used	Conditional random field
Providers’ involvement in	Developing : NS, Testing : NS, Validating : NS
Accuracy of the AI Intervention	Accuracies of 83.7% for dose, 88.0% for route of administration, and 83.2% for frequency, Training with a bias towards precision boosts precision significantly (to 0.996), at the expense of recall (0.407). Surprisingly, training with a bias towards recall fails to boost recall (0.726) but does lower precision (0.651)
Patient-related Outcomes Assessed	Not Specified
Primary Healthcare Worker Related Outcomes Assessed	Not Specified
Healthcare System-related Outcomes Assessed	Not Specified
Reached Target Population?	Yes : Study introduce TranScriptML, a semantic representation schema for prescription regimens allowing various properties of prescriptions (e.g. dose, frequency, route) to be specified separately and applied (manually or automatically) as annotations to patient instructions. In this paper, we describe the annotation schema, the curation of a corpus of prescription instructions through a manual annotation effort, and initial experiments in modeling and automated generation of TranScriptML representations.
Adoption	Not specified
Implementation	TranScriptML was developed in the process of curating a corpus of 2,914 ambulatory prescriptions written within the Partners Healthcare network, and its schema is informed by the content of that corpus. We developed the representation schema as a novel set of semantic tags for prescription concept categories (e.g. frequency); each tag label is defined with an accompanying attribute framework in which the meaning of tagged concepts can be specified in a normalized fashion. We annotated a subset (1,746) of this dataset using cross-validation and reconciliation between multiple annotators, and used Conditional Random Field machine learning and various other methods to train automated annotation models based on the manual annotations. The TranScriptML schema implementation, manual annotation, and machine learning were all performed using the MITRE Annotation Toolkit (MAT). We report that our annotation schema can be applied with varying levels of pairwise agreement, ranging from low agreement levels (0.125 F for the relatively rare REFILL tag) to high agreement levels approaching 0.9 F for some of the more frequent tags. We report similarly variable scores for modeling tag labels and spans, averaging 0. 748 F-measure with balanced precision and recall. The best of our various attribute modeling methods captured most attributes with accuracy above 0.9
Maintenance	Not specified
Key Conclusions	We have described an annotation schema for prescription regimens, and shown that it is possible to annotate prescription regimens at high accuracy for many tag types. We have further shown that many of these tags and attributes can be modeled at high accuracy with various techniques. By structuring the textual representation through annotation enriched with normalized values, the text can be compared against the pharmacist-entered structured data, offering an opportunity to detect and correct discrepancies.

Paper 79

Paper Title: Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices

Authors or developers

Abramoff, M. D.
Lavin, P. T.
Birch, M.
Shah, N.
Folk, J. C.

Year of Publication

2018

Full reference of the study

Abrmoff, M. D., Lavin, P. T., Birch, M., Shah, N., & Folk, J. C. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine, 1(1), 1-8.

Abstract

Artificial Intelligence (AI) has long promised to increase healthcare affordability, quality and accessibility but FDA, until recently, had never authorized an autonomous AI diagnostic system. This pivotal trial of an AI system to detect diabetic retinopathy (DR) in people with diabetes enrolled 900 subjects, with no history of DR at primary care clinics, by comparing to Wisconsin Fundus Photograph Reading Center (FPRC) widefield stereoscopic photography and macular Optical Coherence Tomography (OCT), by FPRC certified photographers, and FPRC grading of Early Treatment Diabetic Retinopathy Study Severity Scale (ETDRS) and Diabetic Macular Edema (DME). More than mild DR (mtmDR) was defined as ETDRS level 35 or higher, and/or DME, in at least one eye. AI system operators underwent a standardized training protocol before study start. Median age was 59 years (range, 22-84 years); among participants, 47.5% of participants were male; 16.1% were Hispanic, 83.3% not Hispanic; 28.6% African American and 63.4% were not; 198 (23.8%) had mtmDR. The AI system exceeded all pre-specified superiority endpoints at sensitivity of 87.2% (95% CI, 81.8-91.2%) (>85%), specificity of 90.7% (95% CI, 88.3-92.7%) (>82.5%), and imageability rate of 96.1% (95% CI, 94.6-97.3%), demonstrating AI’s ability to bring specialty-level diagnostics to primary care settings. Based on these results, FDA authorized the system for use by health care providers to detect more than mild DR and diabetic macular edema, making it, the first FDA authorized autonomous AI diagnostic system in any field of medicine, with the potential to help prevent vision loss in thousands of people with diabetes annually. ClinicalTrials.gov NCT02963441.

Country of Research

USA

Design of Study

Observational study

Duration of Study

7 months,(January 2017-July 2017)

Name of Condition

Diabetic retinopathy

Artificial Intelligence Technique Used

Image quality algorithm

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

AUC: 0.980 (95% CI 0.968 0.992), Sensitivity of 87.2% (95% CI, 81.8 91.2%) (>85%), specificity of 90.7% (95% CI, 88.3 92.7%) (>82.5%), and imageability rate of 96.1% (95% CI, 94.6 97.3%

Patient-related Outcomes Assessed

First FDA authorized autonomous AI diagnostic system in any field of medicine, with the potential to help prevent vision loss in thousands of people with diabetes annually. Prevalence of mild to moderate diabetic retinopathy in this representative sample was 23.8%.

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

This physiologically plausible system with explicit, multiple, partially dependent detectors and a separate module for higher level clinical decisions have parallels in the human and primate ventral visual cortex, with specific sub-regions dedicated to the detection of particular categories of objects

Reached Target Population?

Yes

Adoption

Not specified

Implementation

A clinically inspired algorithm, further, has independent, validated detectors for the lesion’s characteristic for Diabetic Retinopathy, including microaneurysms, hemorrhages and lipoprotein exudates, the outputs of which are then fused into a disease level output, using a separately trained and validated machine learning algorithm. The detectors have been implemented as multilayer convolutional neural networks (CNN), except the microaneurysm detector which is a multiscale feature bank detector, with substantially improved performance on a standardized laboratory dataset. In fact, in a laboratory study, its area under the receiver operator characteristics curve (AUC) of 0.980 (95% CI 0.968 0.992) was not statistically different from a perfect algorithm

Maintenance

Not specified (Unclear)

Key Conclusions

This research shows first FDA authorized the system for use by health care providers to detect more than mild Diabetic Retinopathy and diabetic macular edema, making it, the first FDA authorized autonomous AI diagnostic system in any field of medicine, with the potential to help prevent vision loss in thousands of people with diabetes annually

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	–	+

Color Code

Low

Unclear

High

Paper 80

Paper Title: TOWARD EVIDENCE-BASED PRACTICE. Predicting Common Maternal Postpartum Complications: Leveraging Health Administrative Data and Machine Learning

Authors or developers

Adams, Ellise

Year of Publication

2019

Full reference of the study

Betts KS, Kisely S, Alati R. Predicting common maternal postpartum complications: leveraging health administrative data and machine learning. BJOG 2019;126:702 709.

Abstract

nan

Country of Research

Australia

Design of Study

Unclear

Duration of Study

Six years nine months,(Administrative health data of all inpatient live births (n = 422, 509) in the Australian state of Queensland between January 2009 and October 2015.)

Name of Condition

Maternal Postpartum Complications

Artificial Intelligence Technique Used

Gradient boosting trees

Provider’s involvement in

Developing : NS,Testing : NS,Validating : NS

Accuracy of the AI Intervention

AUC-ROCDevelopment: Hypertension: 0.922, haemorrhage: 0.700, sepsis: 0.643, wound infection:0.873 Validation: Hypertension: 0.879 (0.846, 0.912), haemorrhage: 0.690 (0.663, 0.716), sepsis: 0.660 (0.630, 0.690), wound infection:0.856 (0.838, 0.873), five-fold cross-validation

Patient-related Outcomes Assessed

Importance of 20 predictors defined

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Gradient boosted trees were used with five-fold cross validation to compare model performance. The best performing models for each outcome were then assessed in the independent validation data using the area under the receiver operating curve (AUC-ROC).

Maintenance

Not specified (Unclear)

Key Conclusions

Study suggests that routinely collected health data have the potential to play an important role in helping determine women’s risk of common postpartum complications leading to hospital admission. This information can be presented to clinical staff after delivery to help guide immediate postpartum care, delayed discharge, and post-discharge patient follow up. For such a system to be effective and valued, it must produce accurate predictions, and our findings suggest areas where routine data collection could be strengthened to this end.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	?	+

Color Code

Low

Unclear

High

Paper 81

Paper Title: Risk Assessment for Parents Who Suspect Their Child Has Autism Spectrum Disorder: Machine Learning Approach

Authors or developers

Ben-Sasson, Ayelet
Robins, Diana L.
Yom-Tov, Elad

Year of Publication

2018

Full reference of the study

Ben-Sasson A, Robins DL, Yom-Tov E. Risk Assessment for Parents Who Suspect Their Child Has Autism Spectrum Disorder: Machine Learning Approach. J Med Internet Res. 2018;20(4):e134. Published 2018 Apr 24. doi:10.2196/jmir.9496

Abstract

Background: Parents are likely to seek Web-based communities to verify their suspicions of autism spectrum disorder markers in their child. Automated tools support human decisions in many domains and could therefore potentially support concerned parents. Objective: The objective of this study was to test the feasibility of assessing autism spectrum disorder risk in parental concerns from Web-based sources, using automated text analysis tools and minimal standard questioning. Methods: Participants were 115 parents with concerns regarding their child’s social-communication development. Children were 16- to 30-months old, and 57.4% (66/115) had a family history of autism spectrum disorder. Parents reported their concerns online, and completed an autism spectrum disorder-specific screener, the Modified Checklist for Autism in Toddlers-Revised, with Follow-up (M-CHAT-R/F), and a broad developmental screener, the Ages and Stages Questionnaire (ASQ). An algorithm predicted autism spectrum disorder risk using a combination of the parent’s text and a single screening question, selected by the algorithm to enhance prediction accuracy. Results: Screening measures identified 58% (67/115) to 88% (101/115) of children at risk for autism spectrum disorder. Children with a family history of autism spectrum disorder were three times more likely to show autism spectrum disorder risk on screening measures. The prediction of a child’s risk on the ASQ or M-CHAT-R was significantly more accurate when predicted from text combined with an M-CHAT-R question selected (automatically) than from the text alone. The frequently automatically selected M-CHAT-R questions that predicted risk were: following a point, make-believe play, and concern about deafness. Conclusions: The internet can be harnessed to pre-screen for autism spectrum disorder using parental concerns by administering a few standardized screening questions to augment this process.

Country of Research

Israel

Design of Study

Unclear

Duration of Study

Name of Condition

Autism

Artificial Intelligence Technique Used

Machine Learning Automatic Assessment of Autism Spectrum Disorder Risk From Text

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

AUC: MCHATR risk (R/F: 0.39, R: 0.54), ASQ factor (risk: 0.55, communication: 0.60, personal social: 0.36, communication or personal social: 0.49)

Patient-related Outcomes Assessed

Modified Checklist for Autism in Toddlers-Revised, with Follow-up assessment,(Spearman correlation expert rating (MCHAT R: 0.43, MCHAT R/F: 0.36, ASQ communication: -0.21, ASQ personal-social: -0.26, ASQ gross motor: -0.26, ASQ fine motor: -0.21, ASQ problem solving: -0.29))

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Yes

Maintenance

Not specified (Unclear)

Key Conclusions

This study aimed to test the possibility of automatically estimating a child’s risk for ASD based on his/her parent’s description of concerns. Utilizing ML methods, the study showed satisfying performance of prediction models, relying on the parent’s text and a particular M-CHAT-R question that complemented that unique text.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	?	?

Color Code

Low

Unclear

High

Paper 82

Paper Title: AUTOMATIC DIAGNOSIS OF RHEUMATOID ARTHRITIS FROM HAND RADIOGRAPHS USING CONVOLUTIONAL NEURAL NETWORKS

Authors or developers

Betancourt-Hernandez, M.
Viera-Lopez, G.
Serrano-Munoz, A.

Year of Publication

2018

Full reference of the study

Betancourt-Hern’ndez, M., Viera-L’pez, G., & Serrano-Mu’oz, A. (2018). Automatic Diagnosis of Rheumatoid Arthritis from Hand Radiographs using Convolutional Neural Networks. Revista Cubana de F’sica, 35(1), 39-43.

Abstract

The traditional diagnosis method of Rheumatoid Arthritis (RA) consists in the evaluation of hands and feet radiographs. However, still for medical specialists it turns out to be a complex task because many times the correct diagnosis of the disease depends on the detection of very subtle changes for the human eye. In this work, we developed a system based on Artificial Intelligence (AI), using Convolutional Neural Networks (CNN) for the automatic detection of RA from hand radiographs. The model efficiency is measured with 15 cases achieving an accuracy of 100%. Results of the experiments conducted, showed a superior performance compared to similar state-of-the-art systems reported in the consulted bibliography. This model would be useful for Cuban medicine as a diagnosis tool.

Country of Research

Cuba

Design of Study

Unclear

Duration of Study

Not Specified

Name of Condition

Rheumatoid arthritis

Artificial Intelligence Technique Used

Convolution neural network, LeNET, Network in network, Squeezenet

Providers’ involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

Accuracy: LeNet: 93%, Network in network: 93%, Squeezenet: 100%

Patient-related Outcomes Assessed

Not Specified

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes : NS

Adoption

Implementation

Not Specified

Maintenance

Yes

Key Conclusions

The results of the experiments conducted, showed a superior performance compared to similar state-of-the-art systems reported in the consulted bibliography.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	+	–

Color Code

Low

Unclear

High

Paper 83

Paper Title: Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections

Authors or developers	Burton, R. J. Albur, M. Eberl, M. Cuff, S. M.
Year of Publication	2019
Full reference of the study	Burton, R. J., Albur, M., Eberl, M., & Cuff, S. M. (2019). Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections. BMC medical informatics and decision making, 19(1), 171.
Abstract	BACKGROUND: A substantial proportion of microbiological screening in diagnostic laboratories is due to suspected urinary tract infections (UTIs), yet approximately two thirds of urine samples typically yield negative culture results. By reducing the number of query samples to be cultured and enabling diagnostic services to concentrate on those in which there are true microbial infections, a significant improvement in efficiency of the service is possible. METHODOLOGY: Screening process for urine samples prior to culture was modelled in a single clinical microbiology laboratory covering three hospitals and community services across Bristol and Bath, UK. Retrospective analysis of all urine microscopy, culture, and sensitivity reports over one year was used to compare two methods of classification: a heuristic model using a combination of white blood cell count and bacterial count, and a machine learning approach testing three algorithms (Random Forest, Neural Network, Extreme Gradient Boosting) whilst factoring in independent variables including demographics, historical urine culture results, and clinical details provided with the specimen. RESULTS: A total of 212,554 urine reports were analysed. Initial findings demonstrated the potential for using machine learning algorithms, which outperformed the heuristic model in terms of relative workload reduction achieved at a classification sensitivity > 95%. Upon further analysis of classification sensitivity of subpopulations, we concluded that samples from pregnant patients and children (age 11 or younger) require independent evaluation. First the removal of pregnant patients and children from the classification process was investigated but this diminished the workload reduction achieved. The optimal solution was found to be three Extreme Gradient Boosting algorithms, trained independently for the classification of pregnant patients, children, and then all other patients. When combined, this system granted a relative workload reduction of 41% and a sensitivity of 95% for each of the stratified patient groups. CONCLUSION: Based on the considerable time and cost savings achieved, without compromising the diagnostic performance, the heuristic model was successfully implemented in routine clinical practice in the diagnostic laboratory at Severn Pathology, Bristol. Our work shows the potential application of supervised machine learning models in improving service efficiency at a time when demand often surpasses resources of public healthcare providers.
Country of Research	UK
Design of Study	Cohort study, Unclear, Retrospective
Duration of Study	1 year (October 2016-October 2017)
Name of Condition	Urinary tract infection
Artificial Intelligence Technique Used	Random Forest, Neural Network, Extreme Gradient Boosting,(Heuristic model, XGBOOST)
Provider’s involvement in	Developing : NS, Testing : A machine learning approach testing three algorithms (Random Forest, Neural Network, Extreme Gradient Boosting) whilst factoring in independent variables including demographics, historical urine culture results, and clinical details provided with the specimen, Validating : NS
Accuracy of the AI Intervention	Accuracy: Heuristic model: 63.92%, Random forest (71.96%), neural network (85%), neural network with resampling (79.35%), XGboos (65.68%), (AUC: Random forest (0.908), neural network (0.906), neural network with resampling (0.904), XGboos (0.910))
Patient-related Outcomes Assessed	Amongst the groupings generated from clinical details, Pregnant and Persistent/Recurrent Infection contributed to the largest proportion of the overall data, with all other groups consisting of less than 12% of the data set
Primary Healthcare Worker Related Outcomes Assessed	NS
Healthcare System-related Outcomes Assessed	NS
Reached Target Population?	Yes
Adoption	Not specified
Implementation	Screening process for urine samples prior to culture was modelled in a single clinical microbiology laboratory covering three hospitals and community services across Bristol and Bath, UK. Retrospective analysis of all urine microscopy, culture, and sensitivity reports over one year was used to compare two methods of classification: a heuristic model using a combination of white blood cell count and bacterial count, and a machine learning approach testing three algorithms (Random Forest, Neural Network, Extreme Gradient Boosting) whilst factoring in independent variables including demographics, historical urine culture results, and clinical details provided with the specimen.
Maintenance	Not specified (Unclear)
Key Conclusions	Based on the considerable time and cost savings achieved, without compromising the diagnostic performance, the heuristic model was successfully implemented in routine clinical practice in the diagnostic laboratory at Severn Pathology, Bristol. Our work shows the potential application of supervised machine learning models in improving service efficiency at a time when demand often surpasses resources of public healthcare providers.

Paper 84

Paper Title: Design of a Clinical Decision Support System for Predicting Erectile Dysfunction in Men Using NHIRD Dataset

Authors or developers

Chen, Y. F.
Lin, C. S.
Hong, C. F.
Lee, D. J.
Sun, C.
Lin, H. H.

Year of Publication

2019

Full reference of the study

Chen YF, Lin CS, Hong CF, Lee DJ, Sun C, Lin HH. Design of a Clinical Decision Support System for Predicting Erectile Dysfunction in Men Using NHIRD Dataset. IEEE J Biomed Health Inform. 2019;23(5):2127-2137. doi:10.1109/JBHI.2018.2877595

Abstract

Erectile dysfunction (ED) affects millions of men worldwide. Men with ED generally complain failure to attain or maintain an adequate erection during sexual activity. The prevalence of ED is strongly correlated with age, affecting about 40% of men at age 40 and nearly 70% at age 70. A variety of chronic diseases, including diabetes, ischemic heart disease, congestive heart failure, hypertension, depression, chronic renal failure, obstructive sleep apnea, prostate disease, gout, and sleep disorder, were reported to be associated with ED. In this study, data retrieved from a subset of the National Health Insurance Research Database of Taiwan were used for designing the clinical decision support system (CDSS) for predicting ED incidences in men. The positive cases were male patients aged 20-65 who were diagnosed with ED between January 2000 and December 2010 confirmed by at least three outpatient visits or at least one inpatient visit, while the negative cases were randomly selected from the database without a history of ED and were frequency (1:1), age, and index year matched with the ED patients. Data of a total of 2,832 ED patients and 2,832 non-ED patients, each consisting of 41 features including index age, 10 comorbidities, and 30 other comorbidity-related variables, were retrieved for designing the predictive models. Integrated genetic algorithm and support vector machine was adopted to design the CDSSs with two experiments of independent training and testing (ITT) conducted to verify their effectiveness. In the first ITT experiment, data extracted from January 2000 until December 2005 (61.51%, 1,742 positive cases and 1,742 negative cases) were used for training and validating; the data retrieved from January 2006 until December 2007 were used for testing (38.49%); in the second ITT experiment, data in the training set (77.78%) were extracted from January 2000 until December 2007 and those in the testing set (22.22%) were retrieved afterward. Tenfold cross-validation and three different objective functions were adopted for obtaining the optimal models with best predictive performance in the training phase. The testing results show that the CDSSs achieved a predictive performance with accuracy, sensitivity, specificity, g-mean, and area under ROC curve of 74.72%-76.65%, 72.33%-83.76%, 69.54%-77.10%, 0.7468-0.7632, and 0.766-0.817, respectively. In conclusion, the CDSSs designed based on cost-sensitive objective functions as well as salient comorbidity-related features achieve satisfactory predictive performance for predicting ED incidences.

Country of Research

China

Design of Study

Unclear

Duration of Study

10 years 11 months, Jan 2000-Dec 2010

Name of Condition

Erectile dysfunction (ED)

Artificial Intelligence Technique Used

Genetic Algorithm (GA), Support Vector Machine (SVM), Logistic regression

Providers’ involvement in

Testing: integrated genetic algorithm (GA) and support vector machine (SVM), namely IGS, were used for designing the CDSSs with the former adopted for selecting salient features and adjusting the SVM parameters (cost value and kernel parameter), whereas the latter for classifying different classes and calculating fitness values based on the objective functions.

Accuracy of the AI Intervention

The testing results show that the CDSSs achieved a predictive performance with accuracy, sensitivity, specificity, g-mean, and area under ROC curve (AUC) of 74.72-76.65%, 72.33-83.76%, 69.54-77.10%, 0.7468-0.7632, and 0.766-0.817, respectively

Patient-related Outcomes Assessed

Not Specified

Primary Healthcare Worker Related Outcomes Assessed

Doctors

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Maintenance

Not specified

Key Conclusions

The clinical decision support system designed based on cost-sensitive objective functions as well as salient comorbidity-related features achieve satisfactory predictive performance for predicting ED incidences.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	?	+

Color Code

Low

Unclear

High

Paper 85

Paper Title: Predicting atrial fibrillation in primary care using machine learning

Authors or developers

Hill, N. R.
Ayoubkhani, D.
McEwan, P.
Sugrue, D. M.
Farooqui, U.
Lister, S.
Lumley, M.
Bakhai, A.
Cohen, A. T.’, “O’Neill, M.”, ‘Clifton, D.
Gordon, J.

Year of Publication

2019

Full reference of the study

Hill, N. R., Ayoubkhani, D., McEwan, P., Sugrue, D. M., Farooqui, U., Lister, S., … & Clifton, D. (2019). Predicting atrial fibrillation in primary care using machine learning. PloS one, 14(11), e0224582.

Abstract

BACKGROUND: Atrial fibrillation (AF) is the most common sustained heart arrhythmia. However, as many cases are asymptomatic, a large proportion of patients remain undiagnosed until serious complications arise. Efficient, cost-effective detection of the undiagnosed may be supported by risk-prediction models relating patient factors to AF risk. However, there exists a need for an implementable risk model that is contemporaneous and informed by routinely collected patient data, reflecting the real-world pathology of AF. METHODS: This study sought to develop and evaluate novel and conventional statistical and machine learning models for risk-predication of AF. This was a retrospective, cohort study of adults (aged >=30 years) without a history of AF, listed on the Clinical Practice Research Datalink, from January 2006 to December 2016. Models evaluated included published risk models (Framingham, ARIC, CHARGE-AF), machine learning models, which evaluated baseline and time-updated information (neural network, LASSO, random forests, support vector machines), and Cox regression. RESULTS: Analysis of 2,994,837 individuals (3.2% AF) identified time-varying neural networks as the optimal model achieving an AUROC of 0.827 vs. 0.725, with number needed to screen of 9 vs. 13 patients at 75% sensitivity, when compared with the best existing model CHARGE-AF. The optimal model confirmed known baseline risk factors (age, previous cardiovascular disease, antihypertensive medication usage) and identified additional time-varying predictors (proximity of cardiovascular events, body mass index (both levels and changes), pulse pressure, and the frequency of blood pressure measurements). CONCLUSION: The optimal time-varying machine learning model exhibited greater predictive performance than existing AF risk models and reflected known and new patient risk factors for AF.

Country of Research

Design of Study

Cohort study, Observational study, Unclear, retrospective cohort

Duration of Study

10 years,(January 2006 to December 2016)

Name of Condition

Arterial fibrillation

Artificial Intelligence Technique Used

Framingham, ARIC, CHARGE-AF, machine learning models, which evaluated baseline and time-updated information, neural network, LASSO, random forests, support vector machines, and Cox regression.

Provider’s involvement in

Developing : NS, Testing : NS ,Validating : NS

Accuracy of the AI Intervention

Final machine learning: Specificity: 74.9, positive predictive value: 11.5, AUROC: 0.827 CHARGE-AF risk model: Specificity: 61, positive predictive value: 7.9, AUROC: 0.725 Logistic regression model: Specificity: 52, positive predictive value: 6.5, AUROC: 0.695

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Not specified

Implementation

This study sought to develop and evaluate novel and conventional statistical and machine learning models for risk-predication of AF. This was a retrospective, cohort study of adults (aged 30 years) without a history of AF, listed on the Clinical Practice Research Datalink, from January 2006 to December 2016. Models evaluated included published risk models (Framingham, ARIC, CHARGE-AF), machine learning models, which evaluated baseline and time-updated information (neural network, LASSO, random forests, support vector machines), and Cox regression.

Maintenance

nan

Key Conclusions

The optimal time-varying machine learning model exhibited greater predictive performance than existing AF risk models and reflected known and new patient risk factors for AF.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	+

Color Code

Low

Unclear

High

Paper 86

Paper Title: Evaluation of Artificial Intelligence-Based Grading of Diabetic Retinopathy in Primary Care

Authors or developers

Kanagasingam, Y.
Xiao, D.
Vignarajan, J.
Preetham, A.
Tay-Kearney, M. L.
Mehrotra, A.

Year of Publication

2018

Full reference of the study

Kanagasingam, Y., Xiao, D., Vignarajan, J., Preetham, A., Tay-Kearney, M. L., & Mehrotra, A. (2018). Evaluation of artificial intelligence ‘based grading of diabetic retinopathy in primary care. JAMA network open, 1(5), e182665-e182665.

Abstract

Importance: There has been wide interest in using artificial intelligence (AI)-based grading of retinal images to identify diabetic retinopathy, but such a system has never been deployed and evaluated in clinical practice. Objective: To describe the performance of an AI system for diabetic retinopathy deployed in a primary care practice. Design, Setting, and Participants: Diagnostic study of patients with diabetes seen at a primary care practice with four physicians in Western Australia between December 1, 2016, and May 31, 2017. A total of 193 patients consented for the study and had retinal photographs taken of their eyes. Three hundred eighty-six images were evaluated by both the AI-based system and an ophthalmologist. Main Outcomes and Measures: Sensitivity and specificity of the AI system compared with the gold standard of ophthalmologist evaluation. Results: Of the 193 patients (93 [48%] female; mean [SD] age, 55 [17] years [range, 18-87 years]), the AI system judged 17 as having diabetic retinopathy of sufficient severity to require referral. The system correctly identified two patients with true disease and misclassified 15 as having disease (false-positives). The resulting specificity was 92% (95% CI, 87%-96%), and the positive predictive value was 12% (95% CI, 8%-18%). Many false-positives were driven by inadequate image quality (eg, dirty lens) and sheen reflections. Conclusions and Relevance: The results demonstrate both the potential and the challenges of using AI systems to identify diabetic retinopathy in clinical practice. Key challenges include the low incidence rate of disease and the related high false-positive rate as well as poor image quality. Further evaluations of AI systems in primary care are needed.

Country of Research

Australia

Design of Study

Unclear : Diagnostic study

Duration of Study

6 months, December 1 2016 – May 31 2017

Name of Condition

Diabetic retinopathy

Artificial Intelligence Technique Used

Deep learning model: Deep convolutional neural network

Providers’ involvement in

Developing : Engineers, Testing : Nurses, primary care physicians

Accuracy of the AI Intervention

86.24%, Not provided by the author, calculated by the reviewer

Patient-related Outcomes Assessed

Diagnosis of diabetic retinopathy

Primary Healthcare Worker Related Outcomes Assessed

Not specified

Healthcare System-related Outcomes Assessed

Not specified

Reached Target Population?

Not specified

Adoption

Yes (number of providers i.e. PHC participating) : 6

Implementation

Two trained nurses, four primary care physicians implemented the deep learning approach

Maintenance

Not specified

Key Conclusions

The study demonstrates high accuracy of a deep learning approach to diagnose diabetic retinopathy at the primary care level, which led to reduced referral to an ophthalmologist.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	+	–	–

Color Code

Low

Unclear

High

Paper 87

Paper Title: Prognostic Modeling and Prevention of Diabetes Using Machine Learning Technique

Authors or developers

Perveen, S.
Shahbaz, M.
Keshavjee, K.
Guergachi, A.

Year of Publication

2019

Full reference of the study

Perveen, Sajida, et al. “prognostic Modeling and prevention of Diabetes Using Machine Learning technique.” Scientific reports 9.1 (2019): 1-9.

Abstract

Stratifying individuals at risk for developing diabetes could enable targeted delivery of interventional programs to those at highest risk, while avoiding the effort and costs of prevention and treatment in those at low risk. The objective of this study was to explore the potential role of a Hidden Markov Model (HMM), a machine learning technique, in validating the performance of the Framingham Diabetes Risk Scoring Model (FDRSM), a well-respected prognostic model. Can HMM predict eight year risk of developing diabetes in an individual effectively? To our knowledge, no study has attempted use of HMM to validate the performance of FDRSM. We used Electronic Medical Record (EMR) data, of 172,168 primary care patients to derive the 8-year risk of developing diabetes in an individual using HMM. The Area Under Receiver Operating Characteristic Curve (AROC) in our study sample of 911 individuals for whom all risk factors and follow up data were available is 86.9% compared to AROCs of 78.6% and 85% reported in a previously conducted validation study of FDRSM in the same Canadian population and the Framingham study respectively. These results demonstrate that the discrimination capability of our proposed HMM is superior to the validation study conducted using the FDRSM in a Canadian population and in the Framingham population. We conclude that HMM is capable of identifying patients at increased risk of developing diabetes within the next 8-years.

Country of Research

Canada

Design of Study

Observational study

Duration of Study

12 years,(August 5, 2003 to June 30, 2015)

Name of Condition

Diabetes

Artificial Intelligence Technique Used

Hidden Markov Model (HMM)

Provider’s involvement in

Developing : NS, Testing : NS, Validating : NS

Accuracy of the AI Intervention

AROC%, Hidden Markov Model : 86.9, Framingham simple clinical model: 85, Validation FDRSM: 78.6

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Electronic Medical Record (EMR) data, of 172,168 primary care patients

Maintenance

Not specified (Unclear)

Key Conclusions

The objective of this study was to explore the potential role of a Hidden Markov Model (HMM), a machine learning technique, in validating the performance of the Framingham Diabetes Risk Scoring Model (FDRSM), a well-respected prognostic model. Further, Electronic Medical Record (EMR) data, of 172,168 primary care patients to derive the 8-year risk of developing diabetes in an individual using HMM. The Area Under Receiver Operating Characteristic Curve (AROC) in our study sample of 911 individuals for whom all risk factors and follow up data were available is 86.9% compared to AROCs of 78.6% and 85% reported in a previously conducted validation study of FDRSM in the same Canadian population and the Framingham study respectively. These results demonstrate that the discrimination capability of our proposed HMM is superior to the validation study conducted using the FDRSM in a Canadian population and in the Framingham population. In conclusion, the HMM is capable of identifying patients at an increased risk of developing diabetes within the next eight years.

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	?	+	?

Color Code

Low

Unclear

High

Paper 88

Paper Title: Scoring algorithms for a computer-based cognitive screening tool: An illustrative example of overfitting machine learning approaches and the impact on estimates of classification accuracy

Authors or developers

Ursenbach, J.’, “O’Connell, M. E.”, ‘Neiser, J.
Tierney, M. C.
Morgan, D.
Kosteniuk, J.
Spiteri, R. J.

Year of Publication

2019

Full reference of the study

Ursenbach, J., O’Connell, M. E., Neiser, J., Tierney, M. C., Morgan, D., Kosteniuk, J., & Spiteri, R. J. (2019). Scoring algorithms for a computer-based cognitive screening tool: An illustrative example of overfitting machine learning approaches and the impact on estimates of classification accuracy. Psychological assessment.

Abstract

Computerized cognitive screening tools, such as the self-administered Computerized Assessment of Memory Cognitive Impairment (CAMCI), require little training and ensure standardized administration and could be an ideal test for primary care settings. We conducted a secondary analysis of a data set including 887 older adults (M age = 72.7 years, SD = 7.1 years; 32.1% male; M years education = 13.4, SD = 2.7 years) with CAMCI scores and independent diagnoses of mild cognitive impairment (MCI). A study by the CAMCI developers used a portion of this data set with a machine learning decision tree model and suggested that the CAMCI had high classification accuracy for MCI (sensitivity = 0.86, specificity = 0.94). We found similar support for accuracy (sensitivity = 0.94, specificity = 0.94) by overfitting a decision tree model, but we found evidence of lower accuracy in a cross-validation sample (sensitivity = 0.62, specificity = 0.66). A logistic regression model, however, discriminated modestly in both training (sensitivity = 0.72, specificity = 0.80) and cross-validation data sets (sensitivity = 0.69, specificity = 0.74). Evidence for strong accuracy when overfitting a decision tree model and substantially reduced accuracy in cross-validation samples was replicated across 500 bootstrapped samples. In contrast, the evidence for accuracy of the logistic regression model was similar in the training and cross-validation samples. The logistic regression model produced accuracy estimates consistent with other published CAMCI studies, suggesting evidence for classification accuracy of the CAMCI for MCI is likely modest. This case study illustrates the general need for cross-validation and careful evaluation of the generalizability of machine learning models. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

Country of Research

Canada

Design of Study

Unclear

Duration of Study

Name of Condition

Memory Cognitive Impairment (CAMCI)

Artificial Intelligence Technique Used

Logistic regression model,(Decision tree approach in the part package)

Provider’s involvement in

Developing : NS, Testing : Computerized Assessment of Memory Cognitive Impairment, Validating : NS

Accuracy of the AI Intervention

Predictive accuracy (sensitivity 0.94, specificity 0.94)

Patient-related Outcomes Assessed

Primary Healthcare Worker Related Outcomes Assessed

Healthcare System-related Outcomes Assessed

Reached Target Population?

Yes

Adoption

Not specified

Implementation

Yes

Maintenance

Yes

Key Conclusions

Using an archival database, a decision tree machine learning method demonstrated overfitting and had substantially reduced evidence for classification accuracy measured in cross-validation, but the statistical method results in similar evidence of classification in training and cross-validation samples. The evidence for classification accuracy of the Computerized Assessment of Mild Cognitive Impairment for cognitive impairment is modest, and this has clinical implications

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	?	?

Color Code

Low

Unclear

High

Paper 89

Paper Title: Diagnostic Accuracy of a Device for the Automated Detection of Diabetic Retinopathy in a Primary Care Setting

Authors or developers

Verbraak, F. D.
Abramoff, M. D.
Bausch, G. C. F.
Klaver, C.
Nijpels, G.
Schlingemann, R. O.
van der Heijden, A. A.

Year of Publication

2019

Full reference of the study

Verbraak, F. D., Abramoff, M. D., Bausch, G. C., Klaver, C., Nijpels, G., Schlingemann, R. O., & van der Heijden, A. A. (2019). Diagnostic accuracy of a device for the automated detection of diabetic retinopathy in a primary care setting. Diabetes care, 42(4), 651-656.

Abstract

OBJECTIVE: To determine the diagnostic accuracy in a real-world primary care setting of a deep learning-enhanced device for automated detection of diabetic retinopathy (DR). RESEARCH DESIGN AND METHODS: Retinal images of people with type 2 diabetes visiting a primary care screening program were graded by a hybrid deep learning-enhanced device (IDx-DR-EU-2.1; IDx, Amsterdam, the Netherlands), and its classification of retinopathy (vision-threatening [vt]DR, more than mild [mtm]DR, and mild or more [mom]DR) was compared with a reference standard. This reference standard consisted of grading according to the International Clinical Classification of DR by the Rotterdam Study reading center. We determined the diagnostic accuracy of the hybrid deep learning-enhanced device (IDx-DR-EU-2.1) against the reference standard. RESULTS: A total of 1,616 people with type 2 diabetes were imaged. The hybrid deep learning-enhanced device’s sensitivity/specificity against the reference standard was, respectively, for vtDR 100% (95% CI 77.1-100)/97.8% (95% CI 96.8-98.5) and for mtmDR 79.4% (95% CI 66.5-87.9)/93.8% (95% CI 92.1-94.9). CONCLUSIONS: The hybrid deep learning-enhanced device had high diagnostic accuracy for the detection of both vtDR (although the number of vtDR cases were low) and mtmDR in a primary care setting against an independent reading center. This allows its’ safe use in a primary care setting.

Country of Research

Netherlands

Design of Study

Cohort study, Unclear : Retrospective

Duration of Study

1 year, Jan2015-Dec2015

Name of Condition

Diabetic retinopathy

Artificial Intelligence Technique Used

Hybrid deep learning enhanced technique

Providers’ involvement in

Developing : NS, Testing: Retinal images of people with type 2 diabetes visiting a primary care screening program were graded by a hybrid deep learning enhanced device (IDx-DR-EU-2.1; IDx, Amsterdam, the Netherlands), and its classification of retinopathy (vision-threatening [vt]DR, more than mild [mtm]DR, and mild or more [mom]DR) was compared with a reference standard, Validating : NS

Accuracy of the AI Intervention

Specificity: 97.8%, Vision threatening diabetic retinopathy specificity: 100%, more than mild: 79.4%

Patient-related Outcomes Assessed

Not Specified

Primary Healthcare Worker Related Outcomes Assessed

Not Specified

Healthcare System-related Outcomes Assessed

Not Specified

Reached Target Population?

Not specified

Adoption

Yes (number of providers i.e. PHC participating) : Retinal images of people with type 2 diabetes visiting a primary care screening program were graded

Implementation

The hybrid deep learning enhanced device had high diagnostic accuracy for the detection of both vtDR (although the number of vtDR cases was low) and mtmDR in a primary care setting against an independent reading center

Maintenance

Not specified

Key Conclusions

The results show that a hybrid lesion-based device, with deep learning enhancements, for the automated detection of DR achieved high diagnostic accuracy in a primary care setting in a study with a predetermined protocol and an independent reference standard. These results confirm corresponding results in an earlier study of essentially the same algorithm in a laboratory setting. Specifically, the device achieved high sensitivity (100%) in people with vtDR, as the device did not miss any vtDR, or DME, according to the ICDR grading system. It also achieved high specificity (97.8%). However, the number of vtDR cases, although representative for the studied patient population, was low and prevents definite conclusions. The device also had a high sensitivity to detect mtmDR of 79.4%, at a specificity of 93.8% (Diabetic retinopathy (DR)) Vision threatening diabetic retinopathy (vtDR)

Risk of Bias

Participants	Predictors	Outcome	Analysis
+	–	+	–

Color Code

Low

Unclear

High

Paper 90

Paper Title: Does machine learning improve prediction of VA primary care reliance?

Authors or developers	Wong, E. S. Schuttner, L. Reddy, A.
Year of Publication	2020
Full reference of the study	Wong, E. S., Schuttner, L., & Reddy, A. (2020). Does machine learning improve prediction of VA primary care reliance?. The American Journal of Managed Care, 26(1), 40-44.
Abstract	OBJECTIVES: The Veterans Affairs (VA) Health Care System is among the largest integrated health systems in the United States. Many VA enrollees are dual users of Medicare, and little research has examined methods to most accurately predict which veterans will be mostly reliant on VA services in the future. This study examined whether machine learning methods can better predict future reliance on VA primary care compared with traditional statistical methods. STUDY DESIGN: Observational study of 83,143 VA patients dually enrolled in fee-for-service Medicare using VA and Medicare administrative databases and the 2012 Survey of Healthcare Experiences of Patients. METHODS: The primary outcome was a dichotomous measure denoting whether patients obtained more than 50% of all primary care visits (VA + Medicare) from VA. We compared the performance of six candidate models-logistic regression, elastic net regression, decision trees, random forest, gradient boosting machine, and neural network-in predicting 2013 reliance as a function of 61 patient characteristics observed in 2012. We measured performance using the cross-validated area under the receiver operating characteristic (AUROC) metric. RESULTS: Overall, 72.9% and 74.5% of veterans were mostly VA reliant in 2012 and 2013, respectively. All models had similar average AUROCs, ranging from 0.873 to 0.892. The best-performing model used gradient boosting machine, which exhibited modestly higher AUROC and similar variance compared with standard logistic regression. CONCLUSIONS: The modest gains in performance from the best-performing model, gradient boosting machine, are unlikely to outweigh inherent drawbacks, including computational complexity and limited interpretability compared with traditional logistic regression.
Country of Research	USA
Design of Study	Observational study
Duration of Study	1 year, 2012-2013
Name of Condition	Predict future reliance on veteran affair primary care
Artificial Intelligence Technique Used	Logistic regression, elastic net, decision tree, random forest, gradient boosting machine, neural network
Providers’ involvement in	Developing : Not specified, Testing : Not specified, Validating : Not specified
Accuracy of the AI Intervention	Area under receiver operating characteristic Logistic regression: 0.89, elastic net: 0.89, decision tree: 0.87, random forest: 0.88, gradient boosting machine: 0.89, neural network: 0.88 Specificity Logistic regression: 0.75, elastic net: 0.75, decision tree: 0.72, random forest: 0.74, gradient boosting machine: 0.75, neural network: 0.74 Sensitivity Logistic regression: 0.92, elastic net: 0.92, decision tree: 0.91, random forest: 0.92, gradient boosting machine: 0.91, neural network: 0.91
Patient-related Outcomes Assessed	2012: 72.9% veterans were veteran affair primary care service reliant 2013: 74.5% veterans were veteran affair primary care service reliant
Primary Healthcare Worker Related Outcomes Assessed	Not specified
Healthcare System-related Outcomes Assessed	Not specified
Reached Target Population?	Yes
Adoption	Yes (number of providers i.e. PHC participating) : Veteran affair primary care analysis evaluated
Implementation	Not implemented, the data utilized were extracted from a Medicare administrative database.
Maintenance	Not specified
Key Conclusions	The study evaluated six different machine learning algorithms to predict the primary care services to be accessed by veterans. The Logistic regression method was recommended by the authors as the best performing model.