JDRF/Helmsley Big Data in T1D Workshop

August 15-16, 2018; Reston, VA; Day #1 Highlights – Draft

Executive Highlights

  • Today at the first-ever Helmsley Charitable Trust/JDRF Meeting on Big Data, we loved hearing updates from the JDRF-IBM collaboration to explore type 1 diabetes risk and onset, as well as a review of Verily’s work on automated retinopathy detection. IBM Research Manager of Health Analytics Dr. Kenney Ng provided a progress report on the JDRF-IBM collaboration’s four main aims, including that the duo has “several” manuscripts in progress (quite the feat since the partnership was conceived a year ago) related to phenotyping and predicting onset of type 1 diabetes. Although Verily’s Dr. Howard Zisser did not share any updates on the company’s retinopathy work, he did dispel three common myths of AI – well-worth the read! He shared optimism for AI overall, noting it will help us “stop treating to the mean.” Nice!

  • We also heard ample discussion on the myriad applications for big data in diabetes: primary prevention, diagnosis, treatment, and reduction of healthcare costs. Post-diagnosis, Barbara Davis Center’s Dr. Marian Rewers asserted that big data and artificial intelligence have the potential to reduce outpatient visits, saving patients and providers time and money. The big caveat – as noted by Dr. Rewers, Western Michigan’s Dr. Craig Beam, Verily’s Howard Zisser, and others – is that data and insights need to be clinically-relevant, interoperable, and easy to use. Indeed, we’d note that with CGM values coming in every five minutes, along with connected insulin delivery devices, “Big Data” can easily become “Overwhelming Data.”

  • Helmsley Charitable Trust’s Ms. Deniz Dalton shared that the Jaeb-coordinated, multi-site clinical observational study in type 1 diabetes to gather data on exercise (T1-DEXI) will enroll 800-1,000 individuals (ages 14-70) and will hopefully commence in “early 2019.” The T1-DEXI pilot study (see ADA), which will inform the larger version, is currently underway. Given the challenges of managing type 1 around exercise and sorting real patterns from noise, this is exciting!

  • On the disease prediction front, University of Copenhagen’s Dr. Søren Brunak, Notre Dame’s Dr. Nitesh Chawla, and Exeter’s Dr. Richard Oram all described their work predicting type 1 onset, establishing linkages and trajectories between and among diseases, and distinguishing between disease types.

Greetings from Reston, Virginia, where day #1 of the Helmsley Charitable Trust- and JDRF-funded Big Data in T1D Workshop is in the books! JDRF Research Director Dr. Jessica Dunne introduced this first-ever meeting, emphasizing its focus on automated insulin delivery and type 1 prevention because they are “low-hanging fruit,” while also encouraging attendees to think broadly about the applications of big data across the spectrum of type 1 diabetes prevention, treatment, and cure research.

This ~90-person meeting is taking place in a small Hyatt ballroom packed with familiar faces – as one speaker said, a “meeting of the top clinical and computational minds” – all brought together by one common cause. Read on for our top highlights from the first (half) day, which started at 1 pm!


Day #1 Highlights

1. JDRF-IBM T1D Big Data Project Has “Several” Manuscripts In Progress, One Year In; Preliminary Work in Predicting T1D Disease Onset (very cool)

Nearly a year after JDRF and IBM announced a partnership to explore type 1 diabetes risk and onset – and with “several” manuscripts in the works – IBM Research Manager of Health Analytics Dr. Kenney Ng provided a progress report. Preceding a very interesting Q&A featuring the likes of Drs. Des Schatz and Marian Rewers (see below), Dr. Ng shared the status of the collaboration’s four main aims: (i) data aggregation and curation; (ii) phenotyping type 1 diabetes classes; (iii) type 1 diabetes onset prediction; and (iv) type 1 diabetes progression modeling.

  • Data aggregation and curation: The data set has swelled to 81,000+ total clinic visits, 22,000+ subjects, and 650+ cases of type 1 diabetes, and includes auto-antibodies, clinical, physical, genetic, family history, socio-demographic, and environmental data. As a reminder, the data comes from data sets from previous type 1 diabetes trials: DAISY, DiPiS, DEW-IT1/DEW-IT2, and the most recent addition, DIPP. Not surprisingly, Dr. Ng noted challenges in aggregating the data sets due to differences in place, time, patient population, method of classifying features, etc. – this was a main conversation point during Q&A.

  • Phenotyping classes: Researchers have identified three clusters of individuals that have vastly different probabilities of developing type 1 diabetes. In the graph below, the blue line (very few auto-antibodies) has a low risk of developing type 1 diabetes (4% at 10 years); meanwhile the green and red groups, both of which have more auto-antibodies, have much greater risk of developing type 1 (59% and 83%, respectively, at 10 years). These clusters are highly associated with type 1 development, with very strong sensitivity of 90% and specificity of 85%. Western Michigan’s Dr. Craig Beam suggested that these measures of sensitivity/specificity are great, but wondered if they were clinically relevant (i.e., whether there should be a higher bar. How does this clustering affect clinical practice?). IBM-JDRF’s work is still early stage, but we can easily imagine how stratifying individuals by their probability of developing type 1 diabetes early on could impact clinical care and research.

  • Researchers also sub-typed type 1 diabetes by auto-antibody transition patterns (below). The algorithms found, unsurprisingly, that the most significant predictor of type 1 diabetes was persistent presence of antibodies to insulin and GAD65. The second most significant predictor was persistence of antibodies to GAD65, protein tyrosine phosphatase, and Zinc Transporter Antibody 8…and so forth. When the model was trained on transition patterns in the DAISY data set and tested on the DiPiS data set, the ROC area under the curve was a very strong 86%. In the flipped scenario, when the model was trained on the DiPiS set and tested on DAISY, the ROC area under the curve was also a strong 81%.

  • Onset prediction: IBM’s model (“RankSvx”) accurately predicted time from seroconversion (auto-antibodies appear in the blood) to type 1 diabetes onset and correctly ranked the relative risks - high importance of auto-antibodies and genetic risk, and low importance of factors like age and gender.

  • Progression modeling: Work is ongoing here, so there are no results to share. However, the goal is to answer population-level questions (e.g., “What are the underlying progression states for the disease? What are the most/least likely progression pathways?) and patient-level questions (e.g., What is the current state for a patient? What is the patient’s most likely progression pathway going forward?).

Selected Questions and Answers

Dr. Desmond Schatz (University of Florida): Regarding data aggregation and analysis challenges – different formats, different bits of data – do you actually have the raw, longitudinal data just to make sure that it’s accurate and valid? Hopefully you can validate in TEDDY and other sets. I struggle with the data accuracy. How confident you are in the models you’re producing?

Dr. Ng: There are definitely a lot of challenges as you can imagine, and we’ve been working closely with the clinical experts from the sites, multiple calls per month to try to resolve those issues. We’re relying on a collaborative effort to validate that the data we get is the data the clinicians expected and that the analyses are consistent with their understanding. We try to make sure that data is faithfully received and analyzed by us. We also have a collaborative workshop where we share initial results to make sure interpretation and use of data is consistent.

Dr. Schatz: But that’s not the raw data?

Dr. Marian Rewers (Barbara Davis Center): Excellent question. Our study (DAISY) provided the raw data at the highest level of resolution we could provide. But there are issues. Starting with HLA classification, each study had different classification. Three-quarters used DR, whereas one used DQ as the primary classifier. It took us three months to figure this out and agree on categories depending on application. For auto-antibodies – anyone who participated in antibody standardization can see how difficult it is – but we agreed for the time being to use “positive” and “negative.” I have to say that for all of the studies it was a very nice experience. It’s moving very well, we’ve outlined five manuscripts so far.

Prof. Chantal Mathieu (KU Leuven): I was thinking the same, with the definition of diabetes, when you try to predict the time – is the definition of diabetes the same? Are some studies using OGTT, random glycemia? Is that also taken care of?

Dr. Ng: I think onset was provided by each of the sites. I didn’t share this work, but we did actually look at defining the development of antibodies with different definitions, performed an extensive sensitivity analysis, with different definitions of similar events along the progression of type 1 diabetes. Does it really matter? At some level it’s too fine-grained, but could be interesting to know. There are a lot of interesting questions in this space.

Dr. Craig Beam (Western Michigan University): A number of different approaches will be taken to deal with big data. I’m wondering if it’s not essential for us, early in the development stage, to come up with a standard way of assessing success and clinical relevance. Sensitivity and specificity have to be clinically relevant at some point – is there a clinician- and health outcomes-based way to derive standards for proof?

Dr. Ng: You’re totally right in terms of the early research vs. clinical deployment phases. There are a lot of challenges there.

Q: Will you need additional data types to get the answers you’re looking for?

Dr. Ng: That’s definitely a yes. This is a baseline. We’re starting with common data features from the data sets. Some of the work we’re doing now is to enlarge the features space to include as much information as we have available – the challenge is they don’t all have all of the features.

Q: You don’t feel that the data you have now will be limiting with regards to preliminary answers?

Dr. Ng: I think there are insights we can still extract from the existing data. There’s enough there to lead to some more interesting work.

2. Verily’s Dr. Howard Zisser: AI helps us stop “treating to the mean”; Busting 3 AI Myths; Google’s Healthcare AI Publications

Verily’s Dr. Howard Zisser showed off his AI expertise – a new hat after years driving AID forward – in a 30-minute talk, dispelling three myths in AI, highlighting two recent Google publications, and reviewing Verily’s work with automated retinopathy detection (no updates on this front). He also shared some comical examples of where AI is not perfect: (i) A dinosaur picture with a scale below that a computer labeled “a dinosaur on top of a surfboard”; and (ii) a Google Translate translation of a Spanish sign to “Recent attack of shark” – was the shark attacked, or did it attack a human? But at the end of the day, he said, using AI in healthcare helps us to stop “treating to the mean” – it allows us to look at a big population, stratify based on risk, and then to derive insights at the level of the individual. “What can I ask that will help me take care of patients better? What can help me to look at populations? Who should be getting what and when? But also when talking to individual, to make predictions based on big data sets…That’s what patients want, it’s more precise, and it’s better for populations as well.”

  • Dr. Zisser explained why three myths about AI are in fact not true: 

    • Myth #1: “Artificial Intelligence = Machine Learning.” Dr. Zisser has previously dispelled this idea – machine learning is in fact a subset of artificial intelligence, and deep learning is a subset of machine learning. The two are often used interchangeably, but are not in fact the same thing.

    • Myth #2: “AI is a monolith.” AI is often a top layer built on a heuristic infrastructure. In other words, AI is part of the system, not the whole system. For example, with an algorithm that identifies retinopathy and tells the provider whether to refer the patient to an ophthalmologist, the system includes the payer, the patient, the doctor, to name a few – not just the algorithm. Workflow matters, which is why Verily was undertaking workflow-based studies for AI retinopathy screening in India (as of November 2017).

    • Myth #3: “AI doesn’t need a human.” Noted Dr Zisser, “It needs supervision in design, training, evaluation, iterations, and often when addressing errors. Basically at every single step. All models are wrong, but some are useful.  [The latter is a famous quote from statistician George Box]”

  • Dr. Zisser referenced two recent Google papers in the Nature family, including one published just two days ago (which attracted a fair amount of media attention)! In March, Poplin et al. (including Verily Head of CV Innovations Dr. Michael McConnell) showed that an algorithm could predict age, gender, smoker status, A1c, BMI, and blood pressure with high accuracy just from retinal fundus photographs. The DeepMind paper published on Monday – picked up by outlets such as STAT, Business Insider, and The Inquirer – showed that an algorithm could accurately detect 50 types of eye disease just by looking at optical coherence tomography scans. Taken together, the larger Google healthcare family seems to be driving toward harvesting as much health information as possible from retinae – we’re envisioning a single image diagnostic panel that could, in real-time, inform next steps with respect to all sorts of eye disease and CV risk. This is also makes us harken back to the no-longer-updated Verily/Novartis glucose-sensing contact lens – is it still in development?

  • Apart from the vision to have automated retinopathy screening at PCP offices and in pharmacies and to “have a diagnosis by the time you end a screen,” there were no updates on Verily’s work in this arena. This part of the talk was basically identical to that Dr. Zisser delivered at DTM 2017 – he discussed shortages of eye specialists (particularly in India), a 2016 JAMA paper speaking to Verily’s algorithm’s strong sensitivity/specificity, and a general overview of the algorithm and its features.

3. BDC’s Dr. Marian Rewers Calls for Increased Clinical Utility and Interoperability of Diabetes Technology, Applauds Glooko and Tidepool

Barbara Davis Center’s Dr. Marian Rewers brought a much-needed clinical perspective to the day’s big data discussion, urging industry players, particularly those in CGM, to focus on clinical utility and interoperability. He expressed frustration with the lack of interoperability between different manufacturers’ software, which he says complicates efforts to identify overall clinic trends and patterns in CGM data (for purposes of quality improvement). Restricting CGM data to device-specific interfaces, he explained, limits the value of data to clinicians and patients. To this end, he pointed to Tidepool and Glooko as “extremely important” (management from both teams were in the room). However, software interoperability is but one component of the big data bottleneck for clinicians – EMR compilation, registry integration, and business information also contribute to the data profile of each individual patient. Together, these factors create a complicated, inefficient system, which Dr. Rewers hopes can be streamlined to allow for patient care.

  • As Dr. Rewers astutely explained, “data reduction” is fundamentally in opposition to “big data” efforts, yet it remains integral to the improvement of diabetes technology. He asserted that the software interfaces of CGMs are often thorough to a fault, including data that are of no use to clinicians or patients; alternatively, he would prefer for dashboards to be reduced to simple, easily understood figures that highlight trends. He suggested that Dexcom has made headway in this regard, showing a Clarity display on a slide following one from a previous generation – indeed, we find Clarity’s main display to be very intuitive and a strong step forward from Dexcom Studio. Offering an alternative account, Dr. Chantal Mathieu suggested following the talk that Dr. Rewers’ opinion on CGM dashboards is not necessarily generalizable to all patients and clinicians. In particular, she finds that many of her younger patients find value in the charts and graphs provided in CGM dashboards, which makes us wonder if customizable software (e.g. the option to select a “simple,” “classic,” or “detailed” dashboard) may be useful. Of course, adding a bunch of customizability also adds complexity, so this is a tough balancing act. The standard one-page AGP (Ambulatory Glucose Profile) – now licensed by all major CGM companies – has made increasing strides in CGM standardization as a simple, base view for all CGM display, while each manufacturer also offers other displays of varying granularity and content. However, we’d note that AGP does not do pattern recognition – e.g., “high pattern between 3-8 am” – which we believe is the most clinically actionable piece of CGM data. (When is someone going high or low, why is that occurring, and what can be done about it?) While the AGP does display the modal day chart and someone can easily see where highs or lows are occurring, having a prioritized list of patterns by time of day would be quite useful to many. (Dexcom Clarity and Medtronic CareLink both do this. We haven’t seen Abbott’s LibreView software in quite a while, but this FAQ suggests it also gives similar patterns.)

  • Dr. Rewers provocatively (and “tweet-ably”) said, “I really would like artificial pancreas and artificial intelligence to replace me and my colleagues.” In his experience, the current healthcare system in which patients pay up to $800 for a one-hour outpatient visit is both outlandish and inaccessible, especially when time constraints allow for only “limited insights.” While we agree that the cost of care needs to come down, we expect that most endocrinologists and primary care providers would jump at the opportunity for a one-hour session – at one of Dr. Irl Hirsch’s  AACE 2018 talks, the majority of attendees indicated that they are allotted 15 minutes or less with their patients. Dr. Rewers hopes that AI-based technology can reduce burden on the healthcare system while conferring better outcomes for patients. TO that, we’d add better use of the system’s resources – spend in-person time on the complicated cases, and use more remote monitoring and telemedicine for simpler cases. In the mid/long-term, we think this is inevitable, given the trends in endocrinologists and PCPs vs. people with diabetes. However, it is important to note that hybrid closed loop and AI is still in the infancy stage, so training and implementation will likely add burden before they decrease it.

  • According to Dr. Rewers, unless big data can be used to reduce burden (including the high cost of healthcare), it remains an academician’s pipedream. He cited a 2014 article from the New York Times, in which one woman said she paid ~$4,000 in out-of-pocket costs (~$26,000 without insurance) to manage her type 1 diabetes for just one year. Dr. Rewers said this number would be 30% higher today. In his view, big data has an opportunity to reduce patient burden on the healthcare system, the savings from which must ultimately be passed down to patients; we wholeheartedly support this sentiment, and believed the companies that best collect and use data to cut overall costs will certainly capture a lot of value in the system. While he noted AID as one example, this is not yet a classic application of big data, as most of the decision-making is n=1 based and fairly moment to moment. (over time, however, systems should improve to higher-level insights informed by populations – “patients like you.”). Decision support tools, hypoglycemia risk predictors, and other prevention strategies also come to mind as exciting options.

4. Update on HCT/Jaeb T1-DEXI Exercise Study: Hope to Launch Observational Study in “Early 2019” with N=800-1,000 (Ages 14-70)

Helmsley Charitable Trust’s Ms. Deniz Dalton reiterated plans for a Jaeb-coordinated, multi-site, n=800-1,000 (ages 14-70) clinical observational study in type 1 diabetes to gather data on exercise (T1-DEXI). The hope is to commence the study in “early 2019.” This study is expected to have an initial four-week observational period, followed by a break and another four-week period (total data collection period of eight weeks). Main outcomes monitored in the study will be insulin delivery, exercise, and CGM (for participants that do not use CGM at baseline, a CGM will be provided). All participants will also receive exercise videos “to be able to characterize certain exercise types” and a T1-DEXI study app that collects information about activity and food intake. Aligning with Helmsley’s ever-strengthening focus on data quantity, quality, access, and interoperability, the goal of this project is to broadly share data with the community, which will then allow researchers to test existing hypotheses and create new ones. Put simply: “A lot of data will be coming out of this project.” We’re delighted to see a growing focus on data aggregation around exercise in type 1 diabetes. Notably, the expected “n” of 800-1,000 shared today by Ms. Dalton is significantly larger than that of “300-500” previously shared at ADA. Presumably this could lead to some meaningful decision support around exercise, which remains challenging in type 1 diabetes.

  • The T1-DEXI pilot study, described by OHSU’s Dr. Jessica Castle at ADA, is currently underway. This pilot study will enroll 60 individuals between ages 15-70 with type 1 diabetes, and collect one month of insulin, CGM, food, and physical activity data. Data will be collected with Dexcom G5, DiabNext’s Clipsulin dose capture device (an unconventional choice), a Garmin activity tracker, and a custom app developed at OHSU for food photos and exercise logging. Participants were randomized to complete two in-clinic and four home sessions of either aerobic, anaerobic, or high intensity interval exercise. Every seven days, providers review CGM and insulin data, and make insulin dose recommendations. This pilot will eventually inform the larger T1-DEXI study.

  • Helmsley also recently launched a data-sharing initiative – going forward, whenever it funds a clinical trial, it will embed a data-sharing policy in the agreement. Ms. Dalton also alluded to the organization’s investment in CDISC to develop type 1 diabetes-related data standards, something we’ll hear more about tomorrow. Hats off to the T1D team at Helmsley for really driving forward the data ecosystem in type 1!

  • We noted more than 15 mentions of the word “data” in this very short (~5-minute) talk – this certainly seems to be a bigger and bigger Trust priority. We agree this is a place where it can have real leverage and drive the ecosystem in a coordinated direction.

5. Genetic Risk Score for Type 1 Diabetes (Based on HLA Allele Copy) is Useful for Research but May Not Be Ready for Population Screening

University of Exeter’s Dr. Richard Oram emphasized that use of a continuous genetic risk score (GRS) for type 1 diabetes poses a strong advantage for investigators, but may not be suitable at an individual level. The GRS is calculated using the number of HLA allele copies present – HLA is a common genetic variant associated with type 1 diabetes. The GRS alone was shown to be “strongly discriminatory” for the diagnosis of type 1 diabetes and type 2 diabetes. Moreover, Dr. Oram found that when GRS is combined with age of diagnosis, BMI, and presence of autoantibodies into a single continuous variable, the discrimination is “close to perfect.” He sees the GRS being particularly effective in mitigating the costs of early intervention trials, as it is critical to restrict them only to very high-risk populations. As an example, he determined that identifying those at 10% vs. 5% risk of multiple autoantibodies could halve the cost of an early intervention trial. On the other hand, if a population screening approach is taken, he suggested using GRS to identify the 10% of the population at greatest risk of developing type 1 diabetes, as it contains 78% of cases.

  • Dr. Oram detailed ongoing work leveraging combined modeling to improve type 1 diabetes prediction in those with first degree relatives with type 1 diabetes. At part of the TrialNet Pathway to Prevention program, ~7,000 individuals with first degree relatives with type 1 diabetes are being followed, ~1,000 of which have already been genotyped. After stratifying by single or multiple autoantibodies, the results indicated that GRS can identify type 1 diabetes risk more effectively than the number of autoantibodies. In fact, Dr. Oram noted that some individuals with a single autoantibody had the same type 1 diabetes risk as those with multiple autoantibodies (we would hazard that these single-autoantibody individuals were typically on their way to developing multiple).

  • Audience members were very interested to hear Dr. Oram’s thoughts on whether the GRS could be effectively implemented for population screening. As one attendee pointed out, genetic variants will differ based on ethnicity, and the GRS in question was generated using Caucasian genomes. Furthermore, a specific risk threshold would have to be determined in identifying when to actually approach parents with the result of a “high genetic risk score.” Dr. Oram asserted that the GRS is not yet suited for population screening, maintaining that “at the moment it’s a research tool.” He acknowledged that there is a need to test the GRS in non-Caucasian populations, although he’s been surprised by the solid prediction accuracy demonstrated in initial analyses investigating different ethnicities. Still, he noted a tradeoff – is it better to make 50 different GRSs for unique situations or one broader, simplistic metric? The former is obviously more sensitive, but it could be less translatable. Dr. Oram also described the complexities surrounding counseling, explaining that if disease information is provided, so, too must appropriate information – and this is further complicated in cases of polygenic risk, where progression to disease is even more uncertain, than in diseases concerning one gene. Determining “the right amount of risk” at which to intervene is also difficult as it’s currently unknown what kind of impact clinicians might have for high-risk patients. Interventions to delay onset of type 1 diabetes, such as oral insulin, require more work to determine reproducibility and value.

6. The Subjectivity of Diabetes Prevention: Why Algorithmic Prediction Requires Human Insight, and the Connection to WWII Fighter Planes

In a highly technical talk, University of Washington’s Dr. Shuai Huang underscored the potential and pitfalls of rule-based algorithms for type 1 diabetes prediction. Our understanding is that Dr. Huang’s algorithms predict outcomes (such as type 1 diabetes development) based on a set of rules (linear combinations of risk factors) derived from biomarker data. In this way, data patterns and insights are built into the machine learning processes used for prediction at a fundamental level, forcing algorithms to be adaptive (e.g., risk is different for patients with HLA-DR3 vs. HLA-DR4) and conditional (e.g. HLA risk changes depending on whether autoimmunity is present). The rules that inform these algorithms must be formulated in a very cautious manner, incorporating human insights. Dr. Huang provided a useful parable, hearkening back to a statistician named Abraham Wald, who was tasked with placing armor on WWII military planes. He couldn’t coat the entire aircraft with armor due to (i) resource scarcity and (ii) weight. He parked himself on the military base’s runway and observed the location of bullet holes in the plane. At first glance, one would think to reinforce the spots that attracted the most fire – as did most people in the audience. However, Dr. Huang argued, this rationale is not necessarily right in this case: One should instead reinforce the parts of planes that do not have any bullet holes, as planes that did receive fire in these areas never made it back to the base (i.e., they were shot down). Dr. Huang drew a connection to algorithmic prediction: In order to provide the greatest benefit, machine learning must be married to data insights, which are often not intuitive and require humans to contextualize.

7. Generating Personalized Disease Risk Profiles from Health Records; Algorithm Predicts Likelihood and Rate of Developing T2D Complications (Under Review)

Notre Dame’s Dr. Nitesh Chawla detailed his team’s work leveraging patients’ medical histories to predict their individual risk of developing specific diseases. The results of this effort, called CARE (Collaborative Assessment and Recommendation Engine) were published in 2013 in the Journal of General Internal Medicine – an impressive feat, especially given the absence of MDs in the byline (though perhaps a broader reflection on the entry of “non-traditional players” into healthcare). CARE functions via a Netflix- or YouTube-like, “collaborative filtering method,” which identifies patient similarities and generates personalized disease risk profiles for individuals. Importantly, these similarities stretch far beyond traditional factors. As Dr. Chawla noted, healthcare (i.e., access to care, quality of care) is responsible for only ~20% of an individual’s overall health and wellness state. To account for the myriad other social health determinants, the algorithm sorts through shared diseases, symptoms, family histories, lab results, urban/rural residencies, occupation, demographics, and more. While a disease risk profile is certainly a step in the right direction, Dr. Chawla acknowledged that if such a solution were to ever actually be implemented in the clinic, meaningful contextualization would be required. To this end, Dr. Chawla and his team provided physicians with a list of the top 20 diseases for which a given individual is likely at risk. Impressively, follow-up visits confirmed that the CARE model was able to capture ~51% of future diseases in the top 20 list. Dr. Chawla admitted that 51% may not seem like a strong proportion, but as he pointed out, if even a small portion of these diseases can be addressed in a timely fashion, savings in cost and life could prove very effective. We do wonder to what degree (if any) these savings might be offset by any preventative measures taken regarding diseases that were inappropriately flagged.

  • Dr. Chawla just wrapped up work using CARE to predict the likelihood of developing complications (e.g., myocardial infarction, heart failure, kidney disease, liver disease, retinopathy, stroke) for individuals diagnosed with type 2 diabetes ­­– results are currently under review for publication. Critically, this algorithm will not only describe whether a given complication is likely to occur, but also the speed at which progression might take place.

  • To enhance the CARE model, Dr. Chawla is working to integrate medication data. For example, an individual moving to a different state may not establish a new health record, but it is likely that the individual’s prescription will be transferred and thus recorded. To this end, Dr. Chawla found that when health record and medication data were combined, CARE accurately captured 66% of diseases (up from 62% with health record alone).

  • Dr. Chawla emphasized the ability to translate a research paper into a useful tool as the “biggest challenge.” He underscored the importance of social scientists and design thinkers, cautioning that “silos will fail us.”

  • This approach to predicting disease is similar to that employed by GNS Healthcare, Cardinal Analytics, and Base Health – read about them all here.

8. Dr. Søren Brunak on the Potential for Big Data in Primary Prevention; Modeling Connections Between Diseases to Predict Risks and (When Diseases Will Arise)

Opening the meeting’s official agenda, University of Copenhagen’s Dr. Søren Brunak highlighted the utility of big data in understanding relationships between diseases and predicting diagnoses. According to Dr. Brunak, current systems of acute diagnosis are inefficient and reactive whereas predictive models can help raise awareness of associations (genetic, epidemiological, etc.) between diseases, alerting patients and clinicians to risks before diagnosis. One such model – the focus of Dr. Brunak’s work – connects diseases in stratified trajectories, demonstrating how each is connected so that clinicians and patients can take proactive steps to prevent further diagnoses if possible. Moving forward, Dr. Brunak said that data from the healthy domain (e.g., A1c trends before diabetes diagnosis) would help to complete the diagnostic picture, improving models and providing insights for primary prevention.

  • Dr. Brunak suggested that advances in big data could make primary prevention possible, stating that his group is even beginning to understand when patients will be diagnosed with diseases based on trajectories. We wonder how understanding an individual’s time to diagnosis might be valuable in diabetes – for type 2 diabetes, could the timescale inform the intensity and/or selection of pharmacotherapy? Dr. Desmond Schatz expressed his belief that primary prevention is an attractive prospect for type 1 diabetes last month at Friends For Life, and GPPAD’s currently-recruiting POInT trial is the first primary prevention trial for oral insulin in the type 1 diabetes population. Regarding diabetes-related CV complications, Lilly’s highly-anticipated REWIND CVOT (for GLP-1 agonist dulaglutide) is the first such study to include a large (69%) primary prevention cohort, which will provide excellent insight into the CV primary prevention capabilities of the once-weekly GLP-1 agonist.

  • Dr. Brunak underscored the need for standardized, longitudinal data to construct accurate disease trajectories, but noted that such a database is rare with clinical big data still in its infancy. However, he  touted data from his home country of Denmark as the next best thing, explaining that the personal identification number (PIN), 20-30 years of digital EMR data, and single-payer system of Denmark create the perfect (good) storm for comprehensive, linkable data. Together, the standardized medical and socioeconomic information associated with each PIN allow for a holistic analysis of associations between diseases, including the idiosyncrasies attributed to social health determinants like race, income, and education – factors which we know to be significantly associated with diabetes and obesity.


-- Brian Levine, Peter Rentzepis, Maeve Serino, Adam Brown, and Kelly Close