“Do Currently Available Blood Glucose Meters Meet Regulatory Standards?"

May 21, 2013: Arlington, VA Full Commentary

Executive Highlights

Diabetes Technology Society’s (DTS) May 21 meeting posed the question, “Do currently available blood glucose meters (BGM) meet regulatory standards?” The conference served as an intimate forum for individuals to voice concerns about BGM accuracy and system evaluation to the FDA, which had a strong presence at the meeting. Representatives from academia and industry consistently identified low-cost meters as the source of device inaccuracies on the market. Industry was largely represented by the Big Four BGM companies (Abbott, J&J, Bayer, Roche), and we would have been interested to hear the perspective of low-cost suppliers on how best to ensure that meters perform well for patients.

Speakers agreed that marketed meters do not uniformly meet current ISO standards. Drs. Ronald Brazg (Rainier Clinical Research Center, Renton, WA), Guido Freckmann (Institut für Diabetes Technologie, Ulm, Germany), and Andreas Pfützner (IKFE, Mainz, Germany) reviewed post-market evaluations of commercially available meters, drawing attention to inaccurate systems and lot-to-lot variability. The speakers suggested that performance deviated according to whether the meter was “branded” or “low- cost.” However, despite differential quality, formularies and macroeconomic challenges place significant constraints on patients’ meter choice. Dr. Brazg urged payers to take into account the adverse effects that can result from meter inaccuracy, as opposed to just initial costs; specific clinical implications of meter error were discussed by Dr. Brad Karon (Mayo Clinic, Rochester, MN), Dr. Boris Kovatchev (University of Virginia, Charlottesville, VA), Terry Lumber (AADE, Fairfax, VA) and Dr. Howard Zisser (Sansum Diabetes Research Institute, Santa Barbara, CA). Dr. Jan Krouwer (Krouwer Consulting, Sherborn, MA) and FDA’s Katie Serrano highlighted many reasons why meters do not perform according to pre-market trials, which are company-conducted, small, and well controlled. The need for post-market quality assurance was clear; however, as both FDA and industry representatives noted, some companies devote more resources to the task than others. A few attendees suggested that independent institutions be tasked with conducting quality evaluations, which Dr. Lutz Heinemann (Science & Co., Düsseldorf, Germany) also recommended for the EU market during his afternoon talk.

A parallel theme was that patients and HCPs should be aware of the limitations of BGMs, especially in the critical care environment. Drs. George Cembrowski (University of Alberta, Alberta, Canada) and Martha Lyon (Royal University Hospital, Saskatchewan, Canada) highlighted the unique physiological challenges presented by this patient group. The talks underscored the potential benefits of a system, be it on the hospital level or national level, to verify the performance of point-of-care BGMs used in the ICU.

DTS’ Dr. David Klonoff succinctly captured the day’s discussion in his concluding remarks. “We're not here to find a solution, we're here to identify a problem,” he said. “Do currently available blood glucose meters meet regulatory standards? My answer is all too frequently, no they do not.” We noted a sense of frustration that the regulatory process for BGM remains largely unchanged, three years after FDA’s 2010 Public Meeting on BGM accuracy ( 2013 could be a pivotal year, since both CLSI and ISO have released new, tighter standards in recent months). We hope that DTS’ meeting will intensify efforts on this front and create more forums so that any new regulations are designed as well as possible. Multiple individuals at the meeting suggested that CMS’ competitive bidding could create a dynamic where decisions are made on price alone, putting patients at risk. Indeed, we hope that payers will recognize that quality diabetes care is worth its financial cost, to avoid the much greater costs and suffering of complications. And, we hope it becomes harder for lower-cost providers to compete.


Table of Contents 


Detailed Discussion and Commentary

Session 1: How is BG Monitor Performance Determined?


Mitch Scott, PhD (Washington University, St Louis, MO)

Dr. Mitch Scott reviewed the current standards for BGM accuracy as defined by the FDA, ISO, CLIA/CAP, ADA, and – in greatest detail – CLSI. He explained that the CLSI guidelines address most potential sources of meter error that would occur in the real world. Also, he showed that many older meters (names unspecified) have not met accuracy standards in post-market evaluations, whereas a few of the newest meters to market (names unspecified) seem to perform well within the new CLSI requirements.

  • Dr. Scott began by answering the specific question in his talk’s title, with a clear and brief overview of current BGM accuracy guidelines. He also mentioned CLIA’s standards for central laboratory glucose measurements: within 10% for values above 60 mg/dl, and within 6 mg/dl for values below 60 mg/dl.

    • To approve new glucose meters, the FDA uses ISO’s 2003 standards (and CLSI’s previous standards, which specify the same targets as the ISO standards). These standards require that 95% of all measurements be within 20% of reference (for values>75 mg/dl) or within 15 mg/dl of reference (for values <75 mg/dl).

    • In January 2013, CLSI tightened its standards so that 95% of all values must fall within 12.5% of reference (for values >100 mg/dl) or within 12 mg/dl of reference (for values <100 mg/dl). The new standards also specify that 98% of measurements must meet the old CLSI criteria. (Still, Dr. Scott observed, 2% of meter results “could be anywhere.”)

    • In April 2013, ISO voted to approve its own tighter set of accuracy standards, which are available for sale online. A member of the standards meeting spoke up from the audience to say that the document would be officially published in “another month or two.”

    • The CLIA/CAP standards for hospital-use meters require BG measurements to be within 20% for samples >60 mg/dl (or within 12 mg/dl for samples <60 mg/dl). Dr. Scott noted that the comparator in these studies is not a reference method, but rather the mean of measurements with peer devices – a relatively easy mark to hit.

    • The American Diabetes Association suggests a total analytic error below 5%. This standard is rarely discussed, Dr. Scott noted, because not even central laboratory measurements are so accurate.

  • Dr. Scott spoke in detail about the newest CLSI guideline, officially titled POCT12- A3. These standards specify that split-sample studies should be performed in the intended patient populations. He said that these standards account for both analytic performance and (if the studies are designed in accordance with the guidelines) potential clinical interferences. They are unlikely to include as much user error as would occur in a large, busy hospital; however, Dr. Scott said that no standards can really address this. Another issue that he said is beyond the scope of the guidelines is the physiological difference between capillary and arterial glucose values (which can be exacerbated in critical care patients, e.g., those with hypotension).

  • Several recent studies have demonstrated that newer meters (names unspecified) are up to FDA and/or CLSI accuracy standards. Dr. Scott said that two meters recently cleared in the US not only performed well overall, but also were less susceptible than older meters to hematocrit effects and under- or overfilling (Chan et al., Clin Biochem 2009; Meynaard et al., Crit Care Med 2009; Hopf et al., Diab Technol Ther 2011). More recently, Dr. Scott and his colleagues analyzed a new meter in accordance with POCT12-A3. The researchers used 1,200 meter results from 600 patients, who were distributed roughly evenly among 15 different critical care areas. The meter posted 98.8% of values within the new CLSI standard (12.5% / 12 mg/dl) and 99.8% within the old ones (20% / 15 mg/dl). Of values below 100 mg/dl (n=278), 98.6% were within 12 mg/dl (Mitsios et al., J Diabetes Sci Technol 2013).


Pat Bernhardt (Scientific Reviewer, Office of in Vitro Diagnostic Device Evaluation and Safety, CDRH)

Pat Bernhardt gave a high-level overview of the FDA’s regulation of blood glucose meters. She discussed the myriad factors the Agency considers when evaluating a device, including precision, accuracy, linearity, interference, environmental effects, labeling, and the cleaning/disinfection process. Notably, in her discussion of FDA’s accuracy evaluation, Ms. Bernhardt remarked that the FDA was part of the ISO revision discussion and voted to move the new ISO standards forward; however, the Agency did so because it recognized how many countries rely on ISO requirements to set evaluation criteria. Said Ms. Bernhardt, “We don’t believe that [the new ISO standard] goes far enough to meet the needs of users. The device criteria for the hypoglycemia end hasn’t really changed much.” As a reminder, while the FDA chose to adopt the 2003 ISO 15197 standards, it is not legally required to adopt the revised ISO requirements. In the group discussion that followed, Ms. Bernhardt indicated that the Agency wanted to produce its own guideline for accuracy requirements, but could not provide clarity on a possible timeline to publication.


Moderator: Katie Serrano (FDA, Silver Spring, MD):

Panelists: Mitch Scott, PhD (Washington University, St Louis, MO) and Pat Bernhardt (Scientific Reviewer, Office of in Vitro Diagnostic Device Evaluation and Safety, CDRH)

Q: The FDA has made it clear that POCT12 and ISO for accuracy are not sufficient. Why won’t FDA articulate what is sufficient so that we as developers can have that target and can design to it?

Ms. Bernhardt: We are in the process of developing guidance. We wish it were available it as soon as possible, but we don’t know when that will be. We are trying to assess all the comments from the 2010 meeting to come up with standards. [Editor’s note: for our coverage of FDA’s 2010 Public Meeting on blood glucose meter accuracy, see our Day #1 report at http://www.closeconcerns.com/knowledgebase/r/e7dfc468 and our Day #2 report at https://closeconcerns.box.com/s/de57ce534d6c39ccbce3.]

Q: So we should expect to see FDA’s expectations in this pending guidance?

Ms. Bernhardt: Yes. As I said, we wish it could be available immediately, but we don’t know exactly when it will be.

Dr. Brad Karon (Mayo Clinic, Rochester, MN): We took stable, hospitalized patients’ venous blood by venipuncture and dosed meters and multiple meters met criteria. But when we tested blood from central venous catheters, the meters had poor performance. It’s clear that venous catheter blood can lead to overestimation of glucose values. Have you considered using venous catheter blood as the evaluation site as opposed to considering all venous blood to be the same?

Ms. Bernhardt: That may be something we have to think about in the future.

Dr. Scott: This is for Brad. I know your study in Diabetes Tech 2009. Have you figured out what it is that caused the poor performance? Is it a plasticizer in the lines?

Dr. Karon: We know it’s not exogenous glucose contaminating the samples. AJCP 2008 was our first study and this was a follow up study of 50 stable, hospitalized patients with diabetes. We drew specimens from the central venous catheter. We performed a venous puncture and sent that to the lab. So we had circulating venous and catheter venous blood samples. One meter was sensitive to hematocrit and one was not. The one that was sensitive to hematocrit performed more poorly. Probably, when blood sits in the venous catheter, this affects the measurement. It seems reasonable that it’s a plasma water viscosity or hematocrit effect.

Ms. Serrano: We know current ISO and potential future ISO guidelines are for over-the- counter devices. These are the standards that are used by manufactures to evaluate these devices even thought we know these are commonly used in hospitals. What are the effects of that?

Dr. Scott: POCT12 specifically addresses health care institutions. It is expressly for health care institutions. That 12.5% - I think it is reasonable based on well-done modeling studies. The academics on that writing committee were pushing for 10%. Industry was pushing for 15%. On any consensus document, there’s compromise. But it is within the range of what the modeling studies suggest accuracy needs to be to avoid increased incidence of hypoglycemia.

Dr. Courtney Lias (FDA, Silver Spring, MD): I think POCT12 is a good document and is good for hospitals. What is your impression – you worked on the document – as to whether hospitals look at the document as verification of meter performance or whether they look at it as a way of establishing meter performance in their population. Manufacturers are doing evaluations meant for monitoring at home. They are not addressing hospital use. How do hospitals perceive POCT12?

Dr. Scott: I have to qualify my comments with the fact that I have large academic hospital blinders. The institutions I’m familiar with are using PCOT12 and are using it to evaluate meters in the critical care setting. They are using split sample design. Are community hospitals doing this? Probably not. But I suspect that they are doing a smaller version of it. But again, I don’t know. The other compromise on PCOT12 was that 2% of values can be anywhere. Those from hospitals wanted it to be 99%. Industry wanted 95%. Consensus is a compromise.

Dr. Karon: I think until recently most smaller environments didn’t do a lot of evaluation. Until recently, I would have guessed that very few facilities would have done that. But by adding the disclaimer to hospital- use meters, it will create a situation where they will do something on that general realm, even if it is smaller scale. I think you’re going to see more sites doing on-site evaluations because of the limitations of meters.

Dr. Scott: Will this be added to the College of American Pathologists (CAP) checklist?

Dr. Karon: I don’t speak for CAP. I think it’s going to be the FDA that causes sites to do in-house evaluations as opposed to an accreditation body.

Ms. Serrano: It’s not a new statement on the glucose meters. In the last 7 years, they’ve all included the statement. It is not a new statement. The old, old meters that were cleared over 10 years ago might not have had it, but for many years we have included it.

Dr. Scott: Is that true? The original disclaimer said it was not for use in critical care? Ms. Serrano: That exact language was included.

Dr. Karon: You’re right. It’s not a new concept, but the awareness is going to drive people like me who direct hospital point-of-care programs to do some version of that. Certainly, point-of-care is under resourced and not every site will do a good study.

Ms. Serrano: Meters are cleared for over-the-counter use, but are used in many scenarios that they are not cleared for. It’s always been the responsibility of the user to validate that use, but it came to the forefront with critically ill patients. But even using it in folks that don’t have diabetes – the devices are not evaluated for that use. Perhaps, we just weren’t paying enough attention before.

Comment: Most smaller hospitals cannot do good evaluations. In Canada, we have larger health systems and it’s incumbent on the lab director to select the best system. I would suggest newer ones with hematocrit insensitivity are the ones we should go towards. We should all read the guidelines of CLSI and ISO and heed what is in there. Most studies of accuracy really will not be any good because it is difficult to get the right samples and reference tests.

Dr. David Klonoff (Mills-Peninsula Health Services, San Mateo, CA): When looking at the users in accuracy studies, do you specify what type of training these patients might have?

Ms. Bernhardt: We don’t specify that the patient should be with or without diabetes, but we do specify it should be an untrained user.

Session 2: Do Approved BG Monitors Perform According to ISO Standards?


Ronald Brazg, MD, FACE (Rainier Clinical Research Center, Renton, WA)

Dr. Ronald Brazg reviewed a Roche-funded comparison of the Accu-Chek Aviva Plus with six other “unknown” or “less-known” meters (Brazg et al., J Diabetes Sci Technol 2013). The study enrolled adult patients with type 1 or type 2 diabetes (total n=100). Blood glucose measurements took place within a five-minute window and compared three different strip lots for four different meters, in duplicate. Each measurement period began and ended with the reference measurement specified in the meter’s label (perchloride hexokinase for Accu-Chek, YSI for the other systems). Dr. Brazg and his colleagues found that only three BG systems met the current ISO standards for all three strip lots. The Accu-Chek Aviva Plus was the only system that met the new ISO standard, and its lot-to-lot variability was lowest. They also observed that five of the meters overestimated glucose levels when hematocrit was low (implying that hypoglycemia might be masked in patients with renal insufficiency or anemia). Dr. Brazg concluded that payers should make formulary decisions that take into account the adverse effects of inaccurate measurements, and not just cost; he emphasized that such formulary decisions are often the key input in the decision of which meter a patient uses.

  • Dr. Brazg noted that his group’s accuracy met all the criteria to enable comparisons between different BGM accuracy studies, as specified by Thorpe and colleagues in a recent review (Diab Technol Ther 2013). Thorpe and colleagues recommended that researchers use manufacturer-recommended reference methods, a sufficiently high sample number (at least 100 patients in duplicate), a glucose concentration spread across the “bins” specified by ISO (though Dr. Brazg’s study included only 3% of results under 50 mg/dl, instead of 5%), and multiple strip lots. Recommendations also included that the research be performed by an independent clinical research center than by a manufacturer directly, and that the study be conducted in keeping with ISO specifications.


Guido Freckmann, MD (Institut für Diabetes Technologie, Ulm, Germany)

Dr. Guido Freckmann’s presentation underscored that a CE Mark does not guarantee blood glucose meter accuracy. Backtracking momentarily, Dr. Freckmann reminded the audience that accuracy is a combination of precision and trueness; consequently, blood glucose systems (defined as both the meter and the strip lot) can fail to meet ISO standards by either metric. Turning to commercialized systems, Dr. Freckmann showed that in his team’s evaluation of nine marketed systems, only seven fulfilled the prerequisites of the ISO standard and only five were within the limits of the new proposed ISO standard (Baumstark et al., German Diabetes Association Poster 2013). Further, in his team’s evaluation of five systems (four strip lots per system), he found noticeable difference between strip lots (Baumstark et al., J Diabetes Sci Technol 2012). The maximum lot-to-lot difference between any two of the four evaluated test strip lots for a single blood glucose system ranged from 1.0% to 13.0%. Only two of the five systems evaluated met the current ISO criteria for every strip lot tested. With this data in mind, Dr. Freckmann argued that postmarket evaluations were important to ensure meter system’s adherence to quality and accuracy standards. He pressed that blood glucose meters should not only be considered according to their price, but to their quality and system accuracy. Multiple speakers expressed this sentiment during the meeting.


Andreas Pfützner, MD, PhD (IKFE, Mainz, Germany)

Reviewing a plethora of BG trials performed at his clinical research center, Dr. Andreas Pfützner shared insights on how accuracy evaluations should be performed and interpreted. He concluded that branded meters used by experienced investigators generally appear to comply with ISO criteria, while lower-cost meters generally seem to require improvement. He also noted that ISO accuracy standards may not be appropriate for studies that are not designed in accordance with ISO specifications, and he highlighted the potential benefits of post-marketing evaluations to ensure that companies are performing quality assurance. Another emphasis was the need for appropriate user training. At ADA 2013, Dr. Pfützner’s group (Demircik et al.) will present a study in which inexperienced BGM operators initially tested in a way that yielded a MARD of 9%, which improved to MARD of 5% after training.

  • Dr. Pfützner issued a general caveat on industry-sponsored research: studies with better results are more likely to published (publication bias). For this reason, he suggested that clinical researchers should always report their sponsors.

  • In a Sanofi-sponsored trial, several brand-name meters all performed with 97-100% of values in the Clarke Error Grid A Zone and a mean absolute relative difference of5-8% relative to YSI (Pfützner et al., Curr Med Res Opin 2012). (Dr. Pfützner commented that 10 years ago, the MARD would have been more like 10-15% – which he said is also the level at which today’s cheaper meters would perform, “in all likelihood.”) The study included 106 patients and six meters: BGStar, iBGStar, Accu-Chek Aviva, Contour, FreeStyle Lite, and Ultra2.

    • In a post-hoc analysis by Borchert et al. that will be presented as a poster at ADA 2013, the study’s results were re-analyzed according to the new ISO criteria. All the meters met the new criteria except for the Contour. (However, Dr. Pfützner noted that he has found Bayer’s recently released Contour Next strips to be much more accurate.)
  • In an Abbott-sponsored study designed to more match simulate real-world conditions, several brand-name meters again performed well (Tack et al., Diab Technol Ther 2012). The study’s patients (n=453) were roughly evenly split between people with type 1 and type 2 diabetes; most tested themselves four times per day. For each of five brand-name meters, four strip lots were purchased through regular distribution channels. Based on a YSI reference method, the meters ranged in MARD from lowest to highest as follows: FreeStyle Lite (MARD ~5%), FreeStyle Freedom Lite, Accu-Chek Aviva, Contour, and OneTouch Ultra Easy (MARD ~9%). Dr. Pfützner also showed an analysis by the old and new ISO standards. Not all the meters met the standards, but he emphasized that this result was not meaningful: the standards should be used only for studies designed according to ISO specifications (which this one was not).

  • Speaking to the need for post-marketing surveillance, Dr. Pfützner showed that one low-cost BGM product performed well when the meter and strips were supplied by the manufacturer, but much worse in off-the-shelf testing. The meter in question was the IME iDia, manufactured in Taiwan. In a manufacturer-sponsored study in which the manufacturer supplied the strips and meters, MARD was 9.6% – “an acceptable value for a low- cost meter in Germany” (Funke et al., German Diabetes Conference poster 2011). One year later, in a Sanofi-sponsored study in which meters and strips were purchased from pharmacies, Dr. Pfützner’s team found the IME iDia to have a MARD of 22.6% – the worst result of several meters included in the study (Schipper et al., Diabetes Metab Heart 2012). Dr. Pfützner attributed the discrepancy to issues of storage and shipping.


Jan Krouwer, PhD (Krouwer Consulting, Sherborn, MA)

With almost 20 years of experience as the person in charge of clinical trials for a diagnostic company, Dr. Jan Krouwer spoke with authority on the type of biases that exist in blood glucose meter evaluations. He caveated that his discussion addressed only accuracy trials that companies pay hospitals to perform. Said Dr. Krouwer, “We envisioned these as company experiments done off site.” He did not recommend changes to the approval process, explaining that some bias is unavoidable. Rather, he suggested that the scientific community take advantage of the wealth of postmarket data on glucometers to understand how these devices are performing and changing overtime. To demonstrate, he remarked that an insulin using patient who tests three times per day would average 7.9 billion blood glucose tests over his or her lifetime. Further, there are ~12,000 adverse events reported per year on blood glucose monitors – a “tremendous amount of data that needs to be understood,” said Dr. Krouwer. He asked the scientific community to develop a set of goals for postmarket evaluation to make available data and postmarket quality assessments more meaningful.

  • Conflict of interest: Dr. Krouwer explained that in clinical trials whereby companies pay hospitals to perform evaluations, generic bias exists. Companies might supply the protocol, do data analysis, and write or edit the publication, he said.

  • User error: User error, explained Dr. Krouwer, is unrelated to the meter being tested (e.g., patients spill coke on their hand, affecting test results). Instrument error is due to the meter itself. However, said Dr. Krouwer, it is difficult to uniformly consider user error as being devoid of any problem with the meter because of the interaction between the user and instrument (e.g., when a user gives too small of a sample, is it a failure of the user or a failure of the meter to detect that the sample was too small?).

  • Incidental bias: A common case of this, explained Dr. Krouwer, is that clinical evaluators often receive more training than routine staff, which helps to minimize user error during testing.

  • Reagent bias: As raw materials and vendor processes change, explained Dr. Krouwer, lot-to-lot biases begin to emerge. As such, the first few lots tend to have lower bias and these are the ones included in initial quality evaluations.


Katie Serrano (FDA, Silver Spring, MD)

Ms. Katie Serrano discussed several reasons why meters seem to perform better in pre-market trials than in post-market settings. The pre-market studies required for FDA clearance tend to be small, performed under well-controlled conditions by well-trained operators, and conducted by manufacturers with a strong financial interest in reporting good results. (During Q&A Ms. Serrano explained that FDA is working to make its regulatory review more strict but that this process is difficult to achieve legally: as class 2 medical devices, meters need only to show substantial equivalence to marketed devices, which obviously were approved under the existing system.) Then once BG systems are on the market, their performance can be affected by quality-assurance and shipping-and-handling issues – areas where some companies devote more resources than others, as mentioned several times during the day.


Moderator: Robert Vigersky, MD (Walter Reed National Military Medical Center, Bethesda, MD)

Panelists: Ronald Brazg, MD, FACE (Rainier Clinical Research Center, Renton, WA); Guido Freckmann, MD (Institut für Diabetes Technologie, Ulm, Germany); Andreas Pfützner, MD, PhD (IKFE, Mainz, Germany); Jan Krouwer, PhD (Krouwer Consulting, Sherborn, MA); and Katie Serrano (FDA, Silver Spring, MD):

Q: [Representative from Roche] Katie, everything we’ve heard here suggests that there is no refuting that there is differential performance between lower cost meters and branded meters. I would assume that this would raise enough concern that the amount of inspection for low cost providers should go up. But data suggest that these manufacturers are not being investigated. Why not and when will they be investigated? The inspection arm of the FDA, which is vast, certainly can be looking at these things. It is unbelievable to see that these manufacturers have no Medical Device Report (MDR) data.

Ms. Serrano: We are concerned about it. I would say the manufacturers we are particularly concerned about are not in the US. It is hard for us to do inspections outside of the US, particularly in Asia. We have done it. Getting good data out of those inspections is a challenge, but it is something that we are concerned about.

Q: [Representative from Roche] This is for the panel’s practicing physicians. You have studied the differential meters. In your practice, would you tell people to use those meters that are clearly performing differentially for patient care?

Dr. Pfützner: My answer is clearly no. Unfortunately, our system in Germany is that if we are part of the government’s reimbursed system – and it’s the same with drugs – we are forced to use ~20% of meters that are in the lower cost class. That is part of the reason why I stay in private practice. I would definitely stick with branded meters.

Dr. Brazg: I agree that ideally, and it’s not different for medications, we’d like to stick with meters with high quality and accuracy. But patients are under a lot of pressure. Formularies often dictate what meter patients are going to use. Patients are under financial pressure to use something that isn’t going to put them in the hole.

Dr. Barry Ginsberg (Diabetes Technology Consultants, Wyckoff, NJ): I consult for a lot of BGM companies. First, a general comment: I did a review of one of the Asian BGM companies and walked into their facility. Their procedure for issuing a medical device report (MDR) was that, basically, a patient had to die before they would submit an MDR. My second comment is that we are developing a two-tier system of “haves” and “have- nots.” The have-nots are because CMS is going to competitive bidding, with prices that will be difficult if not impossible for branded-meter manufacturers to meet. The have-nots will be stuck with meters that met 20% standards several years ago but probably couldn’t meet 40% standards in postmarket analyses today. I have previously recommended that we need to re-think how we measure accuracy, perhaps by doing something similar to Europe’s notified body system. All the regulatory studies would be done by independent bodies, and the studies would be performed every year – if a manufacturer missed the standard one year, they would have to improve the next year. Then we would get away from the two-tier system.

Ms. Serrano: FDA has been working on revising its regulatory system; we have taken input, for example, at the 2010 meeting. However, the legal bar is substantial equivalence, so it is difficult for us to enforce changes in the standard. We are working on this, but it’s not there yet.

Dr. Martha Lyon (Royal University Hospital, Saskatoon, Saskatchewan, Canada): A recurring theme is the evaluation of accuracy. When you evaluate accuracy, the question is to which method do you compare? To a predicate device, YSI, hexokinase? I found it interesting in your presentation that the meter compared to perchloric acid hexokinase passed ISO standards now and the proposed ISO standards. I thought the study was great. My question is, how did the two methods used for accuracy compare? And if you use the meter that was compared to perchloric hexokinase, how well did it perform compared to YSI?

Dr. Brazg: Those comparison methods were used because we it was according to manufacturer guidelines. Roche uses hexokinase, whereas all other meters used YSI for approval.

Dr. Lyon: If you measured the results of patients using hexokinase and YSI, what kind of variance would we see in those results?

Dr. Pfützner: We did a recent comparison difference and it depends on the sample, but it can be up to 8% difference in the standard. His study was a fair comparison because it’s what the manufacturer compared to.

Dr. Lyon: Maybe that’s where we have to reassess our accuracy evaluations.

Dr. Pfützner: There is no isotope dilution mass spectrometry that can be used widely at this point. As an analytical chemist, it is actually crazy that these two methods are allowed.

Q: I think that Dr. Lyon has exposed a problem in Gary Thorpe’s article. The ISO criteria are not set up to design comparative studies, but to compare meters to a standard for market approval. With a good design, like Dr. Brazg’s, you add the statistical problem that the meter at the center of the study is double-sampled, making its data more valid. Also, different references were used. Also, in evaluation studies there is potential doping of the samples to get values in the required bins. Perhaps we ought to put together standards for comparative studies, because ISO does not provide for that.

I have a question for Ms. Serrano. You cannot anticipate all the issues; manufacturers can’t be responsible for hand-washing or improper storage. So how do you propose that manufacturers design lot-release criteria?

Ms. Serrano: Typically three control samples are tested by manufacturers for lot-release, with ISO standards used. ISO is supposed to be total accuracy – 100 samples tested in the hands of lay user, across the sampling range. In this case the manufacturer is using ISO only for the manufacturing process, even though you know that other factors will occur in the real world. Probably as a manufacturer you need to do an evaluation to see how much tighter to make the lot-release criteria, to ensure that real-world results are up to standard.

Comment: The science of industrial quality control is pretty well felt. Other experts in the field can put together criteria for lot release. That is something that needs to be done. It makes a difference whether MARD is 2%, 3% 5%...so let’s look at MARD as a function of glucose level and bias as a function of glucose level. We should get good lot release data from that.

Dr. Mitch Scott: First a clarifying point – there are only two reference methods in the world recognized by JCTLM (Joint Committee for Traceability in Laboratory Medicine) – isotope dilution mass spectrometry and perchlorate hexokinase. YSI’s use as predicate is historical – it was cleared before the 1976 guidelines.

These issues are not just restricted to glucose meters. In my career I’ve done over 100 studies for sponsors for FDA submission; none have been glucose meters. The FDA package insert for a major manufacturer’s 10% CV for troponin was performed in my laboratory by a longtime employee of mine, with a single instrument under ideal conditions. I teach our residents and fellows always to take package inserts with a grain of salt. I disagree with Katie Serrano – I don’t think manufacturers are fudging data, but simply that the studies are performed under ideal conditions. In the real world in our facility, we are running troponin assays on four different instruments with many different operators – the cutoff is now more like 40%, not 10%. It’s just the way the system is set up today; I don’t think there is much we can do, short of postmarket evaluation. The CAP criterion for acceptability is 20% vs. a peer – hard to fail.

Ms. Serrano: I agree with you. I don’t think that most data is fudged. I do think we have probably seen some submissions where it might not have been generated by the company and that is difficult for us to assess.

Dr. Karon: Release criteria and the concept that evaluations are performed on early lots and need to be looked at overtime – that’s certainly not limited to glucose meters. Manufacturers rightly point out that there are matrix effects in PTA (editor’s note – we believe this is Product Test Authorization) surveys, but perhaps they wrongly point out that they are all due to matrix effects. How do we look at the long-term traceability of assays over time?

Dr. Krouwer: I don’t have an answer, but I do have a story. Someone came to us and asked to pay extra for lots of material that were more tightly controlled, those that had tighter acceptance criteria than standard acceptance. It is something that companies struggle with and really, it is an economic issue.

Dr. Lias: You are right – these issues are not only for BG meters, though meters are what we are talking about today. At FDA when we think about whether to address potential issues, we think about all these things: quality control, proficiency testing, laboratory use. With regard to lab use, a lot of you may be using this type of devices in your facilities. In most cases that is wave testing, not proficiency testing. End- users tend not to be as well trained in how to use the devices, and they tend not to be aware of the gaps between the labeled results and real-world use. To address another earlier point – we have seen fraudulent data on inspection of some centers; it’s rare, but it happens. It’s hard to detect fraudulent data without an inspection, and in some countries, inspections are not unannounced. With one manufacturer that I will not name, there was a big issue with calibration drift over time; we try to address such issues with post-marketing controls in place. Most manufacturers do a good job of following traceability over time, but some do not.

Dr. Vigersky: I’m a clinician and what I want to know is whether this meter is going to make a difference in outcomes. We’re talking about analytical accuracy, but from the clinical point of view, is their A1c better? Do they have less hypoglycemia? Less hospital admissions? In this era, where there may be “haves” and “have-nots” in the US – there may not be in Germany – isn’t there an opportunity to study outcomes in your patient population, real hard outcomes, and give us an answer that will help the clinician make a decision?

Dr. Pfützner: It is not nice to be part of a field experiment. Yes, we have opportunity to test this, but if my assumption is right, it is unethical to do such a comparison. I would like to reformulate my statement. For me, it’s clear that the branded meters will consistently meet the standards. In our system, we have the patient pay system and the government reimbursed system, which meets all reimbursement but only for a defined quality of care. If my assumption might be correct, some patients may have to die before we change the rules. We all drive cars and we have a system where cars are checked every two years. Why don’t we have the same thing for devices? It’s simply a request from FDA to do so and then we could get some more clarification and comparisons would get incentivized. Right now, honestly, they are incentivized not to do it because it increases cost if you have quality assurance systems in place. We need to satisfy safety needs and accuracy needs in a way that doesn’t force us to be 100% with both. Every percentage is really requiring such a huge increase in pricing. Everyone says with 5% outside, it can be everywhere, but we know that it is not. It’s the extreme outliers that are really endangering patients. It doesn’t mean that all the inaccuracies have clinical implications – that’s a weakness of the Clarke Error Grid. We need to find a way to practically handle that issue and I don’t want to wait five years.

Dr. Klonoff: The types of clinical studies we would like to see would not be ethical to perform. Thus, the Diabetes Technology Society believes that modeling studies are very important to perform. We provided a grant to the group at the University of Virginia, who published a modeling study in our journal a few years ago. Later today Dr. Boris Kovatchev will present new data funded by a subsequent DTS grant, and Dr. Brad Karon will present data also.

Dr. Simmons: Germany’s B-meter system – a price-only tiered system – has been disrupted in terms of its validity to distinguish between higher- and lower-performing meters. We have put our new highest- performance meter in the B meter system simply to be competitive in the market.

Dr. Pfützner: Please don’t take Germany as a representative country. We are going back to the Stone Age. Our system is debating the value of blood glucose testing at all; it wants us to go back to urine strips, or even tasting. The Germans here have an unusual perspective, because we work internationally. Our colleagues who work only in Germany have a very different set of concerns than the comparative accuracy of different blood glucose meters.

Dr. Simmons: It’s a confusion of apples to oranges. It feels almost as if in this room it’s a dirty industry secret that we test in the best conditions. But that’s science. Let’s separate out these other factors, because at end of the day, the only way you can know what the best performance of the meter is, is too look at it in best circumstances.

Dr. Vigersky: Is it necessary to have as strict accuracy criteria in patients who are not as susceptible to hypoglycemia as insulin-using patients are – for instance, those only on diet and exercise control?

Dr. Pfützner: In my point of view, this is a good question. In Germany, strips for non-insulin-using patients with type 2 diabetes are not reimbursed. My clinical perspective is that I want to still have a couple blood glucose tests per month to see where my patients are. A1c is a poor metric in type 2 diabetes, because it masks variability. When I see that blood glucose deteriorates over time, I make a change; changes in A1c are seen too late.

Ms. Serrano: FDA believes that meters should be designed and evaluated as appropriate for the intended- use population. Right now the intended-use population is broadly defined as over-the-counter (OTC) users. Some of the OTC end users may require tighter or less-tight standards, but no manufacturer is seeking such specific label claims.

Dr. Scott: Based on comments from the FDA meeting in 2010, I sort of expected a two- tiered criteria to come out of that meeting with different criteria for meters in critical care versus for type 2 patients doing self monitoring. I don’t know if anyone from the FDA would like to comment if that is pending or still being considered.

Ms. Serrano: From the 2010 meeting that was a very strong statement by the community and one we took seriously. Without guidance documents to talk publically on the matter, we are not able to say how we would like to move forward. But I hope someday soon we can have that discussion.

Session 3: How do Approved BG Monitors Perform When Used by Patients and HCPs?


Richard Louie, PhD (University of California, Davis, Sacramento, CA)

Dr. Richard Louie described the effects of temperature and humidity on the performance of point-of- care glucose meters, a topic of conversation particularly relevant to medical care during natural disasters. Dr. Louie showed that static and dynamic thermal stressors can affect the performance of glucose test strips and that these effects vary between products. Even short-term exposure to thermal and humidity stressors can affect glucose measurements. Further, the impact of environmental stressors on meters and the impact on strips appear to be compounding. To mitigate the errors and uncertainties introduced by thermal stressors, Dr. Louie drew attention to the need for proper handling and storage of reagents, the need to monitor reagents for potential adverse exposures, and the importance to understand how environmental stresses impact point-of-care performance. He suggested that the field needs standards for validating the performance of point-of-care reagents during dynamic stresses and that possibly, new robust reagents and packaging would be needed. He proposed a simple temperature lock out method whereby meters would no longer work if exposed to harmful temperatures. Of particular importance in disaster response, he recommended that deliberate supply deployment and resupply plans could minimize the exposure of point-of-care devices to extreme stressors.

  • Anecdotal reporting suggests that temperature and humidity affects point-of-care testing in the field. During hurricane Katrina new shipments of point-of-care meters reportedly failed after one week of use. In Massachusetts, paramedics complained that cold temperatures caused meter analyses to shutdown during emergency responses.

  • Laboratory analysis has shown that thermal stress impairs strip accuracy. Strips exposed to static heat stress (40C for four weeks) generally over reported glucose values compared to strips stored at room temperature, whereas strips exposed to cold stress (-21C for up to four weeks) generally under reported glucose values (Louie et al., Disaster Med Public Health Prep 2009).

  • Both foil-based packaging and vial-based packaging appear to withstand the effects of static humidity stress. No statistically significant performance differences were found between strips in foil packaging exposed to humidity vs. strips in foil packaging in control conditions nor between strips is vials exposed to humidity vs. strips in vials in control conditions (Truong et al., Proceedings of the 24th Undergraduate Annual Undergraduate Research, Scholarship and Creative Activities Conference 2013).

  • Dr. Louie showed one-week-old data from his lab to demonstrate that even short- term environmental stress exposure affects meter system performance. Meters and strips exposed to 15 minutes of extreme stress in a chamber (41.9C; relative humidity of 82.5%) were compared to a meter system kept at room temperature conditions. Systems were tested with aqueous control solutions of blood glucose 112 mg/dl. Meters that were “shock” stressed and tested with control strips showed significant difference from the control system as did strips that were “shock” stressed and tested with control meters (measured by mean glucose paired difference). Further, the effects were compounded. “Shock” stressed systems (where both the meter and strip were exposed) performed even worse against control systems than when only one variable (meter or strip) was exposed.

Questions and Answers

Q: On vial versus foil packaging, I didn’t see a lot of difference.

A: There were no statistically significant differences between stressed vials vs. room temperature vials or between foil packaging that was stressed vs. foil packaging that was kept at room temperature.

Q: The vials were closed?

A: Yes.

Q: Did you have any consideration on the environment that the control sample was in up until the point it reached you? Was there variability during transit? Your assumption was that it had been in a controlled environment up until that time.

A: We don’t have documents on what it was exposed to while in transit.



George Cembrowski, MD, PhD (University of Alberta, Edmonton, Alberta, Canada)

Dr. George Cembrowski discussed blood glucose monitoring in the ICU with a focus on potential sources of error (e.g., hydrogen-peroxide-based cleaning products, extreme hematocrit values) and the perceptions of nurses who actually use the meters. He has found newer meters (anonymized) to be better than his center’s previous meter (LifeScan SureStep) in terms of accuracy (based in part on retrospective comparisons to blood gas analyzer tests taken within an hour of the meter test), robustness, and usability. He concluded that available BG meters are sufficiently accurate in the hyper- and normoglycemic ranges, but that data in hypoglycemia are limited (in part because patients with low hematocrit have artificially inflated glucose measurements). Also, Dr. Cembrowski recommended that attention be paid to other factors, such as the source of blood in the sample and the acuity of patients’ conditions.

Questions and Answers

Dr. Mitch Scott: Do you think that one-hour might be too long to consider measurements paired? Particularly at the extreme values, because the patient may have corrected their glucose level by then.

A: I looked at the data with dependence to time and I am not convinced that this was important.

Q: I work at LifeScan, and I would say that clearly hematocrit-insensitive meters are the way to go. SureStep has been around since the late 80s, and it’s a product that we have end- of-lifed – we supply only the hospitals that want to continue with it. However, I would say that it’s been used in places like Portland, and they’ve had excellent results with it despite its limitations.

A: When we switched to LifeScan across the system, it was great. I have worked closely with many people from your company.


Martha Lyon, PhD (Royal University Hospital, Saskatoon, Saskatchewan, Canada)

Complimenting her presentation at the Third Annual International Hospital Diabetes Meeting, Dr. Martha Lyon drew attention to the shortcomings of blood glucose meters in critical care populations. (For discussion on her International Hospital Diabetes Meeting Presentation, see page nine of our full report: https://closeconcerns.box.com/s/6no7s4mazxvtptfjj7av.) Backtracking momentarily, Dr. Lyon explained that relative glucose molality is translated to molarity according to a 1.11 correction factor; however, the correction factor assumes 1) 43% hematocrit; 2) 93% plasma water fraction; and 3) 71% red blood cell water fraction. She pressed that these values are not necessarily representative of all patient populations. Critical care patients tend to have substantially lower hematocrit and higher plasma water. The red blood cell water fraction assumption tends to be more representative. However, in each category there was marked inter-individual variety. She argued that the differences in individual’s correction factors translate to clinically meaningful differences in glucose measurements. In a study of an ICU patient population in St. Pierre Hospital (Brussels, Belgium), Dr. Lyon found that percentage error in glucose measurement resulting from individual correction factor variations ranged from -4% to +7%. Importantly, this error is on top of error introduced from precision and medication-related inaccuracies. As such, she said there was pressing need to establish a verification system for meters to better address the unique challenges posed by critically ill populations.

Questions and Answers

Comment: You gave a simplified analysis. You are assuming the strip lyses the cells. In reality, it varies meter to meter. Some meters have webs and pull out the cells and others do lyse the cells. It’s a very complex system and you’ve described only one small part of it.

A: I agree, absolutely.


Moderator: Courtney Lias, PhD (FDA, Silver Spring, MD)

Panelists: Richard Louie, PhD (University of California, Davis, Sacramento, CA); George Cembrowski, MD, PhD (University of Alberta, Edmonton, Alberta, Canada); and Martha Lyon (Royal University Hospital, Saskatoon, Saskatchewan, Canada)

Dr. Lias: What do each of you think are the most important considerations to take into account for patient use of meters and for hospital use? And as a follow up, what are your suggestions on ways either in the community or by manufacturers that could address those issues.

Dr. Cembrowski: There is a great paper by a Dutch clinical chemist that said there’s not much difference between the first and second drop of blood. Nurses always say that they are supposed to run the second drop but we often run the first. Which drop of blood gives the best number? There was total resistance to doing this study. We are so locked into doing things according to protocols that we don’t even explore. Maybe if there was a huge study of 3,000 in-hospital patients we could have a better idea of where these faults occur.

Dr. Louie: With respect to environmental stress effects, I think it is important that HCPs and patients be educated about the limitations that are out there and understand the environment in which these devices are being used. If you are operating outside of manufacturer’s specifications, you could introduce error. A short-term solution might be as simple as incorporating a lock out feature such that if devices are exposed to that type of environment, it would prevent the device from being used.

Dr. Lyon: Speaking to the issue of SMBG versus hospital meters, when we think of folks using meters in the community, we think of them as being healthy. But data from our lab pull showed large variances in hematocrit. People in the hospital – they are a sick group, but we can’t assume the people out in the community don’t demonstrate a large distribution in a variety of parameters as well. How can we verify meters in the hospital group? We don’t have verification materials right now. If we could establish a system, that would be the way to go. It would be helpful for manufacturers and in the laboratory.

Dr. Mitch Scott: The hematocrit issue has pretty much been solved by the newer meters. You say we need verifications; I would submit that we already have them. If you use samples from a sufficiently large population of critically ill patients, you can verify the hematocrit insensitivity, even if you don’t know the exact water content.

Dr. Lyon: this works well in large hospital settings, but not necessarily in smaller communities.

Dr. Scott: Yes, but studies that are done and published can be consulted by smaller centers.

Dr. Vigersky: How do these factors affect CGM sensors? We might be combining the errors, because we are calibrating with a biased meter, and then the sensor itself might be biased.

Dr. Louie: I don’t have any data.

Dr. Lias: Dr. Howard Zisser will be giving a talk on this topic later today.

Dr. Brad Karon: I have a question for Dr. Louie. You are looking at stress testing for disasters. For hospital or home use, have you done any “domestic stress-testing” where you might fluctuate within a less-extreme range of temperature and humidity?

Dr. Louie: We have looked mainly at tropical climates, extreme cold, and post-disaster conditions. We plan to look at other conditions.

Dr. Klonoff: Do you see areas where additional or different regulation by the FDA would improve monitors or something by industry to improve product performance in your environments?

Dr. Cembrowski: Manufacturers have so much data available and supposedly, CLSI has guidelines where according to rates of breakage your quality control (QC) testing can be adjusted up or down. I cannot alter my QC at all.

Dr. Lias: It would be an uphill battle convincing CMS that you don’t have to do QC testing. In our opinion, QC may be one mechanism to detect problems with devices that might occur.

Dr. Barry Ginsberg: I would like to repeat my call for full disclosure to the users involved. One system that I worked with in the past – if you kept it at 42 degrees Celsius for a month, it would increase the glucose level by 300. That is a tremendous failure. For a car, you would expect to know the horsepower, along with a lot of other data. Patients have a right to know how these systems can fail, and yet for meters and strips we have almost no data. I would like labeling that shows inaccuracy at multiple ISO levels: 15%, 20%, etc.

Dr. Lias: That wouldn’t necessarily get to the point of environmental contributions to error.

Dr. Ginsberg: I have had a patient say that they are taking their strips on a trip with them and that it will be fine, because the label says that it’s good up to 38 degrees Celsius. If I knew that keeping them at 40 degrees Celsius might cause an error of 300, I would handle the strip differently.

Dr. Lias: One message that I am hearing today is awareness. If laboratorians have to take the package insert with a grain of salt, that will contribute to risk mitigation. If patients know how their strips will be affected by storage in a glove compartment for a month, that will contribute to risk mitigation. Do you think that nurses, and hospital workers, and patients at home are aware?

Dr. Cembrowski: Often in the evening there is one nurse who runs all of the controls on all of the systems, thereby thwarting what we’re trying to control, but that’s a different problem. Barry, three years ago you said we needed a strip with a built in control.

Dr. Ginsberg: One of my companies is coming up with that.

Dr. Lyon: I think there are limitations with a variety of devices – glucose meters and other devices to measure glucose. I’m talking about blood gas analyzers. There is the perception that the glucose meter is problematic but blood gas analyzers are considered a gold standard, when in fact the blood gas analyzer is not the gold standard. It is problematic as well, but our clinicians don’t understand that.

Dr. Robert Vigersky: Many of you know that I am at Walter Reed. Many of our soldiers, and non-military contractors, get deployed to environments that are hostile in many ways. Iraq and Afghanistan are very hostile in terms of temperature. If patients need to test blood sugars, we provide paperwork saying that they should not be deployed in these zones. However, their commanders often overrule this. We have no data to say how inaccurate the devices become in this environment; this is to ask for such data.

Q: If a manufacturer in a country outside the US sends a product to the US and it sits in a warehouse in Miami for three months, then goes to a patient and sits on the doorstep, how do we protect against that? Education is important, but we already know it.

Dr. Lias: Even foreign manufacturers are required by law to meet manufacturing practices and need to distribute products under test conditions by which they are evaluated. Like you said, someone could leave a shipment on the doorstep though. Is the patient aware that that is not a good thing? It happens, but I think your question is, how do we increase patients’ awareness.

Comment: Thank you for that clarification. I wasn’t sure how we maintain the integrity of that labeling.

Dr. Scott: Richard, your studies were performed with quality control material for the most part, yes? I think that the answers are to instruct your patients to run quality control (QC) – clearly, they pick up a lot of these environmental things.

Dr. Ginsberg: We do run QC and recommend that patients do so; still, they often don’t. Dr. Cembrowski: Wouldn’t you rather work with some nice whole blood, warm blood? Dr. Lias: In general or in glucose meters?

[Audience laughter.]

Dr. Louie: We’ve seen trends with aqueous control material where it can be replicated with whole blood samples.

Gary Puckrein (National Minority Quality Forum): I am wondering more broadly about inter-patient variability. Forty percent of people with diabetes in the US are minorities – mainly African American or Hispanic. If you have sickle-cell trait, there is variability. Have the meters been standardized for people of different races?

Dr. Cembrowski: African Americans have much lower hemoglobins than Mexican-Americans or non- Hispanic whites. Maybe this issue has not been looked at sufficiently. However, Dr. Lyon has helped to produce hemograms that show insensitivity of different to low hematocrit. Another issue is that most people who come from Africa have very low neutrophils; for them, the printed ranges are artificially high. But there is controversy around how to record race in hospital records.

Mr. Puckrein: The American Hospital Association has been wrestling with this issue of patient identity. The National Institutes of Health is doing work on blood glucose measurements in African Americans. I think that the ADA’s new guidelines have suggested not using A1c in African Americans. The nature of the population is changing, and these issues are broad and will become bigger over time.

Session 4: What Could be the Consequence of Poor BG Monitor Performance?


Olga Claudio, PhD (FDA, Silver Spring, MD)

Dr. Olga Claudio gave an overview of the FDA’s adverse event reporting system for medical devices and gave examples of how the system has been applied to blood glucose meters. By federal regulation (21 CFR Part 803), device user facilities such as labs and nursing homes must report serious adverse events that a medical device may have caused or contributed to. From here, manufacturers assess the risk level, decide whether to file a medical device report (MDR) to the FDA, and how to categorize any events they report. Dr. Claudio said that over-the-counter glucose meters account for more MDRs than any other device regulated by the FDA’s CDRH: more than 32,000 MDRs per year. (During Q&A, Dr. Claudio noted that most of these MDRs are filed by brand-name manufacturers.) Sometimes MDRs lead to device recalls, which are carried out by manufacturers but classified in terms of their severity by the FDA (with class I most severe and class III least severe). Dr. Claudio observed that glucose products have a relatively high rate of class I recalls due to the potential consequences of meter error; for example, she mentioned the recent class I recalls of J&J’s OneTouch Verio IQ and Abbott’s FreeStyle InsuLinx.

Questions and Answers

Chris Parkin (President, CG Parkin Communications, Inc.): Am I correct in assuming that most of your MDRs are from the four branded meters in the US?

Dr. Claudio: Yes.

Mr. Parkin: Are we seeing an increase in the number that comes from unbranded meters?

Dr. Lias: We see fewer from these meters.

Dr. Claudio: Over the years we haven’t seen a change in the relative rate of MDRs of unbranded meters.

Mr. Parkin: You mentioned the recent recalls of the OneTouch Verio and FreeStyle InsuLinx. Have there been recalls of unbranded meters?

Dr. Lias: Yes.



Lutz Heinemann, PhD (Science & Co., Dusseldorf, Germany)

After reviewing the current CE marking system in the EU, the esteemed Dr. Lutz Heinemann shared his view for an improved process in diabetes device regulation. He proposed that an independent research institute be tasked with supporting notified bodies (NBs) with expert knowledge and specialized evaluation in specific areas (e.g., diabetes devices). For background, NBs are responsible for assessing device safety and performance, but they do not evaluate effectiveness. How serious can the quality evaluation of medical devices be, asked Dr. Heinemann, when NBs are tasked with evaluating all types of technical devices? Quoting Trevor Jackson (British Medical Journal 2012), he quipped, “medical devices...need only a simple quality certificate (CE Mark) to gain access to the market, putting them on the same footing as domestic appliances such as toasters.”

  • Dr. Heinemann pressed that the CE Mark process has important limitations. The CE Mark has been around since 1993 and is required for all medical devices in Europe before they can be marketed. Notified bodies (NBs) are accredited by a member country to assess device safety and performance, but they do not evaluate device effectiveness (a process that would require more clinical data). Further, manufacturers conduct their own data collection and submission, rather than an independent testing authority.
  • Clinical evaluations of blood glucose systems demonstrate that CE Mark does not guarantee that a system will meet the current ISO accuracy standards. Moreover, ~50% of currently available meters will not meet the new proposed ISO requirements. Dr. Heinemann wondered whether such meters should be pulled from the market.




Number of Systems Evaluated

Number Fulfilling ISO Standards

Freckmann et al.

Diabetes Technol Ther




Chih-Yi Kuo et al.

Diabetes Technol Ther




Pfützner et al.






Baumstark et al.


J Diabetes Sci Technol



5 systems,

4 lots each


16 lots


Brazg et al.


J Diabetes Sci Technol



7 systems,

≥3 lots each

3 systems,

20 lots

* All systems were Group A (higher cost category)

  • Dr. Heinemann next reviewed the EU commission’s recently proposed changes to  the CE Mark process. The new guidelines will start at the end of 2013 and be fully  implemented by the end of 2014. Importantly, the regulations do not require individual countries to pass legislation, which Dr. Heinemann believes will avoid problems in the interpretation and application of the rules. The commission’s proposal, said Dr. Heinemann, is a clear statement that the CE Mark process works in principle, but has issues in its implementation at the country level. Dr. Heinemann discussed the changes put forth by the commission in detail during his EASD presentation – see page 29 of our EASD Regulations and Reimbursement report: https://closeconcerns.box.com/s/vs5cx4hcoxmddpmmvgn2.

  • He called for an independent research institute to support NBs in the CE Mark process. To this end, Drs. Heinemann, Freckmann, and Koschinsky just published an article in Journal of Diabetes Science and Technology arguing for the establishment of a European Institute for Technology Evaluation and Quality Control (EITEQC). The institution would perform more rigorous quality evaluation of diabetes devices both pre- and post-approval. He believes that manufacturers and insurance companies could provide the financial support for EITEQC – SMBG represents a two to three billion euro market in Europe, said Dr. Heinemann.

  • Dr. Heinemann argued for a greater role of EASD in the CE Mark process, suggesting that EASD and a specialized institution like EITEQC could work together to support NBs. EASD could help establish standards and substantiate a new process with a “quality seal” for approved diabetes devices. He looked forward to an upcoming symposium at EASD 2013 on device regulation and drew attention to EASD’s recent position statement that called for the overhaul of the CE Mark process. For discussion on the statement, see our March 15 Closer Look email: https://closeconcerns.box.com/s/af8pcu797hsw3qduw0ti.

  • His talk underscored the importance of collaboration. Said Dr. Heinemann, “constructive collaboration of all parties interested is needed in order to make sure that patients with diabetes have effective and safe devices.”

Questions and Answers

Q: If you keep notified bodies (NBs) and now suggest that there be an institute to evaluate glucose meters, why can’t you just put standards to NBs instead of having an independent group?

A: Not all NBs are allowed to approve blood glucose meters, only a subset. Who is performing these evaluations studies? NBs are supposed to evaluate data presented by the manufacturer. The performance of studies has some differences – how the study is performed and how is the data analyzed and so on. The question is, should there be additional evaluation after market approval. Should companies approach the institute and show the new meter and then the institute will perform the evaluation so the NB can be sure there is sufficient quality?



Howard Zisser, MD (Sansum Diabetes Research Institute, Santa Barbara, CA)

Dr. Howard Zisser summarized his talk in three simple takeaways about calibration of continuous glucose monitors (CGMs). First, he said that the more accurate the meter used for calibration, the better. He added that CGM calibration algorithms have many moving parts that are invisible to end users, making the exact effects of BGM accuracy difficult to quantify. Finally, he encouraged the development  of ‘smart’ CGM algorithms to help minimize some of the errors inherent in calibration (e.g., the separate dynamics of blood and tissue glucose levels, glycemic fluctuations, meter inaccuracy). Such algorithms might prompt the user to check their blood glucose level at a particular time, for instance.

  • Continuous glucose monitors measure interstitial glucose levels, but they are calibrated with blood glucose meters, are designed to provide an estimate of blood glucose, and are typically evaluated according to a blood glucose measurement other than the meter (e.g., YSI). However, tissue glucose measurements tend to lag behind blood glucose, and glucose is constantly fluctuating in both compartments. Thus Dr. Zisser quipped that continuous glucose monitoring is like trying to hit a moving target (fluctuating glucose levels) with a moving gun (a calibration method of varying accuracy).

  • Borrowing from Lee Dubois’ The Art of Control (lifeafterdx.blogspot.com), Dr. Zisser shared two general principles on interpreting continuous glucose monitoring data. Mr. Dubois’s first principle is to “watch the flow of the water, not the stones in the stream” – i.e., to focus more on the glycemic trends rather than the specific numbers. Dr. Zisser endorsed this concept and agreed with Mr. Dubois that neither the meter reading nor the CGM should be interpreted a “true” glucose measurement. However, he does not fully agree with Mr. Dubois’ idea that “only changes in glucose matter,” given that extreme hypoglycemia can have adverse consequences.

  • Dr. Zisser said that calibrations for some CGMs can be much more accurate when glucose levels are stable than when they are changing, as suggested by a recent paper about the Medtronic Guardian CGM (Zueger et al., Diab Technol Ther 2012) and an unpublished custom simulation designed by Bill Van Antwerp. However, as he suggested and as emphasized during Q&A, some CGMs have algorithms “under the hood” that enable calibrations to be accurate even while glucose levels are changing.

Questions and Answers

Dr. David Klonoff: Dr. Karon, didn’t you find something similar about the importance of directionality of blood glucose monitor bias?

Dr. Brad Karon: Not exactly – the direction of bias didn’t matter in terms of analytical accuracy, but underestimation of glucose was safer from a clinical perspective, since it was less likely to lead to insulin overdose.

Dr. Klonoff: Dr. Zisser, to what extent do you think that better calibration will improve the performance of the artificial pancreas?

A: There are many sources of error, including exercise and meals. I think that we will also have a   challenge turning the system on if the system doesn’t know what the patient’s already been up to that day. We do much better with a more accurate sensor. That said, when there is a problem in the sensor reading, usually they read low – e.g., a compression artifact overnight. We used to see hyperglycemic readings that lasted a long time after meals and could lead to excessive insulin doses. However, the Dexcom G4 performs better in this regard, and we also have implemented insulin-on-board constraints in our closed- loop algorithm.

Dr. David Price (Dexcom, San Diego, CA): One problem with the simulation you showed is that it assumes the BG value used for calibration at rapid rate of change was not adjusted by an algorithm in the CGM. If you have a smart algorithm, this may be accounted for in algorithms.

A: Yes, there are things under the hood of CGMs that we don’t have access to – this was just a thought experiment.



The expected panel discussion took the form of individual presentations by each panelist followed by brief Q&A.

Brad Karon, MD, PhD (Mayo Clinic, Rochester, MN)

Dr. Brad Karon discussed blood glucose accuracy requirements in the hospital environment based on simulation modeling. First, he discussed his team’s published work on tight glycemic control modeling (Karon et al., Clinical Chemistry 2010). Dr. Karon used 29,920 glucose values from patients at Mayo  Clinic under tight glycemic control to categorize glucose values according to an insulin dosing category.   He then applied two different error simulation models (one based on a distribution of bias and precision and one based on a Gaussian distribution) to determine how meter inaccuracy could affect dosing decisions. Total allowable error (TEa) was set at 10%, 15%, and 20%. Dr. Karon and colleagues calculated the percentage of simulated values that differed by one, two, and ≥three dosing categories. In both   models, one-category errors were unavoidable, 20% TEa was the only criteria where error ≥three- categories really occurred, and 10% TEa tended to eliminate two-category errors. Next, Dr. Karon discussed an unpublished study, which applied similar modeling techniques to moderate glycemic control (MGC) protocols (target range 110-150 mg/dl), which were implemented at Mayo Clinic in 2010. He    found similar rates of insulin dosing errors as predicted under TGC. Looking back at the rates of hypoglycemia during Mayo Clinic’s TGC and MGC periods, unsurprisingly there were higher rates of  severe and moderate hypoglycemia during TGC than MGC. He concluded that the impact of large dosing errors (≥three category error) depends on the hospital protocol, but that these errors only occur when TEa is ≥20%.

  • In tight glycemic control (TGC) modeling simulations, three-category errors occurred mainly with meter total allowable error (TEa) of 20%. One-category insulin errors were unavoidable and two-category errors decreased as TEa decreased. In the table below, the percent of simulated glucose values that fell within each error classification category are listed. The data is segmented according to TEa and the type of error simulation model used, either 1) a model where bias and precision were considered separately (Bias/CV) or 2) a model based on Gaussian  distribution.



Error Condition

Percent of Simulated Glucose Values

10% TEa

15% TEa

20% TEa







1 category

Up to 60%


Up to 80%


Up to 90%


2 category



Up to 5%


Up to 20%


≥ 3 category







  • Mayo Clinic adopted new moderate glycemic control (MGC) protocols in 2010. Glycemic target range was 110-150 mg/dl and little or no insulin was given if glucose was below 110 mg/dl.

  • MGC modeling resulted in similar findings to Dr. Karon’s TGC study. One category insulin errors were unavoidable, two category errors decreased as total allowable error (TEa) decreased, and ≥ three category errors occurred with 20% TEa.



Error Condition

Percent of Simulated Glucose Values

10% TEa

15% TEa

20% TEa







1 category

Up to 60%


Up to 80%


Up to 90%


2 category



Up to 5%


Up to 20%


≥ 3 category







  • During TGC, severe hypoglycemia rates (<40 mg/dl) were 1.7% compared to 0.26% over the study period for MGC. Moderate hypoglycemia rates (40-60 mg/dl) were 7.7% compared to 2.2% over the study period for MGC. Dr. Karon said that this change was observed without any change to the meters used, personnel, or percent dosing errors.


Boris Kovatchev, PhD (University of Virginia, Charlottesville, VA)

Using the 100 in silico type 1 diabetes “patients” from the FDA-accepted UVa/Padova metabolic simulator, Dr. Boris Kovatchev estimated how BG monitoring errors might affect the incidence and detection of hypoglycemia. In one experiment, Dr. Kovatchev and colleagues simulated a range of meter errors between 0% and 20% total error in a normal distribution (roughly corresponding to MARDs of 0% to 8%), and in another bimodal error of 0% to 20% was simulated to emphasize large meter deviations. In the first simulation, the researchers showed that less-accurate meters were less likely to detect hypoglycemia below 70 mg/dl. He explained that at a reference value of 60 mg/dl, a meter with 20% total error would give a reading over 70 mg/dl one-tenth of the time (Breton & Kovatchev, J Diabetes Sci Technol 2010). A subsequent simulation was conducted to predict how BGM errors would affect the likelihood of going hypoglycemic (as represented by the lower 95% bound of glucose  excursions from 45,000 separate simulated meals). Dr. Kovatchev showed that if the meter was perfect, roughly 8% of simulated patients went below 70 mg/dl, but if the meter had a total error of 20%, then 15% of simulated patients went below 70 mg/dl – i.e., hypoglycemia became roughly twice as prevalent.

  • Dr. Kovatchev described his recent set of in silico experiments to determine how meter inaccuracy would affect patients’ likelihood of going hypoglycemic. The basic design of the experiment was that an in silico patient would eat three meals a day, using their meter to calculate mealtime insulin dosage and any necessary correction doses two hours after the meal. (The glucose meter was also used whenever the simulated reference measurement sank below 70 mg/dl; if the meter confirmed hypoglycemia, then 16 g carbohydrates were taken.) The meals were given at slightly varying times each day and included slightly varying carbohydrate content (roughly 0.55 CHO/kg body weight at breakfast, up to roughly 1.3 CHO/kg body weight at dinner). Altogether, each simulated patient was evaluated on 30 days. Finally, this entire set of experiments was repeated with five different simulated meters that had errors of 0%, 5%, 10%, 15%, and 20%. The total number of meals was therefore 45,000 (100 simulated patients x 3 meals per day x 30 days x 5 levels of meter accuracy).

    • Dr. Kovatchev noted that the simulation assumed that carbohydrate  counting was performed perfectly – highly unrealistic in the real world, though he suggested during Q&A that future experiments could be designed to incorporate errors in carbohydrate  counting.

  • As a proxy for how often hypoglycemia would occur with different meters, Dr. Kovatchev looked at the lower bound of the lower confidence interval for each post- meal glycemic excursion. With a meter that had perfect accuracy, this value was below 70 mg/dl for 8% of simulated patients and below 50 mg/dl for <2% of simulated patients. However, for the meter with 20% total error, excursions below 70 mg/dl occurred in 15% of patients, and excursions below 50 mg/dl occurred in >5% of patients. In other words, hypoglycemia was roughly twice as prevalent when the meter had a MARD of 8%, as compared to the perfectly accurate meter.

Questions and Answers

Dr. David Klonoff: Dr. Kovatchev, what is next?

Dr. Kovatchev: There are several experiments to be completed along the same lines, to look at other functions of meter errors. Then the intent is to wrap this up and publish it.

Dr. Barry Ginsberg: In your simulation, you are assuming perfect knowledge of carbohydrate and reliable absorption of insulin. In real life, the errors in both of these is 20-30%.

Dr. Kovatchev: The errors in insulin absorption are already built into the model. We can simulate errors in carb counts if we choose, though we didn’t do it in this study. If we introduce errors in carb counts, obviously the performance will get worse.


Terry Lumber, MSN, RN, CNS, CDE, BC-ADM, FAADE (AADE, Fairfax, VA)

Terry Lumber delivered a clinically driven presentation structured around AADE’s seven self-care behaviors to improve diabetes management. She described how blood glucose meter accuracy can affect patient’s ability to appropriately engage in each behavior: healthy eating, healthy coping, physical activity, monitoring, taking medication, problem solving, and reducing risk. Of particular relevance is the impact  of erroneous glucose readings on patients’ ability to take medication. She gave the example of how a 20% error in blood glucose value can lead to substantial differences in calculated insulin dose. Further, she put forth that patients may be discouraged from testing in the first place when meter inaccuracy of up to 20% is acceptable. Her discussion underscored the clinical relevance of meter accuracy both with respect to immediate treatment decisions and longer-term behaviors.

Questions and Answers:

Dr. David Klonoff: Based on what we heard from Brad’s presentation, how concerned about these step errors are you?

Ms. Lumber: Anyone working in diabetes would be concerned about two and three step errors.

Dr. Brad Karon: For many years we have agreed to evaluate any meter that an endocrinologist requests. We have all this beautiful data on hospital use of meters so they can recommend it to patients.

Dr. Zisser: How many do you check each year?

Dr. Karon: Three to four meters per year – we use one meter; it’s a quick and dirty test.


Personal Experience with a Medicare-Provided Glucose Meter

Dr. David Klonoff played an audio recording of a Diabetes Educator, who was set to appear on the panel, but was unable to be at the meeting in person. She described the experience of Bob (a person with  diabetes whom she lives with), who got a new meter from Medicare. She said that the meter gave “the  most obnoxiously high glucose readings that we had never seen before” and that it was much less accurate than Bob’s old “wonderful” FreeStyle Lite. After the recording was played, Terry Lumber commented that the experience was “not unusual” to her, and that it exemplified concerns with competitive bidding.


Public Comments


Larry Ellingson (President, Global Diabetes Consulting)

“The type 2 diabetes population is going to explode,” warned Larry Ellingson. Thus, he said, “we need to make sure that they are using the appropriate meters that are accurate and reflect the state of our technology.” He encouraged FDA to look at different strip lots prior to approval, and/or to conduct post- marketing surveillance. Such post-marketing surveillance ought to be informed by continued vigilance around adverse events, he said; he also suggested that new guidelines for this purpose be developed in conjunction with industry and the Diabetes Technology Society. Mr. Ellingson also noted that after the meeting, he would give the DTS a set of resolutions written by the National Diabetes Volunteer Leadership Council (a group of past ADA officers).



Rolf Hinzmann, MD, PhD (Head of Global Medical and Scientific Affairs, Roche Diabetes Care)

“We need the help of editors and pier reviewers,” said Dr. Rolf Hinzmann. There are poorly performing blood glucose meters on the markets and meter evaluations, as demonstrated in this meeting, are often of poor quality. Dr. Hinzmann pressed that journals should impose checklists, similar to that proposed by Dr. Gary Thorpe (Diabetes Technol Ther 2013), which require meter evaluations to meet ISO and CLSI guidelines in order to be published. He realized that such a requirement would not improve poor quality meter performance, but held that it would make literature a more reliable source of information for decision makers.



 Yaron Keidar (VP of Glucose Monitoring, Edward Lifesciences) 

Yaron Keidar began by noting that insulin is an important but highly risky drug in the ICU. Thus he proposed that if continuous glucose monitoring systems are to be a first-line monitoring system in the ICU, those CGMs can and should perform comparably to the widely used hospital standard, blood gas analyzers (which have a total error of ~7-8% and MARD of ~3%). Mr. Keidar said that in a small recent study in Europe, Edwards/Dexcom’s prototype inpatient CGM device posted a MARD of roughly 5% – a remarkable performance if it can be maintained in larger studies and clinical use.



Chris Parkin (President, CG Parkin Communications, Inc.)

“We have the development of a perfect storm,” said Chris Parkin. “Meters of questionable accuracy are being marketed towards the most vulnerable patients.” He explained that meters of low accuracy are often distributed through mail order, which targets Medicare patients. Further, these patients often don’t receive face-to-face instruction on how to use the meter or how to respond to glucose values. Mr. Parkin believes that the impact of this could be overwhelming if a patient makes a harmful treatment decision based on an erroneous glucose reading. He urged the people in the room to act by 1) tightening accuracy requirements and 2) ensuring that adequate instruction on blood glucose meter use is  provided to all patients.



David Simmons, MD (CMO, Bayer Diabetes Care)

Speaking specifically about outpatient blood glucose monitoring, Dr. David Simmons said that he generally agreed with Chris Parkin’s comments but does not think that analytic standards need to be tightened beyond the latest CLSI and ISO standards. Rather, he believes that the major issue is the discrepancy between pre-market studies and post-market performance. Some manufacturers commit time and resources to quality assurance, he said, but others may not. He concluded that these issues will be exacerbated when CMS introduces a competitive bidding process for which any FDA-approved   device is eligible, regardless of its exact performance specifications: “We are about to see a dramatic increase in the probability that our patients will be provided with meters that do not perform at the level where we would like them to perform.”



Jared Watkin (Divisional Vice President, R&D, Abbott)

Jared Watkin echoed comments by Dr. David Simmons (but in a slightly different accent, he joked). He shares concerns about market accuracy and about newly approved systems tailing off in performance overtime due to the cost of sustained quality assurance programs. Without independent evaluations, said Mr. Watkin, “glucose monitoring as an industry is in a fairly difficult place.”



-- by Joseph Shivers, Kira Maker, and Kelly Close




Editor’s note: This piece was updated on June 3 to correct for our mistake transcribing the Group Discussion on page 11; we now correctly quote Dr. Mitch Scott’s clarifying point that “there are only two reference methods in the world recognized by JCTLM.” Previously, we said, “recognized by ISO.” We apologize for our error.