Lacey Harbour, regulatory and validation specialist for Ken Block Consulting evaluates the explainability for Artificial Intelligence (AI) wearables and healthcare devices in the United States (US).
I walk into Andersen Bakery within Ueno Station in Tokyo, select a few pieces of bread, and hand the delectable treats to the young lady behind the check-out station. Each piece of bread’s outline is highlighted and labeled on the screen so that I, the end user, can confirm that the system is correct. Upon very quick confirmation, I use the automatic payment system to pay out. The whole process took less than 3 minutes from bread selection to leaving the very busy bakery and with this naturally explainable AI, I felt confident that I paid the correct amount for my purchase.
With AI being integrated into medical devices, industry 4.0, and quality management systems, developers need to understand how much and what kind of explanation is necessary for a system’s intended use, as well as the associated risk involved. To explore this, let’s look at some recent devices that have received Food and Drug Administration (FDA) marketing authorisation.
Within the US, the diabetic retinopathy detection system, IDx-DR, was granted a de novo decision under the product code PIB on April 11th, 2018 (DEN180001). Upon this decision date, the IDx-Dr became the US’ first legally marketed, fully autonomous, deep learning medical device that provides screening decisions without a clinician. This device was deemed to be reliable due to its explainability, or explicability as the British Standards Institution (BSI) /Association for the Advancement of Medical Instrumentation (AAMI) standard development group calls it. Like the Andersen Bakery system, this system utilises images to observe markers (or biomarkers). However, it does not directly report the observed biomarkers to the end users, only the screening decision. This system’s explainability was considered enough for FDA to grant the de novo.
On March 8th, 2018, Medtronic’s Guardian Connect Continuous Glucose Monitoring (CGM) System (approval letter: P160007) was approved with the indication of continuously monitoring glucose levels within interstitial fluid for patients with diabetes mellitus. Glucose concentration data that was detected by the wearable sensor is tracked in real-time through the Guardian Connect app. Using the accumulated data, the system is designed to alert the user of potential glucose excursions up to an hour in advance of the predicated event. A second app called Sugar.IQ (using IBM Watson) can be used to provide personalised recommendations about diet and exercise based on sensor and other collected data. The Guardian system was only approved to be a predictive indicator and event alert for when end users must test their glucose levels with a cleared blood glucose monitor. Explainability of the AI used in both the Guardian and IDX-DR systems was demonstrated through traditional clinical evaluations.
As end users, we expect the burden of explainability to be addressed by regulatory agencies like the FDA. Recently AlivCor, the maker of the Kardia Band System ECG (K171816) gained clearance for the KardiaAI system on March 11th, 2019 (K181823). Though the AlivCor website had no information on the new product at the time of my search, the product was cleared with product codes DQK (programmable diagnostic computer) and DPS (electrocardiogram) with the primary classification under ‘Computer, Diagnostic, Programmable’. Product code DQK has been used for products like Abbott’s EnSite Velocity Cardiac Mapping System (K182644 in 2018) that uses intelligent automation based on live 3D images collected during electrophysiology studies.
DQK has also been used for products like the VivoMetrics LifeShirt Real-Time (K043604 in 2005) which has the intended use of recording physiological data for later analysis by a physician. As there is no specific product code that identifies the system as using predictive algorithms, are these existing product codes enough to describe what is really going on with the device? In addition, if one were to trace predicate ‘K numbers’ for most devices that have several iterations, one would reasonably find simpler indications as the predicates lead back in time. For example, legacy Abbott EnSite system with K071818 in 2007 had the indication to create colour-coded isopotential maps for the physician’s use and, the even older EnSite 3000 system, K983456 in 1999, has an indication for storage and display of intracardiac electrograms only.
Is it appropriate to group AI or continuous learning algorithms with those product codes that do not have the history to support AI safety and efficacy claims? Do FDA reviewers across the board have the training to recognise AI devices, understand the limits of explainability, and determine the appropriateness of the explainability for the intended use?
The Defense Advanced Research Projects Agency (DARPA) states that explainable AI will be essential if users are to understand, appropriately trust, and effectively manage this incoming generation of artificially intelligent partners. This statement is equally true for intelligent or augmented algorithms used in devices.
However, is the current method of device evaluation providing enough confidence? Maybe. For example, the IDx-DR had been validated against the Fundus Photography Reading Centre (FPRC) reference standard with a sensitivity of 87% and specificity of 91% for the locked algorithm. Therefore, despite biomarkers not being fully explained to the end user, that end user nurse in the doctor’s office would not require or desire that level of explanation. As the device is indicated for use only with patients already diagnosed with diabetes, the patient risk for the explainability of this device could be considered mild.
The Guardian Connect CGM has a similar situation in that the device is not for making therapy adjustments but alerts the user to check their blood glucose levels; therefore, the level of required explainability would be lower even through the risk is still high enough to merit the premarket approval pathway.
To address this non-linear relationship between risk and the need for explainability, the BSI/AAMI AI standard development working group will undergo an exercise to address risk management strategies for AI Software as a Medical Device (SaMD) and mapping of essential principles for safety and performance. Other working groups, like Xavier AI Summit are exploring what explainability is and how explainability is related to confidence and adoptability. Finally, stakeholders are watching to see if the FDA’s pre-cert program will help improve explainability and confidence for AI in devices.