Date: February 27, 2026
Re: Docket No. FDA-2025-P-5560 for “Medical Devices; Exemption from Premarket Notification: Radiology Computer-Aided Detection and/or Diagnosis Devices and Computer-Aided Triage and Notification Devices.”
The National Center for Health Research (NCHR) appreciates the opportunity to comment on the petition proposing exemption from 510(k) premarket notification requirements for certain radiology computer-aided detection (CADe), diagnosis (CADx), and triage/notification devices.
NCHR is a nonprofit, nonpartisan think tank that evaluates medical and consumer products and health policies to ensure they are supported by strong scientific evidence and benefit public health. For the last 27 years, our experts have analyzed clinical trial data, FDA regulatory decisions, post-market safety
concerns, and real-world evidence related to drugs, medical devices, and digital health technologies. NCHR frequently submits comments to the FDA, testifies before federal advisory committees, and publishes evidence-based analyses on the safety and effectiveness of medical products.
Our expertise in regulatory science, clinical evidence evaluation, and patient-centered health policy uniquely positions us to assess the potential risks and benefits of modifying premarket oversight for the radiology devices that are included in Docket No. FDA-2025-P-5560.
We note that the Docket refers broadly to computer-aided devices. While some such devices incorporate artificial intelligence (AI) or machine learning (ML), many rely on rule-based algorithms, predefined statistical models, image-processing heuristics, or fixed signal-detection thresholds. All of these fall
within the scope of “computer-aided” radiology tools and can materially influence diagnostic interpretation and triage decisions.
NCHR supports responsible innovation in machine learning (ML) and/or artificial intelligence (AI) and software-based medical technologies. However, given FDA’s long-standing staffing shortages, we are very concerned that FDA efforts at regulatory modernization will weaken the evidentiary safeguards that
protect patients. Based on current peer-reviewed evidence and established regulatory science principles, we agree with the many experts who believe that patients would be harmed by FDA’s broad exemptions from premarket review for any types of AI or computer-aided devices that directly influence diagnostic
interpretation or clinical triage regarding serious diseases.
Key Clinical and Regulatory Concerns
1. Real-World Generalizability Remains Insufficiently Demonstrated
Recent empirical evidence raises serious concerns about the generalizability of FDA-authorized software-based medical devices. A 2025 JAMA Network Open analysis found that many AI devices cleared by the FDA lack robust external validation across diverse clinical environments and populations.1 Clinical AI
performance is highly sensitive to dataset shift changes in imaging equipment, acquisition protocols, patient demographics, and disease prevalence. A 2024 NEJM perspective emphasized that dataset shiftremains a central risk to safe deployment of clinical algorithms.2 Likewise, research in Nature Medicine
demonstrates that even models optimized for fairness may fail to maintain equitable performance under real-world distribution shifts.3
Although the 2025 JAMA Network Open analysis focused on artificial intelligence (AI)-enabled systems, generalizability concerns are not unique to AI. Traditional computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems, such as mammography CADe tools that highlight suspicious
calcifications and masses, or CADx systems that estimate the likelihood of malignancy based on predefined image features, have long been used to assist radiologists. As described in the clinical literature, CADx systems typically rely on engineered image features and fixed statistical decision rules
rather than adaptive learning models.4 These tools are developed and validated under specific imaging conditions and patient populations.
The 2023 Nature Digital Medicine review on translational and implementation challenges in clinical decision-support systems highlights that performance often shifts when tools are deployed outside their original development settings, due to differences in imaging equipment, acquisition protocols, calibration
parameters, disease prevalence, and workflow integration.5 Variability in scanner type, reconstruction settings, and case mix can materially influence performance, even when the underlying algorithm is fixed and not self-updating. Thus, concerns about site-dependent performance and limited external validation apply broadly to computer-aided radiology tools, not only to AI-enabled systems.
This is particularly concerning for radiology tools that may be deployed nationally across heterogeneous health systems. Premarket notification currently provides a structured opportunity for the FDA to assess whether performance claims are adequately supported across intended use environments. This applies to computer-aided or ML/AI-aided devices. Exempting any of those devices may remove a critical safeguard before generalizability concerns are sufficiently resolved.
2. Diagnostic Accuracy Does Not Equal Clinical Benefit
Most published AI imaging studies rely on retrospective diagnostic accuracy metrics such as area under the curve (AUC), sensitivity, and specificity; however, these measures reflect performance in controlled datasets and do not demonstrate real-world clinical effectiveness, impact on patient outcomes, workflow
integration, or safety under changing practice conditions. Importantly, improvements in statistical discrimination do not necessarily translate into improved patient outcomes, reduced morbidity, or better quality of care.
The 2025 JAMA AI Summit Report stresses that regulatory oversight should prioritize patient-centered outcomes and meaningful clinical impact, not surrogate performance metrics alone.6 Similarly, a 2024 scoping review in The Lancet Digital Health found that randomized controlled trials evaluating AI tools remain limited, and evidence of improved patient-level outcomes is still emerging.7
This limitation is not unique to ML and AI imaging tools. Traditional computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems have similarly shown improved detection metrics without corresponding evidence of meaningful clinical benefit in routine practice. For example, a large analysis of
digital screening mammography interpreted with and without traditional CADe found no improvement in sensitivity, specificity, or overall cancer detection rate and even showed reduced sensitivity in some subsets of radiologists using CADe in community practice.8,9 Likewise, a commentary published in 2025
in the New England Journal of Medicine noted that the addition of new analytic tools to standard mammography has not been shown to yield clinically meaningful improvements, reinforcing that enhanced performance metrics alone are not sufficient evidence of patient benefit. 9
For tools that influence high-stakes decisions such as stroke triage, cancer detection, or emergent notification, outcome evidence is especially important. Broad exemptions of any specific type of device risks accelerating the deployment of tools without adequate prospective validation demonstrating patient benefit.
3. Differences in Performance Across Patient Groups
Computer-aided devices with and without ML/AI should work equally well for all patients. However, research published in Nature Medicine shows that even when developers try to reduce bias during model training, differences in performance across patient groups can reappear when the tool is used in real-world
settings, especially when patient populations or clinical conditions change.3
Performance differences across patient groups have been documented in computer-aided devices well before the introduction of AI-enabled tools. For example, even standard screening mammography (without AI) performs differently across subgroups: the US Preventive Services Task Force (USPSTF)
notes that dense breasts are associated with reduced sensitivity and specificity of mammography, underscoring that baseline imaging performance varies by patient characteristics and can propagate to any computer-aided tool layered on top of that workflow.10
NCHR is particularly concerned that reduced premarket scrutiny could inadvertently exacerbate disparities in diagnostic accuracy and triage prioritization.
4. Performance Drift and Lifecycle Oversight Challenges
Computer-aided radiology devices, including both artificial intelligence (AI)-enabled systems and traditional rule-based software, may experience changes in clinical performance over time, even when the underlying code remains unchanged. Software-based medical devices operate within evolving clinical
environments. Changes in imaging hardware, acquisition protocols, calibration settings, patient demographics, disease prevalence, and workflow integration can materially affect diagnostic accuracy or triage performance.
Research published in Nature Communications shows that it can be difficult to detect when performance starts to shift.11 Even if the algorithm itself is not updated, its accuracy can decline over time as patient populations change, diseases evolve, imaging machines are upgraded, or clinical practices shift.
FDA’s Total Product Lifecycle (TPLC) framework applies broadly to medical devices, including traditional computer-aided devices and ML/AI-enabled systems. The agency’s AI/ML Action Plan mphasizes continuous post-market performance monitoring and real-world evidence generation. However, reducing premarket oversight without strengthening post-market infrastructure may undermine this lifecycle model.
Analyses of real-world AI adoption patterns indicate that AI-enabled medical devices can scale rapidly across health systems.12 In this context, early detection of safety or effectiveness degradation is essential. Exemptions should not proceed without enforceable lifecycle safeguards.
5. Criteria for Exemptions
FDA has described the criteria for exemptions. Unfortunately, these criteria are flawed, as described below:
The FDA requires that “(1) the device does not have a significant history of false or misleading claims or of risks associated with inherent characteristics of the device.”
HOWEVER, even if a previous version of the device (made by a different company or the same company) does not have a “significant history” of such problems, that doesn’t mean the new device will be as good. Maybe it will be as good, maybe it will be better, maybe it will be worse. Moreover, given the voluntary adverse event reporting and the shortage of well-designed post-market studies of medical devices, it is often impossible to know if there is a “significant history” of problems with any FDA-cleared device. Unfortunately, this CDRH standard regarding an unknown history of problems is standard practice, but when a device has life-saving or life-threatening implications, it is not appropriate to exempt the device from FDA regulatory review.
FDA also requires that “(2) characteristics of the device necessary for its safe and effective performance are well established.”
As noted above, even if the characteristics of the device have been established for its safe and effective performance based on one or more previous versions of the device, the new device that is made by a different company or the same company does not necessarily have the identical required characteristics.
Again, the new device may function better, worse, or the same.
FDA also requires that “(3) changes in the device that could affect safety and effectiveness will either (a) be readily detectable by users by visual examination or other means such as routine testing, before causing harm, or (b) not materially increase the risk of injury, incorrect diagnosis, or ineffective treatment.”
Visual examination is obviously not relevant to these devices. Routine testing is not relevant because too many patients could be harmed before an increased risk of injury, incorrect diagnosis, triage, or notification is identified.
The FDA also requires that “(4) any changes to the device would not be likely to result in a change in the device’s classification. FDA may also consider that, even when exempting devices from the 510(k) requirements, these devices would still be subject to the general limitations on exemptions (see 21 CFR 892.9).”
This criterion is not appropriate for the types of computer-aided or AI-aided devices that are the subject of this request for comments.
Alignment with FDA’s AI/ML Action Plan
FDA’s 2021 AI/ML-Based Software as a Medical Device (SaMD) Action Plan articulated key pillars:
• A risk-based regulatory framework
• Emphasis on real-world performance monitoring
• Transparency and good machine learning practices
• Strengthened evidence generation
NCHR strongly supports these principles, and we support the same principles for any computer-aided device. However, broad exemption from 510(k) review may conflict with the risk-based approach if applied to devices that materially influence diagnosis or triage.
Regulatory efficiency should not be conflated with deregulation. Modernization must preserve reasonable assurance of safety and effectiveness, particularly for tools that directly affect patient management.
III. Recommendations
If FDA proceeds with considering exemptions, NCHR urges adoption of the following safeguards:
1. Strict Risk Stratification: Exemptions should be limited to clearly low-risk tools with narrowly defined intended use and established technological maturity. Devices influencing high-acuity triage or diagnostic decisions should remain subject to premarket notification unless supported by robust
prospective evidence of safety and benefit.6,7
2. Mandatory Multisite External Validation: Eligibility for exemption should require multisite validation across diverse health systems and patient populations, with transparent subgroup performance reporting.1,3
3. Prospective Clinical Impact Evidence: For tools integrated into clinical decision pathways, FDA should require evidence demonstrating meaningful patient-centered outcomes, not solely retrospective accuracy metrics.6,7
4. Enforceable Post-Market Surveillance: Exempted devices should be subject to
• Defined real-world performance monitoring
• Structured adverse event reporting
• Data drift detection protocols
• Publicly accessible performance summaries
Empirical evidence shows that detecting performance degradation in any type of computer-aided medical imaging systems can be technically challenging and requires active monitoring infrastructure in place.7 Prior experience with traditional computer-aided mammography systems demonstrated widespread
adoption before robust evidence of improved patient outcomes was established, and post-market evaluation later revealed increased recall rates and false positives without clear mortality benefit.13 These historical lessons underscore that performance and clinical impact cannot be assumed at scale and may
only become apparent through structured post-market evaluation.
5. Transparency and Reporting Standards: Manufacturers should adhere to recognized reporting frameworks such as TRIPOD+AI and DECIDE-AI to ensure transparent disclosure of training data characteristics, intended use boundaries, performance limitations, and known failure modes.14,15
Transparency and reporting standards should apply to all computer-aided radiology devices, not only to artificial intelligence (AI)-enabled systems. Frameworks such as STARD (Standards for Reporting Diagnostic Accuracy Studies) require clear disclosure of study populations, reference standards, subgroup analyses, and performance metrics, and apply equally to traditional computer-aided detection (CADe) and computer-aided diagnosis (CADx) tools.16
6. Clear Public Communication: FDA should explicitly clarify that “exemption” does not imply diminished safety expectations. Public messaging must reinforce that exempted devices remain subject to rigorous lifecycle oversight.
Conclusions
AI-enabled and computer-assisted radiology tools hold substantial promise to enhance detection and efficiency. However, recent evidence consistently demonstrates that generalizability, equity, and real-world effectiveness cannot be assumed based on retrospective validation alone.1-6,11 As a public health agency, FDA decisions should be based on scientific evidence, not on efforts to get products to market more quickly by ignoring the safeguards that previously made the FDA a gold standard. Premarket notification remains an important evidentiary safeguard. If exemptions are granted, they must be narrowly tailored, evidence-based, and accompanied by strengthened lifecycle oversight consistent with the FDA’s AI/ML Action Plan. Patient safety must remain the guiding principle of regulatory modernization.
Respectfully submitted,
National Center for Health Research (NCHR)
Washington, D.C.
References:
1. Windecker, D., Baj, G., Shiri, I., Kazaj, P. M., Kaesmacher, J., Gräni, C., & Siontis, G. C. (2025). Generalizability of FDA-approved AI-enabled medical devices for clinical use. JAMA Network Open, 8(4), e258052.
2. Lea, A. S., & Jones, D. S. (2024). Mind the gap—machine learning, dataset shift, and history in the age of clinical algorithms. New England Journal of Medicine, 390(4), 293-295.
3. Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D., & Ghassemi, M. (2024). The limits of fair medical imaging AI in real-world generalization. Nature medicine, 30(10), 2838-2848.
4. Jorritsma, W., Cnossen, F., & van Ooijen, P. M. (2015). Improving the radiologist-CAD interaction: designing for appropriate trust. Clinical radiology, 70(2), 115–122. https://doi.org/10.1016/j.crad.2014.09.017
5. Leming, M. J., Bron, E. E., Bruffaerts, R., Ou, Y., Iglesias, J. E., Gollub, R. L., & Im, H. (2023). Challenges of implementing computer-aided diagnostic models for neuroimages in a clinical setting. NPJ digital medicine, 6(1), 129. https://doi.org/10.1038/s41746-023-00868-x
6. Angus, D. C., Khera, R., Lieu, T., Liu, V., Ahmad, F. S., Anderson, B., … & Bibbins-Domingo, K. (2025). AI, health, and health care today and tomorrow: the JAMA summit report on artificial intelligence. Jama, 334(18), 1650-1664.
7. Han, R., Acosta, J. N., Shakeri, Z., Ioannidis, J. P., Topol, E. J., & Rajpurkar, P. (2024). Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. The lancet digital health, 6(5), e367-e373.
8. Lehman, C. D., Wellman, R. D., Buist, D. S., Kerlikowske, K., Tosteson, A. N., Miglioretti, D. L., & Breast Cancer Surveillance Consortium (2015). Diagnostic Accuracy of Digital Screening Mammography With and Without Computer-Aided Detection. JAMA internal medicine, 175(11), 1828–1837. https://doi.org/10.1001/jamainternmed.2015.5231
9. Hyams, B., Kerlikowske, K., & Redberg, R. F. (2025). New Mammography Tools – The Need for Clinically Meaningful Assessment Standards. The New England journal of medicine, 393(3), 211–213. https://doi.org/10.1056/NEJMp2500274
10. US Preventive Services Task Force, Nicholson, W. K., Silverstein, M., Wong, J. B., Barry, M. J., Chelmow, D., Coker, T. R., Davis, E. M., Jaén, C. R., Krousel-Wood, M., Lee, S., Li, L., Mangione, C. M., Rao, G., Ruiz, J. M., Stevermer, J. J., Tsevat, J., Underwood, S. M., & Wiehe, S. (2024). Screening for Breast Cancer: US Preventive Services Task Force Recommendation Statement. JAMA, 331(22), 1918–1930. https://doi.org/10.1001/jama.2024.5534
11. Kore, A., Abbasi Bavil, E., Subasri, V., Abdalla, M., Fine, B., Dolatabadi, E., & Abdalla, M. (2024). Empirical data drift detection experiments on real-world medical imaging data. Nature communications, 15(1), 1887.
12. Wu, K., Wu, E., Theodorou, B., Liang, W., Mack, C., Glass, L., … & Zou, J. (2024). Characterizing the clinical adoption of medical AI devices through US insurance claims. Nejm Ai, 1(1), AIoa2300030.
13. Fenton, J. J., Taplin, S. H., Carney, P. A., Abraham, L., Sickles, E. A., D’Orsi, C., Berns, E. A., Cutter, G., Hendrick, R. E., Barlow, W. E., & Elmore, J. G. (2007). Influence of computer-aided detection on performance of screening mammography. The New England journal of medicine, 356(14), 1399–1409. https://doi.org/10.1056/NEJMoa066099
14. Collins, G. S., Moons, K. G., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster, B., … & Logullo, P. (2024). TRIPOD+ AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385.
15. Vasey, B., Nagendran, M., Campbell, B., Clifton, D. A., Collins, G. S., Denaxas, S., … & McCulloch, P. (2022). Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. bmj, 377.
16. Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L., Lijmer, J. G., Moher, D., Rennie, D., de Vet, H. C., Kressel, H. Y., Rifai, N., Golub, R. M., Altman, D. G., Hooft, L., Korevaar, D. A., Cohen, J. F., & STARD Group (2015). STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.


