FRIDAY, April 17, 2026 (HealthDay News) -- Notes generated by artificial intelligence (AI) have lower-quality scores than those generated by humans across five standardized care cases, according to a study published online April 17 in the Annals of Internal Medicine to coincide with the Internal Medicine Meeting, the annual meeting of the American College of Physicians, held from April 16 to 18 in San Francisco.Ashok Reddy, M.D., from the University of Washington in Seattle, and colleagues compared the quality of AI-generated clinical notes to the quality of human-produced notes generated from standardized primary care clinical cases in the Veterans Health Administration. Participants included 11 AI scribe tools, 18 human note takers, and 30 human raters. Five standardized primary care cases were audio-recorded, and notes were generated from the audio files. All notes were assessed by blinded raters using the modified Physician Documentation Quality Instrument (PDQI-9), which measures 10 domains of note quality on a 5-point Likert scale (maximum score, 50 points).The researchers found that human-generated notes received higher overall modified PDQI-9 scores than AI-generated notes across all five clinical cases. In the acute low back pain case, the difference was largest (43.8 for human-generated versus 20.3 for AI-generated). Lower AI scores were observed across all 10 domains in a pooled domain analysis, with the largest deficits seen in domains related to being thorough, organized, and useful (−1.23, −1.06, and −1.03, respectively)."Although ambient AI scribes hold promise for reducing clinician burden, rigorous and ongoing evaluation of their quality is essential to ensure that these tools enhance rather than compromise the quality of clinical care," the authors write.Abstract/Full Text (subscription or payment may be required)Editorial (subscription or payment may be required)More Information.Sign up for our weekly HealthDay newsletter