FRIDAY, April 17, 2026 (HealthDay News) -- Chatbots perform poorly when answering questions about misinformation-prone health topics, according to a study published online April 14 in BMJ Open.Nicholas B. Tiller, Ph.D., from Harbor-UCLA Medical Center in Torrance, California, and colleagues audited chatbot responses for health-related questions prone to misinformation. Ten questions from five categories (cancer, vaccines, stem cells, nutrition, and athletic performance) were used as prompts in five popular chatbots: Gemini (Google), DeepSeek (High-Flyer), Meta AI (Meta), ChatGPT (OpenAI), and Grok (xAI) in February 2025, with two experts rating responses.The researchers found that nearly half (49.6 percent) of responses were problematic (30 percent somewhat problematic and 19.6 percent highly problematic). Quality of responses was similar among chatbots (P = 0.566), although Grok generated significantly more highly problematic responses than would be expected under a random distribution (z-score, +2.07). For vaccines (mean z-score, –2.57) and cancer (–2.12), performance was strongest, while it was weakest in stem cells (+1.25), athletic performance (+3.74), and nutrition (+4.35). The quality of references was poor, with a median completeness score of 40 percent. No chatbot produced a fully accurate reference list due to hallucinations and fabricated citations. Readability of responses was graded as "difficult," equivalent to college sophomore–senior level."By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences," the authors write. "They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments. This behavioral limitation means that chatbots can reproduce authoritative-sounding but potentially flawed responses."Abstract/Full Text.Sign up for our weekly HealthDay newsletter