ChatGPT vs. Google: Who Provides Better Health Advice? Discover the Best Source for Your Wellness Needs!

Admin

ChatGPT vs. Google: Who Provides Better Health Advice? Discover the Best Source for Your Wellness Needs!

Can AI chatbots give better medical answers than Google? A recent study suggests they can, but it all depends on how you ask.

How trustworthy are AI chatbots and search engines when it comes to health questions? A study published in NPJ Digital Medicine by Spanish researchers looked into this by testing four popular search engines and seven large language models (LLMs), including ChatGPT and GPT-4, with 150 medical inquiries. The results revealed some surprising insights about accuracy and how prompts impact replies.

People increasingly turn to the internet for health advice, but search engines often provide incomplete or misleading information. LLMs offer a different approach by generating coherent answers based on extensive training. While there has been research on LLMs in specific medical fields like fertility, not much focus has been on how they stack up against traditional search engines in general health queries.

The study tested four major search engines — Yahoo!, Bing, Google, and DuckDuckGo — along with seven LLMs. Notably, GPT-4 and ChatGPT performed the best, while Flan-T5 lagged behind. The questions covered a wide range of medical topics and were formatted for clarity, allowing the researchers to compare the responses effectively.

Search engine results were assessed by examining the top 20 links they provided. The researchers simulating user behavior found that users who accepted the first answer (the “lazy” user) achieved similar accuracy to those who checked multiple sources (the “diligent” user). This raises the question of whether it’s wise to trust top search results without further verification.

For LLMs, the effectiveness was tested under various prompts. Using plain language or expert prompts influenced how well the models responded. Adding example questions improved performance for some, but not all models benefitted equally. Interestingly, when models were fed results from search engines before replying, this often enhanced their accuracy, especially for smaller models.

In terms of performance, LLMs generally outperformed search engines, answering about 80% of questions correctly. In comparison, search engines managed only 50-70% accuracy. However, LLM responses varied significantly based on how questions were phrased. Prompts that guided the models toward medical consensus were usually the most reliable, although they sometimes created vague answers.

Bing was noted for its comparatively reliable results, but it wasn’t notably different from Google, Yahoo!, or DuckDuckGo. While many search engine responses contained irrelevant information, omitting those increased their precision to about 80-90%. However, 10-15% of those still included wrong answers.

The study highlighted that ‘lazy’ users often achieved better or similar accuracy with less effort, revealing a double-edged sword in trusting initial search results without deeper investigation.

Moreover, using retrieval-augmented approaches helped some LLMs perform better, showing that combining LLMs with quality search engine outputs can enhance accuracy, as long as the retrieved content is relevant and correct. For instance, integrating snippets from search engines boosted some smaller models to match the performance of larger ones like GPT-4. However, adding lower-quality results can diminish performance, making the quality of retrieved information crucial.

The researchers noted different error patterns in LLM responses, such as misunderstandings of medical conditions or ambiguous phrasing. Some health inquiries proved challenging for both search engines and LLMs, further emphasizing the need for careful consideration when interpreting results.

Ultimately, LLMs showed better overall accuracy, yet their dependence on phrasing and potential for misinformation means careful use is essential. While blending LLMs with search engines seems promising, the need for accuracy in retrieved data remains a challenge. Future work should explore how to improve the reliability of LLMs and reduce misinformation in health-related contexts.

In conclusion, both search engines and LLMs have strengths and weaknesses in answering health questions. While LLMs often achieve higher accuracy, they require careful prompting and retrieval methods to ensure quality responses. Understanding these dynamics can help users make more informed decisions when seeking medical information online.

Journal reference:
Fernández-Pichel, M., Pichel, J.C. & Losada, D.E. (2025). Evaluating search engines and large language models for answering health questions. NPJ Digital Medicine. 8, 153. DOI:10.1038/s41746-025-01546-w, https://www.nature.com/articles/s41746-025-01546-w



Source link

Artificial Intelligence, covid-19, Fertility, Genetics, Language, Medicine, Research