{"PubmedArticle":{"MedlineCitation":{"@attributes":{"Status":"Publisher","Owner":"NLM"},"PMID":{"@attributes":{"Version":"1"},"@text":"41910547"},"DateRevised":{"Year":"2026","Month":"03","Day":"30"},"Article":{"@attributes":{"PubModel":"Print-Electronic"},"Journal":{"ISSN":{"@attributes":{"IssnType":"Electronic"},"@text":"1536-3708"},"JournalIssue":{"@attributes":{"CitedMedium":"Internet"},"PubDate":{"Year":"2026","Month":"Mar","Day":"30"}},"Title":"Annals of plastic surgery","ISOAbbreviation":"Ann Plast Surg"},"ArticleTitle":"Benchmarking Large Language Models Against Web of Science: A Comparative Bibliometric Analysis on Panniculectomy.","ELocationID":[{"@attributes":{"EIdType":"doi","ValidYN":"Y"},"@text":"10.1097\/SAP.0000000000004730"}],"Abstract":{"AbstractText":[{"@attributes":{"Label":"BACKGROUND","NlmCategory":"BACKGROUND"},"@text":"As large language models (LLMs) grow in sophistication, their potential role in scientific writing is being explored with growing interest and caution. However, LLMs vary in their performance, contextual accuracy, and reliability. This study compares the outputs of 3 leading LLMs (ChatGPT-4o, Deepseek, and Claude 3.7) against a manually curated bibliometric analysis of the most highly cited panniculectomy articles."},{"@attributes":{"Label":"METHODS","NlmCategory":"METHODS"},"@text":"The 50 most highly cited panniculectomy publications were manually extracted from Web of Science (WoS) to serve as a reference data set. ChatGPT-4o, Deepseek, and Claude 3.7 Sonnet were each prompted to generate their own list of the 50 most cited panniculectomy articles. Outputs were compared across citation totals and averages, publication year trends, journal distribution, author co-occurrence, and article authenticity."},{"@attributes":{"Label":"RESULTS","NlmCategory":"RESULTS"},"@text":"The manual data set totaled 2494 citations (density: 49.8). ChatGPT-4o, Deepseek, and Claude 3.7 produced 2111 (42.2), 4736 (94.7), and 8592 (171.8) citations, respectively. Overlap with the manual list was limited: ChatGPT-4o (14.00%), Claude 3.7 (4.00%), Deepseek (0.00%) (P&lt;0.001). \"Plastic and Reconstructive Surgery\" was the most cited journal across all outputs. Unique authors: manual (241), ChatGPT-4o (114), Deepseek (72), and Claude 3.7 (129). Article accuracy: ChatGPT-4o had 34.00% accurate, 26.00% confabulated, and 40.00% hallucinated articles. Claude 3.7: 4.00% accurate, 26.00% confabulated, and 70.00% hallucinated. Deepseek: 100.00% hallucinated (P&lt;0.001). Year trends and journal representation varied notably from the manual set."},{"@attributes":{"Label":"CONCLUSIONS","NlmCategory":"CONCLUSIONS"},"@text":"Current LLMs struggle to replicate accurate bibliometric data. ChatGPT-4o performed best but still showed major limitations. WoS remains the gold standard, and LLM-generated outputs should be treated cautiously in bibliometric analyses."}],"CopyrightInformation":"Copyright \u00a9 2026 Wolters Kluwer Health, Inc. All rights reserved."},"AuthorList":{"@attributes":{"CompleteYN":"Y"},"Author":[{"@attributes":{"ValidYN":"Y"},"LastName":"Mangal","ForeName":"Rohan","Initials":"R","AffiliationInfo":[{"Affiliation":"University of Miami Miller School of Medicine."}]},{"@attributes":{"ValidYN":"Y"},"LastName":"Jessup","ForeName":"Mark","Initials":"M","Identifier":[{"@attributes":{"Source":"ORCID"},"@text":"0009-0005-4682-9353"}],"AffiliationInfo":[{"Affiliation":"University of Miami Miller School of Medicine."}]},{"@attributes":{"ValidYN":"Y"},"LastName":"Desai","ForeName":"Anshumi","Initials":"A","AffiliationInfo":[{"Affiliation":"DeWitt Department of Surgery, Division of Plastic Surgery, University of Miami Miller School of Medicine."}]},{"@attributes":{"ValidYN":"Y"},"LastName":"Khandekar","ForeName":"Archan","Initials":"A","AffiliationInfo":[{"Affiliation":"Desai Sethi Urology Institute, University of Miami Miller School of Medicine, Miami, FL."}]},{"@attributes":{"ValidYN":"Y"},"LastName":"Prasad","ForeName":"Soumil","Initials":"S","AffiliationInfo":[{"Affiliation":"University of Miami Miller School of Medicine."}]},{"@attributes":{"ValidYN":"Y"},"LastName":"Tadisina","ForeName":"Kashyap Komarraju","Initials":"KK","AffiliationInfo":[{"Affiliation":"DeWitt Department of Surgery, Division of Plastic Surgery, University of Miami Miller School of Medicine."}]},{"@attributes":{"ValidYN":"Y"},"LastName":"Singh","ForeName":"Devinder","Initials":"D","AffiliationInfo":[{"Affiliation":"DeWitt Department of Surgery, Division of Plastic Surgery, University of Miami Miller School of Medicine."}]}]},"Language":["eng"],"PublicationTypeList":{"PublicationType":[{"@attributes":{"UI":"D016428"},"@text":"Journal Article"}]},"ArticleDate":[{"@attributes":{"DateType":"Electronic"},"Year":"2026","Month":"03","Day":"30"}]},"MedlineJournalInfo":{"Country":"United States","MedlineTA":"Ann Plast Surg","NlmUniqueID":"7805336","ISSNLinking":"0148-7043"},"CitationSubset":["IM"],"KeywordList":[{"@attributes":{"Owner":"NOTNLM"},"Keyword":[{"@attributes":{"MajorTopicYN":"N"},"@text":"LLM"},{"@attributes":{"MajorTopicYN":"N"},"@text":"artificial intelligence"},{"@attributes":{"MajorTopicYN":"N"},"@text":"bibliometric"},{"@attributes":{"MajorTopicYN":"N"},"@text":"panniculectomy"}]}],"CoiStatement":"Conflicts of interest and sources of funding: none declared."},"PubmedData":{"History":{"PubMedPubDate":[{"@attributes":{"PubStatus":"received"},"Year":"2025","Month":"10","Day":"2"},{"@attributes":{"PubStatus":"accepted"},"Year":"2026","Month":"2","Day":"21"},{"@attributes":{"PubStatus":"medline"},"Year":"2026","Month":"3","Day":"30","Hour":"12","Minute":"49"},{"@attributes":{"PubStatus":"pubmed"},"Year":"2026","Month":"3","Day":"30","Hour":"12","Minute":"49"},{"@attributes":{"PubStatus":"entrez"},"Year":"2026","Month":"3","Day":"30","Hour":"10","Minute":"42"}]},"PublicationStatus":"aheadofprint","ArticleIdList":{"ArticleId":[{"@attributes":{"IdType":"pubmed"},"@text":"41910547"},{"@attributes":{"IdType":"doi"},"@text":"10.1097\/SAP.0000000000004730"},{"@attributes":{"IdType":"pii"},"@text":"00000637-990000000-01132"}]},"ReferenceList":[{"Reference":[{"Citation":"Clapp B, Ponce J, Corbett J, et al. American Society for Metabolic and Bariatric Surgery 2022 estimate of metabolic and bariatric procedures performed in the United States. Surg Obes Relat Dis. 2024;20:425\u2013431."},{"Citation":"Pestana IA, Campbell D, Fearmonti RM, et al. \u201cSupersize\u201d panniculectomy: indications, technique, and results. Ann Plast Surg. 2014;73:416\u2013421."},{"Citation":"Payer M, Youngberg B, Pfister S. Panniculectomy\u2014an option for people who are morbidly obese. AORN J. 2003;77:782\u2013794; quiz 795, 797-798."},{"Citation":"Meyerowitz BR, Gruber RP, Laub DR. Massive abdominal panniculectomy. JAMA. 1973;225:408\u2013409."},{"Citation":"Pratt JH, Irons GB. Panniculectomy and abdominoplasty. Am J Obstet Gynecol. 1978;132:165\u2013168."},{"Citation":"Hernandez AB, Medina LG, Hueber PA, et al. Robotic simple prostatectomy plus panniculectomy and giant umbilical hernia repair. Int Braz J Urol. 2019;45:641."},{"Citation":"Aitzetm\u00fcller MM, Klietz ML, Dermietzel AF, et al. Robotic-assisted microsurgery and its future in plastic surgery. J Clin Med. 2022;11:3378."},{"Citation":"Mansoor M, Ibrahim AF. The transformative role of artificial intelligence in plastic and reconstructive surgery: challenges and opportunities. J Clin Med. 2025;14:2698."},{"Citation":"Telenti A, Auli M, Hie BL, et al. Large language models for science and medicine. Eur J Clin Invest. 2024;54:e14183."},{"Citation":"Joseph A, Joseph K, Joseph A. A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports. Transl Neurosci. 2024;15:20220361."},{"Citation":"Athaluri SA, Manthena SV, Kesapragada VSRKM, et al. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus. 2023;15:e37432."},{"Citation":"Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023;15:e35179."},{"Citation":"Dumit J, Roepstorff A. AI hallucinations are a feature of LLM design, not a bug. Nature. 2025;639:38."},{"Citation":"Aljamaan F, Temsah MH, Altamimi I, et al. Reference hallucination score for medical artificial intelligence chatbots: development and usability study. JMIR Med Inform. 2024;12:e54345."},{"Citation":"Manley K, Salingaros S, Fuchsman AC, et al. Using ChatGPT to write a literature review on autologous fat grafting. J Plast Reconstr Aesthet Surg. 2025;105:292\u2013304."},{"Citation":"Pranckut\u0117 R. Web of Science (WoS) and Scopus: the titans of bibliographic information in today\u2019s academic world. Publications. 2021;9:12."},{"Citation":"Li K, Rollins J, Yan E. Web of Science use in published research and review papers 1997-2017: a selective, dynamic, cross-domain, content-based analysis. Scientometrics. 2018;115:1\u201320."},{"Citation":"Manoj Kumar L, George RJ, Anisha PS, et al. Bibliometric analysis for medical research. Indian J Psychol Med. 2023;45:277\u2013282."},{"Citation":"Farquhar S, Kossen J, Kuhn L, et al. Detecting hallucinations in large language models using semantic entropy. Nature. 2024;630:625\u2013630."},{"Citation":"Yilmaz BE, Gokkurt Yilmaz BN, Ozbey F. Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis. BMC Oral Health. 2025;25:573."},{"Citation":"Liu X, Duan C, Kim MK, et al. Claude 3 Opus and ChatGPT with GPT-4 in dermoscopic image analysis for melanoma diagnosis: comparative performance analysis. JMIR Med Inform. 2024;12:e59273."},{"Citation":"Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med (Lond). 2023;3:141."},{"Citation":"Keating M, Bollard SM, Potter S. Assessing the quality, readability, and acceptability of AI-generated information in plastic and aesthetic surgery. Cureus. 2024;16:e73874."},{"Citation":"Swisher AR, Wu AW, Liu GC, et al. Enhancing health literacy: evaluating the readability of patient handouts revised by ChatGPT\u2019s large language model. Otolaryngol Head Neck Surg. 2024;171:1751\u20131757."},{"Citation":"Gomez-Cabello CA, Borna S, Pressman SM, et al. Large language models for intraoperative decision support in plastic surgery: a comparison between ChatGPT-4 and Gemini. Medicina (Kaunas). 2024;60:957."},{"Citation":"Brochu BM, Mirsky NA, Thaller SR. Evaluating ChatGPT\u2019s efficacy in addressing common patient questions in plastic surgery consultations. Art Int Surg. 2024;4:411\u2013426."},{"Citation":"Lim B, Seth I, Xie Y, et al. Exploring the unknown: evaluating ChatGPT\u2019s performance in uncovering novel aspects of plastic surgery and identifying areas for future innovation. Aesthetic Plast Surg. 2024;48:2580\u20132589."},{"Citation":"Najafali D, Reiche E, Araya S, et al. Artificial intelligence augmentation: performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination. Plast Reconstr Surg Glob Open. 2025;13:e6645."}]}]}}}