| تعداد نشریات | 8 |
| تعداد شمارهها | 435 |
| تعداد مقالات | 5,639 |
| تعداد مشاهده مقاله | 7,466,486 |
| تعداد دریافت فایل اصل مقاله | 6,237,012 |
Prompt Engineering for Biomedical NLI: An Exploratory Study. | ||
| AUT Journal of Modeling and Simulation | ||
| مقالات آماده انتشار، پذیرفته شده، انتشار آنلاین از تاریخ 16 آذر 1404 | ||
| نوع مقاله: Research Article | ||
| شناسه دیجیتال (DOI): 10.22060/miscj.2025.24382.5421 | ||
| نویسندگان | ||
| Hasan Fadhil Qasim Alkhawaf* 1؛ Heshaam Faili2 | ||
| 1Ministry of Higher Education and Scientific Research / University of Maysan | ||
| 2School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran | ||
| چکیده | ||
| Abstract Biomedical Natural Language Inference (BioNLI) is a core task in biomedical NLP that looks to identify whether the biomedical premise entails the hypothesis from the premise. Prompt-based methods are gaining traction as one of the simplest and fastest approaches for effectively using large language models (LLMs) for these sorts of inference tasks without the need for complex, time-consuming fine-tuning of the model. However, the inherent difficulty of biomedical NLI problems represents a significant challenge for prompt engineering given the heavy use of terminology specific to the biomedical domain. During in-context prompting or using predefined examples for zero-shot or few-shot prompting, there is often a lack of contextual information or generalizability to the heterogeneity found in biomedical texts for determining entailment decisions. In this work, we present a comprehensive evaluation of a variety of prompting methods (zero-shot, few-shot static, few-shot dynamic, Chain-of-Thought, self-consistency, and Tree-of-Thought) with two LLMs, DeepSeek-R1-Distill-Qwen-14B and LLaMA-3.1-8B-Instruct, from the prompt-engineering perspective. We applied these methods to the BioNLI dataset and reported on key evaluation metrics across all methods. Our results show that dynamic contextual in-context prompting, together with structured reasoning, produces high-quality inference in our context. Between all of the models and configurations, few-shot ToT prompting using the DeepSeek model produced the best results, scoring a macro-F1 score of 71.05, even outperforming retrieval-augmented models reported on in prior studies. These findings show that prompt engineering alone can handle complex biomedical reasoning effectively, without needing retrieval or full fine-tuning. | ||
| کلیدواژهها | ||
| Chain-of-Thought؛ Tree-of-Thought؛ self-consistency prompting؛ few-shot reasoning؛ DeepSeek-R1-Distill-Qwen-14B | ||
|
آمار تعداد مشاهده مقاله: 2 |
||