| تعداد نشریات | 8 |
| تعداد شمارهها | 430 |
| تعداد مقالات | 5,597 |
| تعداد مشاهده مقاله | 7,096,228 |
| تعداد دریافت فایل اصل مقاله | 6,042,261 |
Efficient Arabic Hate Speech Detection via LLaMA-3: A Prompting and Instruction-Tuning Approach | ||
| AUT Journal of Modeling and Simulation | ||
| مقالات آماده انتشار، پذیرفته شده، انتشار آنلاین از تاریخ 07 آبان 1404 | ||
| نوع مقاله: Research Article | ||
| شناسه دیجیتال (DOI): 10.22060/miscj.2025.24262.5414 | ||
| نویسندگان | ||
| ِِAnas Khudhur Abbass1؛ HESHAM FAILI* 2 | ||
| 1Institution: University of Tehran Department: Pardis Alborz, Computer Engineering | ||
| 2full Professor at the Collage of Electrical and Computer Engineering, University of Tehran Head of the Intelligent Text Processing and Natural Language Processing Lab (NLP Lab),Iran | ||
| چکیده | ||
| Cyberspace generates user-generated content daily, but also gives freedom of expression, potentially spreading hate speech and endangering minorities. So, there’s the issue of identifying hate speech as quickly as possible to stop it from being shared. This is particularly challenging for Arabic given its rich morphology and lack of good quality linguistic resources. In this article, we examine zero-shot and few-shot prompting for detecting Arab hate speech using the LLaMA-3-8B language model while also refining performance via supervised fine-tuning utilizing a custom instruction-based dataset. In the zero-shot approach, the outputs from the model are unstructured textual outputs, so we take the unstructured responses and run them through a lightweight TF-IDF + Logistic Regression classifier to classify the responses in one of the predefined hate speech categories. To obtain better classification, we construct the instruction-based training set by creating tweet embeddings from Arabic-BERT and using K-Means clustering to enforce semantic/topical variety. Next, we use the GPT-4o model to generate representative instructions from each cluster and create an instruction-based fine-tuning data set. We then fine-tune LLaMA-3-8B using QLoRA, which also allows the model to be fine-tuned with a lower memory footprint. The experimental results presented in this paper show that zero-shot and few-shot prompting achieved relatively low F1-scores of 42.2% and 45.0%, respectively, and instruction-tuning fine-tuning achieves the overall performance of an F1-score of 90.1%, which exceeds stronger benchmarks like AraBERT. Our results exemplify the potential impact of instruction tuning and QLoRA-based fine-tuning over prompting-based approaches in low-resource contexts like Arabic. | ||
| کلیدواژهها | ||
| Hate speech detection in Arabic؛ Zero-shot prompting؛ Few-shot prompting؛ Supervised fine-tuning؛ LLaMA-3-8B | ||
|
آمار تعداد مشاهده مقاله: 5 |
||