Hate speech detection in low resource languages using large language models

Mohades Khorasani, Ali; Almahdi, Asseel Jabbar; Akbari, Mohammad; Heidary, Soroush

doi:10.22060/ajmc.2025.23590.1270

دانشگاه صنعتی امیرکبیر

تعداد نشریات	9
تعداد شماره‌ها	452
تعداد مقالات	5,750
تعداد مشاهده مقاله	8,255,578
تعداد دریافت فایل اصل مقاله	6,770,694

	Hate speech detection in low resource languages using large language models
AUT Journal of Mathematics and Computing
مقالات آماده انتشار، پذیرفته شده، انتشار آنلاین از تاریخ 30 دی 1404
نوع مقاله: Original Article
شناسه دیجیتال (DOI): 10.22060/ajmc.2025.23590.1270
نویسندگان
Ali Mohades Khorasani^* ؛ Asseel Jabbar Almahdi؛ Mohammad Akbari؛ soroush Heidary
Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran
چکیده
The prevalence of hate speech on social media platforms is increasing, prompting considerable attention from the research community to detect such harmful content. Recent studies have focused on refining language models (LMs) to effectively identify hate speech, resulting in notable advancements in performance. Nonetheless, the majority of these studies are confined to identifying hate speech exclusively in English, disregarding the vast amount of hateful content produced in other languages, notably those considered low-resource languages. Constructing a classifier capable of effectively detecting hate speech in a low-resource language with limited data poses a formidable challenge. To address the existing gap, we perform comparative study for five large language models and three Parameter-Efficient Fine-Tuning Methods, to determine which model and method excel in detecting hate speech proficiently on two languages that have limited linguistic resources available. Specifically, we evaluate three approaches: Sequence Classification based fine-tuning (SEQ_CLS), Causal language modeling-based fine-tuning (CLM), and In-context learning approach (ICL). Our findings emphasize the ability of generative models to address the challenges of data scarcity and enhance model performance through these methods and approaches.
کلیدواژه‌ها
Hate speech detection؛ Large language models؛ Sequence classification؛ Causal language modeling؛ In-context learning

آمار تعداد مشاهده مقاله: 63

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

آمار

Hate speech detection in low resource languages using large language models