Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by Adam and AdamW optimizers

Lashkari, Mohammad; Gheibi, Amin

doi:10.22060/ajmc.2023.22182.1139

	Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by Adam and AdamW optimizers
AUT Journal of Mathematics and Computing
مقاله 6، دوره 5، شماره 4، 2024، صفحه 361-375 اصل مقاله (1.08 M)
نوع مقاله: Original Article
شناسه دیجیتال (DOI): 10.22060/ajmc.2023.22182.1139
نویسندگان
Mohammad Lashkari^* ؛ Amin Gheibi
Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran
چکیده
The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with a lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW.
کلیدواژه‌ها
Generalization error؛ Adam algorithm؛ Lipschitz constant
مراجع

آمار تعداد مشاهده مقاله: 867 تعداد دریافت فایل اصل مقاله: 484

پیوندهای مفید

دانشگاه صنعتی امیرکبیر

آمار

تعداد نشریات	9
تعداد شماره‌ها	466
تعداد مقالات	5,841
تعداد مشاهده مقاله	8,762,874
تعداد دریافت فایل اصل مقاله	7,368,282