Home » Node » 29379

Present and Future of (L)LMs: considerations on reliability and efficiency

Speaker: 
Federico Errica
Data dell'evento: 
Giovedì, 3 July, 2025 - 14:00
Luogo: 
B203
Contatto: 
moroni@diag.uniroma1.it
Abstract: The paradigm shift of Large Language Models (LLMs) is in part due to bringing down the time-to-content creation by orders of magnitude. While we can be generally content with using LLMs to carry out simple and risk-free tasks, at their core LLMs remain autoregressive neural networks, thus we cannot control their “hallucinations” nor we can quantify when distribution-shift happens. This makes it very hard for software engineers to use LLMs as a practical and reliable tool in their everyday work. Also, training LLMs remains an insurmountable task for those with less resources, which makes NLP research harder. This is interesting, because there are no guarantees that the current sizes of LLMs are optimal. The costs of performing model selection are way too high to justify a proper exploration of the architectural space, so there is a tangible possibility that the right size might be way smaller from what we are used to, potentially leading to a democratization of (L)LM research.
 
In this talk, we will address two separate topics. First, we will briefly see how to quantitatively measure the “stability” of LLMs to prompt rewritings, so that software engineers can at least select and debug LLMs according to two newly introduced metrics. Then, we will spend more time on a new technique to train neural networks so that their size adapts to the task within a single training run, up to an infinite size. We will show how the neural network “orders information by its importance”, opening a whole new set of considerations when discussing neural networks training.
 
 
 
 
Bio: Federico Errica is a Senior Research Scientist at NEC Laboratories Europe, which he joined since 2022. He received his PhD in Computer Science from the University of Pisa in 2022, with a thesis on Bayesian graph machine learning supervised by Davide Bacciu and Alessio Micheli. His current interests revolve around the understanding deep graph networks’ behavior and the ability of neural networks to adapt their architecture during training. Federico is part of the Intelligent Software Systems group, which performs basic research on computational science and AI agents topics.
gruppo di ricerca: 
© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma