Leandro von Werra

Head of Research at Hugging Face

Hello there! This is my personal website highlighting some of the projects I worked on and than I am excited about. At Hugging Face I lead the research team and my interests cover LLM pre-training and post-training as well as high large-scale, quality datasets and robust training recipes. Proior to joining Hugging Face studied Physics orignally and worked for a few years as a data scientist at a startup and an insurance company. I live in Bern, the underrated capital of Switzerland.

Research

The research projects I work on generally aim at building useful artefacts for the community such as lightweight but powerful models, high-quality datasets or open recipes. You can find a more exhaustive list on the Hugging Face Science page.

BigCode: We started working on code LLMs while finishing the NLP with Transformers book which resulted in CodeParrot, a GPT-2 like model trained on GitHub code. When CoPilot was released we decided to scale the project up in a community effort called BigCode to build fully open LLMs for code. With 1000+ community members we built the The Stack v1 and The Stack v2, both TB scale code datasets for pretraining, and trained the fully open models StarCoder and StarCoder2.

SmolLM: The SmolLM model family is a set of models with maximal performance at small size that can run locally or on-device. We released three generations so far (1, 2, and 3) along with the full training pipeline. The models were also adapted to images and videos with SmolVLM and SmolVLM2 and SmolVLA for robotics. In collaboration with IBM we build SmolDocling specifically for OCR tasks.

TRL: The Transformer Reinforcement Learning (TRL) library originally started as a reproduction project in 2020 to get myself into NLP and has since become a popular fine-tuning library for transformer models with 15k GitHub stars and over 1M monthly pip installs. It serves as the foundation for many projects like our Zephyr model and the Open-R1 project replciating the DeepSeek-R1 pipeline.

FineDatasets: Large and high-quality datasets are the foundation of the success of LLMs however they are rarely released these days. Similar to The Stack datasets we worked on FineWeb, FineWeb-Edu, FineWeb2, FineVideo and others to enable more people to train great models.

Agents: There is a lot of interest (and hype) around agents but there are not proportionally mayny quality resources out there, yet. We experiment on what useful agents could do and built for example Jupyter Agents and Computer Use Agents as first projects.

Books

Natural Language Processing with Transformers

Lewis Tunstall, Leandro von Werra, Thomas Wolf. O'Reilly Media, 2022

A comprehensive guide to building real-world NLP applications using the Transformers library. This book covers everything from the fundamentals of transformer architectures to advanced techniques for fine-tuning and deploying models in production. Features hands-on examples with BERT, GPT, T5, and other state-of-the-art models.

Available languages: en, de, ja, ko, pl, zh, pt, es, it, fr

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro von Werra, Thomas Wolf. Lulu, 2025

Embark on a journey to orchestrate thousands of GPUs to scale LLM training to the largest compute clusters today. Starting with the memory and compute anatomy of model training we then explore 5 dimensions of parallelism to distribute training efficiently. From there we dive deeper into how GPUs are designed and how specialised kernels help increase training efficiency further. This book is a great starting point if you want to get into training ever larger models efficiently at scale!