Distributed LLM Fine-Tuning & Inference on HPC Systems
The Norwegian Research Infrastructure Services (NRIS) is hosting an in-person, hands-on physical course in Bergen. Gain practical, hands-on experience over two days working with single-GPU fine-tuning, multi-GPU scaling, and optimized LLM inference on a high-performance computing (HPC) system. Build applied skills in optimizing large language models in HPC environments.
Main content
Content: In this course, you will learn to:
- Implement parameter-efficient fine-tuning using LoRA and QLoRA
- Configure and launch distributed training workloads across multiple GPUs
- Perform distributed LLM inference
- Monitor and analyze GPU utilization and profiling GPU memory
HPC System: Olivia Supercomputer
Target audience: This course is ideal for researchers, developers, and students with Python experience who want hands-on skills in scalable LLM training and inference on an HPC system.
Prerequisites:
- Familiarity with machine learning (ML) frameworks (e.g. PyTorch)
- Basic understanding of large language models (LLMs)
Registration: Register her
Practical information: The course is free of charge, and has a maximum capacity of 30 participants. There will be serving of food, some light pastries and coffee/tea both days.
Instructor: Hicham Agueny
Coordinator: Eirik Skjerve