Tutorial - Megatron-SWIFT and Qwen2.5 Installation
This tutorial based a lot on my experience with my company’s servers. So maybe there is some things not applicable in your case. Leave comment if you need anything to discuss.
Purpose
I am trying to fine-tunning a model with Mixture-of-Experts (MoE) (Sanseviero et al., 2023) methodology. I choose Megatron-LM (Shoeybi et al., 2019) and SWIFT (Zhao et al., 2025) as the framework.
The tutorial I am following is: Megatron-SWIFT Training.
Prerequisites
- Operating System: Linux
- Python should be pre-installed. Check if your OS already has Python.
python3 --version
- If your OS doesn’t have Python yet, run below commands to install (this apply for Ubuntu, if you use different distro, google the tutorial for your OS).
sudo apt install python3 python3-pip
- Using virtual environment is a good practice for Python, I use
anaconda
for this. Follow this guide to installanaconda
on Linux. - If you want to train with GPU, you need to install
cuda
: CUDA Installation Guide for Linux. Recommended version:12.1.0
. - With this framework, you also need to install
cuDNN
: Installing cuDNN Backend on Linux. Recommended version:9
.
Install Megatron-SWIFT
First, we will create a virtual environment with conda
:
conda create --name <ENV_NAME> python=3.10
conda activate <ENV_NAME>
Then we will install pytorch
and torchvision
first.
pip install pytorch==2.3.0 torchvision==0.18.0
Next we need to install apex
, transformer-engine
, and ms-swift
.
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
pip install transformer-engine
pip install ms-swift
Download Qwen2.5
Install git lfs
Check availability of git
and git lfs
.
git --version
git lfs version
If your enviroment still not have git-lfs
you need to install it
conda install conda-forge::git-lfs
Clone model repo
Qwen2.5 (Team, 2024) is the model I use to train. First we will visit HuggingFace to create an account. Then visit Profile > Access Tokens > Create new token. Choose Token Type is Write. Remember to copy the token.
Return to our activated conda environment. Install huggingface_hub
:
pip install huggingface_hub
Then login into your huggingface
token.
huggingface-cli login --token <your-token>
Finally, we can clone the model repo to our folder. Example: Qwen2.5-7B-Instruct
cd <model folder>
git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
Test
Create a test.sh
file to run test.
CUDA_VISIBLE_DEVICES=0 \
swift export \
--model <model_dir>/Qwen2.5-7B-Instruct \
--to_mcore true \
--torch_dtype bfloat16 \
--output_dir Qwen2.5-7B-Instruct-mcore
I am afraid that because I wrote the tutorial after finishing setup, so maybe there is some incompatible version and tweak steps that I forgot. So comment to tell me if you can’t follow the tutorial.
References
- Mixture of Experts ExplainedSep 2023
- Megatron-lm: Training multi-billion parameter language models using model parallelismarXiv preprint arXiv:1909.08053, Sep 2019
- Swift: a scalable lightweight infrastructure for fine-tuningIn Proceedings of the AAAI Conference on Artificial Intelligence, Sep 2025
- Qwen2.5: A Party of Foundation ModelsSep 2024
Enjoy Reading This Article?
Here are some more articles you might like to read next: