Tutorial - Megatron-SWIFT and Qwen2.5 Installation

This tutorial based a lot on my experience with my company’s servers. So maybe there is some things not applicable in your case. Leave comment if you need anything to discuss.

Purpose

I am trying to fine-tunning a model with Mixture-of-Experts (MoE) (Sanseviero et al., 2023) methodology. I choose Megatron-LM (Shoeybi et al., 2019) and SWIFT (Zhao et al., 2025) as the framework.

The tutorial I am following is: Megatron-SWIFT Training.

Prerequisites

  • Operating System: Linux
  • Python should be pre-installed. Check if your OS already has Python.
    python3 --version
    
  • If your OS doesn’t have Python yet, run below commands to install (this apply for Ubuntu, if you use different distro, google the tutorial for your OS).
    sudo apt install python3 python3-pip
    
  • Using virtual environment is a good practice for Python, I use anaconda for this. Follow this guide to install anaconda on Linux.
  • If you want to train with GPU, you need to install cuda: CUDA Installation Guide for Linux. Recommended version: 12.1.0.
  • With this framework, you also need to install cuDNN: Installing cuDNN Backend on Linux. Recommended version: 9.

Install Megatron-SWIFT

First, we will create a virtual environment with conda:

conda create --name <ENV_NAME> python=3.10
conda activate <ENV_NAME>

Then we will install pytorch and torchvision first.

pip install pytorch==2.3.0 torchvision==0.18.0

Next we need to install apex, transformer-engine, and ms-swift.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
pip install transformer-engine
pip install ms-swift

Download Qwen2.5

Install git lfs

Check availability of git and git lfs.

git --version
git lfs version

If your enviroment still not have git-lfs you need to install it

conda install conda-forge::git-lfs

Clone model repo

Qwen2.5 (Team, 2024) is the model I use to train. First we will visit HuggingFace to create an account. Then visit Profile > Access Tokens > Create new token. Choose Token Type is Write. Remember to copy the token.

Return to our activated conda environment. Install huggingface_hub:

pip install huggingface_hub

Then login into your huggingface token.

huggingface-cli login --token <your-token>

Finally, we can clone the model repo to our folder. Example: Qwen2.5-7B-Instruct

cd <model folder>
git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

Test

Create a test.sh file to run test.

CUDA_VISIBLE_DEVICES=0 \
swift export \
    --model <model_dir>/Qwen2.5-7B-Instruct \
    --to_mcore true \
    --torch_dtype bfloat16 \
    --output_dir Qwen2.5-7B-Instruct-mcore

I am afraid that because I wrote the tutorial after finishing setup, so maybe there is some incompatible version and tweak steps that I forgot. So comment to tell me if you can’t follow the tutorial.

References

  1. Mixture of Experts Explained
    Omar Sanseviero, Lewis Tunstall, Philipp Schmid, and 3 more authors
    Sep 2023
  2. Megatron-lm: Training multi-billion parameter language models using model parallelism
    Mohammad Shoeybi, Mostofa Patwary, Raul Puri, and 3 more authors
    arXiv preprint arXiv:1909.08053, Sep 2019
  3. Swift: a scalable lightweight infrastructure for fine-tuning
    Yuze Zhao, Jintao Huang, Jinghan Hu, and 8 more authors
    In Proceedings of the AAAI Conference on Artificial Intelligence, Sep 2025
  4. Qwen2.5: A Party of Foundation Models
    Qwen Team
    Sep 2024



    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Graphs-of-Heads - The First Literature Review
  • My Life Setup - Summer 2025
  • Mixture-of-Experts - first diggin'
  • I am done my Thesis, so what's next?
  • I use Arch, btw