7.8 KiB

Raw Blame History

Unsloth: Fast Fine-Tuning with Memory Optimization

Unsloth is a fine-tuning library that provides ~2x faster training and ~60% less VRAM usage for LLM training. It's particularly useful when working with limited GPU memory or when speed is critical.

GitHub: unslothai/unsloth
Docs: unsloth.ai/docs

When to Use Unsloth

Use Unsloth if instructed to do so, or one of the following use cases applies:

Use Case	Recommendation
Standard text LLM fine-tuning	TRL is sufficient, but Unsloth is faster
Limited GPU memory	Use Unsloth - 60% less VRAM
Need maximum speed	Use Unsloth - 2x faster
Large models (>13B)	Use Unsloth - memory efficiency critical

Supported Models

Unsloth supports many popular models including:

Text LLMs: Llama 3/3.1/3.2/3.3, Qwen 2.5/3, Mistral, Phi-4, Gemma 2/3, LFM2/2.5
Vision LLMs: Qwen3-VL, Gemma 3, Llama 3.2 Vision, Pixtral

Use Unsloth's pre-optimized model variants when available:

# Unsloth-optimized models load faster and use less memory
model_id = "unsloth/LFM2.5-1.2B-Instruct"      # 4-bit quantized
model_id = "unsloth/gemma-3-4b-pt"            # Vision model
model_id = "unsloth/Qwen3-VL-8B-Instruct"     # Vision model

Installation

# /// script
# dependencies = [
#     "unsloth",
#     "trl",
#     "datasets",
#     "trackio",
# ]
# ///

Basic Usage: Text LLM

from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="LiquidAI/LFM2.5-1.2B-Instruct",
    max_seq_length=4096,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "in_proj", "w1", "w2", "w3"],
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# Load dataset
dataset = load_dataset("trl-lib/Capybara", split="train")

# Train with TRL
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=SFTConfig(
        output_dir="./output",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        max_steps=500,
        learning_rate=2e-4,
        report_to="trackio",
    ),
)

trainer.train()

LFM2.5 Specific Settings

For LFM2.5 inference, use these recommended generation parameters:

Instruct models:

temperature = 0.1
top_k = 50
top_p = 0.1
repetition_penalty = 1.05

Thinking models:

temperature = 0.05
top_k = 50
repetition_penalty = 1.05

Vision-Language Models (VLMs)

Unsloth provides specialized support for VLMs with FastVisionModel:

from unsloth import FastVisionModel, get_chat_template
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

# Load VLM with Unsloth
model, processor = FastVisionModel.from_pretrained(
    "unsloth/gemma-3-4b-pt",  # or "unsloth/Qwen3-VL-8B-Instruct"
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

# Add LoRA for all modalities
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,      # Train vision encoder
    finetune_language_layers=True,    # Train language model
    finetune_attention_modules=True,  # Train attention
    finetune_mlp_modules=True,        # Train MLPs
    r=16,
    lora_alpha=32,
    target_modules="all-linear",
)

# Apply chat template (required for base models)
processor = get_chat_template(processor, "gemma-3")

# Load VLM dataset (with images and messages)
dataset = load_dataset("your-vlm-dataset", split="train", streaming=True)

# Enable training mode
FastVisionModel.for_training(model)

# Train with VLM-specific collator
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=processor.tokenizer,
    data_collator=UnslothVisionDataCollator(model, processor),
    args=SFTConfig(
        output_dir="./vlm-output",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        max_steps=500,
        learning_rate=2e-4,
        # VLM-specific settings
        remove_unused_columns=False,
        dataset_text_field="",
        dataset_kwargs={"skip_prepare_dataset": True},
        report_to="trackio",
    ),
)

trainer.train()

Key Differences from Standard TRL

Aspect	Standard TRL	Unsloth
Model loading	`AutoModelForCausalLM.from_pretrained()`	`FastLanguageModel.from_pretrained()`
LoRA setup	`PeftModel` / `LoraConfig`	`FastLanguageModel.get_peft_model()`
VLM loading	Limited support	`FastVisionModel.from_pretrained()`
VLM collator	Manual	`UnslothVisionDataCollator`
Memory usage	Standard	~60% less
Training speed	Standard	~2x faster

VLM Dataset Format

VLM datasets should have:

images: List of PIL images or image paths
messages: Conversation format with image references

{
    "images": [<PIL.Image>, ...],
    "messages": [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe this image"}
        ]},
        {"role": "assistant", "content": "This image shows..."}
    ]
}

Streaming Datasets

For large VLM datasets, use streaming to avoid disk space issues:

dataset = load_dataset(
    "your-vlm-dataset",
    split="train",
    streaming=True,  # Stream from Hub
)

# Must use max_steps with streaming (no epoch-based training)
SFTConfig(max_steps=500, ...)

Saving Models

Save LoRA Adapter

model.save_pretrained("./adapter")
processor.save_pretrained("./adapter")

# Push to Hub
model.push_to_hub("username/my-vlm-adapter")
processor.push_to_hub("username/my-vlm-adapter")

Merge and Save Full Model

# Merge LoRA weights into base model
model = model.merge_and_unload()

# Save merged model
model.save_pretrained("./merged")
tokenizer.save_pretrained("./merged")

Convert to GGUF

Unsloth models can be converted to GGUF for llama.cpp/Ollama:

# Save in 16-bit for GGUF conversion
model.save_pretrained_gguf("./gguf", tokenizer, quantization_method="f16")

# Or directly quantize
model.save_pretrained_gguf("./gguf", tokenizer, quantization_method="q4_k_m")

Qwen3-VL Specific Settings

For Qwen3-VL models, use these recommended settings: