playbook/antigravity-awesome-skills/skills/hugging-face-model-trainer/references/unsloth.md

# Unsloth: Fast Fine-Tuning with Memory Optimization

**Unsloth** is a fine-tuning library that provides ~2x faster training and ~60% less VRAM usage for LLM training. It's particularly useful when working with limited GPU memory or when speed is critical.

- **GitHub**: [unslothai/unsloth](https://github.com/unslothai/unsloth)
- **Docs**: [unsloth.ai/docs](https://unsloth.ai/docs)

## When to Use Unsloth

Use Unsloth if instructed to do so, or one of the following use cases applies:

| Use Case | Recommendation |
|----------|----------------|
| Standard text LLM fine-tuning | TRL is sufficient, but Unsloth is faster |
| Limited GPU memory | **Use Unsloth** - 60% less VRAM |
| Need maximum speed | **Use Unsloth** - 2x faster |
| Large models (>13B) | **Use Unsloth** - memory efficiency critical |

## Supported Models

Unsloth supports many popular models including:
- **Text LLMs**: Llama 3/3.1/3.2/3.3, Qwen 2.5/3, Mistral, Phi-4, Gemma 2/3, LFM2/2.5
- **Vision LLMs**: Qwen3-VL, Gemma 3, Llama 3.2 Vision, Pixtral

Use Unsloth's pre-optimized model variants when available:
```python
# Unsloth-optimized models load faster and use less memory
model_id = "unsloth/LFM2.5-1.2B-Instruct"      # 4-bit quantized
model_id = "unsloth/gemma-3-4b-pt"            # Vision model
model_id = "unsloth/Qwen3-VL-8B-Instruct"     # Vision model
```

## Installation

```python
# /// script
# dependencies = [
#     "unsloth",
#     "trl",
#     "datasets",
#     "trackio",
# ]
# ///
```

## Basic Usage: Text LLM

```python
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="LiquidAI/LFM2.5-1.2B-Instruct",
    max_seq_length=4096,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "in_proj", "w1", "w2", "w3"],
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# Load dataset
dataset = load_dataset("trl-lib/Capybara", split="train")

# Train with TRL
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=SFTConfig(
        output_dir="./output",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        max_steps=500,
        learning_rate=2e-4,
        report_to="trackio",
    ),
)

trainer.train()
```

## LFM2.5 Specific Settings

For LFM2.5 inference, use these recommended generation parameters:

**Instruct models:**
```python
temperature = 0.1
top_k = 50
top_p = 0.1
repetition_penalty = 1.05
```

**Thinking models:**
```python
temperature = 0.05
top_k = 50
repetition_penalty = 1.05
```

## Vision-Language Models (VLMs)

Unsloth provides specialized support for VLMs with `FastVisionModel`:

```python
from unsloth import FastVisionModel, get_chat_template
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

# Load VLM with Unsloth
model, processor = FastVisionModel.from_pretrained(
    "unsloth/gemma-3-4b-pt",  # or "unsloth/Qwen3-VL-8B-Instruct"
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

# Add LoRA for all modalities
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,      # Train vision encoder
    finetune_language_layers=True,    # Train language model
    finetune_attention_modules=True,  # Train attention
    finetune_mlp_modules=True,        # Train MLPs
    r=16,
    lora_alpha=32,
    target_modules="all-linear",
)

# Apply chat template (required for base models)
processor = get_chat_template(processor, "gemma-3")

# Load VLM dataset (with images and messages)
dataset = load_dataset("your-vlm-dataset", split="train", streaming=True)

# Enable training mode
FastVisionModel.for_training(model)

# Train with VLM-specific collator
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=processor.tokenizer,
    data_collator=UnslothVisionDataCollator(model, processor),
    args=SFTConfig(
        output_dir="./vlm-output",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        max_steps=500,
        learning_rate=2e-4,
        # VLM-specific settings
        remove_unused_columns=False,
        dataset_text_field="",
        dataset_kwargs={"skip_prepare_dataset": True},
        report_to="trackio",
    ),
)

trainer.train()
```

## Key Differences from Standard TRL

| Aspect | Standard TRL | Unsloth |
|--------|--------------|---------|
| Model loading | `AutoModelForCausalLM.from_pretrained()` | `FastLanguageModel.from_pretrained()` |
| LoRA setup | `PeftModel` / `LoraConfig` | `FastLanguageModel.get_peft_model()` |
| VLM loading | Limited support | `FastVisionModel.from_pretrained()` |
| VLM collator | Manual | `UnslothVisionDataCollator` |
| Memory usage | Standard | ~60% less |
| Training speed | Standard | ~2x faster |

## VLM Dataset Format

VLM datasets should have:
- `images`: List of PIL images or image paths
- `messages`: Conversation format with image references

```python
{
    "images": [<PIL.Image>, ...],
    "messages": [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe this image"}
        ]},
        {"role": "assistant", "content": "This image shows..."}
    ]
}
```

## Streaming Datasets

For large VLM datasets, use streaming to avoid disk space issues:

```python
dataset = load_dataset(
    "your-vlm-dataset",
    split="train",
    streaming=True,  # Stream from Hub
)

# Must use max_steps with streaming (no epoch-based training)
SFTConfig(max_steps=500, ...)
```

## Saving Models

### Save LoRA Adapter

```python
model.save_pretrained("./adapter")
processor.save_pretrained("./adapter")

# Push to Hub
model.push_to_hub("username/my-vlm-adapter")
processor.push_to_hub("username/my-vlm-adapter")
```

### Merge and Save Full Model

```python
# Merge LoRA weights into base model
model = model.merge_and_unload()

# Save merged model
model.save_pretrained("./merged")
tokenizer.save_pretrained("./merged")
```

### Convert to GGUF

Unsloth models can be converted to GGUF for llama.cpp/Ollama:

```python
# Save in 16-bit for GGUF conversion
model.save_pretrained_gguf("./gguf", tokenizer, quantization_method="f16")

# Or directly quantize
model.save_pretrained_gguf("./gguf", tokenizer, quantization_method="q4_k_m")
```

## Qwen3-VL Specific Settings

For Qwen3-VL models, use these recommended settings:

**Instruct models:**
```python
temperature = 0.7
top_p = 0.8
presence_penalty = 1.5
```

**Thinking models:**
```python
temperature = 1.0
top_p = 0.95
presence_penalty = 0.0
```

## Hardware Requirements

| Model | Min VRAM (Unsloth 4-bit) | Recommended GPU |
|-------|--------------------------|-----------------|
| 2B-4B | 8GB | T4, L4 |
| 7B-8B | 16GB | A10G, L4x4 |
| 13B | 24GB | A10G-large |
| 30B+ | 48GB+ | A100 |

## Example: Full VLM Training Script

See `scripts/unsloth_sft_example.py` for a complete production-ready example that includes:
- Unsloth VLM setup
- Streaming dataset support
- Trackio monitoring
- Hub push
- CLI arguments

Run locally:
```bash
uv run scripts/unsloth_sft_example.py \
    --dataset trl-lib/Capybara \
    --max-steps 500 \
    --output-repo username/my-model
```

Run on HF Jobs:
```python
hf_jobs("uv", {
    "script": "<script content>",
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
```

## See Also

- `scripts/unsloth_sft_example.py` - Complete text LLM training example
- [Unsloth Documentation](https://unsloth.ai/docs)
- [LFM2.5 Guide](https://unsloth.ai/docs/models/tutorials/lfm2.5)
- [Qwen3-VL Guide](https://unsloth.ai/docs/models/qwen3-vl-how-to-run-and-fine-tune)
- [Unsloth GitHub](https://github.com/unslothai/unsloth)