2.0 KiB
2.0 KiB
Usage Examples
This document provides practical examples for running evaluations locally against Hugging Face Hub models.
What this skill covers
inspect-ailocal runsinspect-aiwithvllmor Transformers backendslightevallocal runs withvllmoraccelerate- smoke tests and backend fallback patterns
What this skill does NOT cover
model-index.eval_results- community eval publication workflows
- model-card PR creation
- Hugging Face Jobs orchestration
If you want to run these same scripts remotely, use the hugging-face-jobs skill and pass one of the scripts in scripts/.
Setup
cd skills/hugging-face-evaluation
export HF_TOKEN=hf_xxx
uv --version
For local GPU runs:
nvidia-smi
inspect-ai examples
Quick smoke test
uv run scripts/inspect_eval_uv.py \
--model meta-llama/Llama-3.2-1B \
--task mmlu \
--limit 10
Local GPU with vLLM
uv run scripts/inspect_vllm_uv.py \
--model meta-llama/Llama-3.2-8B-Instruct \
--task gsm8k \
--limit 20
Transformers fallback
uv run scripts/inspect_vllm_uv.py \
--model microsoft/phi-2 \
--task mmlu \
--backend hf \
--trust-remote-code \
--limit 20
lighteval examples
Single task
uv run scripts/lighteval_vllm_uv.py \
--model meta-llama/Llama-3.2-3B-Instruct \
--tasks "leaderboard|mmlu|5" \
--max-samples 20
Multiple tasks
uv run scripts/lighteval_vllm_uv.py \
--model meta-llama/Llama-3.2-3B-Instruct \
--tasks "leaderboard|mmlu|5,leaderboard|gsm8k|5" \
--max-samples 20 \
--use-chat-template
accelerate fallback
uv run scripts/lighteval_vllm_uv.py \
--model microsoft/phi-2 \
--tasks "leaderboard|mmlu|5" \
--backend accelerate \
--trust-remote-code \
--max-samples 20
Hand-off to Hugging Face Jobs
When local hardware is not enough, switch to the hugging-face-jobs skill and run one of these scripts remotely. Keep the script path and args; move the orchestration there.