14 KiB
Pipeline Options Reference
Guide to configuring model loading and inference using the PretrainedModelOptions parameter in the pipeline() function.
Table of Contents
- Overview
- Basic Options
- Model Loading Options
- Device and Performance Options
- Common Configuration Patterns
Overview
The pipeline() function accepts three parameters:
import { pipeline } from '@huggingface/transformers';
const pipe = await pipeline(
'task-name', // 1. Task type (e.g., 'sentiment-analysis')
'model-id', // 2. Model identifier (optional, uses default if null)
options // 3. PretrainedModelOptions (optional)
);
The third parameter, options, allows you to configure how the model is loaded and executed.
Available Options
interface PretrainedModelOptions {
// Progress tracking
progress_callback?: (info: ProgressInfo) => void;
// Model configuration
config?: PretrainedConfig;
// Cache and loading
cache_dir?: string;
local_files_only?: boolean;
revision?: string;
// Model-specific settings
subfolder?: string;
model_file_name?: string;
// Device and performance
device?: DeviceType | Record<string, DeviceType>;
dtype?: DataType | Record<string, DataType>;
// External data format (large models)
use_external_data_format?: boolean | number | Record<string, boolean | number>;
// ONNX Runtime settings
session_options?: InferenceSession.SessionOptions;
}
Basic Options
Progress Callback
Track model download and loading progress. Note: Models consist of multiple files (model weights, config, tokenizer, etc.), and each file reports its own progress:
const fileProgress = {};
const pipe = await pipeline('sentiment-analysis', null, {
progress_callback: (info) => {
if (info.status === 'progress') {
fileProgress[info.file] = info.progress;
console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
}
if (info.status === 'done') {
console.log(`✓ ${info.file} complete`);
}
}
});
Progress Info Types:
type ProgressInfo = {
status: 'initiate' | 'download' | 'progress' | 'done' | 'ready';
name: string; // Model id or path
file: string; // File being processed
progress?: number; // Percentage (0-100, only for 'progress' status)
loaded?: number; // Bytes downloaded (only for 'progress' status)
total?: number; // Total bytes (only for 'progress' status)
};
Example: Browser Loading UI with Multiple Files
const statusDiv = document.getElementById('status');
const progressContainer = document.getElementById('progress-container');
const fileProgressBars = {};
const pipe = await pipeline('image-classification', null, {
progress_callback: (info) => {
if (info.status === 'progress') {
// Create progress bar for each file if not exists
if (!fileProgressBars[info.file]) {
const fileDiv = document.createElement('div');
fileDiv.innerHTML = `
<div class="file-name">${info.file}</div>
<div class="progress-bar">
<div class="progress-fill" style="width: 0%"></div>
</div>
`;
progressContainer.appendChild(fileDiv);
fileProgressBars[info.file] = fileDiv.querySelector('.progress-fill');
}
// Update progress bar
fileProgressBars[info.file].style.width = `${info.progress}%`;
const mb = (info.loaded / 1024 / 1024).toFixed(2);
const totalMb = (info.total / 1024 / 1024).toFixed(2);
statusDiv.textContent = `${info.file}: ${mb}/${totalMb} MB`;
}
if (info.status === 'ready') {
statusDiv.textContent = 'Model ready!';
}
}
});
For more progress tracking examples, see the examples in this section above.
Custom Configuration
Override the model's default configuration:
import { pipeline } from '@huggingface/transformers';
const pipe = await pipeline('text-generation', 'model-id', {
config: {
max_length: 512,
temperature: 0.8,
// ... other config options
}
});
Use cases:
- Override default generation parameters
- Adjust model-specific settings
- Test different configurations without modifying model files
Model Loading Options
Cache Directory
Specify where to cache downloaded models:
// Node.js: Custom cache location
const pipe = await pipeline('sentiment-analysis', 'model-id', {
cache_dir: './my-custom-cache'
});
Default behavior:
- If not specified, uses
env.cacheDir(default:./.cache) - Only applies when
env.useFSCache = true(Node.js) - Browser cache uses Cache API (configured via
env.cacheKey)
Local Files Only
Prevent any network requests:
const pipe = await pipeline('sentiment-analysis', 'model-id', {
local_files_only: true
});
Use cases:
- Offline applications
- Air-gapped environments
- Testing with pre-downloaded models
- Production deployments with bundled models
Important:
- Model must already be cached or available locally
- Throws error if model not found locally
- Requires
env.allowLocalModels = true
Model Revision
Specify a specific model version (git branch, tag, or commit):
const pipe = await pipeline('sentiment-analysis', 'model-id', {
revision: 'v1.0.0' // Use specific version
});
// Or use a branch
const pipe = await pipeline('sentiment-analysis', 'model-id', {
revision: 'experimental'
});
// Or use a commit hash
const pipe = await pipeline('sentiment-analysis', 'model-id', {
revision: 'abc123def456'
});
Default: 'main' (latest version)
Use cases:
- Pin to stable release for production
- Test experimental features
- Reproduce results with specific model version
- Work with models under development
Important:
- Only applies to remote models (Hugging Face Hub)
- Ignored for local file paths
- Each revision is cached separately
Model Subfolder
Specify the subfolder within the model repository:
const pipe = await pipeline('sentiment-analysis', 'model-id', {
subfolder: 'onnx' // Default: 'onnx'
});
Default: 'onnx'
Use cases:
- Custom model repository structure
- Multiple model variants in same repo
- Organizational preferences
Model File Name
Specify a custom model file name (without .onnx extension):
const pipe = await pipeline('text-generation', 'model-id', {
model_file_name: 'decoder_model_merged'
});
// Loads: decoder_model_merged.onnx
Use cases:
- Models with non-standard file names
- Select specific model variant
- Encoder-decoder models with separate files
Note: Currently only valid for encoder-only or decoder-only models.
Device and Performance Options
Device Selection
Choose where to run the model:
// Run on CPU (WASM - default)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
device: 'wasm'
});
// Run on GPU (WebGPU)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
device: 'webgpu'
});
Common devices:
'wasm'- WebAssembly (CPU, most compatible)'webgpu'- WebGPU (GPU, faster in browsers)'cpu'- CPU'gpu'- Auto-detect GPU'cuda'- NVIDIA CUDA (Node.js with GPU)
See the full list in the devices.js source.
Per-component device selection:
For models with multiple components (encoder-decoder, vision-encoder-decoder, etc.):
const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
device: {
encoder: 'webgpu', // Run encoder on GPU
decoder: 'wasm' // Run decoder on CPU
}
});
WebGPU Requirements:
- Chrome/Edge 113+
- Enable chrome://flags/#enable-unsafe-webgpu (if needed)
- Adequate GPU memory
Data Type (Quantization)
Control model precision and size:
// Full precision (largest, most accurate)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
dtype: 'fp32'
});
// Half precision (balanced)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
dtype: 'fp16'
});
// 8-bit quantization (smaller, faster)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
dtype: 'q8'
});
// 4-bit quantization (smallest, fastest)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
dtype: 'q4'
});
Common data types:
'fp32'- 32-bit floating point (full precision)'fp16'- 16-bit floating point (half precision)'q8'- 8-bit quantized (good balance)'q4'- 4-bit quantized (maximum compression)'int8'- 8-bit integer'uint8'- 8-bit unsigned integer
See the full list in the dtypes.js source.
Per-component data type:
const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
dtype: {
encoder: 'fp32', // Encoder at full precision
decoder: 'q8' // Decoder quantized
}
});
Trade-offs:
| Data Type | Model Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
fp32 |
Largest | Slowest | Highest | Research, maximum quality |
fp16 |
Medium | Medium | High | Production, GPU inference |
q8 |
Small | Fast | Good | Production, CPU inference |
q4 |
Smallest | Fastest | Acceptable | Edge devices, real-time apps |
External Data Format
For models >= 2GB, ONNX uses external data format:
// Automatically detect and load external data
const pipe = await pipeline('text-generation', 'large-model-id', {
use_external_data_format: true
});
// Specify number of external data chunks
const pipe = await pipeline('text-generation', 'large-model-id', {
use_external_data_format: 5 // Load 5 chunks (model.onnx_data_0 to _4)
});
How it works:
- Models >= 2GB split weights into separate files
- Main file:
model.onnx(structure only) - Data files:
model.onnx_dataormodel.onnx_data_0,model.onnx_data_1, etc.
Default behavior:
false- No external data (models < 2GB)true- Load external data automaticallynumber- Load this many external data chunks
Maximum chunks: 100 (defined by MAX_EXTERNAL_DATA_CHUNKS)
Per-component external data:
const pipe = await pipeline('text-generation', 'large-model-id', {
use_external_data_format: {
encoder: true,
decoder: 3 // Decoder has 3 external data chunks
}
});
Session Options
Advanced ONNX Runtime configuration:
const pipe = await pipeline('sentiment-analysis', 'model-id', {
session_options: {
executionProviders: ['webgpu', 'wasm'],
graphOptimizationLevel: 'all',
enableCpuMemArena: true,
enableMemPattern: true,
executionMode: 'sequential',
logSeverityLevel: 2,
logVerbosityLevel: 0
}
});
Common session options:
| Option | Description | Default |
|---|---|---|
executionProviders |
Ordered list of execution providers | ['wasm'] |
graphOptimizationLevel |
Graph optimization: 'disabled', 'basic', 'extended', 'all' |
'all' |
enableCpuMemArena |
Enable CPU memory arena for faster memory allocation | true |
enableMemPattern |
Enable memory pattern optimization | true |
executionMode |
'sequential' or 'parallel' |
'sequential' |
logSeverityLevel |
0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal | 2 |
freeDimensionOverrides |
Override dynamic dimensions (e.g., { batch_size: 1 }) |
- |
Use cases:
- Fine-tune performance for specific hardware
- Debug model execution issues
- Override dynamic shapes
- Control memory usage
Common Configuration Patterns
Development
Fast iteration with progress tracking:
import { pipeline } from '@huggingface/transformers';
const pipe = await pipeline('sentiment-analysis', null, {
progress_callback: (info) => {
if (info.status === 'progress') {
console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
}
}
});
Production (GPU)
Use WebGPU with fp16 for better performance:
const pipe = await pipeline('sentiment-analysis', 'model-id', {
device: 'webgpu',
dtype: 'fp16'
});
Production (CPU)
Use quantization for smaller size and faster CPU inference:
const pipe = await pipeline('sentiment-analysis', 'model-id', {
dtype: 'q8' // or 'q4' for even smaller
});
Offline/Local
Prevent network requests, use only local models:
import { pipeline, env } from '@huggingface/transformers';
env.allowLocalModels = true;
env.localModelPath = './models/';
const pipe = await pipeline('sentiment-analysis', 'model-id', {
local_files_only: true
});
Per-Component Settings
For encoder-decoder models, configure each component separately:
const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
device: {
encoder: 'webgpu',
decoder: 'wasm'
},
dtype: {
encoder: 'fp16',
decoder: 'q8'
}
});
Related Documentation
- Configuration Reference - Environment configuration with
envobject - Text Generation Guide - Text generation options and streaming
- Model Architectures - Supported models and selection tips
- Main Skill Guide - Getting started with Transformers.js
Best Practices
- Progress Callbacks: Use
progress_callbackfor large models to show download progress - Quantization: Use
q8orq4for CPU inference to reduce size and improve speed - Device Selection: Use
webgpufor better performance when available - Offline-First: Use
local_files_only: truein production to avoid runtime downloads - Version Pinning: Use
revisionto pin model versions for reproducible deployments - Memory Management: Always dispose pipelines with
pipe.dispose()when done
This document covers all available options for the pipeline() function. For environment-level configuration (remote hosts, global cache settings, WASM paths), see the Configuration Reference.