14 KiB

Raw Blame History

Pipeline Options Reference

Guide to configuring model loading and inference using the PretrainedModelOptions parameter in the pipeline() function.

Overview
Basic Options
Model Loading Options
Device and Performance Options
Common Configuration Patterns

Overview

The pipeline() function accepts three parameters:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline(
  'task-name',           // 1. Task type (e.g., 'sentiment-analysis')
  'model-id',            // 2. Model identifier (optional, uses default if null)
  options                // 3. PretrainedModelOptions (optional)
);

The third parameter, options, allows you to configure how the model is loaded and executed.

Available Options

interface PretrainedModelOptions {
  // Progress tracking
  progress_callback?: (info: ProgressInfo) => void;
  
  // Model configuration
  config?: PretrainedConfig;
  
  // Cache and loading
  cache_dir?: string;
  local_files_only?: boolean;
  revision?: string;
  
  // Model-specific settings
  subfolder?: string;
  model_file_name?: string;
  
  // Device and performance
  device?: DeviceType | Record<string, DeviceType>;
  dtype?: DataType | Record<string, DataType>;
  
  // External data format (large models)
  use_external_data_format?: boolean | number | Record<string, boolean | number>;
  
  // ONNX Runtime settings
  session_options?: InferenceSession.SessionOptions;
}

Basic Options

Progress Callback

Track model download and loading progress. Note: Models consist of multiple files (model weights, config, tokenizer, etc.), and each file reports its own progress:

const fileProgress = {};

const pipe = await pipeline('sentiment-analysis', null, {
  progress_callback: (info) => {
    if (info.status === 'progress') {
      fileProgress[info.file] = info.progress;
      console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
    }
    
    if (info.status === 'done') {
      console.log(`✓ ${info.file} complete`);
    }
  }
});

Progress Info Types:

type ProgressInfo = {
  status: 'initiate' | 'download' | 'progress' | 'done' | 'ready';
  name: string;       // Model id or path
  file: string;       // File being processed
  progress?: number;  // Percentage (0-100, only for 'progress' status)
  loaded?: number;    // Bytes downloaded (only for 'progress' status)
  total?: number;     // Total bytes (only for 'progress' status)
};

Example: Browser Loading UI with Multiple Files

const statusDiv = document.getElementById('status');
const progressContainer = document.getElementById('progress-container');
const fileProgressBars = {};

const pipe = await pipeline('image-classification', null, {
  progress_callback: (info) => {
    if (info.status === 'progress') {
      // Create progress bar for each file if not exists
      if (!fileProgressBars[info.file]) {
        const fileDiv = document.createElement('div');
        fileDiv.innerHTML = `
          <div class="file-name">${info.file}</div>
          <div class="progress-bar">
            <div class="progress-fill" style="width: 0%"></div>
          </div>
        `;
        progressContainer.appendChild(fileDiv);
        fileProgressBars[info.file] = fileDiv.querySelector('.progress-fill');
      }
      
      // Update progress bar
      fileProgressBars[info.file].style.width = `${info.progress}%`;
      
      const mb = (info.loaded / 1024 / 1024).toFixed(2);
      const totalMb = (info.total / 1024 / 1024).toFixed(2);
      statusDiv.textContent = `${info.file}: ${mb}/${totalMb} MB`;
    }
    
    if (info.status === 'ready') {
      statusDiv.textContent = 'Model ready!';
    }
  }
});

For more progress tracking examples, see the examples in this section above.

Custom Configuration

Override the model's default configuration:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('text-generation', 'model-id', {
  config: {
    max_length: 512,
    temperature: 0.8,
    // ... other config options
  }
});

Use cases:

Override default generation parameters
Adjust model-specific settings
Test different configurations without modifying model files

Model Loading Options

Cache Directory

Specify where to cache downloaded models:

// Node.js: Custom cache location
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  cache_dir: './my-custom-cache'
});

Default behavior:

If not specified, uses env.cacheDir (default: ./.cache)
Only applies when env.useFSCache = true (Node.js)
Browser cache uses Cache API (configured via env.cacheKey)

Local Files Only

Prevent any network requests:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  local_files_only: true
});

Use cases:

Offline applications
Air-gapped environments
Testing with pre-downloaded models
Production deployments with bundled models

Important:

Model must already be cached or available locally
Throws error if model not found locally
Requires env.allowLocalModels = true

Model Revision

Specify a specific model version (git branch, tag, or commit):

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  revision: 'v1.0.0'  // Use specific version
});

// Or use a branch
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  revision: 'experimental'
});

// Or use a commit hash
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  revision: 'abc123def456'
});

Default: 'main' (latest version)

Use cases:

Pin to stable release for production
Test experimental features
Reproduce results with specific model version
Work with models under development

Important:

Only applies to remote models (Hugging Face Hub)
Ignored for local file paths
Each revision is cached separately

Model Subfolder

Specify the subfolder within the model repository:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  subfolder: 'onnx'  // Default: 'onnx'
});

Default: 'onnx'

Use cases:

Custom model repository structure
Multiple model variants in same repo
Organizational preferences

Model File Name

Specify a custom model file name (without .onnx extension):

const pipe = await pipeline('text-generation', 'model-id', {
  model_file_name: 'decoder_model_merged'
});
// Loads: decoder_model_merged.onnx

Use cases:

Models with non-standard file names
Select specific model variant
Encoder-decoder models with separate files

Note: Currently only valid for encoder-only or decoder-only models.

Device and Performance Options

Device Selection

Choose where to run the model:

// Run on CPU (WASM - default)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'wasm'
});

// Run on GPU (WebGPU)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'webgpu'
});

Common devices:

'wasm' - WebAssembly (CPU, most compatible)
'webgpu' - WebGPU (GPU, faster in browsers)
'cpu' - CPU
'gpu' - Auto-detect GPU
'cuda' - NVIDIA CUDA (Node.js with GPU)

See the full list in the devices.js source.

Per-component device selection:

For models with multiple components (encoder-decoder, vision-encoder-decoder, etc.):

const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
  device: {
    encoder: 'webgpu',    // Run encoder on GPU
    decoder: 'wasm'       // Run decoder on CPU
  }
});

WebGPU Requirements:

Chrome/Edge 113+
Enable chrome://flags/#enable-unsafe-webgpu (if needed)
Adequate GPU memory

Data Type (Quantization)

Control model precision and size:

// Full precision (largest, most accurate)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'fp32'
});

// Half precision (balanced)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'fp16'
});

// 8-bit quantization (smaller, faster)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q8'
});

// 4-bit quantization (smallest, fastest)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q4'
});

Common data types:

'fp32' - 32-bit floating point (full precision)
'fp16' - 16-bit floating point (half precision)
'q8' - 8-bit quantized (good balance)
'q4' - 4-bit quantized (maximum compression)
'int8' - 8-bit integer
'uint8' - 8-bit unsigned integer

See the full list in the dtypes.js source.

Per-component data type:

const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
  dtype: {
    encoder: 'fp32',  // Encoder at full precision
    decoder: 'q8'     // Decoder quantized
  }
});

Trade-offs:

Data Type	Model Size	Speed	Accuracy	Use Case
`fp32`	Largest	Slowest	Highest	Research, maximum quality
`fp16`	Medium	Medium	High	Production, GPU inference
`q8`	Small	Fast	Good	Production, CPU inference
`q4`	Smallest	Fastest	Acceptable	Edge devices, real-time apps

External Data Format

For models >= 2GB, ONNX uses external data format:

// Automatically detect and load external data
const pipe = await pipeline('text-generation', 'large-model-id', {
  use_external_data_format: true
});

// Specify number of external data chunks
const pipe = await pipeline('text-generation', 'large-model-id', {
  use_external_data_format: 5  // Load 5 chunks (model.onnx_data_0 to _4)
});

How it works:

Models >= 2GB split weights into separate files
Main file: model.onnx (structure only)
Data files: model.onnx_data or model.onnx_data_0, model.onnx_data_1, etc.

Default behavior:

false - No external data (models < 2GB)
true - Load external data automatically
number - Load this many external data chunks

Maximum chunks: 100 (defined by MAX_EXTERNAL_DATA_CHUNKS)

Per-component external data:

const pipe = await pipeline('text-generation', 'large-model-id', {
  use_external_data_format: {
    encoder: true,
    decoder: 3  // Decoder has 3 external data chunks
  }
});

Session Options

Advanced ONNX Runtime configuration:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  session_options: {
    executionProviders: ['webgpu', 'wasm'],
    graphOptimizationLevel: 'all',
    enableCpuMemArena: true,
    enableMemPattern: true,
    executionMode: 'sequential',
    logSeverityLevel: 2,
    logVerbosityLevel: 0
  }
});

Common session options:

Option	Description	Default
`executionProviders`	Ordered list of execution providers	`['wasm']`
`graphOptimizationLevel`	Graph optimization: `'disabled'`, `'basic'`, `'extended'`, `'all'`	`'all'`
`enableCpuMemArena`	Enable CPU memory arena for faster memory allocation	`true`
`enableMemPattern`	Enable memory pattern optimization	`true`
`executionMode`	`'sequential'` or `'parallel'`	`'sequential'`
`logSeverityLevel`	0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal	`2`
`freeDimensionOverrides`	Override dynamic dimensions (e.g., `{ batch_size: 1 }`)	-

Use cases:

Fine-tune performance for specific hardware
Debug model execution issues
Override dynamic shapes
Control memory usage

Common Configuration Patterns

Development

Fast iteration with progress tracking:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('sentiment-analysis', null, {
  progress_callback: (info) => {
    if (info.status === 'progress') {
      console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
    }
  }
});

Production (GPU)

Use WebGPU with fp16 for better performance:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'webgpu',
  dtype: 'fp16'
});

Production (CPU)

Use quantization for smaller size and faster CPU inference:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q8'  // or 'q4' for even smaller
});

Offline/Local

Prevent network requests, use only local models:

import { pipeline, env } from '@huggingface/transformers';

env.allowLocalModels = true;
env.localModelPath = './models/';

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  local_files_only: true
});

Per-Component Settings

For encoder-decoder models, configure each component separately:

const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
  device: {
    encoder: 'webgpu',
    decoder: 'wasm'
  },
  dtype: {
    encoder: 'fp16',
    decoder: 'q8'
  }
});

Configuration Reference - Environment configuration with env object
Text Generation Guide - Text generation options and streaming
Model Architectures - Supported models and selection tips
Main Skill Guide - Getting started with Transformers.js

Best Practices

Progress Callbacks: Use progress_callback for large models to show download progress
Quantization: Use q8 or q4 for CPU inference to reduce size and improve speed
Device Selection: Use webgpu for better performance when available
Offline-First: Use local_files_only: true in production to avoid runtime downloads
Version Pinning: Use revision to pin model versions for reproducible deployments
Memory Management: Always dispose pipelines with pipe.dispose() when done

This document covers all available options for the pipeline() function. For environment-level configuration (remote hosts, global cache settings, WASM paths), see the Configuration Reference.

14 KiB Raw Blame History

Pipeline Options Reference

Table of Contents

Overview

Available Options

Basic Options

Progress Callback

Custom Configuration

Model Loading Options

Cache Directory

Local Files Only

Model Revision

Model Subfolder

Model File Name

Device and Performance Options

Device Selection

Data Type (Quantization)

External Data Format

Session Options

Common Configuration Patterns

Development

Production (GPU)

Production (CPU)

Offline/Local

Per-Component Settings

Related Documentation

Best Practices

14 KiB

Raw Blame History