playbook/antigravity-awesome-skills/skills/transformers-js/references/PIPELINE_OPTIONS.md

14 KiB

Pipeline Options Reference

Guide to configuring model loading and inference using the PretrainedModelOptions parameter in the pipeline() function.

Table of Contents

  1. Overview
  2. Basic Options
  3. Model Loading Options
  4. Device and Performance Options
  5. Common Configuration Patterns

Overview

The pipeline() function accepts three parameters:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline(
  'task-name',           // 1. Task type (e.g., 'sentiment-analysis')
  'model-id',            // 2. Model identifier (optional, uses default if null)
  options                // 3. PretrainedModelOptions (optional)
);

The third parameter, options, allows you to configure how the model is loaded and executed.

Available Options

interface PretrainedModelOptions {
  // Progress tracking
  progress_callback?: (info: ProgressInfo) => void;
  
  // Model configuration
  config?: PretrainedConfig;
  
  // Cache and loading
  cache_dir?: string;
  local_files_only?: boolean;
  revision?: string;
  
  // Model-specific settings
  subfolder?: string;
  model_file_name?: string;
  
  // Device and performance
  device?: DeviceType | Record<string, DeviceType>;
  dtype?: DataType | Record<string, DataType>;
  
  // External data format (large models)
  use_external_data_format?: boolean | number | Record<string, boolean | number>;
  
  // ONNX Runtime settings
  session_options?: InferenceSession.SessionOptions;
}

Basic Options

Progress Callback

Track model download and loading progress. Note: Models consist of multiple files (model weights, config, tokenizer, etc.), and each file reports its own progress:

const fileProgress = {};

const pipe = await pipeline('sentiment-analysis', null, {
  progress_callback: (info) => {
    if (info.status === 'progress') {
      fileProgress[info.file] = info.progress;
      console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
    }
    
    if (info.status === 'done') {
      console.log(`✓ ${info.file} complete`);
    }
  }
});

Progress Info Types:

type ProgressInfo = {
  status: 'initiate' | 'download' | 'progress' | 'done' | 'ready';
  name: string;       // Model id or path
  file: string;       // File being processed
  progress?: number;  // Percentage (0-100, only for 'progress' status)
  loaded?: number;    // Bytes downloaded (only for 'progress' status)
  total?: number;     // Total bytes (only for 'progress' status)
};

Example: Browser Loading UI with Multiple Files

const statusDiv = document.getElementById('status');
const progressContainer = document.getElementById('progress-container');
const fileProgressBars = {};

const pipe = await pipeline('image-classification', null, {
  progress_callback: (info) => {
    if (info.status === 'progress') {
      // Create progress bar for each file if not exists
      if (!fileProgressBars[info.file]) {
        const fileDiv = document.createElement('div');
        fileDiv.innerHTML = `
          <div class="file-name">${info.file}</div>
          <div class="progress-bar">
            <div class="progress-fill" style="width: 0%"></div>
          </div>
        `;
        progressContainer.appendChild(fileDiv);
        fileProgressBars[info.file] = fileDiv.querySelector('.progress-fill');
      }
      
      // Update progress bar
      fileProgressBars[info.file].style.width = `${info.progress}%`;
      
      const mb = (info.loaded / 1024 / 1024).toFixed(2);
      const totalMb = (info.total / 1024 / 1024).toFixed(2);
      statusDiv.textContent = `${info.file}: ${mb}/${totalMb} MB`;
    }
    
    if (info.status === 'ready') {
      statusDiv.textContent = 'Model ready!';
    }
  }
});

For more progress tracking examples, see the examples in this section above.

Custom Configuration

Override the model's default configuration:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('text-generation', 'model-id', {
  config: {
    max_length: 512,
    temperature: 0.8,
    // ... other config options
  }
});

Use cases:

  • Override default generation parameters
  • Adjust model-specific settings
  • Test different configurations without modifying model files

Model Loading Options

Cache Directory

Specify where to cache downloaded models:

// Node.js: Custom cache location
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  cache_dir: './my-custom-cache'
});

Default behavior:

  • If not specified, uses env.cacheDir (default: ./.cache)
  • Only applies when env.useFSCache = true (Node.js)
  • Browser cache uses Cache API (configured via env.cacheKey)

Local Files Only

Prevent any network requests:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  local_files_only: true
});

Use cases:

  • Offline applications
  • Air-gapped environments
  • Testing with pre-downloaded models
  • Production deployments with bundled models

Important:

  • Model must already be cached or available locally
  • Throws error if model not found locally
  • Requires env.allowLocalModels = true

Model Revision

Specify a specific model version (git branch, tag, or commit):

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  revision: 'v1.0.0'  // Use specific version
});

// Or use a branch
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  revision: 'experimental'
});

// Or use a commit hash
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  revision: 'abc123def456'
});

Default: 'main' (latest version)

Use cases:

  • Pin to stable release for production
  • Test experimental features
  • Reproduce results with specific model version
  • Work with models under development

Important:

  • Only applies to remote models (Hugging Face Hub)
  • Ignored for local file paths
  • Each revision is cached separately

Model Subfolder

Specify the subfolder within the model repository:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  subfolder: 'onnx'  // Default: 'onnx'
});

Default: 'onnx'

Use cases:

  • Custom model repository structure
  • Multiple model variants in same repo
  • Organizational preferences

Model File Name

Specify a custom model file name (without .onnx extension):

const pipe = await pipeline('text-generation', 'model-id', {
  model_file_name: 'decoder_model_merged'
});
// Loads: decoder_model_merged.onnx

Use cases:

  • Models with non-standard file names
  • Select specific model variant
  • Encoder-decoder models with separate files

Note: Currently only valid for encoder-only or decoder-only models.

Device and Performance Options

Device Selection

Choose where to run the model:

// Run on CPU (WASM - default)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'wasm'
});

// Run on GPU (WebGPU)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'webgpu'
});

Common devices:

  • 'wasm' - WebAssembly (CPU, most compatible)
  • 'webgpu' - WebGPU (GPU, faster in browsers)
  • 'cpu' - CPU
  • 'gpu' - Auto-detect GPU
  • 'cuda' - NVIDIA CUDA (Node.js with GPU)

See the full list in the devices.js source.

Per-component device selection:

For models with multiple components (encoder-decoder, vision-encoder-decoder, etc.):

const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
  device: {
    encoder: 'webgpu',    // Run encoder on GPU
    decoder: 'wasm'       // Run decoder on CPU
  }
});

WebGPU Requirements:

  • Chrome/Edge 113+
  • Enable chrome://flags/#enable-unsafe-webgpu (if needed)
  • Adequate GPU memory

Data Type (Quantization)

Control model precision and size:

// Full precision (largest, most accurate)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'fp32'
});

// Half precision (balanced)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'fp16'
});

// 8-bit quantization (smaller, faster)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q8'
});

// 4-bit quantization (smallest, fastest)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q4'
});

Common data types:

  • 'fp32' - 32-bit floating point (full precision)
  • 'fp16' - 16-bit floating point (half precision)
  • 'q8' - 8-bit quantized (good balance)
  • 'q4' - 4-bit quantized (maximum compression)
  • 'int8' - 8-bit integer
  • 'uint8' - 8-bit unsigned integer

See the full list in the dtypes.js source.

Per-component data type:

const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
  dtype: {
    encoder: 'fp32',  // Encoder at full precision
    decoder: 'q8'     // Decoder quantized
  }
});

Trade-offs:

Data Type Model Size Speed Accuracy Use Case
fp32 Largest Slowest Highest Research, maximum quality
fp16 Medium Medium High Production, GPU inference
q8 Small Fast Good Production, CPU inference
q4 Smallest Fastest Acceptable Edge devices, real-time apps

External Data Format

For models >= 2GB, ONNX uses external data format:

// Automatically detect and load external data
const pipe = await pipeline('text-generation', 'large-model-id', {
  use_external_data_format: true
});

// Specify number of external data chunks
const pipe = await pipeline('text-generation', 'large-model-id', {
  use_external_data_format: 5  // Load 5 chunks (model.onnx_data_0 to _4)
});

How it works:

  • Models >= 2GB split weights into separate files
  • Main file: model.onnx (structure only)
  • Data files: model.onnx_data or model.onnx_data_0, model.onnx_data_1, etc.

Default behavior:

  • false - No external data (models < 2GB)
  • true - Load external data automatically
  • number - Load this many external data chunks

Maximum chunks: 100 (defined by MAX_EXTERNAL_DATA_CHUNKS)

Per-component external data:

const pipe = await pipeline('text-generation', 'large-model-id', {
  use_external_data_format: {
    encoder: true,
    decoder: 3  // Decoder has 3 external data chunks
  }
});

Session Options

Advanced ONNX Runtime configuration:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  session_options: {
    executionProviders: ['webgpu', 'wasm'],
    graphOptimizationLevel: 'all',
    enableCpuMemArena: true,
    enableMemPattern: true,
    executionMode: 'sequential',
    logSeverityLevel: 2,
    logVerbosityLevel: 0
  }
});

Common session options:

Option Description Default
executionProviders Ordered list of execution providers ['wasm']
graphOptimizationLevel Graph optimization: 'disabled', 'basic', 'extended', 'all' 'all'
enableCpuMemArena Enable CPU memory arena for faster memory allocation true
enableMemPattern Enable memory pattern optimization true
executionMode 'sequential' or 'parallel' 'sequential'
logSeverityLevel 0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal 2
freeDimensionOverrides Override dynamic dimensions (e.g., { batch_size: 1 }) -

Use cases:

  • Fine-tune performance for specific hardware
  • Debug model execution issues
  • Override dynamic shapes
  • Control memory usage

Common Configuration Patterns

Development

Fast iteration with progress tracking:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('sentiment-analysis', null, {
  progress_callback: (info) => {
    if (info.status === 'progress') {
      console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
    }
  }
});

Production (GPU)

Use WebGPU with fp16 for better performance:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'webgpu',
  dtype: 'fp16'
});

Production (CPU)

Use quantization for smaller size and faster CPU inference:

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q8'  // or 'q4' for even smaller
});

Offline/Local

Prevent network requests, use only local models:

import { pipeline, env } from '@huggingface/transformers';

env.allowLocalModels = true;
env.localModelPath = './models/';

const pipe = await pipeline('sentiment-analysis', 'model-id', {
  local_files_only: true
});

Per-Component Settings

For encoder-decoder models, configure each component separately:

const pipe = await pipeline('automatic-speech-recognition', 'model-id', {
  device: {
    encoder: 'webgpu',
    decoder: 'wasm'
  },
  dtype: {
    encoder: 'fp16',
    decoder: 'q8'
  }
});

Best Practices

  1. Progress Callbacks: Use progress_callback for large models to show download progress
  2. Quantization: Use q8 or q4 for CPU inference to reduce size and improve speed
  3. Device Selection: Use webgpu for better performance when available
  4. Offline-First: Use local_files_only: true in production to avoid runtime downloads
  5. Version Pinning: Use revision to pin model versions for reproducible deployments
  6. Memory Management: Always dispose pipelines with pipe.dispose() when done

This document covers all available options for the pipeline() function. For environment-level configuration (remote hosts, global cache settings, WASM paths), see the Configuration Reference.