# YouTube reference

Subcommands: `youtube-search-api`, `youtube-video-api`, `youtube-channel-api`, `youtube-transcript-api`. 10 credits each.

`*-search-api` is keyword-driven; the others target a specific video, channel, or transcript. Most workflows chain search → video → transcript.

---

## youtube-search-api

```bash
hasdata youtube-search-api --q "QUERY" [--sort-by relevance|date|views|rating|popularity] \
  [--length under4|between420|plus20] [--date hour|today|week|month|year] \
  [--video-type video|shorts|channel|playlist|movie] \
  [--gl us] [--hl en] [--device-type desktop|mobile] \
  [--pagination-token TOKEN] --raw | jq .
```

Common flags:
- `--q TEXT` (required)
- `--sort-by relevance|date|views|rating|popularity`
- `--date hour|today|week|month|year` — upload-recency window
- `--length under4|between420|plus20` — duration bucket (`<4m`, `4–20m`, `>20m`)
- `--video-type video|shorts|channel|playlist|movie`
- `--filters hd,k4,hdr,subtitles,cc,d3,d360,vr180,live,bought,location` — feature flags ANDed
- `--gl`, `--hl` — country / language
- `--pagination-token` — copy from previous response's `pagination.nextPageToken`
- `--sp` — raw YouTube `sp=` filter token (overrides `sort-by` / `date` / `video-type` / `length` / `filters`)

Top-level response keys: `videoResults`, `shortsResults`, `channelResults`, `playlistResults`, `adsResults`, `sponsoredResults`, `searchInformation`, `pagination`.

Per-video result: `videoId`, `title`, `link`, `channel`, `description`, `length`, `views`, `viewsOriginal`, `publishedDate`, `thumbnail`, `positionOnPage`.

```bash
# Latest videos for a topic
hasdata youtube-search-api --q "$Q" --sort-by date --date week --raw \
  | jq -c '.videoResults[] | {title, channel: .channel.name, views, publishedDate, link}'
```

## youtube-video-api

```bash
hasdata youtube-video-api --v-param VIDEO_ID [--gl us] [--hl en] --raw | jq .
```

- `--v-param` (required) — 11-char video ID (the `v=` part of the watch URL)
- `--device-type desktop|mobile`
- `--gl`, `--hl`

Top-level fields: `videoId`, `title`, `description`, `channel`, `views`, `extractedViews`, `likes`, `extractedLikes`, `lengthSeconds`, `publishedDate`, `keywords[]`, `captions`, `socialLinks`, `music`, `category`, `thumbnail`, `isFamilySafe`, `isUnlisted`, `relatedVideos[]`, `relatedShorts[]`, `endScreenVideos[]`.

```bash
# Quick stats
hasdata youtube-video-api --v-param "$VID" --raw \
  | jq '{title, views: .extractedViews, likes: .extractedLikes, length: .lengthSeconds, published: .publishedDate, channel: .channel.name}'
```

## youtube-channel-api

```bash
hasdata youtube-channel-api --channel-id "@HANDLE_OR_UCID" \
  [--tab featured|videos|shorts|streams|playlists|posts|community|podcasts|releases|about|store] \
  [--gl us] [--hl en] [--pagination-token TOKEN] --raw | jq .
```

- `--channel-id` (required) — `@handle`, `UC…` canonical ID, or legacy `/c/<custom>` / `/user/<name>` URL slug
- `--tab` — which tab to scrape; default `featured` (Home page)
- `--pagination-token` — for tabs that paginate (`videos`, `shorts`, etc.)

Top-level response: `channelInfo`, `featuredVideo`, `sections[]`.

`channelInfo` carries: `name`, `handle`, `channelId`, `channelUrl`, `avatar`, `banner`, `description`, `subscribers`, `extractedSubscribers`, `videosCount`, `extractedVideosCount`, `keywords[]`, `availableTabs[]`, `verified`, `websiteUrl`, `rssUrl`, `isFamilySafe`.

```bash
# All uploads (paginated)
hasdata youtube-channel-api --channel-id "@MrBeast" --tab videos --raw \
  | jq -c '.sections[].items[]? | {title, videoId, views, publishedDate}'
```

## youtube-transcript-api

```bash
hasdata youtube-transcript-api --v-param VIDEO_ID [--language-code en] [--type asr] --raw | jq .
```

- `--v-param` (required) — 11-char video ID
- `--language-code` — BCP-47 / YouTube code (`en`, `de`, `en-US`, `pt-BR`); must match a track the video actually has
- `--type asr` — fetch the auto-generated speech-recognition track (omit for human-authored)

Response: `transcript[]` and `availableTranscripts[]`.

Each `transcript[]` entry: `startMs`, `endMs`, `snippet`, `startTimeText`. Join `snippet`s to reconstruct the full text.

```bash
# Flatten to plain text
hasdata youtube-transcript-api --v-param "$VID" --raw \
  | jq -r '.transcript[].snippet' | tr '\n' ' '
```

---

## Non-obvious use cases

- **"What is this video actually about?"** — `youtube-transcript-api --v-param X --raw | jq -r '.transcript[].snippet'` → feed to an LLM for a real summary, not a thumbnail / title guess.
- **Cite YouTube content in a brief** — transcript + timestamps lets you quote with `(02:14)` accuracy. `--type asr` fills the gap when the creator never uploaded captions.
- **Channel growth audit** — `youtube-channel-api --channel-id @X --tab videos` paginated; `jq '.sections[].items[] | {publishedDate, views: .extractedViews}'` gives a velocity-over-time table.
- **Trend discovery** — `youtube-search-api --q "TOPIC" --sort-by views --date month` returns the highest-viewed videos in the last month — a leading indicator for cultural trends well before they hit Google News.
- **Competitor content map** — `youtube-channel-api --channel-id @competitor --tab videos --raw | jq -r '.sections[].items[].title'` enumerates every published title, useful for content-gap analysis.
- **Brand-safety scan** — `youtube-search-api --q "BRAND" --sort-by date --date week --raw | jq '.videoResults[] | select(.title | test("BRAND"; "i"))'` catches new mentions before they go viral.
- **Influencer due diligence** — combine `youtube-channel-api` for subscriber/video counts with `youtube-video-api` on their top videos (engagement rate ≈ `likes / views`).
- **"Find the moment they said X"** — `youtube-transcript-api` then `jq -r '.transcript[] | select(.snippet | test("X"; "i")) | "\(.startTimeText): \(.snippet)"'` returns timestamps of every mention.
- **Music-licensing lookup** — `youtube-video-api --v-param X --raw | jq .music` returns identified tracks (artist, title) when YouTube's Content ID matched.
- **Translate / re-localize a video** — pull the English transcript with `youtube-transcript-api`, translate via LLM, regenerate subtitles. Cheaper than re-transcribing audio.
- **Build a podcast / RSS feed for a channel** — `youtube-channel-api --raw | jq -r .channelInfo.rssUrl` returns the official RSS URL; subscribe in any podcast app.
- **Detect deleted videos** — `youtube-video-api --v-param X` returns an error for removed videos; useful for catching takedowns in archival pipelines.
- **Bulk research → transcript chain** — `youtube-search-api --q "$Q" --raw | jq -r '.videoResults[].videoId' | xargs -I{} hasdata youtube-transcript-api --v-param {} --raw` builds a research corpus from a topic search.
- **Shorts-only / long-form-only feeds** — `--video-type shorts` vs `--length plus20` to bias toward one format.
- **Live-stream discovery** — `--filters live` returns only currently live broadcasts; pair with `--sort-by date` for fresh streams.

## Pipelines

```bash
# Search → top 5 video transcripts as a corpus
hasdata youtube-search-api --q "$Q" --sort-by views --raw \
  | jq -r '.videoResults[:5][].videoId' \
  | while read -r vid; do
      echo "=== $vid ==="
      hasdata youtube-transcript-api --v-param "$vid" --raw \
        | jq -r '.transcript[].snippet'
    done > corpus.txt

# Channel → CSV of all videos
hasdata youtube-channel-api --channel-id "@$HANDLE" --tab videos --raw \
  | jq -r '.sections[].items[]? | [.videoId, .title, (.extractedViews // 0), .publishedDate] | @csv' \
  > channel_videos.csv
```

## Gotchas

- **`--v-param` is exactly 11 chars** — not the full watch URL. Strip the `v=` value or use `jq -r '.videoResults[0].videoId'` from a prior search.
- **`--language-code` must exist on the video.** Pass one of the codes listed in `availableTranscripts[]`, or omit for the default track.
- **`--type asr` is needed when no human-authored captions exist.** Without it the API returns the default human track and errors if there isn't one.
- **`@handle` is preferred over `UC…`** — easier to read, same result. Legacy `/c/` and `/user/` slugs also resolve.
- **`pagination.nextPageToken` is opaque** — pass it back via `--pagination-token` verbatim; don't try to decode.
- **`views` vs `extractedViews`** — `views` is the formatted string (`"1.2M views"`), `extractedViews` is the integer. Use the integer for math.