75 lines
4.9 KiB
Markdown
75 lines
4.9 KiB
Markdown
# Getting better over time — capture new gotchas + personalize (without corrupting the skill)
|
|
|
|
This skill is a static reference, so it does **not** evolve on its own. But every real run teaches
|
|
something — a new platform quirk, a training bug not in the catalog, or the user's own setup. This file
|
|
is the protocol for **sedimenting that knowledge in the right place, at the right bar, without silently
|
|
rewriting the skill**. Apply it whenever a run surfaces something the catalog did not already cover.
|
|
|
|
To jump: `grep -in '<keyword>' references/self-improvement.md`.
|
|
|
|
## Table of contents
|
|
1. The bar — what qualifies as a keepable gotcha
|
|
2. Route the learning — user memory vs the catalog vs an upstream PR
|
|
3. Propose, don't auto-edit
|
|
4. First-run personalization — capture the user's setup into memory
|
|
5. Freshness — platform facts rot; stamp and re-verify
|
|
|
|
## 1. The bar — do NOT enshrine a one-off
|
|
|
|
A surprising failure is a **hypothesis, not a gotcha** (principle #3; **REQUIRED:**
|
|
`verifying-dl-experiments`). Sediment a NEW gotcha ONLY when all three hold:
|
|
|
|
- **Root-caused** — the mechanism is understood, not just "it worked after I retried."
|
|
- **Reproduced or clearly mechanistic** — not a single flaky incident (a transient network blip is not a gotcha).
|
|
- **Generalizable** — another user on this platform / training setup would hit it too.
|
|
|
|
If it fails the bar it is a *project note* (→ §4 memory), not a catalog entry. Enshrining unverified
|
|
one-offs is exactly the catalog rot this bar exists to prevent.
|
|
|
|
## 2. Route the learning
|
|
|
|
| What was learned | Where it goes | Form |
|
|
|---|---|---|
|
|
| **User/personal/host-specific** — this account's quirk, the user's preference, the usual GPU plan, "on MY box X is true" | **the host's `memory/` system** (host-specific, personal, may be ephemeral) | a `reference` / `project` / `feedback` fact, one per file, deduped |
|
|
| **A project-level fact or recurring error the user keeps hitting on THEIR project** — a config quirk, "always run X first", a path that must be Y, an env that must be activated | a **project instructions file in the user's repo** — `CLAUDE.md` / `AGENTS.md` / `.cursorrules` (persists cross-session AND cross-tool AND for collaborators, unlike host memory) | a short imperative rule; **propose, don't auto-write** (§3) |
|
|
| **Generalizable platform gotcha** | `profiles/<platform>.md` §7 (platform-pinned) or `references/gotchas_universal.md` (cross-platform) | `symptom → root cause → fix` + a source URL |
|
|
| **Generalizable training-debug gotcha** | `references/training/<topic>.md` | same form |
|
|
| **A correction** (a fact here is now wrong/stale) | edit that file; re-stamp its `verified <month>` | note old → new + URL |
|
|
|
|
Because this skill is open-source, a generalizable addition/correction is also a candidate for an
|
|
**upstream PR** — offer to open one so every user benefits, not just this install.
|
|
|
|
## 3. Propose, don't auto-edit
|
|
|
|
**NEVER silently rewrite a skill file from a single run** — a wrong "fix" or a broken structure is worse
|
|
than a missing entry. Instead:
|
|
|
|
- Draft the entry (`symptom → root cause → fix`, with its source) and **show it to the user for approval.**
|
|
- For an out-of-scope or larger change, spin it off (the host's task / PR mechanism) rather than bloating
|
|
the current run.
|
|
- Apply only after the user okays it; then re-run the skill's own checks — cross-refs resolve, **no secret
|
|
value written**, structure + TOC intact.
|
|
|
|
## 4. First-run personalization → memory
|
|
|
|
The first time this skill runs for a user, capture their **actual setup** into memory so later runs are
|
|
pre-parameterized instead of re-asked:
|
|
|
|
- which platform(s) they rent, and the per-platform §8 SCRIPT OVERRIDES that worked (paths, proxy hook, cred location);
|
|
- the project repo path / training entrypoint / config layout;
|
|
- the tracker entity (wandb / trackio) and **where the key lives** (its env-var name or file path — never the value);
|
|
- the usual GPU plan + disk budget.
|
|
|
|
Store as a `project` / `reference` memory. Record only the credential's *name or path*,
|
|
never the secret itself. Next session the profile overrides + `run_one` params come pre-filled.
|
|
|
|
## 5. Freshness — this skill has a shelf life
|
|
|
|
Platform prices, billing verbs, and limits **change**. Every platform fact is annotated `verified
|
|
<month>` at authoring time. Before betting **money or data** on a teardown/billing fact — the
|
|
irreversible ones (`terminate` / `destroy` / release) — **re-verify it against the platform's current
|
|
docs in the same session.** If a fact is stale, fix it (the §2 correction row) and re-stamp the date. A
|
|
quarterly re-verification of each profile's §5 TEARDOWN/BILLING section keeps the highest-stakes facts
|
|
honest — schedulable via the host's `/schedule`. Run `scripts/check_staleness.py` to list every `verified`
|
|
stamp older than N months (a mechanical reminder of WHAT to re-check — it does not verify the fact itself).
|