playbook/antigravity-awesome-skills/skills/anti-sycophancy
ci[bot] 772a1da63c 📦 deps(thirdparty): update snapshots 2026-06-06 16:01:57 +00:00
..
README.md 📦 deps(thirdparty): update snapshots 2026-06-06 16:01:57 +00:00
SKILL.md 📦 deps(thirdparty): update snapshots 2026-06-06 16:01:57 +00:00

README.md

Anti-Sycophancy

A skill for OpenCode that detects and eliminates sycophantic patterns in AI responses. Sycophancy is when an AI prioritises affirming a user's stated or implied beliefs over epistemic integrity — an RLHF structural artifact, not an attitude problem.

Installation

mkdir -p ~/.config/opencode/skills
cp -r anti-sycophancy ~/.config/opencode/skills/

Usage

Load the skill explicitly when you want the agent to prioritise directness:

/skill anti-sycophancy

Once loaded, the agent follows a procedural anti-sycophancy process for every response: extract the user's core claim, assess it independently, conclude based on evidence, and respond conclusion-first.

You can also reference it inline:

With the anti-sycophancy skill active, review this architecture:

Patterns Covered

Epistemic Sycophancy (what you say)

# Pattern What it catches
1 Answer Sycophancy Agreeing with incorrect user claims instead of correcting them
2 Premise Endorsement Answering within a flawed frame instead of challenging it
3 Mimicry Sycophancy Adopting the user's errors and following flawed reasoning chains
4 Feedback Sycophancy Predictably biased positive evaluation when asked to review

Soft Sycophancy (how you say it)

# Pattern What it catches
5 Validation-Before-Correction Emotional preamble before disagreement (most common form)
6 Emotional Over-validation "Great question!", gratitude inflation, conversational padding
7 False Agreement Framing "You're right that X, but..." where X isn't right
8 Hedged Disagreement "You might also consider..." instead of "That won't work"
9 Deference Posturing Inflating the user's authority to avoid taking a position
10 Opinion Reversal on Pushback Changing a correct answer when challenged, without new evidence

Social Sycophancy (who the user is)

# Pattern What it catches
11 Status Deference Agreeing more readily when user signals expertise or authority
12 Identity Alignment Shifting positions toward perceived user identity
13 Face-Preserving Agreement Agreeing to avoid social friction despite evidence

How It Works

No pattern matching. A single procedural discipline applied to every response:

  1. Extract the user's core claim from their framing. State it stripped of premises.
  2. Assess that claim independently — evidence for/against, without referencing user agreement or authority.
  3. Conclude based solely on step 2.
  4. Respond with the conclusion first, evidence second.

When the user disagrees:

  • New evidence → update position, state what changed
  • Repeated opinion → restate position with evidence

References

  • Sharma, M., Tong, M., Korbak, T., et al. (2023). Towards Understanding Sycophancy in Language Models. ICLR 2024. arXiv:2310.13548.
  • Perez, E., et al. (2022). Discovering Language Model Behaviors with Model-Written Evaluations. ACL 2023 Findings.
  • Dubois, M., Ududec, C., Summerfield, C., & Luettgau, L. (2026). Ask Don't Tell: Reducing Sycophancy in Large Language Models. arXiv:2602.23971.
  • Gligorić, K., et al. (2026). SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy. arXiv:2604.02423.
  • Mohsin, M. A., et al. (2026). Pressure, What Pressure? Sycophancy Disentanglement via Reward Decomposition. arXiv:2604.05279.
  • Feng, Z., et al. (2026). Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy. arXiv:2603.16643.
  • Cheng, M., Yu, S., Lee, C., Khadpe, P., Ibrahim, L., & Jurafsky, D. (2026). Social Sycophancy: A Broader Understanding of LLM Sycophancy. Proc. ICLR 2026. arXiv:2505.13995.
  • Ibrahim, L., Hafner, F. S., & Rocher, L. (2026). Training Language Models to be Warm Can Reduce Accuracy and Increase Sycophancy. Nature, 652, 11591165. DOI: 10.1038/s41586-026-10410-0.
  • Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2026). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. Science, 391(6792). DOI: 10.1126/science.aec8352.
  • "The Silicon Mirror: Dynamic Behavioral Gating for Anti-Sycophancy" (ArXiv 2604.00478, 2026).

License

MIT