20 KiB
Extraction Techniques
Technical methods for extracting signals from conversation history, including regex patterns, heuristics, and context analysis.
Signal Detection Patterns
Success Signals
Explicit Praise Detection
Regex patterns:
# High confidence praise
\b(perfect|excellent|exactly|amazing|brilliant|outstanding|superb)\b!*
# Medium confidence praise
\b(great|good|nice|wonderful|fantastic)\b
# Superlatives
\b(best|ideal|optimal|precisely)\s+(what|how|where)\s+
# Enthusiastic patterns
!{2,}|🎉|👍|✅
Heuristic rules:
- Check for exclamation marks: 1 = medium confidence, 2+ = high confidence
- Count positive adjectives in message: 2+ = high enthusiasm
- Check for "exactly what I needed" or similar fulfillment language
- Verify no contradictory signals in same message (e.g., "good but...")
Context checks:
If praise detected:
- Check previous agent message: What did agent do?
- Check next user message: Did user continue or correct?
- Score based on continuation vs. correction
Continuation Detection
Patterns:
# Explicit continuation
\bnow\s+(do|apply|use|add|implement)\s+
\bapply\s+this\s+(to|pattern|approach)\s+
\bnext[,\s]+(let's|do|add)
# Extension language
\b(also|additionally|furthermore|moreover)\s+
\bsame\s+(for|with|to)\b
Heuristic rules:
- Check if message references "this", "that", "same" without corrections
- Verify previous agent message exists (continuation requires prior context)
- Check for negation words: "now do X instead" is correction, not continuation
- Look for expansion keywords: "also", "and", "too", "as well"
Context checks:
If continuation suspected:
- Extract referenced prior work (parse "this", "that", "same")
- Check for corrections between reference and current message
- If corrections exist: Not continuation (likely correction signal)
- If no corrections: Continuation (confidence based on explicitness)
Adoption Detection
Patterns:
This is primarily behavioral, not linguistic. Requires analyzing agent suggestions vs. user actions.
Detection algorithm:
1. Extract agent's suggestions from previous messages
2. Check user's next message or code changes
3. If user implements suggestion without modification:
- Adoption signal (high confidence)
4. If user asks clarifying questions then implements:
- Adoption signal (medium confidence)
5. If user modifies before implementing:
- Partial adoption (low confidence) or correction signal
Code comparison heuristic:
// Pseudocode for adoption detection
function detectAdoption(agentMessage, userResponse) {
const agentSuggestions = extractSuggestions(agentMessage);
const userActions = extractActions(userResponse);
for (const suggestion of agentSuggestions) {
const match = findMatchingAction(suggestion, userActions);
if (match && match.similarity > 0.8) {
return { signal: 'adoption', confidence: 'high' };
} else if (match && match.similarity > 0.5) {
return { signal: 'adoption', confidence: 'medium' };
}
}
return null;
}
Frustration Signals
Correction Detection
Patterns:
# Explicit negation
\b(no[,\s]|wrong|incorrect|not\s+what|that's\s+not)\b
# Correction language
\b(actually|instead|rather)\s+
\bI\s+meant\s+
\bdo\s+\w+\s+instead\b
# Contradiction markers
\bdon't\s+do\s+\w+
\bnot\s+\w+[,\s]+\w+
Heuristic rules:
- Check for negation words followed by agent's previous output
- Look for "instead" patterns: "X instead of Y" where Y was agent's choice
- Check for contradiction: "I said X" where X contradicts recent agent action
- Verify correction vs. iteration: correction references agent error, iteration builds on success
Context checks:
If correction suspected:
- Extract what user is correcting (X → Y)
- Check if agent did X in previous message
- If yes: Correction signal (confidence based on negation strength)
- If no: Possible misunderstanding or false positive
Repetition Detection
Pattern:
This requires multi-message analysis.
Detection algorithm:
1. Extract normalized intent from each user message
2. Build similarity matrix across messages
3. Find clusters of high-similarity messages (>0.7 similarity)
4. If cluster size >= 2 and spans multiple agent responses:
- Repetition signal
5. Check for escalation language ("again", "already told you"):
- High confidence
6. Otherwise: Medium confidence
Normalization steps:
function normalizeIntent(message) {
// Remove politeness/filler
let normalized = message.toLowerCase();
normalized = normalized.replace(/\b(please|thanks|thank you)\b/g, '');
// Extract core imperative
const imperatives = normalized.match(/\b(use|do|make|add|implement|fix)\s+\w+/g);
// Extract prohibitions
const prohibitions = normalized.match(/\bdon't\s+\w+/g);
return { imperatives, prohibitions };
}
function calculateSimilarity(intent1, intent2) {
// Check for matching imperatives or prohibitions
// Return similarity score 0.0 - 1.0
}
Escalation markers:
\b(again|once again|already|I\s+told\s+you|I\s+said|still)\b
Explicit Frustration Detection
Patterns:
# Direct frustration language
\b(frustrat(ing|ed)|annoying|annoyed|confusion|confused)\b
# Problem statements
\b(not\s+working|doesn't\s+work|broken|failing|fails)\b
# Accusatory questions
\bwhy\s+(did|would|do)\s+you\s+
# Exasperation
\bcome\s+on\b|\bseriously\b|\breally\?\b
Heuristic rules:
- Question marks with negative tone: medium frustration
- Multiple question marks: high frustration
- All caps words: high frustration
- Repetition of negative words: escalating frustration
Tone analysis:
function analyzeTone(message) {
const negativeWords = message.match(/\b(not|no|never|don't|can't|won't)\b/g);
const negativeWordCount = negativeWords ? negativeWords.length : 0;
const questionMarks = message.match(/\?/g);
const questionCount = questionMarks ? questionMarks.length : 0;
const capsWords = message.match(/\b[A-Z]{2,}\b/g);
const capsCount = capsWords ? capsWords.length : 0;
// Frustration score
const score = (negativeWordCount * 0.3) + (questionCount * 0.2) + (capsCount * 0.5);
if (score > 1.5) return 'high';
if (score > 0.7) return 'medium';
return 'low';
}
Workflow Signals
Sequence Marker Detection
Patterns:
# Ordinal markers
\b(first|second|third|fourth|fifth)\b[,\s]
\b(1st|2nd|3rd|4th|5th)\b[,\s]
\bstep\s+\d+[:\s]
# Temporal sequence
\b(before|after|then|next|finally)\b[,\s]
\bonce\s+\w+[,\s]+(then|do|we)\b
Heuristic rules:
- Count ordinal markers: 2+ = high confidence sequence
- Check for numbered lists (1., 2., 3.)
- Look for temporal connectives in order (first...then...finally)
- Verify sequence is prescriptive (steps to take) not descriptive (events that happened)
List detection:
function detectSequence(message) {
// Check for numbered list
const numberedItems = message.match(/^\d+\.\s+.+$/gm);
if (numberedItems && numberedItems.length >= 2) {
return { signal: 'sequence', confidence: 'high', items: numberedItems };
}
// Check for ordinal markers
const ordinals = message.match(/\b(first|second|third|then|next|finally)\b/gi);
if (ordinals && ordinals.length >= 2) {
return { signal: 'sequence', confidence: 'medium', markers: ordinals };
}
return null;
}
Stage Transition Detection
Patterns:
# Completion + new direction
\b(now\s+that|with\s+that|that's\s+done)\b.+\b(let's|next|moving|time\s+to)\b
# Explicit transitions
\bmoving\s+on\s+to\b
\bnext\s+up[:\s]
\bswitching\s+to\b
Heuristic rules:
- Check for completion language: "done", "finished", "complete", "that's it"
- Check for new direction: "now", "next", "let's", "time to"
- Must have both completion and new direction for high confidence
- If only new direction: context switch, not stage transition
Context checks:
If stage transition suspected:
- Check if previous task mentioned in completion language
- Verify previous task was in progress (not already complete)
- Check if new direction is related (stage) or unrelated (context switch)
- Related = stage transition, unrelated = context switch
Tool Chain Detection
Pattern:
Requires analyzing agent's tool usage across multiple tasks.
Detection algorithm:
1. Extract tool call sequences from agent messages
2. Group by task (task boundary = user message)
3. Find recurring sequences:
- Use n-gram analysis (n=2 to 5)
- Count frequency of each n-gram
- Filter to sequences with frequency >= 3
4. For each recurring sequence:
- Calculate confidence based on frequency and consistency
- Extract as tool chain pattern
N-gram analysis:
function extractToolChains(agentMessages) {
const sequences = [];
for (const message of agentMessages) {
const tools = extractToolCalls(message); // ['Read', 'Edit', 'Bash']
sequences.push(tools);
}
// Find recurring n-grams
const ngrams = {};
for (const seq of sequences) {
for (let n = 2; n <= Math.min(5, seq.length); n++) {
for (let i = 0; i <= seq.length - n; i++) {
const gram = seq.slice(i, i + n).join(' → ');
ngrams[gram] = (ngrams[gram] || 0) + 1;
}
}
}
// Filter to frequent patterns
const chains = Object.entries(ngrams)
.filter(([gram, count]) => count >= 3)
.map(([gram, count]) => ({
chain: gram,
frequency: count,
confidence: count >= 5 ? 'high' : count >= 3 ? 'medium' : 'low'
}));
return chains;
}
Request Signals
Prohibition Detection
Patterns:
# Absolute prohibitions
\b(never|don't|do\s+not)\s+
\bavoid\s+(using|doing)\s+
# Explicit constraints
\bno\s+\w+\b
\bwithout\s+\w+\b
Heuristic rules:
- "Never" = high confidence prohibition
- "Don't" + imperative = high confidence
- "Avoid" = medium confidence (softer prohibition)
- "No X" = context-dependent (check if X is an action or noun)
Context checks:
If prohibition suspected:
- Extract prohibited action/item
- Check for exceptions: "don't X unless Y"
- If exception: conditional signal, not absolute prohibition
- If no exception: prohibition (confidence based on strength of negation)
Requirement Detection
Patterns:
# Modal verbs
\b(must|should|need\s+to|have\s+to|always)\s+
# Imperatives with emphasis
\bmake\s+sure\s+(to\s+)?
\bensure\s+(that\s+)?
\bremember\s+to\s+
Heuristic rules:
- "Must" / "Always" = high confidence requirement
- "Should" = medium confidence requirement
- "Make sure" = medium confidence requirement
- Bare imperative ("Run tests") = context-dependent
Strength scoring:
function classifyRequirement(message) {
if (/\b(must|always|required)\b/i.test(message)) {
return { strength: 'strong', confidence: 'high' };
}
if (/\b(should|need\s+to|make\s+sure)\b/i.test(message)) {
return { strength: 'moderate', confidence: 'medium' };
}
if (/\b(could|might\s+want\s+to|consider)\b/i.test(message)) {
return { strength: 'weak', confidence: 'low' };
}
return null;
}
Preference Detection
Patterns:
# Explicit preference
\bI\s+prefer\s+
\bI'd\s+rather\s+
\bI\s+like\s+\w+\s+(better|more)\b
# Comparative language
\b(better|cleaner|easier|simpler)\s+to\s+
\bX\s+over\s+Y\b
Heuristic rules:
- "I prefer X" = high confidence preference
- "X is better" = medium confidence (could be objective claim)
- "I like X" = low confidence (weak preference)
- Check for comparison: "X over Y" or "X not Y" strengthens signal
Subjectivity detection:
function isSubjective(statement) {
// Check for first-person markers
const firstPerson = /\b(I|my|me)\b/i.test(statement);
// Check for subjective language
const subjective = /\b(prefer|like|rather|think|believe|feel)\b/i.test(statement);
// Check for evaluative language
const evaluative = /\b(better|worse|best|worst|cleaner|messier)\b/i.test(statement);
return firstPerson || subjective || evaluative;
}
Message Boundary Detection
Identify where user messages begin and end, separating from agent messages and tool outputs.
Actor Classification
function classifyActor(message) {
// Check for role markers
if (message.role === 'user') return 'user';
if (message.role === 'assistant') return 'agent';
// Fallback to content analysis
if (/<function_calls>/i.test(message.content)) return 'agent';
if (/<function_results>/i.test(message.content)) return 'tool';
// Default to user for ambiguous cases
return 'user';
}
Message Filtering
function filterMessages(conversation, options = {}) {
const {
actors = ['user', 'agent'],
excludeToolOutput = true,
excludeCodeBlocks = false,
minLength = 0,
} = options;
return conversation
.filter(msg => actors.includes(classifyActor(msg)))
.filter(msg => !excludeToolOutput || !msg.content.includes('<function_results>'))
.filter(msg => !excludeCodeBlocks || !msg.content.includes('```'))
.filter(msg => msg.content.length >= minLength);
}
Multi-Turn Pattern Recognition
Detect patterns that span multiple messages.
Escalation Detection
function detectEscalation(messages) {
// Group messages by topic
const topics = clusterByTopic(messages);
for (const topic of topics) {
// Check if frustration increases over time
const frustrationScores = topic.messages.map(msg => {
const signals = extractSignals(msg);
return signals.filter(s => s.type === 'frustration').length;
});
// Check for monotonic increase
let isEscalating = true;
for (let i = 1; i < frustrationScores.length; i++) {
if (frustrationScores[i] <= frustrationScores[i - 1]) {
isEscalating = false;
break;
}
}
if (isEscalating && frustrationScores.length >= 2) {
return {
pattern: 'escalation',
topic: topic.name,
messages: topic.messages,
confidence: frustrationScores.length >= 3 ? 'high' : 'medium'
};
}
}
return null;
}
Topic Clustering
function clusterByTopic(messages) {
// Simple keyword-based clustering
const clusters = [];
for (const msg of messages) {
const keywords = extractKeywords(msg);
// Find existing cluster with matching keywords
let matched = false;
for (const cluster of clusters) {
const overlap = keywords.filter(k => cluster.keywords.includes(k));
if (overlap.length / keywords.length > 0.3) {
cluster.messages.push(msg);
cluster.keywords = [...new Set([...cluster.keywords, ...keywords])];
matched = true;
break;
}
}
// Create new cluster if no match
if (!matched) {
clusters.push({
name: keywords[0] || 'unnamed',
keywords,
messages: [msg]
});
}
}
return clusters;
}
function extractKeywords(message) {
// Remove stop words and extract nouns/verbs
const stopWords = new Set(['the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for']);
const words = message.content
.toLowerCase()
.replace(/[^\w\s]/g, '')
.split(/\s+/)
.filter(w => w.length > 3 && !stopWords.has(w));
// Return top 5 most frequent words
const freq = {};
for (const word of words) {
freq[word] = (freq[word] || 0) + 1;
}
return Object.entries(freq)
.sort((a, b) => b[1] - a[1])
.slice(0, 5)
.map(([word]) => word);
}
Context Analysis Methods
Recency Weighting
More recent signals should carry more weight than older ones.
function applyRecencyWeight(signals, halfLifeDays = 7) {
const now = Date.now();
const halfLifeMs = halfLifeDays * 24 * 60 * 60 * 1000;
return signals.map(signal => {
const age = now - signal.timestamp;
const weight = Math.pow(0.5, age / halfLifeMs);
return {
...signal,
weight,
weightedConfidence: signal.confidence * weight
};
});
}
Contradiction Resolution
When signals conflict, resolve using recency and confidence.
function resolveContradictions(signals) {
// Group by topic
const groups = groupByTopic(signals);
for (const group of groups) {
// Sort by timestamp (newest first)
group.signals.sort((a, b) => b.timestamp - a.timestamp);
// Check for contradictions
const contradictions = findContradictions(group.signals);
for (const [newer, older] of contradictions) {
if (newer.confidence >= older.confidence) {
// Mark older signal as superseded
older.superseded = true;
older.supersededBy = newer.message_id;
}
}
}
// Filter out superseded signals
return signals.filter(s => !s.superseded);
}
function findContradictions(signals) {
const pairs = [];
for (let i = 0; i < signals.length; i++) {
for (let j = i + 1; j < signals.length; j++) {
if (areContradictory(signals[i], signals[j])) {
pairs.push([signals[i], signals[j]]);
}
}
}
return pairs;
}
function areContradictory(signal1, signal2) {
// Example: "Use X" vs. "Don't use X"
if (signal1.type === 'request' && signal2.type === 'request') {
// Extract actions
const action1 = signal1.quote.match(/\b(use|do|make|add)\s+(\w+)/i);
const action2 = signal2.quote.match(/\b(don't|never|avoid)\s+(use|do|make|add)?\s*(\w+)/i);
if (action1 && action2 && action1[2] === action2[3]) {
return true; // Contradiction: "use X" vs. "don't use X"
}
}
return false;
}
Performance Optimization
Incremental Analysis
For long conversations, analyze incrementally rather than re-analyzing entire history.
class IncrementalAnalyzer {
constructor() {
this.lastAnalyzedIndex = 0;
this.signals = [];
this.patterns = [];
}
analyze(messages) {
// Only analyze new messages
const newMessages = messages.slice(this.lastAnalyzedIndex);
// Extract signals from new messages
const newSignals = newMessages.flatMap(msg => extractSignals(msg));
this.signals.push(...newSignals);
// Update patterns with new signals
this.patterns = detectPatterns(this.signals);
// Update index
this.lastAnalyzedIndex = messages.length;
return {
signals: this.signals,
patterns: this.patterns
};
}
reset() {
this.lastAnalyzedIndex = 0;
this.signals = [];
this.patterns = [];
}
}
Caching
Cache expensive operations like topic clustering and keyword extraction.
class AnalysisCache {
constructor(ttlMs = 5 * 60 * 1000) { // 5 minute TTL
this.cache = new Map();
this.ttl = ttlMs;
}
get(key) {
const entry = this.cache.get(key);
if (!entry) return null;
if (Date.now() - entry.timestamp > this.ttl) {
this.cache.delete(key);
return null;
}
return entry.value;
}
set(key, value) {
this.cache.set(key, {
value,
timestamp: Date.now()
});
}
clear() {
this.cache.clear();
}
}
// Usage
const cache = new AnalysisCache();
function extractSignalsWithCache(message) {
const key = `signals:${message.id}`;
const cached = cache.get(key);
if (cached) return cached;
const signals = extractSignals(message);
cache.set(key, signals);
return signals;
}
Error Handling
Graceful Degradation
If signal extraction fails for a message, continue with remaining messages.
function extractSignalsSafe(messages) {
const results = {
signals: [],
errors: []
};
for (const msg of messages) {
try {
const signals = extractSignals(msg);
results.signals.push(...signals);
} catch (error) {
results.errors.push({
message_id: msg.id,
error: error.message
});
// Continue with next message
}
}
return results;
}
Validation
Validate extracted signals before adding to results.
function validateSignal(signal) {
const required = ['type', 'subtype', 'message_id', 'quote', 'confidence'];
for (const field of required) {
if (!(field in signal)) {
throw new Error(`Missing required field: ${field}`);
}
}
const validTypes = ['success', 'frustration', 'workflow', 'request'];
if (!validTypes.includes(signal.type)) {
throw new Error(`Invalid signal type: ${signal.type}`);
}
if (signal.confidence < 0 || signal.confidence > 1) {
throw new Error(`Confidence must be 0-1, got: ${signal.confidence}`);
}
return true;
}