The Context Engineering Bottleneck: Why Your AI Coding Sessions Keep Failing

Every week I see the same complaint on Reddit.

"Gemini got dumber."

"Claude keeps forgetting everything I told it."

"It was working fine last month. Now it cannot even remember the file structure."

The developers blaming the model are wrong.

The problem is almost never the model.

It is the context you are feeding it.

#The Bottleneck Nobody Talks About

In January 2026, a study measured how effectively frontier LLMs actually use their context windows. The headline finding: models utilize only 1-5% of their advertised capacity effectively.

Let that sink in.

Gemini 3 Pro advertises a 1 million token context window. That is roughly 750,000 words. Entire novels fit in there. Complete codebases. Thousands of documents.

But when researchers tested real-world tasks using the full window, performance collapsed.

Reddit users figured this out empirically. Complaint after complaint traces the same pattern: "Effective recall seems to drop around 32k tokens." Despite the 1M advertised capacity, people report usable context closer to 3% of that.

The constraint shifted from context size to context organization.

Having a massive context window does not help if you are filling it with noise.

#Lost in the Middle

Researchers at Stanford and MIT have documented a phenomenon called the "lost in the middle" problem. It shows up as a U-shaped attention curve.

LLMs focus intensely on the beginning of context. They also focus on the end. But they lose track of information in the middle.

The primacy effect: the first things you tell the model stick.

The recency effect: the last things you tell the model stick.

The middle? That gets attention-diluted into oblivion.

This is not a bug. It is how transformer attention works. The math of self-attention distributes focus non-uniformly across the sequence. Information buried in the center of a long document receives less representational weight.

The practical consequence: if you paste 50,000 tokens of codebase context into a prompt, the model will likely forget the middle 30,000 tokens.

You are not debugging a model failure. You are experiencing position-dependent attention decay.

#Context Engineering: The New Skill

Andrej Karpathy said it best:

"LLMs are a new kind of operating system. The LLM is the CPU. The context window is the RAM. Context engineering is the delicate art and science of filling the context window with just the right information for the next step."

This reframes everything.

Prompt engineering is about crafting the question. Context engineering is about constructing the answer space.

Prompt engineering focuses on a single input-output pair. Context engineering manages the entire information environment: system prompts, retrieved documents, conversation history, tool definitions, memory systems.

One is a tactic. The other is a strategy.

And here is the issue. Most vibe coders treat prompts as the whole game. They obsess over wording, structure, examples. Then they dump 100k tokens of raw codebase into the context and wonder why the model hallucinates.

The prompt was not the problem. The context was garbage.

#The Five Layers of Context

Context is not monolithic. It has structure:

Layer 1: System Instructions The identity layer. Who is this agent? What behaviors are required? What constraints must it follow?

Layer 2: Retrieved Documents External knowledge. RAG results. Documentation. Real-time data the model was not trained on.

Layer 3: Conversation History The dialogue state. Past decisions. What has already been discussed. What the user already knows.

Layer 4: Long-Term Memory Persistent knowledge across sessions. Learned preferences. Project-specific patterns that carry forward.

Layer 5: Tool Definitions The capabilities available. What functions can this agent invoke? What are their signatures?

Most developers only think about Layer 1, maybe Layer 2 if they have heard of RAG.

Layers 3-5 are where the real leverage lives.

#A Case Study: My Agent Constitution

I run a modular architecture for my AI agents. The core is what I call the Agent Constitution: a layered system of rules, skills, and memory protocols that constructs context dynamically.

The key design decisions:

Modular Rule Loading

Instead of a monolithic system prompt, I use a master loader that imports specific modules:

@~/.gemini/global_rules/governance.md
@~/.gemini/global_rules/skills-protocol.md
@~/.gemini/global_rules/rpe-workflow.md
@~/.gemini/global_rules/context-hygiene.md

Each module is self-contained, under 12,000 characters. The agent only loads what is relevant to the current task. Context stays lean.

Progressive Skill Disclosure

I have 92 specialized skills. Loading all of them into every conversation would be insane. Instead:

Level 1: Summaries only (~100 words each, always loaded)
Level 2: Full skill body (loaded on-demand when triggered)
Level 3: Deep references and scripts (loaded via tool calls as needed)

The agent scans Level 1 to decide relevance, then loads deeper levels only when necessary.

WHY/HOW Annotations

Every skill import includes context about why it matters and how to apply it:

@~/.gemini/antigravity/skills/next-js/SKILL.md
# WHY: Core framework (Next.js 16.1 App Router)
# HOW: Use Server Components by default, leverage Turbopack

The agent knows not just what skills exist, but why they matter for this specific project.

The Result

Context stays targeted. Irrelevant information does not consume tokens. The agent has exactly what it needs and nothing more.

This is what context engineering looks like in practice.

#Spec-Driven Development

A parallel shift is happening in how we think about code.

Spec-driven development flips the traditional workflow. Instead of writing code first and documenting later, you define specifications first and let AI generate the implementation.

The specification becomes the source of truth: clear requirements, interface definitions, acceptance criteria. The AI consumes these specs and produces code, tests, documentation.

GitHub's Spec Kit formalizes this workflow: Specify → Plan → Tasks → Implement.

The relevant insight for context engineering: specifications are context artifacts.

A well-written spec is a dense, structured piece of context that directly guides implementation. A poorly written spec is noise that the model has to pattern-match around.

This is why "just dump the codebase" fails as a strategy. Code is not specification. Code is implementation. The model needs intent, constraints, requirements. Structured context, not raw data.

#The Practical Fixes

If your AI sessions keep failing, here is where to start:

1. Audit Your Context Composition

What exactly is going into the context window? Many tools add hidden system prompts. RAG systems retrieve documents of variable quality. Conversation history accumulates.

Map it. Understand every token that is being consumed.

2. Position Critical Information Strategically

Given the U-shaped attention curve, put the most important information at the start or end of your context. Do not bury critical requirements in the middle of a 50,000 word document.

3. Compress and Structure

Dense, structured context beats sparse, unstructured context. Use clear headers. Bullet lists. Tables. Explicit delimiters.

Instead of:

"I have a Next.js app and I want you to add authentication using Supabase but make sure it works with the existing middleware and also handles session refresh properly..."

Try:

## Objective
Add Supabase Auth to Next.js app

## Requirements
- Email/password login
- Session refresh via middleware
- Protected route wrapper

## Constraints
- Must integrate with existing middleware
- No breaking changes to current routes

Same information, better structure, more effective context.

4. Implement Lazy Loading

Do not load everything at once. Build systems that inject context on-demand based on task requirements. Skills activate when triggered. Documents retrieve when relevant. Memory surfaces when queried.

5. Clean Aggressively

Stale context is worse than no context. Old conversation history that contradicts current requirements. Outdated knowledge items. Deprecated patterns.

Set retention policies. Archive completed work. Delete irrelevant artifacts.

#The Reframe

Stop thinking about prompts.

Start thinking about systems.

Prompts are individual queries. Context is the architecture that makes those queries effective.

The developers who will excel in 2026 are not the ones writing clever prompts. They are the ones building context infrastructure: modular rule systems, skill libraries, memory protocols, structured specifications.

1 million tokens of noise is worse than 32k tokens of signal.

The constraint is not context size.

The constraint is context organization.

Build the system.

Last updated: February 3, 2026