Being the Centaur: Context Engineering for LLM Coding
AI slop pull requests overwhelm open source projects. Agency engineers suffer from reviewing AI-generated code all day. Especially the latter reminds me of The Reverse-Centaur’s Guide to Criticizing AI. But if you are a senior engineer, you can become the centaur on top of an LLM using advanced context engineering, which I learned last year and want to share with you in this post. In that situation, the LLM executes the last step in the engineering workflow for you: writing the actual code.
The Setup ¶
For the process, I use a number of commands that I execute sequentially to fulfill a certain task. Most of the commands result in writing Markdown files, for which I explain the structure at the end of this section.
It's noteworthy that all of the commands have actually been created and are maintained by Claude Code itself (with human input, of course).
You can find the full Claude template with all commands, agents, hooks, etc. on GitHub.
Main Commands ¶
The main commands are:
/create_ticket ¶
The /create_ticket command helps me file a ticket that can be worked on by the LLM. It serves two purposes: a) help me think through what I actually want to achieve, define edge cases, and clarify the scope, and b) produce a document for Claude Code to keep this information in context when it works on the ticket. The command is phrased in a way that Claude brings up questions about the codebase that need to be answered to craft a good implementation plan.
The result of this command is a ticket, which I (for simplicity reasons) store in a Markdown file on disk. I know other people use proper ticket systems via MCP (e.g. Linear). The point for me is that I involve the LLM already in the ticket creation and that the LLM can easily access the ticket during the process and at later stages.
/research_codebase ¶
This command is the most essential, in my opinion, to make Claude Code work on large codebases. /research_codebase receives the ticket file as input. Its task is to research the existing code and the internet to find all relevant information needed for the implementation. This involves information to clarify the open questions in the ticket, but also general information like "Which files will need to be touched during implementation?", "Which concepts exist that we can use directly or as blueprints to create a consistent codebase?", and "What are potential side-effects that we could cause?".
The result of this command is typically a large Markdown document that contains code examples taken from the codebase and clear instructions on where to modify code pieces and how. For example:
### 6. Current FilterPanel Integration
**State Management** (`frontend/pages/index.vue:166-201`):
- `appliedTypes`: URL-synced via computed getter/setter
- `pendingSelectedTypes`: transient ref for in-progress selection
- `hasFilterChanges`: computed comparison
**FilterPanel Props** (`frontend/components/FilterPanel.vue:4-9`):
~~~typescript
const props = defineProps<{
open: boolean
facets: FacetCounts
selectedTypes: DocumentType[]
hasChanges: boolean
}>()
~~~
This command makes heavy use of agents, which I show later in this post.
/create_plan ¶
Using the /create_plan command ensures that Claude has a proper, well-scoped TODO list to work on and knows exactly what to do during the actual code generation. This is also the place where I come into play once more. For any question from the ticket phase, the research phase should have brought up one or more solutions. It is now a human task to review these and pick one.
This is also the step where I pay the most attention to checking the output generated by Claude. It involves the technical structure that will be created and should cover all side-effects that need to be paid attention to. I check the summary of the plan closely, and Claude asks me if the plan is fine for me. Sometimes we re-discuss the plan entirely, but in most cases it is fine and either sticks to the patterns I want to see in the codebase or just needs minimal adjustments.
/implement_plan ¶
This is the phase where Claude actually implements the code. In /implement_plan it reads the three Markdown files it created before as input and then runs through the plan steps it created, filling the plan with code snippets to bring it alive.
This step runs mostly on its own, with me looking at the generated code. A nice
thing: due to how /create_plan is crafted, Claude will also generate tests to
check its own progress and whether the code works. It runs the tests and can
even use Chrome MCP to verify changes
directly in the browser.
When this command finishes, the ticket should be implemented and tested. This typically works well.
Side Commands ¶
Besides the four main workflow commands, I use a couple of commands to enhance the workflow when needed.
/commit ¶
While Claude commits the code changes per step in the plan nicely, it is hard for the tool to decide when a Markdown document step is finished. I use /commit to enforce performing a commit.
/discuss ¶
There are cases where I do not know how to tackle a larger issue at all, where I
want to explore bigger technical topics myself, or similar. For that, I use the
/discuss
command. It helps me structure my thoughts, researches information for me, and
results in a Markdown document that is typically used to create a series of
tickets out of it using the /create_ticket command.
/code_review ¶
Especially when processing larger tickets (epics) that have been split into sub-tickets, I want to be extra sure that we did not overlook anything. "Was really everything implemented?", "Did we violate any codebase constraints and overlook it?", or "Does our code still follow the framework best practices?" are questions that can be answered by a review.
The /code_review command allows simple ticket reviews by giving it a ticket number or accepts more detailed queries toward the codebase.
Agents ¶
Especially the /research_codebase command makes extensive use of agents to deliberately split cognitive labor across isolated contexts. The agents are built to create their own Claude context and perform a given task.
There are three agents to look into the codebase:
codebase-analyzerresearches how code in the codebase works and allows one to understand how a mechanism is currently implemented end to end.codebase-locatorfinds out where certain code lives and is typically used for meta changes like refactoring.codebase-pattern-finderfinds similar implementations and identifies patterns in the code to determine how an implementation should be modeled.
Then there are two to look into existing thoughts (explained in the next section):
thoughts-locatorchecks if there are any thoughts already related to a topic.thoughts-analyzeris rarely used to perform deep research for a certain topic to understand how the topic evolved.
And for web research there is web-search-researcher. This agent is super
powerful. It is started to search for information on a certain topic, typically
documentation of a package, library, or system component, but also best
practices, howtos, and existing issues. The /research_codebase command fires a
lot of these when there is a new implementation to be made. The nice part: the
spawned agent will do extensive research on the given topic on its own and
summarize the result. If this still does not answer the question, the
/research_codebase command will fire more agents with different queries until
it has everything it needs.
Thoughts ¶
Thoughts are the pure essence of the concept. Every result of a context-building
command is stored in the thoughts/ directory in the repository root as pure
Markdown files.
The storage is organized as follows:
thoughts
└── ...
├── discussions
├── plans
├── research
├── reviews
└── tickets
The ... part can be replaced either by shared for information that is
relevant to every project member or by the username of someone who produced a
thought. The subdirectories then contain:
ticketscontains all ticket filesresearchthe outcome of the/research_codebasecommandsplansholds all implementation plans- and so on.
By putting all generated thoughts into the repository, you receive a knowledge
base that is extremely usable by Claude Code with the shown process. Even if
the amount of text would be way too much for a human to research and consume,
the /research_codebase command and its agents can find a lot of relevant
information to understand a) how previous implementations worked, b) what was
already found out and potentially ruled out.
Hooks ¶
I use hooks very specific to my project, for example to prevent tickets from running into a non-conforming state or to check that commit messages conform to my notion of conventional commits.
I left the check-ticket-status.sh and validate-ticket-status.sh hooks in the
template for illustration. What can be more valuable for any engineer are the
notification hooks in settings.json, which produce visual and audio
notifications if Claude Code is done with something or needs input.
CLAUDE.md ¶
The CLAUDE.md files are still highly relevant for carrying information that is
required in any context. My main CLAUDE.md explains the overall project
structure (I use a monorepo, so all components are explained and what belongs
where) and general rules for implementation. Then each component has its own
CLAUDE.md, and some subdirectories (e.g. in infrastructure/) have their own.
I do not maintain these files manually, but tell Claude to memorize important rules on the go. It typically decides on its own what the right place is for some information.
Also, documenting new workflows in appropriate CLAUDE.md files is part of my ticket specification if a new tool or workflow is introduced.
How To and Why? ¶
That was a lot of technical (Markdown) infrastructure explanation. But how do I actually use that stuff? And why does it work?
Implementing a Ticket ¶
The process is really simple and straightforward:
-
/create_ticketI typically run this command with an idea of what to implement or a bug description. If I used/discussbefore, I give the command the discussion document to eat. It is helpful to specify whether one expects the ticket to be an epic (split into sub-tickets) or a simple ticket. But I also regularly ask Claude during the discussion whether it expects the ticket in progress to be an epic or a regular ticket./clear -
/research_codebase <ticket-number>In 99% of cases, I call this command and it runs on its own. I only pay attention to which new domains it wants to fetch information from. There are some that I include-list for Claude, like documentation pages for components we use or MDN. For others, I validate the source depending on how crucial the task is./clear -
/create_plan <ticket-number>runs after the research is complete. This is the point where Claude Code asks me a couple of questions to clarify certain implementation details. Afterwards, it leads me through the implementation plan, and I can influence how things are done. Over time, the adjustments I need to make have been reduced to a minimum. The codebase, thoughts, and CLAUDE.md files seem to contain enough information to create plans that I like./clear -
/implement_plan <ticket-number>again runs largely on its own. It is especially satisfying to see how Claude creates and runs the specified tests, occasionally revises the implementation based on that, and commits phase by phase in clean work packages. I usually use the time it takes for implementation to specify new tickets, conduct a discussion, or take a break.
Why It Works ¶
I'm not an AI scientist, but I suspect that context and especially its size are the key here. When you let an LLM run on a larger codebase or give it only fuzzy instructions, the system pollutes large parts of its context with researching the codebase and understanding what you want to achieve. As a result, very little context is left for the actual code generation.
With the process shown above, you deliberately craft and persist the context
for the LLM across multiple, isolated steps. The /create_ticket step collects
everything that needs to be known about the "what" to implement. The
/research_codebase step analyzes the codebase thoroughly and prepares all
documentation needed for implementation. /create_plan then prepares the steps
to be done. This plan can even survive a context compaction or partial context
loss. Altogether, this builds a context that is knowledge-wise sound but
token-wise small.
This small context leaves a large token space for actually generating the desired code. The LLM will only read the files (or parts of them) it actually needs to touch. With that, also the amount of side-effects on unrelated code files is kept low.
Lastly, the process allows you to steer the LLM as an engineer and put all your expertise into play to prevent slop from taking over. Keeping all the thoughts inside your repository helps improve the AI coding process with every ticket.
Why Not … ¶
… Specification-Driven Development? Spec-driven is an interesting approach, but it optimizes for standardization and does not focus on engineering excellence. It assumes that writing down what you want to achieve precisely enough is the key. We have already been there in waterfall projects in the 20th century, where architects in ivory towers tried dictating engineers how to implement code from UML diagrams. That aside, the process is too strict. I want a living process that is built by and for my team.
… Claude Code plan mode? The planning mode of Claude Code has improved drastically over the past weeks. I have used it in small projects for quick changes, and it did an excellent job researching the codebase and creating a plan before executing it in a clean context. However, the built-in mechanisms are not crafted for larger projects. I expect that with growing project size, the amount of context that needs to be created and maintained upfront increases. On top of that, the built-in functionality is not adapted to your individual project and workflow. You would not do Scrum purely by the book, would you?
Being the Centaur ¶
Becoming a centaur means being deliberate about where human judgment ends and machine execution begins. I keep ownership of intent, scope, constraints, and trade-offs, and let the LLM operate only inside a context I have consciously built and reviewed. When that boundary is clear, the model does not produce anonymous, low-context code. It produces changes that are tied to an explicit decision-making process. This does not eliminate AI slop or reviewer fatigue entirely, but it reduces both significantly, because the code still has an owner and a rationale behind it.
For me personally, this workflow also shifts my time back to the part of engineering I actually enjoy most: thinking through problems, shaping specifications, and crafting software designs. Once that work is done, the actual code generation becomes something closer to auto-complete on steroids. The LLM does not replace the thinking - it accelerates the translation of already-formed ideas into concrete code. That separation is what makes the process both effective and satisfying to work with.
Standing on the shoulders of giants: I did not invent this process. I just used it, enhanced it a bit, and found it useful enough to share. Kudos to my friend Marcel, maker of the awesome Bold video platform, who showed me his process to learn from. And thanks to Getting AI to Work in Complex Codebases, where I believe the basis of this process was shown first.