Blog
Notes on engineering, systems, and what I'm learning.
Deterministic AI: Let the Model Interpret, Let Code Decide
The reliable way to ship LLM features isn't a better prompt — it's shrinking the model's job until everything around it is plain, testable code.
Two AI Coding Agents, Not One: How I Actually Ship
Most engineers now run Cursor and Claude Code in parallel — here's how I split the work between them, and why code-review discipline matters more than ever.
Evals as CI: Catching Agent Regressions Before They Ship
LLM features rot silently — a prompt tweak or model upgrade quietly breaks a case you fixed weeks ago. The fix: run evals in CI like tests.
When to Use an LLM Agent vs Plain Code
Agents add latency, non-determinism, and real cost per run — so plain code is the default. Here's the decision framework I actually use.
A Go Event Pipeline at 100k Events/Day, Sub-200ms
How we built a serverless SQS → Lambda → DynamoDB pipeline in Go that handles 100k events a day at sub-200ms end-to-end latency with 99.99% uptime — and what broke along the way.
MCP in Practice: Tools for an Agent Without the N×M Mess
MCP collapses the N×M agent-tool integration problem into one server per tool — here's what that means for how you actually design and scope tool contracts.
Cutting PostgreSQL Query Latency on a Reporting Endpoint
A slow reporting endpoint, a missing composite index, an unsargable predicate, and what EXPLAIN ANALYZE actually told us — a debugging walkthrough.
Python or Go? How I Actually Choose for a Backend Service
A practical decision framework from shipping real services in both — concrete tiebreakers most teams underweight before they're forced to care.
Context Engineering Is the New Prompt Engineering
Prompt engineering tunes the question; context engineering controls what tokens the model even sees — and the job is keeping that set ruthlessly small.
Building a RAG Pipeline in Python You Can Actually Test
RAG feels untestable because generation is non-deterministic — the move is to decompose the pipeline into layers and test each one differently.
Routing Between LLMs Without Blowing the Budget
How I built a routing layer for a bank's GenAI chatbot that cut resolution time ~30% while keeping model spend controlled — and when not to bother.