Skip to content

Blog

Notes on engineering, systems, and what I'm learning.

3 min read

Deterministic AI: Let the Model Interpret, Let Code Decide

The reliable way to ship LLM features isn't a better prompt — it's shrinking the model's job until everything around it is plain, testable code.

aillmsystems
Read
5 min read

Two AI Coding Agents, Not One: How I Actually Ship

Most engineers now run Cursor and Claude Code in parallel — here's how I split the work between them, and why code-review discipline matters more than ever.

aitoolsproductivity
Read
6 min read

Evals as CI: Catching Agent Regressions Before They Ship

LLM features rot silently — a prompt tweak or model upgrade quietly breaks a case you fixed weeks ago. The fix: run evals in CI like tests.

llmevalsagents
Read
5 min read

When to Use an LLM Agent vs Plain Code

Agents add latency, non-determinism, and real cost per run — so plain code is the default. Here's the decision framework I actually use.

aiagentsllm
Read
6 min read

A Go Event Pipeline at 100k Events/Day, Sub-200ms

How we built a serverless SQS → Lambda → DynamoDB pipeline in Go that handles 100k events a day at sub-200ms end-to-end latency with 99.99% uptime — and what broke along the way.

goawssystems
Read
5 min read

MCP in Practice: Tools for an Agent Without the N×M Mess

MCP collapses the N×M agent-tool integration problem into one server per tool — here's what that means for how you actually design and scope tool contracts.

mcpagentsllm
Read
5 min read

Cutting PostgreSQL Query Latency on a Reporting Endpoint

A slow reporting endpoint, a missing composite index, an unsargable predicate, and what EXPLAIN ANALYZE actually told us — a debugging walkthrough.

postgresbackendperformance
Read
4 min read

Python or Go? How I Actually Choose for a Backend Service

A practical decision framework from shipping real services in both — concrete tiebreakers most teams underweight before they're forced to care.

pythongobackend
Read
5 min read

Context Engineering Is the New Prompt Engineering

Prompt engineering tunes the question; context engineering controls what tokens the model even sees — and the job is keeping that set ruthlessly small.

aillmcontext-engineering
Read
5 min read

Building a RAG Pipeline in Python You Can Actually Test

RAG feels untestable because generation is non-deterministic — the move is to decompose the pipeline into layers and test each one differently.

ragllmpython
Read
5 min read

Routing Between LLMs Without Blowing the Budget

How I built a routing layer for a bank's GenAI chatbot that cut resolution time ~30% while keeping model spend controlled — and when not to bother.

llmaiarchitecture
Read