Chunking Strategies That Actually Affect Retrieval Quality
Most RAG pipelines fail at chunk size 512, split by character, never revisited. Here's what actually moves the needle on retrieval quality — and why your defaults are probably wrong.
Blog
Notes on production AI, data engineering, and the messy reality of shipping systems that work.
Most RAG pipelines fail at chunk size 512, split by character, never revisited. Here's what actually moves the needle on retrieval quality — and why your defaults are probably wrong.
Your software pipeline won't save your ML system. Here's what actually needs to be different — and why copying your DevOps playbook is a trap.
AI agents are proliferating, but they can't find each other. Agora is an open-source registry and discovery service that fixes that — built to complement A2A and MCP.
Technical debt in data systems doesn't sit quietly. It compounds. Every downstream model, dashboard, and decision built on dirty data pays the price.
The demo worked. Stakeholders loved it. And then nothing happened. Here's why — and how to stop the cycle.
Unit tests don't cover AI behavior. If you're shipping models without eval suites, you're shipping blind.
Why every production ML team needs model versioning, eval tracking, and promotion workflows.
No 90-day discovery phase. No 200-page strategy doc. Here's how we actually work.
If you're hiring 4 senior data engineers, you're not doing AI yet — you're building the foundation you skipped.