Blog

Ideas on lean data engineering.

Practical writing on simpler architectures, maintainable analytics systems, and choosing the right level of complexity for the problem in front of you.

Showing 2 articles on page 2 of 2.

Article

The Lean Table Format

Hudi, Iceberg, and Delta are the hottest jargon in modern data lake design. They bring impressive capabilities to cloud object storage such as ACID transactions, schema evolution, time…

Article

Stop Paying for Distributed Frameworks You Don't Need

Every week, I come across teams running massive PySpark clusters to process datasets that could easily fit on a single machine. The result? Bloated AWS bills and a false sense of “future…