Getting Started with Apache Kafka for the Baffled, Part 1

This post isn’t about installing Kafka, or configuring your cluster, or anything like that. My introduction to Kafka was rough, and I hit a lot of gotchas along the way. I want to help others avoid that pain if I can. If you aren’t familiar with why you might want to use Kafka, there are plenty of great articles that will outline why you might want to: The Log: What every software engineer should know about real-time data’s unifying abstraction Turning The Database Inside Out With Apache Samza In this introduction I will assume that you have gone through the Kafka quickstart with version 0.8.2.

» Continue Reading (about 1300 words)

Make Your Database Disposable!

Marvin and Stanley work for different companies that do the same thing. Both businesses deliver a stream of relevant content from the web to their customers. To do this, they both begin by filtering content based on keywords. All content is stored in MySQL, and the content is matched to the appropriate customers. This is simple and effective. Marvin comes from an enterprise background, and his architecture is to shuttle all incoming data straight into the MySQL database.

» Continue Reading (about 800 words)

Maximize the Work Not Done

It’s fuzzy who I first heard say it, but I remember the lesson well. It was a team I was on years ago, and it’s a principle that I have grossly assimilated. I didn’t immediately wrap my head around it, but agile, or Agile, or not, it’s important. Here is the simple explanation: I traded weeks of development for an afternoon of work. That isn’t just weeks of work, that’s weeks of opportunity cost, thousands of lines of code to test and maintain, infrastructure that didn’t need to be built and deployed.

» Continue Reading (about 800 words)

Nested Documents in ElasticSearch

ElasticSearch is an incredibly powerful tool, going well beyond just full text search. Flat JSON documents can take you a long way, but sometimes you need more. Nested documents in ElasticSearch let you model more complex data in the index. By default, ES will flatten child objects so that { "id": 1, "children": [ { "name": "Ben", "age": 10 }, { "name": "Jenny", "age": 12 } ] } would become { "id": 1, "children.name": [ "Ben", "Jenny" ], "children.age": [ 10, 12 ] } The association of the data has been obscured.

» Continue Reading (about 800 words)

About

Hello there, I’m Shayne Studdard. I’m a software developer and a photographer. As a developer, I love being pragmatic and agile. I love wrangling data, playing with NLP and machine learning. Technologies I currently love include Clojure, PostgreSQL, ElasticSearch, and, well, bash. Recently I’ve added Apache Kafka to my breakfast. As a photographer, I’m an amateur trying to find my way, developing my eye and my style. Frequently I photograph my kids when I’m not wringing their necks.

» Continue Reading (about 100 words)