Making Sense of Clojure's Overlooked Agents

Working extensively with Clojure in the last year, I’ve been exploring the many concurrency techniques favored by the language. Certain strategies like atoms and immutability have become second nature.

The Clojure books explain atoms, refs, and agents and how they work. While I understand refs and atoms, agents were never well covered. There are examples of using agents to protect resources like files, but that’s it.

The Textbook on Agents

It’s natural to assume agents are like actors. With actors, a section of code like a function is waiting to process messages in serial. Agents flip this notion upside down: an agent is a piece of data waiting for functions to be submitted to execute in serial.

This is totally bizarre at first, we’re used to submitting data in a queue, not functions! It wasn’t clear to me how this would be helpful.

The Problem

My application needs to log different information at different intervals. Every 10 seconds some detailed throughput information is logged, and every 60 seconds some rolled up metrics are logged. The logging module inverts control, so client code registers a function that produces a string and a period.

The module uses a single executor (which we won’t model in this example), and for each distinct period we create a scheduled event on the executor. We must keep the handle for the schedule for shutdown purpose, as well as a list of functions to be called to produce the log strings.

When adding a new logging function for a period, we must ensure that period is already scheduled, and if not create it. We must track the handles and the functions, both keyed by the period.

Refs

At first this sounded like a job for refs. Refs exist to coordinate transactional changes to multiple bits of data, and that’s what I needed.

    (def schedule-handles (ref {}))
    (def loggers (ref {}))

    (defn- ensure-scheduled
      [period]
      (when-not (@schedule-handles period)
        (alter schedule-handles assoc period (new-scheduled period))))

    (defn- add-logger-to-period
      [period logger-fn]
      (if (@loggers period)
        (alter loggers update-in [period] conj logger-fn)
        (alter loggers assoc period [logger-fn])))
    
    (defn add-logger
      [period logger-fn]
      (dosync
        (ensure-scheduled period)
        (add-logger-to-period period logger-fn)))

This feels wrong. There is concurrency code spread throughout the module. Any addition of data to be tracked will make the code more complicated. Also, thanks to Matt Havener for pointing out that we leak handles when dosync retries. That may be fixable, but as it is this is a complete failure.

The refs allow us to handle updates to multiple data bits transactionally, but perhaps we could combine the data into a single dictionary and try…

Atoms

Atoms facilitate concurrent access to single values (e.g. an integer, a dictionary, a list). They are the most common form of concurrency and easy to work with.

    (def scheduler-data (atom {}))

    (defn- ensure-scheduled
      [period]
      (while (nil? (get-in @scheduler-data [:handles period]))
        (let [handle (new-scheduled period)
              data @scheduler-data
              new-data (assoc-in data [:handles period] handle)]
          (if-not (compare-and-set! scheduler-data
                                    data
                                    new-data)
            (destroy-scheduled handle)))))

Let’s stop right there. The initialization for a new period is complicating the code significantly. Not only do we have to bust out compare-and-swap, we also have to destroy the scheduled handle when our CAS fails. Also, once again, we have our concurrency code invading our functions.

The best abstraction is to serialize the operations to our module. This would simplify how initialization for a new period happens since we don’t have to worry about multiple inits happening at the same time. This also means the code can focus only on what it needs to accomplish and not worrying about concurrency.

Agents

Here is the implementation via agents.

    (def scheduler-data (agent {}))

    (defn- ensure-scheduled
      [data period]
      (if (get-in data [:loggers period])
        data
        (assoc-in data 
                  [:loggers period]
                  { :handle (new-scheduled period)
                    :log-fns [] })))

    (defn- add-logger-for-period
      [data period log-fn]
      (update-in data [:loggers period :log-fns] conj log-fn))

    (defn add-logger
      [period log-fn]
      (-> scheduler-data
          (send ensure-scheduled period)
          (send add-logger-for-period period log-fn)))

This is excellent, our functions can operate ignorant of concurrency issues. The internal functions of the module only transform a dictionary and return it. We are assured that functions operating on our agent data will be serial, we can safely ignore concurrency problems.

Conclusion

Agents aren’t like actors after all. They are a powerful concurrency abstraction that can simplify your code while making it safer. They work well for aggregating operations around a single piece of data. It’s almost like a safe alternative to shared memory!

In fact, I took the ideas I developed here and built a scheduler that is nearly API compatible with at-at called Bonney.