Engineer Note #5: Clojure’s Concurrency and Parallelism

Blocking, Thread, Reference Cell, Mutual Exclusion, Deadlock, Futures, Delays, and Promises

12 min readSep 5, 2020

Concurrency vs Parallelism

Concurrency refers to managing more than one task at the same time. Task just means “something that needs to get done,” and it doesn’t imply anything regarding implementation in your hardware or software. Let’s illustrate concurrency with Bob. Bob says,

I cannot sing while I dance

Bob explain that he can only manage one task (dancing). He won’t be able to sing (another task). However, if he decided to process tasks concurrently, he can,

I can stop dancing, sing a song, then continue dancing

Here, Bob is managing two tasks: dancing and singing. However, he is not executing both tasks at the same time. Instead, he is switching between the two, or interleaving. Note that, while interleaving, you don’t have to fully complete a task before switching: Bob could dance a bit, sing a verse, continue the dance, and then switch back to the next verse of the song.

Parallelism refers to executing more than one task at the same time. If Bob were to execute his two tasks in parallel, he will says,

I can sing while I dance

Parallelism is a subclass of concurrency: before you execute multiple tasks simultaneously, you first have to manage multiple tasks.

Clojure has many features that allow you to achieve parallelism easily. Computer systems generally achieve parallelism by simultaneously executing tasks on multiple processors.

It’s important to distinguish parallelism from distribution. Distributed computing is a special version of parallel computing where the processors are in different computers and tasks are distributed to computers over a network. It’d be like Bob asking Charlie,

Charlie can sing while I dance

Blocking

One of the major use cases for concurrent programming is for blocking operations. Blocking really just means waiting for an operation to finish. You’ll most often hear it used in relation to I/O operations, like reading a file or waiting for an HTTP request to finish. Let’s examine this using the concurrent Bob example.

If Bob watch a video on how to do dance whip nae nae and not dancing nor singing, then you would say that the watching a video operation is blocking and that these tasks are executing synchronously.

Thread

Thread is a subprogram. A program can have many threads, and each thread executes its own set of instructions while enjoying shared access to the program’s state.

Thread is like a steps when we do a cooking, a list that contains a sequence of instructions. The processor executes these instructions in order.

A thread can spawn a new thread to execute tasks concurrently. In a single-processor system, the processor switches back and forth between the threads (interleaving). Here’s where potential concurrency issues get introduced. Although the processor executes the instructions on each thread in order, it makes no guarantees about when it will switch back and forth between threads.

Note that this is just one possible order of instruction execution. The processor could also have executed the instructions in the any order Plating before the combine with butter for example. This makes the program nondeterministic. You can’t know beforehand what the result will be because you can’t know the execution order, and different execution orders can yield different results.

This example shows concurrent execution on a single processor through interleaving, whereas a multi-core system assigns a thread to each core, allowing the computer to execute more than one thread simultaneously.

Reference Cell

┌----┬-----------------┐
| ID | Instruction     |
├----┼-----------------┤
| A1 | WRITE X = 0     |
| A2 | READ X          |
| A3 | WRITE X = X + 1 |
| B1 | READ X          |
| B2 | WRITE X = X + 1 |
└----┴-----------------┘

If the processor follows the order A1, A2, A3, B1, B2, then X will have a value of 2. But if it follows the order A1, A2, B1, A3, B2, X’s value will be 1, because X will be read twice while the value is still 0 and both A3 and B2 call write with the X value still 0.

We’ll call this the reference cell problem. The reference cell problem occurs when two threads can read and write to the same location, and the value at the location depends on the order of the reads and writes.

Mutual Exclusion

Imagine two threads, each trying to write a spell to a file. Without any way to claim exclusive write access to the file, the spell will end up garbled because the write instructions will be interleaved. Consider the following two spells:

By the power invested in me
by the state of California,
I now pronounce you man and wife

and

Thunder, lightning, wind, and rain,
a delicious sandwich, I summon again

If you write these to a file with threads without mutual exclusion, you could end up with this:

By the power invested in me
by Thunder, lightning, wind, and rain,
the state of California,
I now pronounce you a delicious man sandwich, and wife
I summon again

Deadlock

Let’s take a look at dining philosophers problem. Imagine five philosophers sitting around in a round table, having a noodle in their front.

They can eat the noodle when they have a pair of chopstick. Each chopstick place in the right side and left side of each person resulting only 5 chopstick.

Their ritual proceeds thusly:

1. Pick the left chopstick, if available
2. Pick the right chopstick, if available
3. Eat noodle
4. Put back both chopsticks to the respective place back
5. Repeat

Following this ritual, it’s entirely possible that all the philosophers will pick up their left chopstick and then block indefinitely while waiting for the chopstick to their right to become available, resulting in deadlock.

Futures, Delays, and Promises

Futures, delays, and promises are easy, lightweight tools for concurrent programming. They do this by giving you more flexibility than is possible with serial code. When you write serial code, you bind together these three events:

Task definition
Task execution
Requiring the task’s result

As an example, look at this hypothetical code, which defines a simple API call task:

(web-api/get :noodle-philosophers)

As soon as Clojure encounters this task definition, it executes it. It also requires the result right now, blocking until the API call finishes. Part of learning concurrent programming is learning to identify when these chronological couplings aren’t necessary. Futures, delays, and promises allow you to separate task definition, task execution, and requiring the result.

Futures

In Clojure, you can use futures to define a task and place it on another thread without requiring the result immediately. You can create a future with the future macro.

(future (Thread/sleep 5000)
        (println "I'll print after 5 seconds"))
(println "I'll print immediately")

Thread/sleep tells the current thread to just sit on its bum and do nothing for the specified number of milliseconds. Normally, if you evaluated Thread/sleep in your REPL, you wouldn’t be able to evaluate any other statements until the REPL was done sleeping; the thread executing your REPL would be blocked. However, future creates a new thread and places each expression you pass it on the new thread, including Thread/sleep, allowing the REPL’s thread to continue, unblocked.

You can use futures to run tasks on a separate thread and then forget about them, but often you’ll want to use the result of the task. The future function returns a reference value that you can use to request the result. You can use the reference value to request a future’s result, but if the future isn’t done computing the result, you’ll have to wait.

Requesting a future’s result is called dereferencing the future, and you do it with either the deref function or the @ reader macro. A future’s result value is the value of the last expression evaluated in its body. A future’s body executes only once, and its value gets cached.

(let [result (future (println "this prints once")
                     (+ 1 1))]
  (println "deref: " (deref result))
  (println "@: " @result))=> "this prints once"
=> deref: 2
=> @: 2

Notice that the string "this prints once" indeed prints only once, even though you dereference the future twice. This shows that the future’s body ran only once and the result, 2, got cached.

Dereferencing a future will block if the future hasn’t finished running,

(let [result (future (Thread/sleep 3000)
                     (+ 1 1))]
  (println "The result is: " @result)
  (println "It will be at least 3 seconds before I print"))=> The result is: 2
=> It will be at least 3 seconds before I print

Sometimes you want to place a time limit on how long to wait for a future. To do that, you can pass deref a number of milliseconds to wait along with the value to return if the deref times out:

(deref (future (Thread/sleep 1000) 0) 10 5) => 5

This code tells deref to return the value 5 if the future doesn’t return a value within 10 milliseconds.

Finally, you can interrogate a future using realized? to see if it’s done running:

(realized? (future (Thread/sleep 1000))) => false(let [f (future)]
  @f
  (realized? f)) => true

Futures are a dead-simple way to sprinkle some concurrency on your program.

On their own, they give you the power to chuck tasks onto other threads, which can make your program more efficient. They also let your program behave more flexibly by giving you control over when a task’s result is required.

When you dereference a future, you indicate that the result is required right now and that evaluation should stop until the result is obtained. You’ll see how this can help you deal with the mutual exclusion problem in just a bit. Alternatively, you can ignore the result. For example, you can use futures to write to a log file asynchronously, in which case you don’t need to dereference the future to get any value back.

The flexibility that futures give you is very cool. Clojure also allows you to treat task definition and requiring the result independently with delays and promises.

Delays

Delays allow you to define a task without having to execute it or require the result immediately. You can create a delay using delay:

(def jackson-5-delay
  (delay (let [message "Just call my name and I'll be there"]
           (println "First deref:" message)
           message)))

In this example, nothing is printed, because we haven’t yet asked the let form to be evaluated. You can evaluate the delay and get its result by dereferencing it or by using force. force behaves identically to deref in that it communicates more clearly that you’re causing a task to start as opposed to waiting for a task to finish:

(force jackson-5-delay)=> First deref: Just call my name and I'll be there
=> "Just call my name and I'll be there"

Like futures, a delay is run only once and its result is cached. Subsequent dereferencing will return the Jackson 5 message without printing anything:

@jackson-5-delay=> "Just call my name and I'll be there"

One way you can use a delay is to fire off a statement the first time one future out of a group of related futures finishes. For example, pretend your app uploads a set of headshots to a headshot-sharing site and notifies the owner as soon as the first one is up, as in the following:

(def pictures ["memes.jpg" "catastrophic.jpg"])(defn email-user
  [email-address]
  (println "Sending picture notification to" email-address))(defn upload-document
  "Upload document"
  [picture]
  true)(let [notify (delay (email-user "my_email@gmail.com"))]
  (doseq [pic pictures]
    (future (upload-document pic)
            (force notify))))

In this example, you define a vector of pictures to upload and two functions (email-user and upload-document) to pretend-perform the two operations. Then you use let to bind notify to a delay. The body of the delay, (email-user "my_email@gmail.com") , isn’t evaluated when the delay is created. Instead, it gets evaluated the first time one of the futures created by the doseq form evaluates (force notify) . Even though (force notify) will be evaluated three times, the delay body is evaluated only once. The user will be grateful to know when the first picture is available. He/she’ll also appreciate not being spammed.

This technique can help protect you from the mutual exclusion, the problem of making sure that only one thread can access a particular resource at a time. In this example, the delay guards the email server resource. Because the body of a delay is guaranteed to fire only once, you can be sure that you will never run into a situation where two threads send the same email.

Promises

Promises allow you to express that you expect a result without having to define the task that should produce it or when that task should run. You create promises using promise and deliver a result to them using deliver.

(def my-promise (promise))
(deliver my-promise (+ 1 2))@my-promise => 3

Here, you create a promise and then deliver a value to it. Finally, you obtain the value by dereferencing the promise. Dereferencing is how you express that you expect a result, and if you had tried to dereference my-promise without first delivering a value, the program would block until a promise was delivered, just like with futures and delays. You can only deliver a result to a promise once.

One use for promises is to find the first satisfactory element in a collection of data. Suppose, for example, that you’re finding fruit that has the sweetest taste, the sweetness of at least 95 or more. You have a budget of $10 for the fruit.

The following code defines some check on which fruit we need, creates a function to mock up an API call, and creates another function to test whether a product is satisfactory:

(def mountain-watermelon
  {:store "Mountain Watermelon"
    :price 9
    :sweetness 89})(def watermelon-premium
  {:store "Premium Watermelon"
   :price 12
   :sweetness 99})(def watermelon-japan
  {:store "Japan Watermelon"
   :price 9
   :sweetness 97})(defn mock-api-call
  [result]
  (Thread/sleep 1000)
  result)(defn satisfactory?
  "If the fruit meets our criteria, return the fruit, else return false"
  [fruit]
  (and (<= (:price fruit) 10)
       (>= (:sweetness fruit) 95)
       fruit))

The API call waits one second before returning a result to simulate the time it would take to perform an actual call. To show how long it will take to check the sites synchronously, we’ll use some to apply the satisfactory? function to each element of the collection and return the first truthy result, or nil if there are none. When you check each site synchronously, it could take more than one second per site to obtain a result, as the following code shows:

(time (some (comp satisfactory? mock-api-call)
            [mountain-watermelon watermelon-premium watermelon-japan]))=> "Elapsed time: 3002.231 msecs"
=> {:store "Japan Watermelon", :sweetness 97, :price 9}

Here I’ve used comp to compose functions, and I’ve used time to print the time taken to evaluate a form. You can use a promise and futures to perform each check on a separate thread. If your computer has multiple cores, this could reduce the time it takes to about one second:

(time
 (let [fruit-promise (promise)]
   (doseq [fruit [mountain-watermelon watermelon-premium watermelon-japan]]
     (future (if-let [satisfactory-fruit (satisfactory? (mock-api-call fruit))]
               (deliver fruit-promise satisfactory-fruit))))
   (println "And the winner is:" @fruit-promise)))=> "Elapsed time: 1002.265 msecs"
=> And the winner is: {:store "Japan Watermelon", :sweetness 97, :price 9}

In this example, you first create a promise, fruit-promise, and then create three futures with access to that promise. Each future’s task is to evaluate a fruit site and to deliver the site’s data to the promise if it’s satisfactory. Finally, you dereference fruit-promise, causing the program to block until the site data is delivered. This takes about one second instead of three because the site evaluations happen in parallel. By decoupling the requirement for a result from how the result is actually computed, you can perform multiple computations in parallel and save some time.

Because promises can be written to only once, you prevent the kind of inconsistent state that arises from nondeterministic reads and writes.

You might be wondering what happens if none of the fruit is satisfactory. If that happens, the dereference would block forever and tie up the thread. To avoid that, you can include a timeout:

(let [p (promise)]
  (deref p 100 "timed out"))

This creates a promise, p, and tries to dereference it. The number 100 tells deref to wait 100 milliseconds, and if no value is available by then, to use the timeout value, "timed out".

That’s all for the concurrency and parallelism on Clojure for the Engineer Note #5.