Immutability and Concurrency – Part II – A Review of F#’s Immutable Data Structures

The first half of this series the accent is on immutable programming, because in the first part of the series we had an introduction to immutable data, and in this second part we’re going to look in depth at immutable programming possibilities, with the idea of show that this is actually not too different to what your used. Once we’ve conquered the immutability we’ll start to dig into the concurrency. The main idea behind this post is take a look at everything that makes working with immutable data bearable, even likable.

 

Expressions and Returning Multiple Values (a.k.a. Tuples)

In F# everything is an expression. You can have statements too, but these are just expressions that return a special unit type that contains one value, the somewhat cryptic () value. This is largely because values are immutable by default: a statement is a side effect, an admission that a mutation has take place, an expression is something that returns a value, which is how we make progress in the functional world. This has a few interesting effects on our language, and ultimately our way of thing about programming. Consider the following C# fragment:

    public string FoundItem(bool found) {

      string foundString;

      if (found) {

        foundString = "Found";

      }

      else {

        foundString = "Not found";

      }

      return foundString;

    }

Here we have a method that takes an input and through a series of steps: initialise a local, perform an if statement, return the local variable, our method transforms the input into a result. Now let’s take a look at the equivalent in F#:

let foundItem found =

    if found then

        "Found"

    else

        "Not found"

In F# our “if” is not a statement, but an expression. So we’re able to throw away the tedious implementation detail of declaring a local and concentrate on the important bit: the transformation. So I think we ultimately we come to think of series of transformations of on the data that we’re using rather than a series of steps that the computer must perform.

A quick aside: you can of course rewrite the C# in a more expression oriented manor:

    public string FoundItemAlt1(bool found) {

      if (found) {

        return "Found";

      }

      else {

        return "Not found";

      }

    }

    public string FoundItemAlt2(bool found) {

      return found ? "Found" : "Not found";

    }

But neither feels particularly like idiomatic C# and lot of C# style guides council against this sort of style, probably because C# doesn’t allow you to return multiple arguments, which makes this style of programming slightly awkward, we’ll look at this issue next.

So if you don’t know F# you may be thinking that having everything as an expression would make translating the following code fragment difficult, since here we update two local variables:

    public void FoundItemMessage(bool found) {

      string foundTitle;

      string foundMessagge;

      if (found) {

        foundTitle = "Found";

        foundMessagge = "The item was found";

      }

      else {

        foundTitle = "Not found";

        foundMessagge = "The item wasn't found";

      }

      MessageBox.Show(foundMessagge, foundTitle);

    }

In fact it’s very easy to translate this sort of thing because in F# you can return two values or more values from any expression:

let foundItemMessage found =

    let foundTitle, foundMessagge =

        if found then

            "Found", "The item was found"

        else

            "Not found", "The item wasn't found"

    MessageBox.Show(foundMessagge, foundTitle)

Returning multiple items is called returning a tuple, in the above code we return a pair, a special name for a tuple of 2 items, from our if expression and straight away unpack our tuple into the identifiers foundTitle and foundMessage. As you become more familiar with tuples you start to see that in C# we often use mutation to get around the fact we can’t easily return multiple values. In C# it’s a fairly common pattern to update class members as way of returning results when we need to computer two different but related values in method:

    int cost;

    int profit;

 

    private void ComputeCostProfit(int price) {

      // obviously the calculation needs to more complex to

      // justify being a separate method

      cost = rnd.Next(price);

      profit = price - cost;

    }

 

    public void OutputProfit() {

      int price = 42;

      ComputeCostProfit(42);

      Console.WriteLine("Price: {0} Cost: {1} Profit: {2}", price, cost, profit);

    }

We can translate this into the following F#:

let computeCostProfit price =

      let cost = rnd.Next(price)

      cost, price - cost

 

let outputProfit() =

    let price = 42

    let cost, profit = computeCostProfit price

    Printf.printf "Price: %i Cost: %i Profit: %i" price cost profit

The F# implementation of computeCostProfit is not only more encapsulated because we no longer rely on updating member field, it’s also thread safe as now all state is held on the threads stack.

I think now we’re beginning to get a feel for how expressions and tuples can help us avoid the need for mutable state, now let’s take a tour of other immutable types in F#.

Records

In F# records are a way of grouping multiple values and naming the fields. In F# a record looks something like:

type Person =

    { FirstName: string;

      LastName: string; }

And we can construct instances of them like:

let helen = { FirstName = "Helen"; LastName = "Rippon" }

By default all fields in a record are immutable.  This may sound like a horrible grind, as sooner or later we’re bound to want to make a change to the record, but this is why F# provides a nice short-hand syntax for creating new records with one or more fields changed:

let marriedHelen = { helen with LastName = "Jones" }

It’s also common practices to provide member methods that create copies of record as a convenient way of altering the record:

type Person =

    { FirstName: string;

      LastName: string; }

    with

        member x.Marries lastName =

            { x with LastName = lastName }

Although here we do more work up front it generally means we have nice short hand way of using our record though out the code:

let marriedHelen = helen.Marries "Jones"

At this point you’re probably starting to see that using immutable constructs is as different as you previously may have thought, but maybe starting to worry about the performance of all that copying. This is a real concern, but probably isn’t as bad as you think. Allocation is very quick in a garbage collected environment like the CLR, and the amount of data we’re copying is generally small, in the previous example we’re only coping two pointers since the strings themselves are immutable so need to copy them. In fact, if all your records are immutable then there is only ever need to make a shallow copy, because the data structures your referencing are guaranteed not to change so there’s no need to copy them. Finally garbage collection of new objects is very efficient in the CLR, because its collections are split into generations.

List/Set/Map

List, Set and Map are F#’s immutable collections. List is a singularly linked list and Set and Map are both based on binary trees with Set being a mathematical “Set” and “Map” being a collection of name value pairs. Again using these types isn’t that different from using mutable types, you just need to keep in mind a new value is being produced every time you add a new value, but this doesn’t change the look and the feel of the code too much. To illustrate this let’s look at initializing a ResizeArray (F#’s name for System.Collections.Generic.List) and initializing a set:

    let ra =

        let temp = new ResizeArray<string>()

        temp.Add("toto")

        temp.Add("lolo")

        temp.Add("bobo")

        temp

       

    let set =

        let temp = Set.empty

        let temp = temp.Add("toto")

        let temp = temp.Add("lolo")

        let temp = temp.Add("bobo")

        temp

I think you’ll agree in both cases the code looks quite similar: it has the same look and feel, but there are a couple of key differences that are important to notice. First notice that we initialise “set” to be “empty”, this is the cute part of programming using immutable data structures there is no need of several different instances of the empty set, it is immutable, so the empty set will always be the empty set and can be shared between all the different sets that the program uses. The second thing to notice is how we need to recapture the reference to the newly created object every time we add a new item:

        let temp = temp.Add("toto")

but you’ll be getting used the by now. F#’s list structure works in pretty much the same way expect we use the [] symbol to represent the empty list and the :: symbol to represent adding an item to a list:

    let list =

        let temp = []

        let temp = "toto" :: temp       

        let temp = "lolo" :: temp       

        let temp = "bobo" :: temp

        temp

Since initialising a list like this can be pretty tedious F# also provides a short had for doing it:

let listShortHand = [ "toto"; "lolo"; "bobo" ]

Looping Using Recursion

Traditional looping constructs like “for” and “while” are statements and rely on mutation to store temporary results, so you may think translating a section of C# that’s based on while loop, like the section below which calculates whether a complex number is a member of the Mandelbrot set, would be difficult:

    Complex cMax = Complex.Create(1.0, 1.0);

    Complex cMin = Complex.Create(-1.0, -1.0);

    int iterations = 18;

 

    public bool isInMandelbrotSet(Complex c0) {

      bool res = false;

      Complex c = c0;

      int count = 0;

      while ((cMin < c) && (c < cMax) && (count < iterations)) {

        c = (c * c) + c0;

        count++;

      }

      return count == iterations;

    }

However loops can be translated into recursive functions, which do not reply on mutation. An example of how the while loop could be translated into a recursive function is shown below. Here the loop is translated into a recursive “check” function which checks the exit condition and if it’s not met recursively calls the check function:

let cMax = complex 1.0 1.0

let cMin = complex -1.0 -1.0

let iterations = 18

 

let isInMandelbrotSet c0 =

    let rec check n c =

        (n = iterations)

        || (cMin < c) && (c < cMax) && check (n + 1) ((c * c) + c0)

    check 0 c0

These techniques are very powerful when working with immutable collections. For example it’s common to use recursion and pattern matching work with lists, the below example shows how to transform a list of strings into a list of integers:

let rec mapStringToInt l =

    match l with

    | hd :: tl -> System.Int32.Parse(hd) :: mapStringToInt tl

    | [] -> []

 

let ints = mapStringToInt ["1"; "2"; "3"]

Transformations of this nature are so common that very often you don’t need worry about building a recursive function to perform the transformation as a library function “map” is already provide:

let rec map f l =

    match l with

    | hd :: tl -> f hd :: map f tl

    | [] -> []

You simply need to provide the transformation that you want to apply to each item in the list:

let ints = List.map System.Int32.Parse ["1"; "2"; "3"]

Another common pattern when working with immutable structures is to use folding to create a summary of the collection. Folding works well with immutable collection because a fold function threads an accumulator value though the computation, that is to say that each function call is passed an accumulator value that was the result of the last function. This is extremely useful when working with immutable data structures as the accumulator provides way to pass state between each iteration without have to resort to mutable data. The below code shows how to use folding to create an F# “Set” from an F# list:

let set = [1;1;2;2;3;3;3]

          |> List.fold_left (fun acc x -> Set.add x acc ) Set.empty

Here the “Set.add” function produces a new set that is passed to the next function call via the “acc” accumulator parameter.

This is such a common thing to do that there is no need to write a function to create a set of a list it is provided for you as a library function:

let altSet = Set.of_list [1;1;2;2;3;3;3]

Constructors/Union Types

The union types are a bit unusual because where as records and collections can make sense as either mutable or immutable types, union types generally only make sense as immutable type. A very quick recap for those unfamiliar with union types: a union type is composed of one or more constructors. A “constructor” in the ML (meta language – the family of language that F# comes from) sense is a named type that can contain a fixed number of augments. For instance might declare a constructor of type “vector”:

type vector = Vector2D of int*int

We can compose a constructor, that is, create a new instance:

let myVector = Vector2D (1, 2)

Then one way to recover the values it contains is to use a function to decompose it:

let x,y = (function Vector2D(x,y) -> x, y) myVector

Here we use an anonymous function to decompose the values, but we’re just using that to keep the example brief, in reality we’d probably use a named function for better reuse. Also constructors can be decomposed though pattern matching, we’ll cover this we look at union types that compose two or more constructors.

So constructors can be composed or decomposed, they don’t really have any concept of mutation, that’s why we said earlier they don’t really make sense as mutable data structures, though they can of course contain mutable data:

type thing = MyThing of Dictionary<string, int>

But that’s generally the exception rather than the rule.

So let’s look at a more realistic example. Union types are often used to represent trees; because individual constructors can contain reference to the type of the original union it’s easy to build recursive data structures that can be used to represent trees. Below is an example of a classic binary tree, the tree is composed of two constructors one a node that contains two “BinaryTree” (a right and left branch) and a “Leaf” that contains the value.

type BinaryTree<'a> =

    | Node of BinaryTree<'a> * BinaryTree<'a>

    | Leaf of 'a

The following shows you how to construct an instance of a very small tree, that contains one node and two leaves:

let tree = Node(Leaf 1, Leaf 2)

To help make sense of how we’d use this data structure, the following code snippet shows us pattern matching over a tree structure to collect the information at the leaves of the tree:

let flattenTree tree =

    let rec flattenTree tree acc =

        match tree with

        | Node (right, left) ->

            flattenTree right acc

            |> flattenTree left

        | Leaf value -> value :: acc

    flattenTree tree []

The important part is the “match ... with” expression in the middle of the function. Here we break up the union type to see whether it contains a node or a leaf. If we find a leaf we add the value to the accumulator, if we find a node we recursively treat each branch. So here we deconstruct the tree structure while construing a new list.

The Accumulator and Environment Patterns

Even with the help of tuples to allow us to return multiple results we can run into problems with having to pass too much data around, especially if our algorithm becomes very complicated. To cope with this we can use the accumulator and environment patterns. These can be seen in the post I made a while ago about an ant colony simulation.

Both patterns concern different ways we can pass data though an algorithm and are really just a name for identifiers than are thread though out the function. In an environment pattern we want pass data down though the algorithm, in an accumulator pattern we want to retrieve data from a function. Both are generally represented as represent as record types, but could equally be tuples too. It’s not uncommon to see functions with both and environment and an accumulator.

Let’s dig into an example to see if I can make that a bit clearer. Suppose we want to collect information from the nodes of a tree – a common enough task in functional programming. Now suppose we have the following tree definition:

type MadTree =

    | Foo of (DateTime * int)

    | Bar of (DateTime * string)

    | Node of ((DateTime * DateTime) * (MadTree * MadTree))

We decided that for the moment we are only interested in collecting nodes that are within the time slice specified by the parent node. Therefore we need a way of propagating the time information forward as well as passing the found instances of Foo and Bar back, for this we use an accumulator and an environment:

type Accumulator =

    { foundFoos: List<int>;

      foundBars: List<string>; }

 

type Environment =

    { startDate: DateTime;

      endDate: DateTime; }

So now we can easily write a recursive function using these types to instance of Foo and Bar that we want:

let collectFooBars tree =

    let rec collectFooBars tree env acc =

        match tree with

        | Foo (date, i) ->

            if env.startDate < date && date < env.endDate then

                { acc with foundFoos = i :: acc.foundFoos }

            else acc

        | Bar (date, s) ->

            if env.startDate < date && date < env.endDate then

                { acc with foundBars = s :: acc.foundBars }

            else acc

        | Node ((startDate, endDate), (ln, rn)) ->

            let newEnv = { startDate =  startDate; endDate = endDate }

            let acc = collectFooBars ln newEnv acc

            collectFooBars rn newEnv acc

    collectFooBars tree

        { startDate = DateTime.MaxValue; endDate = DateTime.MaxValue }

        { foundFoos = []; foundBars = []; }

Concurrency and the Haskell Problem

So now finally we get to a little concurrency. The good news if we implement algorithms using these data structures we can on as many threads as we like and we know I’ll never see any threading issues. For example if we have a large list of MadTree that we want to process there’s no problem to spin up a new thread to process each one, provide:

let threadProcessTree tree =

    let thread = new Thread(fun () -> collectFooBars tree |> ignore)

    thread.Start()

   

madTrees |> List.iter threadProcessTree

However there is also some bad news ... it’s up us the, programmer, to guarantee that functions we created are “pure”, that is they have no side effects. However F# does help us a bit here, if we know we need a section of a program to be “pure”, if we stick to F# native structures like records and union types, and its native collections then we know it’s pure. But also, in guarantying purity we run into the Haskell problem (or at least Haskell pre-monad revolution problem, as described by Simon Peyton Jones in this video), that is if we have no side effects then we cannot observe the results of a function. We can code complex and exciting algorithms but all we can observer is that they make one or more processors get warm. So in the next episode we’ll look at how we can manage concurrency in F# with immutable data structures.

Turning 30

I will be 30 on Friday, I’m to celebrate this I going to say on the small meditation island of Ile de Port Cros. In an effort to break my chronic email addiction I’m not taking a laptop, so don’t expect the second part of Immutability and Concurrency till next week!

Prochaine réunion d’alt.net de Paris - mercredi 4 juin, 20h00 - Le Café des Initiés

La prochain réunion d’alt.net de Paris aura lieu mercredi 4 juin, 20h00 à le « Le Café des Initiés », 3, Place des Deux Ecus, 75001 Paris. Moi, Julien et Symon sera là, et vous ?

----

The next meeting of alt.net Paris will take place on Wednesday 4th June at the “Café des Inities”, 3, Place des Deux Ecus, 75001 Paris. Symon, Julien and I will be there, will you?

Immutability and Concurrency – Part I – Getting to Know Immutable Data Structures

When asking the question how does functional programming help me with concurrent programming? The standard response tends to be functional programming use immutable data structures, read-only data structures can be shared between threads without issues, end of problem. Except it isn’t. Immutable data structures have a different set of problems associated with them when working on concurrent problems. This post will examine what these problems are, and then show that this is just a special case of a more general set of problems when working with immutable data structures. Finally will start taking a look at how we solve some of these problems, but in a single thread environment first of all.

First let’s frame the problem by looking at how imperative programs work with threads. In a classic imperative/OO languages programmers tend to use either instance or static member variables to send messages between threads, let’s look a fragment of C# that does something classically multi-threaded:

    object workQueueLock = new object();

    Queue<WorkItem> workQueue = new Queue<WorkItem>();

 

    // this method runs on its own thread

    private void Worker() {

      while (true) {

        // define a work item attempt to retrive work from the queue

        WorkItem item = null;

        lock (workQueueLock) {

          if (workQueue.Count > 0) {

            item = workQueue.Dequeue();

          }

        }

 

        // check if we have work to do, otherwise sleep

        if (item != null) {

          // do some work

        }

        else {

          Thread.Sleep(QueuePollInterval);

        }

      }

    }

 

    // an even that resets our flag

    private void Some_Event(object sender, WorkEventArgs ea) {

      lock (workQueueLock) {

        workQueue.Enqueue(ea.WorkItem);

      }

    }

I wouldn’t recommend you use this naive version of a work queue, but the above code is straight forward enough to understand easily and illustrate the typical way imperative programs communicate between threads. We have a member variable “workQueue” that controls stores the work to be done, the method “Worker” is designed to read from this queue and if there’s some work to do, do the work, otherwise sleep till it’s time to poll the queue again. We use “workQueue” again in “Some_event” to send a message to “Worker”, to enqueue some work for it to do. It’s easy to see that mutation of the variable “workQueue” is essential to get this to work; if we couldn’t change the content of “workQueue” then we couldn’t send the message. It’s also easy to see that we now have a huge number of implementation choices: Do we lock on the queue, or have separate lock? What’s the shortest possible time we can hold the lock for (to avoid other threads be blocked when they want to write to the queue)? How long do we sleep for before polling the queue? Too shorter time and we risk wasting too much processor time polling the queue, too longer time and risk that the queue because unreactive because the worker wastes too much time sleeping when there’s work to be done.

In pure functional programming there are no variables or mutation, so the above scenario simply isn’t possible. Sure, F# isn’t a pure function language, actually most functional languages aren’t, so you can indeed use mutable data structures to implement something similar to the C# fragment we showed earlier, but that’s not the point we want to learn how to use immutable data structures. To fully understand the limitations of immutable data structures, let’s look at another C# example do something simpler. Imagine that we want compute a key of a value then store it in a member variable, a dictionary in this case, for later use:

    Dictionary<string, string> myDict = new Dictionary<string, string>();

 

    public void ReceiveValue(string val) {

      myDict.Add(ComputerKey(val), val);

    }

Now let’s think about how we can translate this into F#. Firstly, if don’t mind being dirty and mutable we can translate this fragment verbatim:

type Store() =

    let myDict = new Dictionary<string, string>()

    member x.ReceiveValue (value:string) =

        myDict.Add(x.ComputeKey value, value)

However, if we don’t want to be mutable it’s not quite so straight forward. F# contains a type called “Map”, which is very similar to a Dictionary except that it is immutable. When you add a new item to a map you don’t change the map you create a new version of the map with the new key added. So here is how our store class would look to if we used an immutable “Map” data structure:

type ComputeKeys(myDict:Map<string, string>) =

    member x.ReceiveValue (value:string) =

        new ComputeKeys(myDict.Add(x.ComputeKey value, value))

The important thing to notice is that we now have no “let” definition where we store our dictionary; instead the dictionary is passed to the class constructor. So our constructor receives a “Map”, and when we use our “ReceiveValue” method we create a new instance of the “ComputerKeys” which contains the newly created value. I think the type signature really helps us understand what’s going on:

type ComputeKeys = class

                   end

                   with

                     member ReceiveValue : value:string -> ComputeKeys

                     new : myDict:Map<string,string> -> ComputeKeys

                   end

This is pretty much the revelation of immutable data structures, “let” definitions become merely short conveniences for values, not memory location that can be updated at a later data if we want to. These new values are all held on the threads stack, if were being pure and fully immutable that we have no memory locations that we can write them to. Okay let’s have a look at how we might use these two classes:

/// wraps a Dictionary<string, string> to provide

/// some hashing and printing functions

type Store() =

    // the dictionary that stores the values

    let myDict = new Dictionary<string, string>()

    /// receive a value, hash it store it

    member x.ReceiveValue (value:string) =

        myDict.Add(x.ComputeKey value, value)

    /// computers the hash (a bit naff for now)

    member x.ComputeKey (value:string) =

        value.GetHashCode().ToString()

    /// prints the stored values

    override x.ToString() =

        let stringWriter = new StringWriter()

        for key in myDict.Keys do

            stringWriter.WriteLine("{0}: {1}", key, myDict.[key])

        stringWriter.ToString()

 

let useStore() =

    let store = new Store()

    store.ReceiveValue("One")

    store.ReceiveValue("Two")

    store.ReceiveValue("Three")

    printfn "%s" (store.ToString())

The mutable version needs little explanation, it is classical imperative programming, we create an instance of store then add values to our store, and finally we print them out. Now compare this with the immutable version:

/// wraps a Map<string, string> to provide

/// some hashing and printing functions

type ComputeKeys(myDict:Map<string, string>) =

    /// receive a value, hash it, return the new value

    member x.ReceiveValue (value:string) =

        new ComputeKeys(myDict.Add(x.ComputeKey value, value))

    /// computers the hash (a bit naff for now)

    member x.ComputeKey (value:string) =

        value.GetHashCode().ToString()

    /// prints values in the map

    override x.ToString() =

        myDict.Fold

            (fun key value acc -> 

                Printf.sprintf "%s \r\n%s: %s" acc key value)

            ""

 

let useComputeKeys() =

    let keysEmpty = new ComputeKeys(Map.empty)

    let keysOne = keysEmpty.ReceiveValue("One")   

    let keysTwo = keysOne.ReceiveValue("Two")   

    let keysThree = keysTwo.ReceiveValue("Three")

    printfn "%s" (keysThree.ToString())

The thing to notice here is how similar using the immutable ComputeKeys class is to using the Store class. We create an instance of the class, we add values to it, and then finally we print it. The only difference being that we need to catch the value returned from RecieveValue and use this value in the next step. Here we’ve used different names for each instance – to illustrate that each let binding is to a different instances, but we don’t need to do that we can reuse the same name to save inventing new names:

let useComputeKeysAlt() =

    let keys = new ComputeKeys(Map.empty)

    let keys = keys.ReceiveValue("One")