The first half of this series the accent is on immutable programming, because in the first part of the series we had an introduction to immutable data, and in this second part we’re going to look in depth at immutable programming possibilities, with the idea of show that this is actually not too different to what your used. Once we’ve conquered the immutability we’ll start to dig into the concurrency. The main idea behind this post is take a look at everything that makes working with immutable data bearable, even likable.

 

Expressions and Returning Multiple Values (a.k.a. Tuples)

In F# everything is an expression. You can have statements too, but these are just expressions that return a special unit type that contains one value, the somewhat cryptic () value. This is largely because values are immutable by default: a statement is a side effect, an admission that a mutation has take place, an expression is something that returns a value, which is how we make progress in the functional world. This has a few interesting effects on our language, and ultimately our way of thing about programming. Consider the following C# fragment:

    public string FoundItem(bool found) {

      string foundString;

      if (found) {

        foundString = "Found";

      }

      else {

        foundString = "Not found";

      }

      return foundString;

    }

Here we have a method that takes an input and through a series of steps: initialise a local, perform an if statement, return the local variable, our method transforms the input into a result. Now let’s take a look at the equivalent in F#:

let foundItem found =

    if found then

        "Found"

    else

        "Not found"

In F# our “if” is not a statement, but an expression. So we’re able to throw away the tedious implementation detail of declaring a local and concentrate on the important bit: the transformation. So I think we ultimately we come to think of series of transformations of on the data that we’re using rather than a series of steps that the computer must perform.

A quick aside: you can of course rewrite the C# in a more expression oriented manor:

    public string FoundItemAlt1(bool found) {

      if (found) {

        return "Found";

      }

      else {

        return "Not found";

      }

    }

    public string FoundItemAlt2(bool found) {

      return found ? "Found" : "Not found";

    }

But neither feels particularly like idiomatic C# and lot of C# style guides council against this sort of style, probably because C# doesn’t allow you to return multiple arguments, which makes this style of programming slightly awkward, we’ll look at this issue next.

So if you don’t know F# you may be thinking that having everything as an expression would make translating the following code fragment difficult, since here we update two local variables:

    public void FoundItemMessage(bool found) {

      string foundTitle;

      string foundMessagge;

      if (found) {

        foundTitle = "Found";

        foundMessagge = "The item was found";

      }

      else {

        foundTitle = "Not found";

        foundMessagge = "The item wasn't found";

      }

      MessageBox.Show(foundMessagge, foundTitle);

    }

In fact it’s very easy to translate this sort of thing because in F# you can return two values or more values from any expression:

let foundItemMessage found =

    let foundTitle, foundMessagge =

        if found then

            "Found", "The item was found"

        else

            "Not found", "The item wasn't found"

    MessageBox.Show(foundMessagge, foundTitle)

Returning multiple items is called returning a tuple, in the above code we return a pair, a special name for a tuple of 2 items, from our if expression and straight away unpack our tuple into the identifiers foundTitle and foundMessage. As you become more familiar with tuples you start to see that in C# we often use mutation to get around the fact we can’t easily return multiple values. In C# it’s a fairly common pattern to update class members as way of returning results when we need to computer two different but related values in method:

    int cost;

    int profit;

 

    private void ComputeCostProfit(int price) {

      // obviously the calculation needs to more complex to

      // justify being a separate method

      cost = rnd.Next(price);

      profit = price - cost;

    }

 

    public void OutputProfit() {

      int price = 42;

      ComputeCostProfit(42);

      Console.WriteLine("Price: {0} Cost: {1} Profit: {2}", price, cost, profit);

    }

We can translate this into the following F#:

let computeCostProfit price =

      let cost = rnd.Next(price)

      cost, price - cost

 

let outputProfit() =

    let price = 42

    let cost, profit = computeCostProfit price

    Printf.printf "Price: %i Cost: %i Profit: %i" price cost profit

The F# implementation of computeCostProfit is not only more encapsulated because we no longer rely on updating member field, it’s also thread safe as now all state is held on the threads stack.

I think now we’re beginning to get a feel for how expressions and tuples can help us avoid the need for mutable state, now let’s take a tour of other immutable types in F#.

Records

In F# records are a way of grouping multiple values and naming the fields. In F# a record looks something like:

type Person =

    { FirstName: string;

      LastName: string; }

And we can construct instances of them like:

let helen = { FirstName = "Helen"; LastName = "Rippon" }

By default all fields in a record are immutable.  This may sound like a horrible grind, as sooner or later we’re bound to want to make a change to the record, but this is why F# provides a nice short-hand syntax for creating new records with one or more fields changed:

let marriedHelen = { helen with LastName = "Jones" }

It’s also common practices to provide member methods that create copies of record as a convenient way of altering the record:

type Person =

    { FirstName: string;

      LastName: string; }

    with

        member x.Marries lastName =

            { x with LastName = lastName }

Although here we do more work up front it generally means we have nice short hand way of using our record though out the code:

let marriedHelen = helen.Marries "Jones"

At this point you’re probably starting to see that using immutable constructs is as different as you previously may have thought, but maybe starting to worry about the performance of all that copying. This is a real concern, but probably isn’t as bad as you think. Allocation is very quick in a garbage collected environment like the CLR, and the amount of data we’re copying is generally small, in the previous example we’re only coping two pointers since the strings themselves are immutable so need to copy them. In fact, if all your records are immutable then there is only ever need to make a shallow copy, because the data structures your referencing are guaranteed not to change so there’s no need to copy them. Finally garbage collection of new objects is very efficient in the CLR, because its collections are split into generations.

List/Set/Map

List, Set and Map are F#’s immutable collections. List is a singularly linked list and Set and Map are both based on binary trees with Set being a mathematical “Set” and “Map” being a collection of name value pairs. Again using these types isn’t that different from using mutable types, you just need to keep in mind a new value is being produced every time you add a new value, but this doesn’t change the look and the feel of the code too much. To illustrate this let’s look at initializing a ResizeArray (F#’s name for System.Collections.Generic.List) and initializing a set:

    let ra =

        let temp = new ResizeArray<string>()

        temp.Add("toto")

        temp.Add("lolo")

        temp.Add("bobo")

        temp

       

    let set =

        let temp = Set.empty

        let temp = temp.Add("toto")

        let temp = temp.Add("lolo")

        let temp = temp.Add("bobo")

        temp

I think you’ll agree in both cases the code looks quite similar: it has the same look and feel, but there are a couple of key differences that are important to notice. First notice that we initialise “set” to be “empty”, this is the cute part of programming using immutable data structures there is no need of several different instances of the empty set, it is immutable, so the empty set will always be the empty set and can be shared between all the different sets that the program uses. The second thing to notice is how we need to recapture the reference to the newly created object every time we add a new item:

        let temp = temp.Add("toto")

but you’ll be getting used the by now. F#’s list structure works in pretty much the same way expect we use the [] symbol to represent the empty list and the :: symbol to represent adding an item to a list:

    let list =

        let temp = []

        let temp = "toto" :: temp       

        let temp = "lolo" :: temp       

        let temp = "bobo" :: temp

        temp

Since initialising a list like this can be pretty tedious F# also provides a short had for doing it:

let listShortHand = [ "toto"; "lolo"; "bobo" ]

Looping Using Recursion

Traditional looping constructs like “for” and “while” are statements and rely on mutation to store temporary results, so you may think translating a section of C# that’s based on while loop, like the section below which calculates whether a complex number is a member of the Mandelbrot set, would be difficult:

    Complex cMax = Complex.Create(1.0, 1.0);

    Complex cMin = Complex.Create(-1.0, -1.0);

    int iterations = 18;

 

    public bool isInMandelbrotSet(Complex c0) {

      bool res = false;

      Complex c = c0;

      int count = 0;

      while ((cMin < c) && (c < cMax) && (count < iterations)) {

        c = (c * c) + c0;

        count++;

      }

      return count == iterations;

    }

However loops can be translated into recursive functions, which do not reply on mutation. An example of how the while loop could be translated into a recursive function is shown below. Here the loop is translated into a recursive “check” function which checks the exit condition and if it’s not met recursively calls the check function:

let cMax = complex 1.0 1.0

let cMin = complex -1.0 -1.0

let iterations = 18

 

let isInMandelbrotSet c0 =

    let rec check n c =

        (n = iterations)

        || (cMin < c) && (c < cMax) && check (n + 1) ((c * c) + c0)

    check 0 c0

These techniques are very powerful when working with immutable collections. For example it’s common to use recursion and pattern matching work with lists, the below example shows how to transform a list of strings into a list of integers:

let rec mapStringToInt l =

    match l with

    | hd :: tl -> System.Int32.Parse(hd) :: mapStringToInt tl

    | [] -> []

 

let ints = mapStringToInt ["1"; "2"; "3"]

Transformations of this nature are so common that very often you don’t need worry about building a recursive function to perform the transformation as a library function “map” is already provide:

let rec map f l =

    match l with

    | hd :: tl -> f hd :: map f tl

    | [] -> []

You simply need to provide the transformation that you want to apply to each item in the list:

let ints = List.map System.Int32.Parse ["1"; "2"; "3"]

Another common pattern when working with immutable structures is to use folding to create a summary of the collection. Folding works well with immutable collection because a fold function threads an accumulator value though the computation, that is to say that each function call is passed an accumulator value that was the result of the last function. This is extremely useful when working with immutable data structures as the accumulator provides way to pass state between each iteration without have to resort to mutable data. The below code shows how to use folding to create an F# “Set” from an F# list:

let set = [1;1;2;2;3;3;3]

          |> List.fold_left (fun acc x -> Set.add x acc ) Set.empty

Here the “Set.add” function produces a new set that is passed to the next function call via the “acc” accumulator parameter.

This is such a common thing to do that there is no need to write a function to create a set of a list it is provided for you as a library function:

let altSet = Set.of_list [1;1;2;2;3;3;3]

Constructors/Union Types

The union types are a bit unusual because where as records and collections can make sense as either mutable or immutable types, union types generally only make sense as immutable type. A very quick recap for those unfamiliar with union types: a union type is composed of one or more constructors. A “constructor” in the ML (meta language – the family of language that F# comes from) sense is a named type that can contain a fixed number of augments. For instance might declare a constructor of type “vector”:

type vector = Vector2D of int*int

We can compose a constructor, that is, create a new instance:

let myVector = Vector2D (1, 2)

Then one way to recover the values it contains is to use a function to decompose it:

let x,y = (function Vector2D(x,y) -> x, y) myVector

Here we use an anonymous function to decompose the values, but we’re just using that to keep the example brief, in reality we’d probably use a named function for better reuse. Also constructors can be decomposed though pattern matching, we’ll cover this we look at union types that compose two or more constructors.

So constructors can be composed or decomposed, they don’t really have any concept of mutation, that’s why we said earlier they don’t really make sense as mutable data structures, though they can of course contain mutable data:

type thing = MyThing of Dictionary<string, int>

But that’s generally the exception rather than the rule.

So let’s look at a more realistic example. Union types are often used to represent trees; because individual constructors can contain reference to the type of the original union it’s easy to build recursive data structures that can be used to represent trees. Below is an example of a classic binary tree, the tree is composed of two constructors one a node that contains two “BinaryTree” (a right and left branch) and a “Leaf” that contains the value.

type BinaryTree<'a> =

    | Node of BinaryTree<'a> * BinaryTree<'a>

    | Leaf of 'a

The following shows you how to construct an instance of a very small tree, that contains one node and two leaves:

let tree = Node(Leaf 1, Leaf 2)

To help make sense of how we’d use this data structure, the following code snippet shows us pattern matching over a tree structure to collect the information at the leaves of the tree:

let flattenTree tree =

    let rec flattenTree tree acc =

        match tree with

        | Node (right, left) ->

            flattenTree right acc

            |> flattenTree left

        | Leaf value -> value :: acc

    flattenTree tree []

The important part is the “match ... with” expression in the middle of the function. Here we break up the union type to see whether it contains a node or a leaf. If we find a leaf we add the value to the accumulator, if we find a node we recursively treat each branch. So here we deconstruct the tree structure while construing a new list.

The Accumulator and Environment Patterns

Even with the help of tuples to allow us to return multiple results we can run into problems with having to pass too much data around, especially if our algorithm becomes very complicated. To cope with this we can use the accumulator and environment patterns. These can be seen in the post I made a while ago about an ant colony simulation.

Both patterns concern different ways we can pass data though an algorithm and are really just a name for identifiers than are thread though out the function. In an environment pattern we want pass data down though the algorithm, in an accumulator pattern we want to retrieve data from a function. Both are generally represented as represent as record types, but could equally be tuples too. It’s not uncommon to see functions with both and environment and an accumulator.

Let’s dig into an example to see if I can make that a bit clearer. Suppose we want to collect information from the nodes of a tree – a common enough task in functional programming. Now suppose we have the following tree definition:

type MadTree =

    | Foo of (DateTime * int)

    | Bar of (DateTime * string)

    | Node of ((DateTime * DateTime) * (MadTree * MadTree))

We decided that for the moment we are only interested in collecting nodes that are within the time slice specified by the parent node. Therefore we need a way of propagating the time information forward as well as passing the found instances of Foo and Bar back, for this we use an accumulator and an environment:

type Accumulator =

    { foundFoos: List<int>;

      foundBars: List<string>; }

 

type Environment =

    { startDate: DateTime;

      endDate: DateTime; }

So now we can easily write a recursive function using these types to instance of Foo and Bar that we want:

let collectFooBars tree =

    let rec collectFooBars tree env acc =

        match tree with

        | Foo (date, i) ->

            if env.startDate < date && date < env.endDate then

                { acc with foundFoos = i :: acc.foundFoos }

            else acc

        | Bar (date, s) ->

            if env.startDate < date && date < env.endDate then

                { acc with foundBars = s :: acc.foundBars }

            else acc

        | Node ((startDate, endDate), (ln, rn)) ->

            let newEnv = { startDate =  startDate; endDate = endDate }

            let acc = collectFooBars ln newEnv acc

            collectFooBars rn newEnv acc

    collectFooBars tree

        { startDate = DateTime.MaxValue; endDate = DateTime.MaxValue }

        { foundFoos = []; foundBars = []; }

Concurrency and the Haskell Problem

So now finally we get to a little concurrency. The good news if we implement algorithms using these data structures we can on as many threads as we like and we know I’ll never see any threading issues. For example if we have a large list of MadTree that we want to process there’s no problem to spin up a new thread to process each one, provide:

let threadProcessTree tree =

    let thread = new Thread(fun () -> collectFooBars tree |> ignore)

    thread.Start()

   

madTrees |> List.iter threadProcessTree

However there is also some bad news ... it’s up us the, programmer, to guarantee that functions we created are “pure”, that is they have no side effects. However F# does help us a bit here, if we know we need a section of a program to be “pure”, if we stick to F# native structures like records and union types, and its native collections then we know it’s pure. But also, in guarantying purity we run into the Haskell problem (or at least Haskell pre-monad revolution problem, as described by Simon Peyton Jones in this video), that is if we have no side effects then we cannot observe the results of a function. We can code complex and exciting algorithms but all we can observer is that they make one or more processors get warm. So in the next episode we’ll look at how we can manage concurrency in F# with immutable data structures.

Feedback:

Feedback was imported from my only blog engine, it’s no longer possible to post feedback here.

re: Immutability and Concurrency – Part II – A Review of F#’s Immutable Data Structures - Art

Hi Rob.
Just a note to let you know that I’m very interested in your Blog series.
In particular I’m interested in how to break the bottle neck interface between the multi/many-core CPU’s and the GPGPU’s. There are some intersting slides at the Graphics Hardware 2008 conference:
http://www.graphicshardware.org/program.html