The first of this series covered our general aims how we’d be structuring the project. This post will cover how we get the data into RavenDB.

First we need to download the data. I got it obtained the data from this site, you need to click the “téléchanger la base” link under “Coordonnées géographiques des villes Françaises” here is the direct link. It’s not the best data source in the world, but it’s the best freely available one that I’ve found. Once you’ve unzipped the zip and converted it CSV the loading it into RavenDB is pretty straight forward.

First we need to design a type to hold the data:

type Commune =
    { mutable Id: string
      Name: string
      Postcode: string }

We’re only going to store the name of the commune its post code because that’s all we’re going to search on or show, so these are the two fields Name and Postcode. RavenDB is pretty robust when it comes to adding or deleting fields so it’s fine to start with a minimal set of data and add stuff later. The Id field is the unique identifier of the record it’s mutable because this just seems to work better with RavenDB. We could let RavenDB generate this for us but since INSEE, the French government’s bureau for statics and economic studies, assigns each village its own unique identifier and this is in the file, we’ll use this. In France several communes can share the same post code, so this would not be a good candidate for the identifier.

Once we’ve designed the type to store the commune data the code to load it from the file and store it in RavenDB is pretty straight forward:

let loadCommuneData() =
    use store = DocumentStore.OpenInitializedStore()
    let lines = File.ReadLines(Path.Combine(__SOURCE_DIRECTORY__, @"ville.csv"), System.Text.Encoding.Default)
   
    use session = store.OpenSession()
    session.Advanced.MaxNumberOfRequestsPerSession <- 30000
    lines
    |> Seq.skip 1
    |> Seq.iteri(fun i line ->
        let line = line.Split(';')
        match line with
        | [|  name; nameCaps; postcode; inseeCode; region; latitude; longitude; eloignementf|] ->
            let id = sprintf "communes/%s" (inseeCode.Trim())
            printfn "Doing %i %s (%s)" i name id
            let place: Commune =
                { Id = id
                  Name = name.Trim()
                  Postcode = postcode.Trim() }
            session.Store(place)
            if i % 1000 = 0 then session.SaveChanges()
        | line -> printfn "Error in line: %A" line)
    session.SaveChanges()

There are just a few points worth highlighting:

- We use the new in .NET 4.0 File.ReadLines to give us an IEnumerable of all the lines in the file. This gives us a nice convenient way to read the file line by line without loading it all into memory.

- Notice we’re passing System.Text.Encoding.Default to File.ReadLines French communes often have accented characters in their names, so we need to ensure we’re using the right encoding.

- It’s necessary to set the session.Advanced.MaxNumberOfRequestsPerSession as this is limited to 10 by default, meaning that after 10 requests or stores the session would throw an exception. This is because in typical use of sessions, they are meant to be short lived, so this exception is meant as an early warning for developers. Since this is an atypical use of a session it’s okay to set this number. However, I think sessions cache the data that they store, so you may want to clear the session after each write to RavenDB. Doesn’t seem to make much difference in this case.

- We enumerate each row in the file using Seq.iteri this gives us the row plus the row number. We can use the row number to do a save every 1000 items (by calling .SaveChanges()), this seems to be more efficient than either saving after each row or trying to save the whole lot all at once. I haven’t done much experimentation with this number, there may be a more optimal number the 1000.

- The parsing of the file is very simple, we simply call .Split(';') on each row and then pattern match over the resulting the array to unpack the relevant items. These are then loaded into the Commune type and stored in RavenDB using the sessions Store() method. As mentioned earlier these aren’t flushed to the DB until you call .SaveChanges().

And that about wraps it up, the data is in the DB and you can verify this using RavenDB’s administrative console:

The full code base can be found in the github repository for PicoMvc.