I watched with interested Don’s talk at PDC. This blog post is to help me put in perspective some of my initial thoughts on what type providers are, I’ve tried to write it so you don’t need to see the video of the session first, but obviously it will help if you have. Some of this is just my speculation – I have no insider information so I can speculate, but don’t take everything I say too seriously. The talk gave a preview of “Type providers”, an experimental feature in F# that will appear in a future version of F#. The aim of type providers give tighter integration between the F# programming language and external data sources, to allow external data source be accessed in strongly typed way. This is blog post to explain what they are and do a bit of speculation about how they work.
Lets start by framing what problem this new feature is trying to solve, accessing external data sources in a strongly typed way. Today there are roughly three you can take approaches to accessing external data sources in strongly typed languages like C# and F#:
1) Access the data in a weekly typed way and then load it into strongly typed classes. Typically you use one or more of the classes provided .NET framework for loading the data from the disk or network such as the StreamReader or the WebRequest to read the data and then there are various classes to help parsing the data, such as the XDocument class for parsing XML. This technique tends require the programmer to write quite a lot of code before the they can get there hands on the strongly typed data, also this code tends to be quite brittle and unless the programmer takes great care, it doesn’t resist changes to the data format well.
2) Use reflection. Typically a programmer creates strongly typed classes to contain the data they are interested, typically these classes will share a similar structure to the data they are expecting, depending on the circumstances it may or may not be necessary to create further class that describe how the data should be mapped to classes that will contain the data. They then need to write a module that will parse the incoming data and use reflection to create the appropriate instances of the classes to hold the data. A good example of this is FluentNHibernate or Entity Framework in code first mode, here the programmer defines the classes they are expecting to receive from the database then reflection is used to create them from the incoming data. This generally works better than writing the code to do this mapping by hand, but can still be problematic, the programmer often still needs to write quite a bit of code to generate the contain class and they still have to deal with problems of the code getting out of sync with the definition.
3) Code generation (“Microsoft love’s code generation” generation as Don put it), in this case some tool will generate code that represents that we are interested in, typically it will also provide some mechanism for loading data into these classes. There needs to be some kind of meta data that describes the data, typically this will be the database schema or an XSD that defines the format of the XML you are interested in. This approach probably requires the least amount of code to be written by the programmer, but is not without its problems. Firstly each code generation tool tends to take a slightly different approach, so the programmer has to spend time getting to know the tool. Secondly the tool must be integrated into the build process and for a smooth experience it must also be integrated into visual studio, which is expensive and time consuming. Finally, often the tools can generate poor quality objects that are difficult to work with, for instances early versions of the xsd.exe, a tool for generating code to interact with xml in a strongly typed way, generated class with fields but no properties and arrays instead of collection objects, meaning it was up to the programmer to initialize each field by hand.
As someone who has suffered at the hands of Microsoft’s love of code generation over the years (and continues to suffer), I’m very interested in anything that could smooth out this process.
While no approach is perfect, frameworks like NHibernate and Entity Framework have pushed what you can with the reflection or code generation approaches to there limits and when accessing relational data the experience is generally quite good. Less effort has been put into the experience when accessing external “web” data sources that return either XML or JSON data, so here the programmer often needs to do more work to access the in strongly typed way. Also, there are other data sources, such as the Windows WMI database that contains information about the operating system or the ubiquitous excel spread sheet, where virtually no effort has been made to expose them in a strongly typed way, so if a programmer needs to access the data they contain they most roll up there sleeves and do everything themselves.
So what exactly does a type provider do? And how does it aim to tackle this problem? As I said, I have no insider information here, so this does leave me free to speculate and make educated guesses, but it does mean that anything you read here should be taken with a large pinch of salt. So, to answer the “So what exactly does a type provider do?” question we need to understand a bit about the how the F# compiler works, so lets look a trivial example and examine the work the compiler needs to do to compile it.
- let firstIdentifier = 1
- let secondIdentifier = 2
- let thirdIdentifier = firstIdentifier + secondIdentifier