A Short Introductory Guide to the Microsoft code name "M" Modeling Language (Part 1 of 2)  [Complete article available in Word Format (196K) for nice printing]

Note: the contents of this series of articles originated from a PowerPoint presentation from Chris Sells that we used to ship with the code name "Oslo" Community Technology Preview (CTP). Since a slide deck isn't the most accessible format for publication, I converted its contents into a more textual form here.

Hello "M": Types, Extents, and Functions

"M" is the code name for Microsoft's new modeling language. It came about because the team developing Microsoft's set of modeling tools (code name "Oslo") realized a certain class of developers—mostly those who are accustomed to writing source code in text—would want to have a means of modeling data and application behavior that didn't force them to use visual tools. And, of course, that other means is itself text.

"M" was thus born as a human-friendly textual language to augment or complement graphical tools as well as machine representations. It's meant to have direct correlations to T-SQL and, as we will see, the code name "Intellipad" tool gives you the ability to view T-SQL equivalent output for "M" statements. "M" is also intended to be a modern type system that enables compile-time verification and validation while not imposing any particular data storage or access technology. For complete details, see the "M" Overview (MSDN Library), "Oslo": The Language (video from PDC'08), and the "M" Language Specification (MSDN Library). Here we'll just dive right into the basics of the language.

Everything in "M" begins with a module statement like this:

module
    {
    }

A module in "M" can then contain four different constructs:

  • Types specify constraints over sets of values
  • Extents specify storage locations
  • Functions specify parameterized queries (these were formerly known as "computed values")
  • Languages define the tokens and syntax rules for domain-specific languages

In this article we'll be covering the first three. For more information about defining domain-specific languages in "M", see the "M" Language Specification (MSDN Library), DSL Tutorials (MSDN Library), and the video Modeling in Text (in five parts).

Here's a short piece of "M" code that contains an opening comment, a module, two types (Person and Size), one function (Stones), and two extents (Sizes and People):

//hello_m.m                            //Comment
module MyModule                         //Module
    {
    type Person
        {                               //Custom entity type
        Id : Integer64 => AutoNumber(); //Identity field
        Name : Text(100);               //Field of intrinsic type with fixed length (100)
        LuckyNumbers : {Integer32}*;    //Field of collection type
        Size : Size;                    //Field of custom type
        } where identity Id;

    type Size
        {
        Id : Integer64 => AutoNumber();
        Height : Integer32 where value < 10;  //Field with a constraint
        Weight : Integer32 => 150;            //Field with default
        } where identity Id;

    Sizes : {Size*} { {Height => 6, Weight => 250} }; //Extent with values

    People : {(Person where value.Size in Sizes)*};   //Extent + constraint

    Stones(Weight : Decimal9) { Weight * 0.07 }       //Function
    }

As you can see, the fields of a type are declared with a name, a colon, then a type, with an optional * to specify a collection of one or more items (? specifies a collection of zero or more items). Default values are specified after a field with =>; constraints on the values that a field can contain are specified with a where clause. Identifier fields are indicated with the where identity clause; AutoNumber() is a built-in "M" function to help set default values for individual values.

Types allow for reuse, as shown by the Person type above declaring a field of type Size. Types can also be refined, which is to say, you can declare a new type that is based on another type but contains additional constraints. For example, SmallSize below derives from Size but restricts its values more specifically:

    type SmallSize : Size
        {
        Height : Integer32 where value < 6;
        Weight : Integer32 where value < 120 => 100;
        }

Notice that in this case SmallSize doesn't need to re-declare any fields of Size that don't need refinement.

Types in "M" map to a table definition in SQL (as we'll see later), but of course a defined table doesn't include any rows or values. In fact, compiling "M" code with only types and loading it into a database won't actually product anything tangible—it will not create the associated tables. That's what extents are for: extents declare storage of a particular type, which in SQL maps to create table statements.

Extents follow the same convention as fields with the name or label followed by a colon then a type inside curly braces (usually with * since most tables have more than one row). Extents can be declared in "M" without with or without values. Values, if they exist, occur within a pair of curly braces as with Size above. Each value is then itself declared within another set of curly braces as a set of name/value pairs, each pair joined by => and separated by a comma. Values map to SQL insert statements.

An extent can also be declared as a query using the intrinsic query language of "M" (that looks much like LINQ). As with People above, the query is contained in parentheses outside of which is a *. The type of the extent, in other words, is the resulting type of the query, and the extent then contains any number of values of that type. Such query-based extents map to additional create table statements in SQL.

While we're speaking of SQL mappings, it's worth pointing out how "M" greatly reduces the overall complexity of defining databases especially when constraints are involved. To implement constraints directly in SQL, you have to first create a function to check the constraints, then invoke that function from a constraint clause within create table. This is quite cumbersome to work with and all the extra verbiage is very error-prone. "M" strips out all that complexity by giving you a straightforward means to declare your intents rather than deal with details of SQL.

Finally, a function is, as mentioned earlier, essentially a parameterized expression (akin to create function in SQL). Function have a name that can then be used in other expressions, such as Stones above. The parameters are declared within parentheses using name/type pairs (separated by commas, which isn't shown above), and the expression is contained within curly braces.

So you can see, with these few simple constructs, "M" allows you to express a wide range of types, extents, and functions, mapping to tables, rows, views, and stored procedures in SQL. It gives you a much more concise and human-friendly means to express the essence of a database without the "ceremony" of SQL.