Elixir Trickery: Using Macros and Metaprogramming Without Superpowers

Elixir Trickery: Using Macros and Metaprogramming Without Superpowers

And, by the way, we'll continue showing you Elixir Trickery, so subscribe to our newsletter to keep up to date!

The axioms to keep in mind

Any further considerations are void unless we understand the following statements:

  1. Any code you write can be represented as a tree of expressions, named an Abstract Syntax Tree (AST).
  2. Elixir is a functional language, and macros are functions.
  3. Macros are compile-time functions that and return an AST, having received as arguments an AST or different data.

We'll elaborate on each of those axioms, but the truth is that if you understand what they mean, you can say that you're an Elixir metaprogramming expert. (I have to pay a huge credit to Phoenix Framework mastermind Chris McCord for his awesome 'Metaprogramming in Elixir' book which helped me realize this.)

Metaprogramming at its core: code that generates code

In classical (I like calling them legacy, but let's not be too provocative) object-oriented languages such as Java, one would often associate the notion of metaprogramming with terms like reflection, dynamic introspection, annotations and all this kind of scary stuff.

The truth is, object-oriented languages have such a level of complexity in the logic behind their class/object model, that using metaprogramming is either very inconvenient, or just very slow.

For instance, according to official Java docs on reflection, because reflection involves types that are dynamically resolved, certain Java virtual machine optimizations can not be performed. Consequently, reflective operations have slower performance than their non-reflective counterparts, and should be avoided in sections of code which are called frequently in performance-sensitive applications. In fact, instantiating objects via reflection can be about ten times slower than using the standard instantiation.

Ruby, to give another example, is a language that's very pure in its insistence on everything being an object, and very permissive when it comes to what these objects can do, or what they can be like. You're saying you've just created an a object that's an instance of class A? Fine, but then someone may assign something else to A (the identifier A is a constant, but in Ruby it's not really a constant, so feel free to reassign it with whatever you want), and a.class == A is no longer true...

At any given time you can define a new method on a class of objects (and you can give it a dynamically assigned name), but you can also define a singleton method on an object, and if you consider that a class is an object... Well, that's why no-one has ever seen a code autocomplete mechanism for Ruby that actually works, because when you see a carrot you can never be sure if it's in fact not a banana now.

Elixir doesn't have this bloat, because it doesn't have objects. It relies on very simply-structured data: tuples, lists and maps, and has no tight coupling between data structures and associated actions. So when you think of how to go about metaprogramming in Elixir, you won't be reasoning about it as a set of dirty hacks, but rather as a way to write code that is conveniently transformed into another code, and then compiled.

It is often said that metaprogramming is writing code that generates code - surprisingly ofen you hear it in the context of languages where it is actually a backdoor to fix what's broken in an existing codebase or in the language itself. In Elixir, this statement is entirely true.

AST: Tree representation of code

To understand how you can actually write code that generates code, you need to familiarize yourself with how code is internally represented.

Just as a binary is the internal representation of a string, an Abstract Syntax Tree (AST) is the internal representation of code in Elixir.

The great news is that in Elixir we're always very close to the internal representation of code. While it is not a truly homoiconic language unlike Lisp and its derivatives, in which you write code almost as if you were directly writing an AST, you can still conveniently translate any Elixir code to an AST using the quote/2 macro.

quote do
  2 + 3 * 4
end

# Result with additional indentation for clarity:
{
  :+,
  [context: Elixir, import: Kernel],
  [
    2,
    {
      :*,
      [context: Elixir, import: Kernel],
      [
        3,
        4
      ]
    }
  ]
}

The example result has exaggerated indentation so that it's clearly seen that this is indeed a tree. When you think of mathematical expressions as functions, a + function has two arguments, and a * function just likewise; and when these two are used in an expression, * takes priority - so the arguments of + are 2 and the result of 3 * 4.

So, in the AST, the "inner" expressions that need to be calculated earliest are always the ones most deeply nested, and when you go up the nesting tree, you finally get to the root that ties them all together, in this case the + function.

To make our considerations simple, let's notice that each AST node is a tuple that consists of:

  1. The name of executed function.
  2. A keyword list denoting execution context. Most of the time you won't manipulate it, and unless you want to inject additional context or build an AST by hand rather than using quote/2, you don't have to care for how it works.
  3. The list of arguments that the executed function takes. These can be literals or AST nodes.

The representation of an AST node is slightly different if it's a variable reference, like x.

quote do
  x
end

# Results in a tuple of: variable name, metadata (usually not to be cared about) and the context:
{:x, [], Elixir}

To cap this up, it's possible to do quite the reverse operation to quote/2 and evaluate an AST as code.

ast = quote do
  2 + 3 * 4
end

Code.eval_quoted(ast)
=> {14, []}

It returns a tuple in which the first element is the result of the AST root, and the second one is a list of bindings - for simplicity, let's imagine that if the quoted syntax had a reassignment to an x variable, we would've seen [x: new_x_value] there.

Read Chris McCord's awesome "Metaprogramming in Elixir" book to get to know the ins and outs of ASTs and macros, and subscribe to our blog to learn more in the future!

Macros as functions manipulating ASTs

Elixir is similar to Lisp-like languages in that it also has a powerful macro mechanism to manipulate ASTs.

So what is a macro, then? As stated before...

A macro is a function that returns a different AST based on passed arguments, often including an AST.

That's it. You call a macro on a specific bit of code, which is automatically represented as an AST, and during compilation it gets transformed to a different AST, which is nothing more than code that will then get compiled.

Roughly speaking, Elixir compiler first does an initial parsing of your source code into an AST, and then does another run-through in which macros are expanded.

Expanding macros is nothing more than running macros on initially generated ASTs to transform them into different ASTs.

Consider a mathematical expression. Again, here's how it looks like as an AST:

quote do
  x + 4 - 6 + 10
end

{:+, [context: Elixir, import: Kernel],
 [
   {:-, [context: Elixir, import: Kernel],
    [{:+, [context: Elixir, import: Kernel], [{:x, [], Elixir}, 4]}, 6]},
   10
 ]}

Usually the first example given in macro learning tutorials is a rewrite of the unless/2 macro, which is great because it is a useful one - but let's first start with an attempt to manipulate the AST manually.

Something that may come in handy is a macro that reduces boilerplate code in tests by simplifying certain commonly found patterns.

One thing I often find myself using when writing tests is the ability to ensure that a record returned by a function is persisted, i.e. it has a :__meta__ key with state: :loaded:

assert {:ok, %Blog.Post{title: "Awesome Post", __meta__: %{state: :loaded}} = Blog.create_post(post_params)

And while I usually like Elixir's verboseness, this assertion involves nested pattern matching on something which is not the main concern of the Ecto struct of Blog.Post, and it doesn't read well - the __meta__ thing distracts me, and I tend to forget whether I should match to state or status.

I would like to be able to just do:

assert {:ok, loaded_record(%Blog.Post{title: "Awesome Post"})} = Blog.create_post(post_params)

For us to be able to do that, a loaded_record/1 macro should transform an AST representing a struct pattern match declaration to an AST representing the same pattern with an added __meta__: %{state: :loaded} key.

You don't have to be fluent in manipulating AST node tuples to do this. The easiest way to figure out how to transform the AST is to just compare the assumed input with the expected output, just like this:

quote do
  %Blog.Post{title: "Awesome Post"}
end

# AST without the __meta__ key:
{:%, [],
 [
   {:__aliases__, [alias: false], [:Blog, :Post]},
   {:%{}, [], [title: "Awesome Post"]}
 ]}

quote do
  %Blog.Post{title: "Awesome Post", __meta__: %{state: :loaded}}
end

# AST with the __meta__ key:
{:%, [],
 [
   {:__aliases__, [alias: false], [:Blog, :Post]},
   {:%{}, [], [title: "Awesome Post", __meta__: {:%{}, [], [state: :loaded]}]}
 ]}

Let's dissect this. As you can see, at the root of each AST is the :% expression with two arguments, the first representing the struct name (Blog.Post). The second argument represents a :%{} expression that represents a map.

So the latter, innermost one, builds a map, and the outermost one then creates a struct out of that map. (Remember how we explained how a struct is related to a map?)

These expressions are actually not functions, but they're not completely there out of nowhere: they're two of Elixir's Kernel.SpecialForms macros.

When you know how the expected result AST should differ from the input, you can easily define a macro!

defmodule MyApp.DataCase do
  defmacro loaded_record({:%, ctx, [aliases_ctx, {:%{}, inner_ctx, map_as_keyword_list}]}) do
    {:%, ctx,
     [
       aliases_ctx,
       {:%{}, inner_ctx, [{:__meta__, {:%{}, [], [state: :loaded]}} | map_as_keyword_list]}
     ]}
  end
end

And there you go - you can now use the loaded_record(%Blog.Post{}) syntax in your assertions. It is important to require or import the module that declares the macro into the module you want to use it in, because the compiler needs to know the compilation order of modules to expand the macro.

defmodule MyApp.BlogTest do
  use MyApp.DataCase
  import MyApp.DataCase

  test "create_post" do
    post_params = %{title: "Awesome Post"}
    assert {:ok, loaded_record(%Blog.Post{title: "Awesome Post"})} = Blog.create_post(post_params)
  end
end

How does that work? The compiler first does a run-through on your module to build an initial AST, and then - during a macro expansion phase - it transforms the AST using the macros you used. Afterwards, the compiler proceeds to build the code into an Erlang AST and then to bytecode.

Actually, you can peek into how an Elixir AST fragment looks like after macro expansion:

Macro.expand(
  quote do
    loaded_record(%Blog.Post{title: "Awesome Post"})
  end,
  __ENV__
)

# I honestly have no idea where the :counter value comes from, so I'd appreciate
# if someone enlightened me. :-)
{:%, [],
 [
   {:__aliases__, [counter: -576460752303422619, alias: Curiosumapp.Blog.Post],
    [:Blog, :Post]},
   {:%{}, [], [__meta__: {:%{}, [], [state: :loaded]}, title: "Awesome Post"]}
 ]}

The way to understand quote and unquote

While it's good to be familiar with the approach of manipulating ASTs as such, the most common way to create macros is very much like building strings.

You're surely familiar with the common pattern of string interpolation, nowadays seen in almost any serious language, such as Elixir's or Ruby's "...#{whatever}..." or JavaScript's `...${whatever}...`.

How does that relate to Elixir metaprogramming with macros? Well, the story is simple:

Just as quotation marks "" are delimiters of strings and #{} is the interpolation token, you can think of quote/2 as a delimiter of an AST-represented code and unquote/1 as the interpolation token.

The quote do ... end block encloses a piece of Elixir code, in which you can call unquote to interpolate an AST into the quoted block.

What do we pass as the argument to unquote? Well, it's an AST node, which is either a 3-tuple we described before, or a literal (Integer, String, etc.).

Many of the most popular Elixir tools, such as Phoenix, Ecto or ExUnit rely on macros to provide a clear, DSL-like syntax. In fact, Elixir itself is largely built upon the clever usage of macros. Elixir was designed to be an extensible language through the macro mechanism, which is one of the reasons why the language is considered complete and its feature backlog is empty.

Phoenix relies on macros to expand the declarative DSL you use in router.ex into code that registers your routes, and Ecto has a DSL to translate your schemas into rich struct definitions.

Something that we might try to implement using macros is expanding controller action definitions to a self-documenting mechanism for our API endpoints.

Yes, there's already PhoenixSwagger, which has a schema DSL and is itself heavily using macros. I'm not saying you shouldn't use it, it's probably a great idea to do so because Swagger is a well-adopted tool. It's not the only documentation standard in the market, though (RAML being another example), so maybe it could be good to try build something that might adapt to different documentation systems.

When approaching macro and DSL design, it's best to have a clear idea how it is supposed to be executed.

I imagine that I would like to do something similar to this:

defmodule Controller do
  import DescribableController

  defaction show(conn, params), desc: "Show an item", success_code: 200, error_code: 404 do
    # process the action
    IO.puts("Processing :show action...")
  end
end

I would expect the defaction macro to define a function just as I would do with def show(conn, params), although I would also like to supply a set of metadata describing the action's documentation - outside the function body, but tied to the declaration.

Also, for the metadata to be read by an abstract documentation-generating mechanism, the Controller module should have a documentation_for/1 function with a clause matching to the function's name, i.e. Controller.documentation_for(:show) should return data that the abstract mechanism could use to build a documentation for the endpoint.

Here's how I would do that, with explanations in comments:

defmodule DescribableController do
  # Keep in mind that we want this to resemble the Kernel.def/2 macro used to
  # define functions, albeit with an additional keyword list for documentation purposes.
  defmacro defaction(call, documentation \\ [], do: expr) do
    # Exercise! Try quoting:
    #   def foo, do: :bar
    # and see how :foo is represented. We'll need the function name as an atom in a moment.
    {function_name, _, _} = call

    # Pay special attention to when this is printed in the console!
    IO.puts(
      "Defining #{function_name}, " <>
        "documenting with options: #{inspect(documentation)}, " <>
        "as AST: #{inspect(expr)}"
    )

    # This is what we return: a quoted expression, that is, an AST.
    # You might want to store the quote in a variable and IO.inspect it to
    # see what the AST looks like.
    quote do
      # Remember: "quote" is like a '"' sign, and "unquote" is like "#{...}"!
      def unquote(call) do
        IO.puts("Calling function: #{unquote(function_name)}")
        unquote(expr)
      end

      # Define a clause for the documentation_for/1 function, matching against
      # the newly defined function name as an atom.
      def documentation_for(unquote(function_name)) do
        process_documentation(unquote(documentation))
      end
    end
  end

  def process_documentation(doc) do
    # Do whatever it takes to parse a certain keyword list into a documentation item -
    # for Swagger, RAML or any other API documentation system. Up to you!
    Map.new(doc)
  end
end

Put the DescribableController module, and then the Controller module, in an *.exs file, and run it with iex file.exs.

Notice how the Defining show, ... message appears right as the module is being compiled.. This is proof that macros are expanded at compile time. At this stage, outside the quote block, you access the macro's arguments directly, without unquote. This is where we can e.g. validate supplied arguments to prevent incorrect code from being generated (e.g. without some required metadata).

Try running Controller.show(nil, nil) (arguments don't matter for the purpose of this exercise). The first thing you'll see is a message display that the macro hooks to every function defined via defaction, which could e.g. be useful for logging purposes. The second thing is the action itself executing.

Calling Controller.documentation_for(:show) displays a form of the show action metadata supposed to be parsed by an abstract documentation builder. In this case, just to make it apparently working, we just convert the keyword list to a map.

Wrapping up: Which approach to use when creating macros?

Manipulating ASTs as trees of tuple nodes is good for writing macros that conceptually transform code into different code.

Macros that conceptually introduce a new reusable, convenient syntax to generate boilerplate-free code are usually created using the quote/unquote pattern.

And the most important thing is probably to avoid using macros at all, unless really necessary. Whenever possible, use pattern matching, pure functions, the pipe operator, and already-defined structures.

Too much usage of metaprogramming confuses developers (you could end up asking yourself: am I still using Elixir or is it actually closer to a JS interpreter written with Elixir macros...?) and makes debugging hard, because - since macros are expanded during compilation - tools such as IEx.pry can't track down where you exactly are in the code when you put the pry entry point in a block of code transformed by a macro.

Surely the best thing you can do with macros is to keep things simple. If you keep in mind that people will always want to be able to write code that actually makes sense in Elixir, everyone will be fine.

Don't forget to subscribe to our newsletter for further articles on metaprogramming in Elixir and the language in general!