How database transactions work in Ecto and why Elixir makes it awesome?

How database transactions work in Ecto and why Elixir makes it awesome?

This is the first instalment of our Elixir Matters series, in which we'll do our best to convince you that (surprise!) Elixir matters, and it matters in many different ways - both at the development side of things, and at the business end, which is why it's just wrong for anyone not to invest time in getting familiar with this language.

Elixir is not only because of its functional character, concurrency capabilities or other obscure-sounding traits - it's the combination of all those qualities that makes it a very efficient tool in translating business needs to working algorithms running behind exquisite apps.

What is a database transaction?

If you're familiar with SQL databases, feel free to skip this section - otherwise, let's delve a bit deeper into the mechanisms of database services.

Consider a bank transfer, consisting of subtracting money from account A and then adding to B. It is mandatory that both operations are performed inseparably, because no money can be lost.

Since our systems are capable of handling multiple operations concurrently, we don't want any agent to read either account's balance before the transfer is completed. We require that it not be possible to transfer $100 from A when its balance is only $50 (surprise, eh?).

On top of that, we obviously want all of this to hold true even when our system crashes.

What does it mean in technical terms? It means that our database is ACID. This slightly scary term is just an acronym for: Atomicity, Consistency, Isolation & Durability.

On a basic level, a transaction is a sequence of atomic operations - that is, inseparable from one another; when any of these fails, or it is decided that we don't proceed with the remainder, the database will be rolled back to the state before transaction started.

Aside from the atomicity, transactions are also required to keep the database consistent, i.e. to keep all defined rules governing the relations between data satisfied at all points of time. This relies on atomicity in a way, because any aborting of a transaction must not leave the database in an inconsistent state.

Perhaps slightly easier to grasp is the concept of isolation - as transactions are core to how databases handle concurrency, the results of operations executed as part of interweaving transactions should not be 'visible' to other transactions until committed. (Though actually, that's a simplification. Read up if you're interested in the full story).

On top of all of these traits, we need durability - the database must be crash-resilient, which means all data committed before a crash must still be seen as committed after the system is up again.

From the application's perspective...

In object-oriented languages, at the core of popular web frameworks were object-relational mapping libraries, which dealt with managing database connection, constructing queries and mapping the results to objects. So in Java there is Hibernate, .NET has its Entity Framework, and Ruby has ActiveRecord.

Elixir if free from OOP's heavy, bloated objects, which always carried a lot of obscure internal state information, and relies on simple data structures such as maps to carry all needed information. So the equivalent of ActiveRecord's duties is more or less done by Ecto, though we'll no longer use the object-relational mapping term here - let's just refer to it as the data access layer.

It hardly matters, though. What matters for us is that these libraries talk to the database to execute operations and transactions, which are committed or rolled back dependent on the app's business logic.

So in Ruby's beloved (well, really?) ActiveRecord you had this:

transfer = 50

ActiveRecord::Base.transaction do
  acc_a, acc_b = Account.find([1, 2])

  raise ActiveRecord::Rollback, :balance_too_low if acc_a.balance < transfer

  acc_a.balance -= transfer
  acc_b.balance += transfer
  acc_a.save!
  acc_b.save!

  [acc_a, acc_b]
end

=> [<Account:...>, <Account:...>]

OK, and what about Elixir and Ecto? Here's a similar code that runs within a transaction a single long function that contains a procedure to execute. Test it out by running iex -S mix run priv/script1.exs from our repository.

import Ecto.Query
alias TransactApp.Bank.Account
alias TransactApp.Repo

transfer = 50

result =
  Repo.transaction(fn ->
    [acc_a, acc_b] = from(acc in Account, where: acc.id in [1, 2]) |> Repo.all()

    if acc_a.balance < transfer, do: Repo.rollback(:balance_too_low)

    update1 = acc_a |> Account.changeset(%{balance: acc_a.balance - 50}) |> Repo.update!()
    update2 = acc_b |> Account.changeset(%{balance: acc_b.balance + 50}) |> Repo.update!()

    {update1, update2}
  end)

=> {:ok, [%Account{...}, %Account{...}]}

At first glance, this doesn't exactly look like much of a win - obviously the language's (and Ecto's) syntax is different, but it's still very procedural.

We still have to roll the transaction back manually. The difference, though, is that when we do roll back, ActiveRecord will just return nil, and Ecto will return {:error, :balance_too_low}. This is immensely important, because in Rails we would often spend lots of precious time figuring out what the reason of the rollback was. In Ecto, you can easily pattern match on the normal {:ok, ...} scenario, or the expected rollback reasons, and let it fail on those that are not expected.

There's way more to what you can do to manage transactions in Ecto, though!

Enter Ecto.Multi

Now this is something that truly leverages the functional nature of Elixir and allows you to build and manage transactions with better control and more confidence.

Procedural code, as seen in examples above, is heavy on assigning data to variables and usually there is an assumption that the whole procedure succeeds, and when it doesn't, most languages rely on an exception handling mechanism to handle errors. Elixir can do that too, but it leads to writing solutions with coarse-grained error handling or even to ignoring errors, which is even worse.

I can't count how many times I've seen folks (myself included...) trying to update a record inside a transaction, which then failed because of validation errors, but the transaction kept executing, and there was an assumption that the update did succeed.

Functional programming promotes defining procedures as pipelines of functions, and every step (function) in that pipeline has a clear requirement on what data (arguments) it expects to receive. With Elixir's awesome and ubiquitous pattern matching it just feels natural.

Here's an example code you could write as an Elixir script - it's also available in our repository:

import Ecto.Query
alias Ecto.Multi
alias TransactApp.Bank.Account

transfer_amount = 50

retrieve_accounts = fn repo, _ ->
  case from(acc in Account, where: acc.id in [1, 2]) |> repo.all() do
    [acc_a, acc_b] -> {:ok, {acc_a, acc_b}}
    _ -> {:error, :account_not_found}
  end
end

verify_balances = fn _repo, %{retrieve_accounts_step: {acc_a, acc_b}} ->
  # we don't do anything to account B, but we could
  if acc_a.balance < transfer_amount,
    do: {:error, :balance_too_low},
    else: {:ok, {acc_a, acc_b, transfer_amount}}
end

subtract_from_a = fn repo, %{verify_balances_step: {acc_a, _, verified_amount}} ->
  # repo.update will return {:ok, %Account{...}} or {:error, #Ecto.Changeset<...>} -
  # {:ok, value} or {:error, value} is what these functions are expected to return.
  acc_a
  |> Account.changeset(%{balance: acc_a.balance - verified_amount})
  |> repo.update()
end

add_to_b = fn repo, %{verify_balances_step: {_, acc_b, verified_amount}} ->
  acc_b
  |> Account.changeset(%{balance: acc_b.balance + verified_amount})
  |> repo.update()
end

batch =
  Multi.new()
  |> Multi.run(:retrieve_accounts_step, retrieve_accounts)
  |> Multi.run(:verify_balances_step, verify_balances)
  |> Multi.run(:subtract_from_a_step, subtract_from_a)
  |> Multi.run(:add_to_b_step, add_to_b)

Did we just actually run the transaction? Not yet. We defined a number of functions that represent single steps of our pipeline.

Every function adheres to the same contract:

  • Return {:ok, value},
  • Expect as arguments: the current Repo, and a map.

Then, a particular function will also pattern match the map to verify that a certain precondition has been met. For instance, if we mistakenly try to use the subtract_from_a function without having a chance to insert the returning value of verify_balances_step into our context, pattern matching will fail.

Using Ecto.Multi, we then defined a batch of operations as a composition of these functions. Finding a way to have these batches well-organized is just up to you. You can throw them in one of your contexts, or separate it out to a different module.

Notice how we define retrieve_accounts/2 and verify_balances/1 as functions that return functions, because every time we do a bank transfer, we do it on different accounts and with different amounts, so each time we build the Ecto.Multi pipeline we need these two functions to be slightly different. Aside from that, subtract_from_a/2 and add_to_b/2 are defined as functions straight away, because they rely on what earlier steps have given them as input (that is, both accounts, and the transfer amount) - in this sense, these steps are identical whenever we run the pipeline.

defmodule TransactApp.Bank.Batches do
  alias Ecto.Multi
  alias TransactApp.Bank.Account
  import Ecto.Query, only: [from: 2]

  def transfer_money(acc1_id, acc2_id, amount) do
    Multi.new()
    |> Multi.run(:retrieve_accounts_step, retrieve_accounts(acc1_id, acc2_id))
    |> Multi.run(:verify_balances_step, verify_balances(amount))
    |> Multi.run(:subtract_from_a_step, &subtract_from_a/2)
    |> Multi.run(:add_to_b_step, &add_to_b/2)
  end

  defp retrieve_accounts(acc1_id, acc2_id) do
    fn repo, _ ->
      case from(acc in Account, where: acc.id in [^acc1_id, ^acc2_id]) |> repo.all() do
        [acc_a, acc_b] -> {:ok, {acc_a, acc_b}}
        _ -> {:error, :account_not_found}
      end
    end
  end

  defp verify_balances(transfer_amount) do
    fn _repo, %{retrieve_accounts_step: {acc_a, acc_b}} ->
      if acc_a.balance < transfer_amount,
        do: {:error, :balance_too_low},
        else: {:ok, {acc_a, acc_b, transfer_amount}}
    end
  end

  defp subtract_from_a(repo, %{verify_balances_step: {acc_a, _, verified_amount}}) do
    acc_a
    |> Account.changeset(%{balance: acc_a.balance - verified_amount})
    |> repo.update()
  end

  defp add_to_b(repo, %{verify_balances_step: {_, acc_b, verified_amount}}) do
    acc_b
    |> Account.changeset(%{balance: acc_b.balance + verified_amount})
    |> repo.update()
  end
end

Then, to actually execute the batch, use Repo.transaction/1. Each step will return {:ok, ...} or {:error, ...} - and Ecto will commit or rollback the transaction depending on that. No need to explicitly call Repo.rollback/1 or raise any errors!

TransactApp.Bank.Batches.transfer_money(1, 2, 50) |> TransactApp.Repo.transaction()

# returns tuple with map denoting results of each step:
{:ok, %{retrieve_accounts_step: ..., verify_balances_step: ..., subtract_from_a_step: ..., add_to_b_step: ...}}

# or if an error occurs, it'll return the exact point of failure _and_ all previous step results, which is awesome:
{:error, :verify_balances_step, :balance_too_low, %{retrieve_accounts_step: ...}}

A script that pattern matches on this call and the TransactApp.Bank.Batches module are also available in our GitHub repository, so you can play around with it.

Multi structures can be merged and this is so awesome and powerful, because we can now fully leverage the composability of Elixir.

There are also numerous awesome shorthands available. Oftentimes a step is just about updating, inserting or deleting a record - and for less complex scenarios we can use it like this (also available in GitHub):

Multi.new()
|> Multi.update(:update1_step, Account.changeset(account1, %{balance: 1337}))
|> Multi.update(:update2_step, Account.changeset(account2, %{balance: 7331}))
|> Multi.insert(:insert_step, Account.changeset(%Account{}, %{balance: 150}))
|> Multi.delete_all(:delete_step, Ecto.assoc(account1, :activities))
|> Multi.update(:update3_step, fn %{insert_step: account} ->
  Account.changeset(account, %{balance: 1234})
end)
|> Repo.transaction()

See that you can operate on pre-existing loaded records if pipeline steps don't rely on context from previous steps, or otherwise you can use a function as the second argument to retrieve the context.

Conclusion

Several Elixir libraries, including Ecto, benefit from Elixir's nature of a functional language, and allow for easy definition of data processing pipelines. Its transaction handling facilities are just cool and simply get out of your way.

Since handling database transactions is an aspect that no serious application can evade, it's awesome that Elixir allows developers to manage them elegantly, which leads to software that does its business well.

There's more to come in the Elixir Matters series, so stay tuned for updates at Curiosum!