Introducing Ant: a simple background job processing library for Elixir

In the Elixir community we are lucky to have a lot of great quality libraries backed by not only by core engineers, but also by the strong community. I use Elixir on a daily basis not only for my work, but also for my personal projects and never feel like I’m missing something crucial. At least in the web development area.

Nevertheless, recently I have decided to publish my new library called Ant for background job processing.

Why another background processing library?

Elixir comes with OTP out of the box. It is one of the competitive advantages, which makes Elixir a great choice for working with multiple concurrent processes. It allows to supervise them and handle failures gracefully. Using Task or GenServer is a go-to solution for asynchronous code execution. Agent is a simple way to store data and share it between processes. For a lot of cases these solutions are more than enough.

Sometimes, however, it’s necessary to have more:

Persist jobs between restarts
Storing them in order to process them later or for debugging purposes
Retry failed jobs
Monitor the status of your jobs

For such advanced cases the first what comes to my mind is Oban. It is a powerful, flexible and reliable library that has a lot of features. It is actively maintained. Many companies trust it and use it in production. Oban allows to choose between PostgreSQL and SQLite as a storage solution.

Then why did I decide to create my own library? Many years ago, when I first started to hear about Elixir, one of the main narratives was that Elixir, which is developed on top of the Erlang VM, inherited a rich Erlang ecosystem. You can use a lot of tools out of the box without any additional dependencies. One of these tools is Mnesia, a distributed database system. Unfortunately, I never saw it in a production environment in web development.

I was eager to try to apply it to my personal needs and decided to create a library for myself that would be simple and easy to use, with the ability to persist jobs and include retry mechanisms for failed jobs, without any additional dependencies. Thus, I built a proof of concept and released the first version of ant.

How to use Ant

I always prefer to explain by real-world examples. So let’s imagine that we have a file with leads and we need to send an email to each of them. This task can be broken down into several steps:

Create a file with 400_000 rows to create leads
Parse the generated file
For each row, emulate lead creation
Send email for each lead
Make a mailer to raise an exception randomly to emulate possible real-world behavior

First thing first, we need to create a new elixir application:

mix new ant_sandbox

The entry point may look like this:

defmodule AntSandbox do
  alias AntSandbox.LeadReportGenerator
  alias AntSandbox.SendPromotionWorker

  def call() do
    :observer.start() # optionally you can start the observer to monitor the application

    :ok = LeadReportGenerator.call() # 1. Create a file with 400_000 rows to create leads

    File.stream!("leads.txt") # 2. Parse the generated file
    |> Stream.each(fn line ->
      line
      |> String.trim()
      |> send_promotion()
    end)
    |> Stream.run()
  end

  defp send_promotion(email) do # 3 and 4: Emulate lead creation and send email
    SendPromotionWorker.perform_async(%{email: email})
  end
end

Next, let’s start with the creation of a module responsible for generating a file with leads, which will later be used as a source of leads. Each row in the file contains a randomly generated email:

defmodule AntSandbox.LeadReportGenerator do
  def call do
    File.open!("leads.txt", [:write], fn file ->
      1..400_000
      |> Stream.map(&generate_email/1)
      |> Stream.each(&IO.write(file, &1 <> "\n"))
      |> Stream.run()
    end)
  end

  defp generate_email(id) do
    first_name = Enum.random(["john", "jane", "alex", "michael", "david", "lisa"])
    last_name = Enum.random(["smith", "williams", "brown", "jones", "garcia"])
    domain = Enum.random(["gmail.com", "yahoo.com", "hotmail.com", "example.com"])

    "#{first_name}.#{last_name}#{id}@#{domain}"
  end
end

A new file leads.txt will be generated with 400,000 emails inside. Let’s pretend that we received this file from our marketing team. So far so good.
The next step is to implement the module responsible for creating leads and sending them emails. Sending emails to big amount of users might take some time, thus it is a good idea to make it asynchronous. Failure during the creation or sending of an email for a single lead should not affect others. That’s why it is worth using separate workers for each lead.

Add ant to our dependencies:

def deps do
  [
    {:ant, "~> 0.0.1"}
  ]
end

By default ant uses Mnesia with in-memory persistence strategy (:ram_copies) and a single queue named default. It is good enough for us at the moment. For more advanced cases you can consider changing default persistence strategy to :disc_copies or :disc_only_copies in order to save jobs on a disk. For additional available configuration, please check Configuration section in the GitHub repository.

The next step is to define a worker:

defmodule AntSandbox.SendPromotionWorker do
  alias AntSandbox.Mailer
  alias AntSandbox.CreateLead

  use Ant.Worker, max_attempts: 3

  @impl Ant.Worker
  def perform(%{args: %{email: email}} = _worker) do
    with {:ok, _lead} <- CreateLead.call(email) do
      Mailer.send_promotion(email)
    end
  end
end

Worker module should include use Ant.Worker line and implement perform function with %Ant.Worker{} struct as an argument. It contains args field with the arguments passed to perform_async function, which schedules the job. Even though Mnesia can store not only basic types like atoms, integers, strings, but also complex data types like maps, lists, or tuples, it is recommended to pass only basic types as arguments. For this worker I decided to increase retries to 3 by setting max_attempts option. Without it, the failed job will stay in failed state and will not be retried.

CreateLead module looks simple:

defmodule AntSandbox.CreateLead do
  # emulate lead creation
  def call(email) do
    if String.contains?(email, "smith") and String.contains?(email, "yahoo") do
      {:error, "Invalid email"} # emulate validation error
    else
      Process.sleep(100)
      {:ok, %{email: email}}
    end
  end
end

It emulates validation error for emails containing "smith" at yahoo domain. For other cases it sleeps the process for 100 milliseconds and returns okay tuple.

Only Mailer is missing. It is responsible for sending emails with a promotion to leads. Here is the implementation:

defmodule AntSandbox.Mailer do
  # emulate sending emails
  def send_promotion(email) do
    if :rand.uniform(10) == 1 do
      raise "Failed to send email to #{email}"
    else
      Process.sleep(100)
    end

    :ok
  end
end

Most of the time it sleeps a process for 100 milliseconds and later returns :ok, but may raise an exception randomly. The last step is to test our application.

iex(1)> AntSandbox.call()
:ok

Now we can check jobs and their statuses. For fetching all workers, you can use Ant.Workers.list_workers(). It returns a list of workers%Ant.Worker{} regardless of their status:

iex(2)> Ant.Workers.list_workers()
  {:ok,
   [
      %Ant.Worker{
        id: 4628067,
        worker_module: AntSandbox.SendPromotionWorker,
        queue_name: :default,
        args: %{email: "jane.jones248@yahoo.com"},
        status: :enqueued,
        attempts: 0,
        scheduled_at: ~U[2025-01-11 16:30:18.466259Z],
        updated_at: ~U[2025-01-11 16:30:18.466260Z],
        errors: [],
        opts: [max_attempts: 3]
      },
      ...
    ]
  }

It’s possible to filter by worker attributes.

For example by status (:enqueued, :running, :scheduled, :completed, :failed, :retrying, :cancelled):

iex(3)> Ant.Workers.list_workers(%{status: :completed})
  {:ok,
   [
      %Ant.Worker{
        id: 238946,
        worker_module: AntSandbox.SendPromotionWorker,
        queue_name: :default,
        args: %{email: "sarah.davis727@gmail.com"},
        status: :completed,
        attempts: 3,
        scheduled_at: nil,
        updated_at: ~U[2025-01-11 17:21:02.565686Z],
        errors: [
        %{
            error: "Failed to send email to sarah.davis727@gmail.com",
            attempt: 2,
            stack_trace: "(ant_sandbox 0.1.0) lib/mailer.ex:5: AntSandbox.Mailer.send_promotion/1\n...",
            attempted_at: ~U[2025-01-11 17:20:24.732696Z]
        },
        %{
            error: "Failed to send email to sarah.davis727@gmail.com",
            attempt: 1,
            stack_trace: "(ant_sandbox 0.1.0) lib/mailer.ex:5: AntSandbox.Mailer.send_promotion/1...",
            attempted_at: ~U[2025-01-11 17:20:08.150039Z]
        }
        ],
        opts: [max_attempts: 3]
    },
    ...
   ]}

Or by multiple attributes:

iex(4)> Ant.Workers.list_workers(%{
...(4)>   queue_name: :default,
...(4)>   status: :failed,
...(4)>   args: %{email: "jane.smith734@yahoo.com"}
...(4)> })
  {:ok,
   [
      %Ant.Worker{
        id: 1150403,
        worker_module: AntSandbox.SendPromotionWorker,
        queue_name: :default,
        args: %{email: "jane.smith734@yahoo.com"},
        status: :failed,
        attempts: 3,
        scheduled_at: nil,
        updated_at: ~U[2025-01-11 17:31:29.615924Z],
        errors: [
        %{
            error: "Expected :ok or {:ok, _result}, but got {:error, \"Invalid email\"}",
            attempt: 3,
            stack_trace: nil,
            attempted_at: ~U[2025-01-11 17:31:29.615792Z]
        },
        %{
            error: "Expected :ok or {:ok, _result}, but got {:error, \"Invalid email\"}",
            attempt: 2,
            stack_trace: nil,
            attempted_at: ~U[2025-01-11 17:30:56.964108Z]
        },
        %{
            error: "Expected :ok or {:ok, _result}, but got {:error, \"Invalid email\"}",
            attempt: 1,
            stack_trace: nil,
            attempted_at: ~U[2025-01-11 17:30:32.909500Z]
        }
        ],
        opts: [max_attempts: 3]
      }
    ]}

If a worker fails during execution, data about failed attempts will be stored in the errors field. It contains the error message, stack trace, attempt number, and timestamp of the attempt. An example of such a worker can be observed above.

Also it is possible to fetch worker by its id:

iex(5)> Ant.Workers.get_worker(1150403)
  {:ok, %Ant.Worker{...}}

What’s the catch?

It’s definitely easy to use this library. It is simple and flexible. It doesn’t require any additional database. Nevertheless, there are two important topics to consider:

Persistence in a cloud
When Mnesia is configured to store data on disk using :disc_copies (memory and disk) or :disc_only_copies (disk only), it writes to table files in a specified directory.
Modern cloud platforms like Heroku or Fly.io can store only ephemeral data. That means that after the application is restarted, all data from the disk will be lost. Luckily, you can use Volumes to create a persistent storage. It may require some additional time and effort to configure it properly, especially for larger applications, that require scalability. Even though it’s solvable, it’s becomes harder to set it up comparing with more popular solutions, like Oban backed by PostgreSQL.
Mnesia is not widely used in the web development
It’s significantly less popular storage solution. There are no nearly as many available resources and guides. I wish somebody experienced would write a book about it. I found difficult to comprehend the documentation.

For further development I would like to hear your feedback. What you would like to see in the future versions of the library? What is crucial for you to start using it? Please share your thoughts with me through the email.

You can find Ant on the Github. Thank you for reading my article.