Batch Updates and Advanced Inserts in Ecto for Elixir

1 month ago 10

When you build Elixir applications, you'll likely encounter scenarios where you need to handle large datasets efficiently. Whether you're importing user data from a CSV file, updating thousands of product prices, or synchronizing data from external APIs, performing operations one record at a time can quickly become a performance bottleneck.

In this two-part tutorial, we'll start by exploring Ecto's powerful batch update and advanced insert capabilities for handling bulk data operations. Then, in the second part, we'll learn how to integrate AppSignal for Elixir to build an observability layer on top of Ecto to catch errors and any performance issues that could arise.

Before we begin, you'll need a few things set up.

Prerequisites

Elixir, the Phoenix framework and PostgreSQL installed and running in your local development environment.
A Phoenix application. If you don't have one ready, fork the example app we'll be using throughout this tutorial.
Basic Elixir and Phoenix framework experience.

Introducing Ecto for Elixir

Ecto is Elixir's database toolkit that acts as a bridge between your application and relational databases like PostgreSQL. It provides type-safe query building, automated migrations, and data validation, making complex data operations simple and reliable.

Ecto's Architecture

At its core, Ecto is built around three fundamental components that work together to provide a complete database abstraction layer:

Schemas act as the structural foundation, defining how data maps to database tables without containing any business logic. Schemas establish field types, relationships, and table mappings.
Changesets form the business logic layer, handling validation, data transformation, and change tracking. They take raw input data and a schema struct, then produce a validated, trackable representation of what should change in the database.
The Repo translates Ecto operations into actual SQL queries, while managing database connections, transactions, and query execution.

Next, we'll take a look at how Ecto handles single database operations, the challenges that come with it, and why batch operations matter.

How Standard Database Operations Work in Ecto for Elixir

For individual database operations like inserts, updates, or deletes, Ecto validates your data through changesets before sending SQL commands to the database. This approach works well for single-record operations but can become a performance bottleneck when handling a large number of records.

Before we learn how standard operations work through an example, let's get an overview of the app we'll be using moving forward.

Introducing Our Example Phoenix App

Our example app is a Phoenix ecommerce app managing products, suppliers, orders, products, and inventory for multiple stores. The app's database is represented below:

Now that we have an idea of how our app is structured (at least on the database level), in the next section, we'll learn how to do a normal data insert using Ecto.

Standard Inserts with Ecto

A database insert operation creates a new record in a table. In Ecto, this process involves several steps: first, you create a changeset that validates and transforms your data according to your schema's rules, then Ecto converts this changeset into the appropriate SQL INSERT statement and executes it against the database.

Next, let's learn about the role of each Ecto component.

The Schema’s Role

We first start with the supplier schema:

This schema definition establishes several important aspects of our Supplier model:

Primary key configuration: Uses :binary_id with auto-generation, creating UUID-based primary keys instead of sequential integers for better distribution across systems.
Foreign key type: Sets :binary_id as the default foreign key type to maintain consistency with the primary key format.
Field definitions: Defines four main fields — name (string), contact_email (string), contact_phone (string), and is_active (boolean with false default).
Timestamps: Automatically adds inserted_at and updated_at fields using UTC datetime format.
Validation rules: The changeset function enforces that name is required and contact_email follows a basic email format pattern.
Data casting: Only allows specific fields to be modified through the cast/3 function, providing a controlled interface for data changes.

The Repo’s Role

Next, we have the repo:

The repo serves as the database interface layer: the bridge between our Elixir code and the PostgreSQL database, handling all database operations for our schemas.

With these in place, we can now create a database record by performing an insert.

Creating a Record in the Database

With the application running, open another terminal and run iex -S mix to start an interactive Elixir shell. We'll use it to create a new supplier record with the commands below:

If successful, you’ll see a map of the inserted supplier, including the generated id and timestamps:

While the single-record approach shown above works perfectly for individual operations, it becomes problematic when dealing with large datasets.

The Performance Problem and Why Batch Operations Matter

Each individual insert requires a separate database connection and network round-trip, as you can see when we created the supplier:

To create 10,000 suppliers, you make 10,000 separate database calls, which is very inefficient. To make matters worse, you lose atomicity — if the 500th supplier fails to insert due to a validation error, the previous 499 records will have already been committed to the database, leaving your data in an inconsistent state.

This is where Ecto's batch operations become essential.

Batch Inserts in Ecto: Inserting Multiple Records Efficiently

Batch inserts in Ecto allow you to create multiple database records in a single, efficient operation using Repo.insert_all/3. Unlike individual inserts that require separate database round-trips for each record, batch inserts compile all your data into a single SQL statement that creates hundreds or thousands of records at once.

An Example: Bulk Inserting Supplier Data

In the previous example, we used Repo.insert/2 to insert a single supplier record into the database, with the changeset performing validations and other checks before insertion into the database.

For a bulk insert, we'll make use of Repo.insert_all/3 to insert a bunch of suppliers into the database.

Preparing Data for Batch Insertion

To begin with, prepare a relevant data source. In my case, I prepared a CSV with around 1,000 supplier records.

The thing to note when preparing data for batch insertion is to make sure the data is in the right format. Remember, batch inserts do not utilize a schema's changeset to help with any validations or data checks, which means data validation is up to you.

Using Repo.insert_all/3 To Bulk Insert Data

Next, with the server running in one terminal, open another terminal and create a new iEx session with iex -S mix, then run the command below:

Note: Prior installation of NimbleCSV or any other CSV library is required to process CSV files.

First, we alias NimbleCSV and the repo to make them available in the iEx session. Then, we run the following commands:

Here, each CSV row is converted into a database-ready map with generated UUIDs, string values are converted to appropriate data types, timestamps are added, and finally, all records are inserted into the suppliers table in a single transaction using Repo.insert_all/3.

The Challenges of Batch Inserts

While Repo.insert_all/3 provides excellent performance for bulk operations, it comes with several important limitations that developers need to understand.

Validation and Error Handling

Unlike individual inserts that leverage Ecto's changeset system, batch inserts bypass validation entirely, meaning data integrity checks must be handled manually before insertion.

For example, we know that database-level constraints like unique indexes can cause the entire batch to fail if even a single record violates them. A good strategy to deal with this is to use a combination of smaller batches and Ecto's transaction API, as in the example below:

We first divide the suppliers_attrs into batches of 100 records each, then process each batch sequentially, with each batch being inserted into the database using Repo.insert_all/3 wrapped in a database transaction. This will rollback the transaction should an error occur (in this case, we try to rescue the error and handle it in some way).

Limited Association Handling

Repo.insert_all/3 cannot directly handle associations (for example, linking suppliers to products). In such cases, you must handle the association manually by first creating the parent records, then referencing the parent record IDs when creating the child records.

Next, let's switch gears to the subject of batch updates.

Batch Updates

Batch updates involve updating a number of records all at once. They introduce some unique challenges, including:

Interdependencies: Updates often affect associated records. For example, changing a supplier’s status requires updating linked products at the same time.
Partial failures: A single invalid change (for instance, violating a uniqueness constraint can derail the entire batch).
Concurrency risks: Competing updates to the same records can cause race conditions (for example, overlapping inventory adjustments).

So, what options do you have if you want to perform a stress-free batch update? One solution is to use Ecto.Multi.

Handling Batch Updates With Ecto.Multi

Ecto.Multi is a tool that lets you bundle multiple database operations (inserts, updates, and deletes) into a single chunk. Ecto.Multi solves the critical issues that come with bulk updates, namely:

Atomicity - Ensuring all steps succeed (or none do).
Dependency management - Making sure associations and dependencies are handled efficiently.
Error isolation - With Ecto.Multi, each step is named, which makes it very easy to pinpoint where a failure occurs.

An Example Using Ecto.Multi to Run a Batch Update

Let's say we need to deactivate a supplier and mark all their products as discontinued. How can we do this effectively?

Using Ecto.Multi, we could run code like this:

Here, we use Ecto.Multi wrapped in a database transaction (so a rollback can happen, in case of errors) to create a multi-step update. The update starts with a step called :deactive_supplier, which finds the specific supplier we want. Then, the :discontinue_products step sets the is_active flag as false on any product belonging to the supplier.

And that's it for this first part!

Wrapping Up

In this tutorial, we learned how to handle bulk data operations efficiently using Ecto's powerful batch capabilities. We explored the performance limitations of single-record operations and saw how Repo.insert_all/3 can dramatically improve insertion performance for large datasets. We also found out about the challenges that come with batch operations, and how we can use Ecto.Multi to ensure data consistency when doing batch updates.

In part two of this series, we'll integrate AppSignal for Elixir into our app to build an observability layer that monitors these batch operations. This will help you catch performance bottlenecks and errors before they impact your production applications.

Until then, happy coding!

Read Entire Article