Understanding Consistency in Databases: Beyond the Basics

1 day ago 5

What to do when dealing with N multiple transactions at the same time?

Most of the time we perform CRUD operations under the pragmatic standard transactional scope, thinking it will suffice under all circunstances. After all, what could go wrong when we always apply a standard ACID tool provided by our frameworks, right? But it feels “good enough” only until we face corner cases of the real world production traffic: missing rows after pagination, duplicated updates, or metrics that refuse to add up.

Going deeper in the reasons of why to use more than the traditional transactional, scope can be hard without some real world examples of when or how should we use something different from what we use to. In this article I revisit those issues through a tiny Spring Boot/Java code base. The example we will use run through the following steps:

Paginate 5 000 Sale rows whose status starts as NOT_INITIALIZED.
Publish each id to a Kafka topic.
Consume each message, updating its status to PROCESSING.
Demonstrate how plain idempotency checks and pagination fail under load.
Apply REPEATABLE READ isolation level and optimistic locking to restore consistency with much better performance.

Experiment set up

The article code can be found on github here. We will use a controller with three endpoints running different implementations of the proposed example:

// ExperimentController
@PostMapping("/pagination-zero")
public ResponseEntity<Void> pagination() {
Thread.ofVirtual().start(() -> paginationZeroService.process(false));
return ResponseEntity.accepted().build();
}

@PostMapping("/pagination-zero-fixed")
public ResponseEntity<Void> paginationFixed() {
Thread.ofVirtual().start(() -> paginationZeroService.process(true));
return ResponseEntity.accepted().build();
}

@PostMapping("/repeatable-read")
public ResponseEntity<Void> repeatable() {
Thread.ofVirtual().start(() -> repeatableReadService.process());
return ResponseEntity.accepted().build();
}

We will use /pagination-zero to demonstrate the problem when we run on a default pagination implementation. After that we show why fixing pagination at 0 is necessary when no isolation level is applied, running the experiment on /pagination-zero-fixed endpoint. And on conclusion we show our final solution for the corner cases presented, running the /repeatable-read endpoint.

We can think of isolation level to be the degree of visibility that CRUD operations have when two or more transactions are running concurrently. Taking our sales example in consideration, it would be translated to how much a transaction running the status update from NOT_INITIALIZED to PROCESSING can affect and be affected by other concurrent transactions.

By default, SQL databases support four types of isolation levels:

READ UNCOMMITED : No guarantees, rarely used. Transactions are not independent, and even in flight operations which were not yet commited are seen by other transactions, what is know as dirty reads.
READ COMMITED : Applied by default. Guarantees that in flight CRUD operations are not seem by other running transactions. Only commited changes performed before operation starts are seen, preventing dirty reads. But the same query performed in two diffent places of the transaction can read two different values, what is known as non-repeatable read.
REPEATABLE READ : All operations inside the transaction see a stable snapshot of the data related to that query, preventing dirty and non-repeatable reads. But fails when business rules depend on seing if another transaction updated its rows values to evaluate if it can proceed. This event is known as write-skew.
SERIALIZABLE : Performs each CRUD operation in sequence, making database execute only one operation per time through locks or by abort-and-retry. Prevent all cited anomalies.

So why not apply SERIALIZABLE directly? Cause the stronger the isolation level is, more you compromise your application performance. Under high workload, it may be impractical to apply such level depending on the circunstances. And that’s where locking strategies step in.

Before we dive into the experiment itself, let’s refresh the two classic locking approaches:

Pessimistic locking : Usually applied implementing FOR UPDATE for row-level or LOCK TABLEfor table-level directly on a query. It guarantees that no other transaction will alter the rows affected by this query by waiting locks to be released. It usually presents better performance when application shows significant more write workload than reads, because retries consume more resource under such circunstances.
Optimistic locking : Applies locking through adding a new column on desired table, which indicates the row version number or timestamp of the last update. When a transaction is about to affect a row, it has to check if its version of the row matches the one that is present on database. It can be done by simply putting something like UPDATE … WHERE id = ? AND version = ?in the SQL statement. If not, an retry can be done later. It usually presents better performance when application shows significant more read workload than writes, because retries are done more rarely.

Lets consider our first implementation through /pagination-zero endpoint. The process of sending each row to kafka topic can be done as following:

public void process() {
Pageable pagination = PageRequest.of(0, 50);
Page<Sale> pageSale;

saleService.updateAllCreatedAt();

do {
pageSale = saleService.findAllByStatus(SaleStatus.NOT_INITIALIZED, pagination);

pageSale.forEach(
sale -> {
var event = new SaleEvent(sale.getId(), EventType.PAGINATION_ZERO);
saleEventProducer.sendMessage("my-topic", "key", event.toString());
}
);

pagination = pageSale.getPageable().next();
} while (pageSale.hasNext());
}

And as we consume such events, we have to make sure that each sale set the proper status:

public void processSaleEvent(Long id) {
var sale = this.findById(id);
if (sale.get().getStatus().equals(SaleStatus.PROCESSING))
return;

Sale sale = saleRepository.findById(id).orElseThrow();
saleRepository.updateStatusById(id, SaleStatus.PROCESSING);

var salePersisted = saleRepository.findById(id);
var saleAudit = SaleAudit.fromSale(salePersisted.get());
saleAuditRepository.save(saleAudit);
}

Running such implementation, we can see the final result throuhg GET /status-count endpoint:

[
{
"status": "NOT_INITIALIZED",
"count": 3949
},
{
"status": "PROCESSING",
"count": 1051
}
]

So, what happened? Aren’t all sales supposed to be in PROCESSINGstatus?

Imagine that, after first pagination, the consumer could only update 8 sale rows to PROCESSINGbefore the loop on producer get the next page. Then, in the 2nd page the read is made just after the consumer commited its 8 sale updates, so its start point will be 8 rows after first page started. It will then skip 50 rows and initiate the page on position 58, simply not processing rows from 50 to 57.

The first approach one could try to solve it would be repeating pagination to be always at 0 on while loop, like on code below:

public void process() {
Pageable pagination = PageRequest.of(0, 50);
Page<Sale> pageSale;

saleService.updateAllCreatedAt();

do {
pageSale = saleService.findAllByStatus(SaleStatus.NOT_INITIALIZED, pagination);

pageSale.forEach(
sale -> {
var event = new SaleEvent(sale.getId(), EventType.PAGINATION_ZERO);
saleEventProducer.sendMessage("my-topic", "key", event.toString());
}
);

//Since we are running locally, we have put it to sleep a bit to let consumer process messages
try {
Thread.sleep(40);
} catch (InterruptedException ex) {
System.out.println("Erro no thread sleep");
}

//Commenting paging number evaluation so it cant skip rows
//pagination = pageSale.getPageable().next();
} while (pageSale.hasNext());
}

The code above is implemented when requesting endpoint /pagination-zero-fixed. This will make the loop repeat itself sometimes, and would produce the same message more than once since the findAll is faster than the update on our consumer.

But with this approach we find a couple of issues. The main one is that we are wasting resources simply looping and sending the same records again. That’s far from efficient and can take significally long if you dont have a proper settled cluster. For this experiment, which is running locally for example, it could take almost 10 times more than using proper pagination. We can verify it by requesting endpoint /elapsed-time:

Total time elapsed in seconds: 151

Besides, another issue comes with the duplication of messages, not only made by the producer but also due to default Kafka guarantees for the consumer. With that, Kafka could deliver more than one message for the consumer with at-least-one settled.

With all those issues in mind, we can implement what we first proposed about isolation levels and locking strategies to solve the problem. In such case, the pagination issue we described in the last session is known as non-repeatable read, as we mentioned earlier. The minimum isolation level to solve it is REPEATABLE READ, and to achieve that on Spring is kind of simple:

//Annotation guraranteeing REPEATABLE_READ isolation
@Transactional(isolation = Isolation.REPEATABLE_READ)
public void process() {
Pageable pagination = PageRequest.of(0, 50);
Page<Sale> pageSale;

saleService.updateAllCreatedAt();

do {
pageSale = saleService.findAllByStatus(SaleStatus.NOT_INITIALIZED, pagination);

pageSale.forEach(
sale -> {
var event = new SaleEvent(sale.getId(), EventType.PAGINATION_ZERO);
saleEventProducer.sendMessage("my-topic", "key", event.toString());
}
);

pagination = pageSale.getPageable().next();
} while (pageSale.hasNext());
}

And, with that we have the proper process concluded. Verifying again through endpoint /status-count:

[
{
"status": "PROCESSING",
"count": 5000
}
]

And the amount of time taken by the operation according to /elapsed-time endpoint was:

Total time elapsed in seconds: 24

So, problem solved right? Not quite.

Remember that kafka, by default, guarantees at-least-once delivery. It means that the broker can, and will, deliver some messages more than once. The figure below shows what could happen if the interval between duplicated consumptions is low enough:

When if idempotency does not suffice

So, in this image we can see that, if a second consumption is made while the first one didn’t update the sale to PROCESSING status, the second one will pass the idempotency we implemented on the snippet earlier:

public void processSaleEvent(Long id) {
var sale = this.findById(id);
if (sale.get().getStatus().equals(SaleStatus.PROCESSING))
return;

// ...
}

And now we have two sales registered on sale_audittable. We can easily simulate such scenario through a simple test:

@Test
void simulateConcurrentUpdatesWithOCC() throws Exception {
long saleId = 1L;

ExecutorService executor = Executors.newFixedThreadPool(2);

executor.submit(() -> {
saleService.processSaleEvent(saleId, true);
return "Job1 succeeded";
});

executor.submit(() -> {
Thread.sleep(10);
saleService.processSaleEvent(saleId, true);
return "Job2 succeeded";
});

executor.shutdown();

assertThat(saleAuditService.findAll()).hasSize(1);
}

The test will fail because, of course, there are two rows on sale_audit, not one as it was supposed to be.

And here is how locking strategies can help us. In our case, as the duplicate consumption is supposed to be rare, following the reasoning defined earlier we can apply optimistic locking. Let’s implement it as a version number in our Sale entity:

@Entity
@Table(name = "sale")
public class Sale {

// The version of the row
@Version
private Long version;

// ...
}

And we also should check this version on every update, like on the following code snippet:

@Modifying
@Transactional
@Query("""
UPDATE Sale s
SET s.status = :status,
s.updatedAt = CURRENT_TIMESTAMP,
s.version = s.version + 1
WHERE s.id = :id
AND s.version = :expectedVersion
""")
int updateStatusById(@Param("id") Long id, @Param("status") SaleStatus status, Long expectedVersion);

And after calling the update, we check the number of rows updated to decide if to proceed or not:

public void processSaleEvent(Long id, Boolean idempotency) {
// This check is not needed anymore
// var sale = this.findById(id);
// if (sale.get().getStatus().equals(SaleStatus.PROCESSING))
// return;

Sale sale = saleRepository.findById(id).orElseThrow();
var rows = saleRepository.updateStatusById(id, SaleStatus.PROCESSING, sale.getVersion());

if (rows == 0) return;

// ...
}

Now, returning to our diagram:

Optimistic locking alone is sufficient to discart the duplicated event

Of course we could for example check if status was already in PROCESSINGdirectly in the update, but for didactic reasons I decided to show how it is usually implemented.

And now we finally can say that our implementation is safe, consistent and performant.

Isolation levels and locking strategies are not for DBA experts only — they are very useful on day‑to‑day for every backend developer. And by switching one annotation and adding one column we recovered correctness and shaved more than 120 seconds off the job.

Happy consistent coding!

Read Entire Article