OpenBSD IO Benchmarking: How Many Jobs Are Worth It?

4 months ago 7

This post explores these questions through detailed fio(1) benchmarking, looking at random reads, random writes, and latency — all running on a recent build of OpenBSD 7.7-current.

OpenBSD 7.7 (GENERIC.MP) #624: Wed Apr 9 09:38:45 MDT 2025 [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Test Setup #

Storage: 1TB Crucial P3 Plus SSD M.2 2280 PCIe 4.0 x4 3D-NAND QLC (CT1000P3PSSD8)
Tool: fio, installed via OpenBSD packages
Test File Size: 64 GB (to bypass RAM cache)
Block Size: 4 KiB
I/O Depth: 32
Job Counts Tested: 1 to 32
Runtimes: 30s runtime per test, 10s ramp-up

Results at a Glance #

Throughput vs. Job Count (Random Read)

Throughput vs. Job Count (Random Write)

Average Read Latency

Average Write Latency

Summary Tables #

Random Read Performance #

numjobs Total BW (MiB/s) IOPS Avg Latency (µs) Notes

1	473.0	121,318	26.41	Baseline
2	808.3	207,333	30.80	Strong scaling
4	1,302.4	334,219	37.97	Excellent parallel read gain
8	1,712.0	439,728	58.22	Near peak performance
18	1,715.6	439,661	117.12	Saturation reached
32	1,618.6	415,603	180.56	Slight regression, high latency

Random Write Performance #

numjobs Total BW (MiB/s) IOPS Avg Latency (µs) Notes

1	265.6	68,223	58.64	Baseline
2	476.6	122,246	63.27	Good scaling
4	829.9	212,610	70.84	Steady performance increase
8	1,259.1	323,439	95.72	Approaching write peak
18	1,428.1	366,830	172.17	Plateau with rising latency
32	1,408.2	361,404	230.42	Regression due to contention

Latency Overview (Read vs Write) #

numjobs Read Latency (µs) Write Latency (µs) Notes

1	26.41	58.64	Minimal latency, sequential load
2	30.80	63.27	Low contention
4	37.97	70.84	Balanced performance
8	58.22	95.72	Sweet spot for throughput vs latency
18	117.12	172.17	Steep latency increase
32	180.56	230.42	High CPU & queue contention

Observations #

OpenBSD scales I/O quite well up to a point — notably better than expected.
Job count sweet spot: Between 6 and 8 jobs gave the best balance of IOPS and latency.
Too many jobs degrade performance due to increased contention and CPU overhead.
NVMe write performance is sensitive to concurrency on OpenBSD, more so than reads.

fio(1) Linux vs. OpenBSD #

Based on this test script I ran a simple benchmark between Linux version 6.12.21-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) and OpenBSD 7.7 (GENERIC.MP) #624: Wed Apr 9 09:38:45 MDT 2025 [email protected] on ThinkPad X1 Carbon Gen 10 (14" Intel).

#!/bin/sh # Common fio parameters BLOCK_SIZE="4k" IODEPTH="1" RUNTIME="30" SIZE="1G" FILENAME="benchfile" # Output directory OUTPUT_DIR="./fio-results" mkdir -p "$OUTPUT_DIR" # numjobs to test NUMJOBS_LIST="1 2 4 8 16 32" # Test types for RW in randread randwrite; do echo "Starting $RW tests..." for J in $NUMJOBS_LIST; do OUTFILE="$OUTPUT_DIR/${RW}-${J}.json" echo "Running $RW with numjobs=$J..." fio --name="test-$RW" \ --filename="$FILENAME" \ --rw="$RW" \ --bs="$BLOCK_SIZE" \ --iodepth="$IODEPTH" \ --numjobs="$J" \ --size="$SIZE" \ --time_based \ --runtime="$RUNTIME" \ --group_reporting \ --output-format=json \ --output="$OUTFILE" done done numjobs OpenBSD BW (MiB/s) OpenBSD IOPS OpenBSD Avg Latency (µs) Linux BW (MiB/s) Linux IOPS Linux Avg Latency (µs)

2	5.78	1478	1343	13.23	3388	595
4	9.92	2538	1563	25.75	6592	605
8	13.32	3403	2316	40.56	10382	735
16	13.89	3549	4511	53.17	13613	1169
32	14.02	3579	8758	53.83	13780	2317

Throughput vs. Job Count (Random Write)

Average Write Latency

Please mind the gap. The test was not even performed with direct=1 (see below for details) under Linux. There is a lot of potential for OpenBSD.

direct=bool If value is true, use non-buffered I/O. This is usually O_DIRECT. Note that OpenBSD and ZFS on Solaris don't support direct I/O. On Windows the synchronous ioengines don't support direct I/O. Default: false.

What I also noticed is that the performance on the ThinkPad damatically worse than on the workstation.

Conclusion #

If you’re tuning I/O performance on OpenBSD — whether for databases, file servers, or personal use — don’t fall into the “more jobs = more performance” trap. Our tests clearly show:

6 to 8 parallel jobs is optimal for both reads and writes.
Beyond that, latency suffers and throughput gains are negligible.

I wanted to get a quick but solid overview of how well OpenBSD handles disk I/O across increasing thread counts. The results matched my expectations — scaling up works to a point, but there are trade-offs. What these benchmarks don’t show is that once the number of threads grows too large, my KDE desktop becomes almost unusable. This is something to keep in mind for real-world multitasking scenarios.

An upcoming test I plan to run will involve RW performance on USB sticks, which could offer more insight as we stress more subsystems.

Read Entire Article