Number of reduce partitions. Default: same as concurrent()
->partitions(16) // More parallelism in reduce phase
Tip: Use 1-2x concurrency level. More partitions = better parallelism but more overhead.
Controls: Memory usage during shuffle phase sorting. Default: 10000
Determines how many records accumulate in memory before being sorted and written as a chunk file during external sorting. This only affects the shuffle phase.
->chunkSize(50000) // High-memory system
->chunkSize(2000) // Memory-constrained
Memory impact: chunkSize × average_record_size bytes per sort operation
Controls: I/O write buffering across all phases (map, shuffle, reduce). Default: 1000
Determines how many records are buffered in memory before flushing to disk. Affects the frequency of fwrite() system calls. Used by all phases, not just shuffle.
->bufferSize(5000) // Large datasets, plenty of RAM
->bufferSize(500) // Memory-constrained
Memory impact: bufferSize × average_record_size bytes per writer (multiple writers in map phase)
These are independent memory concerns. For example, in the shuffle phase, a 10,000-record chunk will be written in ~10 buffer flushes if bufferSize=1000.
workingDirectory(string $path)
Directory for temporary files. Default: system temp
->workingDirectory('/mnt/fast-ssd/tmp')
Tip: Use SSD storage for better performance on large datasets.
partitionBy(callable $partitioner)
Custom function to control which partition a key goes to. Must return 0 to partitions-1.
->partitionBy(function ($key, $numPartitions) {
returnord($key[0]) % $numPartitions; // Partition by first letter
})
(newMapReduceBuilder())
->concurrent(8) // CPU cores (or 2-3x for I/O tasks)
->partitions(16) // 1-2x concurrency
->chunkSize(50000) // Shuffle sort memory (larger = fewer merges)
->bufferSize(5000) // Write buffer across all phases (larger = fewer syscalls)
->workingDirectory('/ssd') // Use fast storage for large jobs
Low Memory System (2GB RAM):
->chunkSize(2000) // Small sort chunks
->bufferSize(500) // Small write buffers
High Performance System (SSD, 32GB RAM):
->chunkSize(100000) // Large sort chunks for fast sorting
->bufferSize(10000) // Large buffers to minimize disk I/O
->workingDirectory('/mnt/fast-ssd/tmp')
Tuning independently:
Slow disk? Increase bufferSize to batch more writes
Limited RAM during shuffle? Decrease chunkSize to reduce sort memory