R interface to Apple’s MLX (Machine Learning eXchange) library.
Overview
Rmlx provides an R interface to Apple’s MLX framework, enabling high-performance GPU computing on Apple Silicon.
This package was vibe-coded with Claude/OpenAI Codex in a week. Use at your own risk! Much of the C++ API has been implemented, but not python-only features such as large neural network layers.
Most C++ functions are implemented via R functions with an mlx_ prefix. In addition, the package defines mlx-specific methods for many R matrix operations, including arithmetic, subsetting and matrix algebra.
 
Motivation
Modern Macs have a GPU, which is great for performing matrix operations. Statistics uses a lot of matrix operations. But until now, there has been no way for R on the Mac to use the GPU.
 
Requirements
- macOS on Apple Silicon or Linux with CUDA or MacOS/Linux for a CPU-only build
 
 
Installation
brew install mlx
or the Linux equivalent. Then just install the package as normal:
# install.packages("remotes")
remotes::install("hughjonesd/Rmlx")
Alternatively, you can build mlx from the source.
 
Features
Fast GPU Operations
library(Rmlx)
#> 
#> Attaching package: 'Rmlx'
#> The following object is masked from 'package:stats':
#> 
#>     fft
#> The following objects are masked from 'package:base':
#> 
#>     asplit, backsolve, chol2inv, col, colMeans, colSums, diag, drop,
#>     outer, row, rowMeans, rowSums, svd
A <- matrix(rnorm(1e6), 1e3, 1e3)
system.time(solve(A))
#>    user  system elapsed 
#>   0.365   0.003   0.367
system.time(solve(as_mlx(A)))
#>    user  system elapsed 
#>   0.037   0.055   0.094 
 
Lazy Evaluation
Operations are recorded but not executed until explicitly evaluated:
x <- as_mlx(matrix(1:25, 5, 5))
y <- as_mlx(matrix(101:125, 5, 5))
# Lazy - not computed yet
z <- x + y * 2
# Force evaluation
mlx_eval(z)
# Or convert to R (automatically evaluates)
as.matrix(z)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  203  218  233  248  263
#> [2,]  206  221  236  251  266
#> [3,]  209  224  239  254  269
#> [4,]  212  227  242  257  272
#> [5,]  215  230  245  260  275 
 
Device Management
M series chips have shared memory between the CPU and GPU, so switching between devices is costless.
 
Subsetting
Subsetting works like base R:
x <- as_mlx(matrix(1:9, 3, 3))
x[1:2, 1:2]
#> mlx array [2 x 2]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
# drop = FALSE by default
x[1, ]
#> mlx array [1 x 3]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
logical_mask <- c(TRUE, FALSE, TRUE)
x[logical_mask, ]
#> mlx array [2 x 3]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    3    6    9
# subset assignment
x[, 2] <- c(0, 0, 0)
x
#> mlx array [3 x 3]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2] [,3]
#> [1,]    1    0    7
#> [2,]    2    0    8
#> [3,]    3    0    9 
 
Arithmetic
x <- as_mlx(matrix(1:12, 3, 4))
y <- as_mlx(matrix(13:24, 3, 4))
# Element-wise operations
sum_xy <- x + y
diff_xy <- x - y
prod_xy <- x * y
quot_xy <- x / y
pow_xy <- x ^ 2
# Comparisons
lt <- x < y
eq <- x == y
# Bring results back to R
as.matrix(sum_xy)
#>      [,1] [,2] [,3] [,4]
#> [1,]   14   20   26   32
#> [2,]   16   22   28   34
#> [3,]   18   24   30   36
as.matrix(lt)
#>      [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE TRUE 
 
Matrix Operations
Many base R matrix functions have mlx-specific methods:
a <- as_mlx(matrix(1:6, 2, 3))
b <- as_mlx(matrix(1:6, 3, 2))
# rbind, cbind, transpose
rbind(a, t(b))
#> mlx array [4 x 3]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
#> [3,]    1    2    3
#> [4,]    4    5    6
cbind(a, t(b))
#> mlx array [2 x 6]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    3    5    1    2    3
#> [2,]    2    4    6    4    5    6
# Matrix algebra
a %*% b
#> mlx array [2 x 2]
#>   dtype: float32
#>   device: gpu
#>   values:
#>      [,1] [,2]
#> [1,]   22   49
#> [2,]   28   64
# Reductions
sum(a)
#> mlx array []
#>   dtype: float32
#>   device: gpu
#>   values:
#> [1] 21
colMeans(a)
#> mlx array [3]
#>   dtype: float32
#>   device: gpu
#>   values:
#> [1] 1.5 3.5 5.5
# Cumulative operations flatten column-major
cumsum(a)
#> mlx array [6]
#>   dtype: float32
#>   device: gpu
#>   values:
#> [1]  1  3  6 10 15 21
qr_res <- qr(a)
svd_res <- svd(a)
chol_res <- chol(a[, 1:2])
fft_res <- fft(a)
crossprod_res <- crossprod(a, b[1:2, ])
solve_res <- solve(a[, 1:2], b[1:2, ]) 
 
Automatic Differentiation
loss <- function(w, x, y) {
  preds <- x %*% w
  resids <- preds - y
  sum(resids * resids) / length(y)
}
x <- as_mlx(matrix(rnorm(20), 5, 4))
y <- as_mlx(matrix(rnorm(5), 5, 1))
w <- as_mlx(matrix(0, 4, 1))
grads <- mlx_grad(loss, w, x, y)
# Inspect gradient
as.matrix(grads[[1]])
#>            [,1]
#> [1,] -1.8226540
#> [2,]  0.3192708
#> [3,]  1.3037107
#> [4,]  0.1494838
# Simple SGD loop
model <- mlx_linear(4, 1, bias = FALSE)
opt <- mlx_optimizer_sgd(mlx_parameters(model), lr = 0.1)
loss_fn <- function(mod, data_x, data_y) {
  preds <- mlx_forward(mod, data_x)
  resids <- preds - data_y
  sum(resids * resids) / length(data_y)
}
for (step in 1:50) {
  mlx_train_step(model, loss_fn, opt, x, y)
}
# Check final loss
ypred <- mlx_forward(model, x)
mean((ypred - y) * (ypred - y))
#> mlx array []
#>   dtype: float32
#>   device: gpu
#>   values:
#> [1] 0.04954781