Show HN: XSched, a scheduling framework for multitasking over diverse XPUs

1 hour ago 1

License


  • [2025/08] We migrated XSched to Windows and completed validation on CUDA, LevelZero, and OpenCL.
  • [2025/07] We integrated XSched into llama.cpp and NVIDIA Triton to enable priority-based scheduling between multiple inference requests.
  • [2025/07] We presented our XSched paper at OSDI 2025 and prepared several interesting demo videos: Ascend910, GV100, Least-Laxity-First, and Active-Window-First.
  • [2025/06] We officially released XSched! Check out our blog post.

XSched eliminates video stuttering in AI video conference applications on AI PCs

For more details, please see the 2nd case study in our paper.

video_conference_on_intel_npu.mp4

XSched is a preemptive scheduling framework for diverse XPUs (referring to various accelerators, such as GPUs, NPUs, ASICs, and FPGAs) across different brands, generations, and software platforms. XSched provides unified interfaces for scheduling XPU tasks through a preemptible command queue abstraction (XQueue), enabling hardware-agnostic, flexible scheduling policies for various objectives. XSched introduces a multi-level hardware model that helps advanced XPUs achieve optimal scheduling performance while maintaining compatibility with emerging XPUs. The framework is designed to efficiently schedule XPU tasks while remaining transparent to existing XPU-based applications.

  • Transparency: Works with existing applications without code changes.
  • Flexibility: Supports diverse scheduling policies and accelerator types.
  • Extensibility: Accommodates new hardware features and software platforms.
  • Integration: Adapts to both OS-level and system-level multitasking scenarios.
  • Performance: Delivers high scheduling performance with minimal runtime overhead.

✅ supported and implemented

❌ not supported

🔘 not yet implemented

🚧 implementation within progress

Platform XPU Shim Level-1 Level-2 Level-3
CUDA NVIDIA Ampere GPUs (sm86) 🚧 🚧
NVIDIA Volta GPUs (sm70)
NVIDIA Kepler GPUs (sm35)
Other NVIDIA GPUs 🔘 🔘
HIP AMD GPUs 🔘 🔘
LevelZero Intel GPUs 🔘 🔘
Intel Integrated NPUs
OpenCL NVIDIA GPUs 🔘 🔘
AMD GPUs 🔘 🔘
Intel GPUs 🔘 🔘
Xillinx FPGAs 🔘
AscendCL Ascend NPUs 🔘
cuDLA NVIDIA DLA
VPI NVIDIA OFA
NVIDIA PVA
git clone https://github.com/XpuOS/xsched.git cd xsched git submodule update --init --recursive
# build XSched only # XSched will be installed to xsched/output by default make # install XSched to a custom path make INSTALL_PATH=/path/to/install # build XSched along with platform-specific components (HAL, Shim, etc.) make cuda # or hip, levelzero, opencl, ascend, cudla, vpi # or specify PLATFORM make PLATFORM = cuda make PLATFORM = "cuda levelzero opencl" # build multiple platforms at once

Transparently Schedule Applications

XSched is designed to be transparent to applications. By setting a few environment variables, you can schedule your application with XSched. See our example (Linux, Windows) for transparent scheduling.

Linking with XSched for Customized Scheduling

You can also explicitly link with XSched and use XQueue APIs & Hint APIs in your application for more flexibility. See our examples for more details:

Check out our example list for more advanced use cases.

Architecture and Workflow

XSched framework

XSched consists of four key components: XPU shim (XShim), XPU task preemption module (XPreempt), XPU hardware adapter layer (XAL), and an XScheduler. XShim, XPreempt, and XAL are three dynamically linked libraries that are preloaded into each XPU application process, while XScheduler runs as a centric system service daemon.

  • XShim: named as shim in the code, intercepts XPU driver API calls and redirects commands to the XQueue ①, allowing applications to run on XSched without modifications (transparency).
  • XPreempt: named as preempt in the code, implements XQueue interfaces based on the multi-level hardware model ②. Contains an agent that watches the state of XQueue (e.g., ready or idle) and generates scheduling events to notify the XScheduler via IPC ③. Also responsible for applying the scheduling operations (e.g., suspend or resume an XQueue) received from the XScheduler ⑤.
  • XAL: named as hal in the code, implements the multi-level hardware model interfaces by calling XPU driver APIs.
  • XScheduler: named as xserver in the code, coordinates all XQueues from different processes, monitors global XQueue status through agent-reported events ③, and invokes the scheduling policy to make decisions when status changes. Decisions are enforced by sending scheduling operations to agents ④. The policy is modular and customizable to suit various workloads.
  • XCLI: a command-line tool that can monitor XQueue status, change the policy, or give scheduling hints (e.g., priority) ⑥.

We will continue to support XSched on more OSes and platforms, and improve the performance of XSched. Please stay tuned!

  • Integrated into LLM serving systems (e.g., llama.cpp, vLLM)
  • Support Windows
  • Support MacOS
  • Install as system daemon

XSched is designed to be extensible and flexible.

We welcome contributions:

  • Support more platforms, or a higher preemption level on existing platforms. See guide
  • Implement a new scheduling policy. See guide
  • Integrate XSched into AI-powered applications.
  • Report or fix issues.

If you use XSched for your research, please cite our paper:

@inproceedings{Shen2025xsched, title = {{XSched}: Preemptive Scheduling for Diverse {XPU}s}, author = {Weihang Shen and Mingcong Han and Jialong Liu and Rong Chen and Haibo Chen}, booktitle = {19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)}, year = {2025}, isbn = {978-1-939133-47-2}, address = {Boston, MA}, pages = {671--692}, url = {https://www.usenix.org/conference/osdi25/presentation/shen-weihang}, publisher = {USENIX Association}, month = jul }

The artifacts of XSched is published on Github and Zenodo.

  • For technical questions and feature requests, please submit via GitHub Issues
  • For collaborations and partnerships, please reach out at [email protected]
Read Entire Article