GPTuner is a manual-reading database tuning system to suggest satisfactory knob configurations with reduced tuning costs. The figure above presents the tuning workflow that involves seven steps:
The following instructions have been tested on Ubuntu 20.04 and PostgreSQL v14.9:
sudo apt-get update
sudo apt-get install postgresql-14
cd ./scripts
sh install_benchbase.sh postgres
sh build_benchmark.sh postgres tpch
sudo pip install -r requirements.txt
# PYTHONPATH=src python src/run_gptuner.py <dbms> <benchmark> <timeout> <seed>
PYTHONPATH=src python src/run_gptuner.py postgres tpch 180 -seed=100
where <dbms> specifies the DBMS (e.g., postgres or mysql), <benchmark> is the target workload (e.g., tpch or tpcc), <timeout> is the maximum time allowed to stress-test the benchmark, <seed> is the random seed used by the optimizer.
The optimization result is stored in optimization_results/{dbms}/{stage}/{seed}/runhistory.json, where {dbms} is the target DBMS, {stage} is coarse or fine and {seed} is the random seed given by user.
"configs": {
"1": {
"effective_io_concurrency": 200,
"random_page_cost": 1.2
},
}
PYTHONPATH=src python -m streamlit run src/demo/entrypage.py
We compare GPTuner with state-of-the-art methods both using or not using natural language knowledge as input:
We compare GPTuner with baselines on different DBMS (PostgreSQL and MySQL), benchmarks (TPC-H and TPC-C) and metrics (throughput and latency). We present the results on PostgreSQL in this repository. For more details, please refer to our paper or technical report.
@article{10.14778/3659437.3659449,
author = {Lao, Jiale and Wang, Yibo and Li, Yufei and Wang, Jianping and Zhang, Yunjia and Cheng, Zhiyuan and Chen, Wanghu and Tang, Mingjie and Wang, Jianguo},
title = {GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization},
year = {2024},
issue_date = {April 2024},
publisher = {VLDB Endowment},
volume = {17},
number = {8},
issn = {2150-8097},
url = {https://doi.org/10.14778/3659437.3659449},
doi = {10.14778/3659437.3659449},
abstract = {Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is an increasing number of knobs to tune and each knob could be in continuous or categorical values, manual tuning becomes impractical. Recently, automatic tuning systems using machine learning methods have shown great potentials. However, existing approaches still incur significant tuning costs or only yield sub-optimal performance. This is because they either ignore the extensive domain knowledge available (e.g., DBMS manuals and forum discussions) and only rely on the runtime feedback of benchmark evaluations to guide the optimization, or they utilize the domain knowledge in a limited way. Hence, we propose GPTuner, a manual-reading database tuning system that leverages domain knowledge extensively and automatically to optimize search space and enhance the runtime feedback-based optimization process. Firstly, we develop a Large Language Model (LLM)-based pipeline to collect and refine heterogeneous knowledge, and propose a prompt ensemble algorithm to unify a structured view of the refined knowledge. Secondly, using the structured knowledge, we (1) design a workload-aware and training-free knob selection strategy, (2) develop a search space optimization technique considering the value range of each knob, and (3) propose a Coarse-to-Fine Bayesian Optimization Framework to explore the optimized space. Finally, we evaluate GPTuner under different benchmarks (TPC-C and TPC-H), metrics (throughput and latency) as well as DBMS (PostgreSQL and MySQL). Compared to the state-of-the-art approaches, GPTuner identifies better configurations in 16x less time on average. Moreover, GPTuner achieves up to 30\% performance improvement (higher throughput or lower latency) over the best-performing alternative.},
journal = {Proc. VLDB Endow.},
month = {may},
pages = {1939–1952},
numpages = {14}
}
@article{10.1145/3733620.3733641,
author = {Lao, Jiale and Wang, Yibo and Li, Yufei and Wang, Jianping and Zhang, Yunjia and Cheng, Zhiyuan and Chen, Wanghu and Tang, Mingjie and Wang, Jianguo},
title = {GPTuner: An LLM-Based Database Tuning System},
year = {2025},
issue_date = {March 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {54},
number = {1},
issn = {0163-5808},
url = {https://doi.org/10.1145/3733620.3733641},
doi = {10.1145/3733620.3733641},
abstract = {Selecting appropriate values for the configurable knobs of Database Management Systems (DBMS) is essential to improve performance. But because the complexity of this task has surpassed the abilities of even the best human experts, the database community turns to machine learning (ML)- based automatic tuning systems. However, these systems still incur significant tuning costs or only yield suboptimal performance, attributable to their overly high reliance on black-box optimization and the lack of integration with domain knowledge, such as DBMS manuals and forum discussions. Hence, we propose GPTuner, a manual-reading database tuning system that extensively leverages domain knowledge to automatically optimize the search space and enhance the runtime feedback-based optimization process. Firstly, we develop a Large Language Model (LLM)-based pipeline to collect and refine heterogeneous knowledge, and propose a prompt ensemble algorithm to unify a structured view of the refined knowledge. Secondly, using the structured knowledge, we (1) design a workload-aware, trainingfree knob selection strategy, (2) develop a search space optimization technique considering the value range of each knob, (3) propose a Coarse-to-Fine Bayesian Optimization Framework to explore the optimized space. Finally, we evaluate GPTuner under different benchmarks (TPC-C and TPC-H), metrics (throughput and latency) and DBMS (PostgreSQL and MySQL). Compared to state-of-the-art methods, GPTuner identifies better configurations in 16x less time on average. Moreover, GPTuner achieves up to 30\% performance improvement over the best-performing alternative.},
journal = {SIGMOD Rec.},
month = apr,
pages = {101–110},
numpages = {10}
}
@inproceedings{10.1145/3626246.3654739,
author = {Lao, Jiale and Wang, Yibo and Li, Yufei and Wang, Jianping and Zhang, Yunjia and Cheng, Zhiyuan and Chen, Wanghu and Zhou, Yuanchun and Tang, Mingjie and Wang, Jianguo},
title = {A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning System},
year = {2024},
isbn = {9798400704222},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3626246.3654739},
doi = {10.1145/3626246.3654739},
abstract = {Selecting appropriate values for the configurable knobs of Database Management Systems (DBMS) is crucial to improve performance. But because such complexity has surpassed the abilities of even the best human experts, database community turns to machine learning (ML)-based automatic tuning systems. However, these systems still incur significant tuning costs or only yield sub-optimal performance, attributable to their overly high reliance on black-box optimization and an oversight of domain knowledge. This paper demonstrates GPTuner, a manual-reading database tuning system that leverages Large Language Model (LLM) to bridge the gap between black-box optimization and white-box domain knowledge. This demonstration empowers (1) regular users with limited tuning experience to gain qualitative insights on the features of knobs, and optimize their DBMS performance automatically and efficiently, (2) database administrators and experts to further enhance GPTuner by simply contributing their invaluable tuning suggestions in natural language. Finally, we offer visitors the opportunity to explore a range of DBMS and optimization metrics, coupled with the flexibility to tailor their target workloads to their specific needs.},
booktitle = {Companion of the 2024 International Conference on Management of Data},
pages = {504–507},
numpages = {4},
keywords = {bayesian optimization, database tuning, large language model},
location = {<conf-loc>, <city>Santiago AA</city>, <country>Chile</country>, </conf-loc>},
series = {SIGMOD/PODS '24}
}