Bricks-use: MCP server for running Databricks SQL queries and comparing tables

4 months ago 2

A powerful Model Context Protocol (MCP) server for executing Databricks SQL queries and comparing table data.

⚠️ This project is purely meant for demo purposes - use at your own risk!

Features
Quick Start
Setup
Tools
VS Code Integration
Usage Examples
License

Feature Description

Execute SQL Queries	Run any SQL query on Databricks with configurable result limits
Table Information	Get detailed information about tables including schema and row counts
Table Comparison	Compare two tables by downloading their data and running CLI diff
Quick Comparison	Fast metadata-only comparison of tables

# 1. Clone and setup git clone https://github.com/aymenfurter/bricks-use.git cd bricks-use python -m venv .venv && source .venv/bin/activate # 2. Install dependencies pip install -r requirements.txt # 3. Configure environment (see setup section) cp .env.example .env # Edit with your credentials # 4. Run the server python databricks_server.py

Python	3.11 or higher
Databricks	Workspace access
Token	Personal access token

Set the following environment variables or create a .env file:

# Databricks Configuration DATABRICKS_SERVER_HOSTNAME=your-workspace.cloud.databricks.com DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/your-warehouse-id DATABRICKS_ACCESS_TOKEN=your-personal-access-token # Optional Settings DATABRICKS_CATALOG=main # Defaults to 'main' DATABRICKS_SCHEMA=default # Defaults to 'default' DATABRICKS_TEMP_DIR=/tmp/databricks_mcp # Temp directory

Create and activate a virtual environment:

python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install dependencies:

pip install -r requirements.txt
Run the server:

python databricks_server.py

Tool Purpose Key Parameters

execute_query	Execute SQL queries	query, limit
get_table_info	Get table metadata	table_name, catalog, schema
compare_tables	Full data comparison	table1, table2, diff_lines
quick_compare_tables	Metadata comparison	table1, table2

Execute a SQL query on Databricks.

Parameters:

query (str): SQL query to execute
limit (int, optional): Maximum rows to return (default: 1000)

Example:

Execute this query: SELECT * FROM my_table WHERE status = 'active'

Get information about a Databricks table.

Parameters:

table_name (str): Name of the table
catalog (str, optional): Catalog name
schema (str, optional): Schema name

Example:

Get info for table 'users' in catalog 'production' and schema 'analytics'

Compare data between two tables by downloading full data and running diff.

Parameters:

table1 (str): First table name
table2 (str): Second table name
catalog1 (str, optional): Catalog for table1
schema1 (str, optional): Schema for table1
catalog2 (str, optional): Catalog for table2
schema2 (str, optional): Schema for table2
diff_lines (int, optional): Number of diff context lines (default: 10)

Example:

Compare tables 'users_old' and 'users_new' and show differences

Quick metadata-only comparison without downloading data.

Parameters:

table1 (str): First table name
table2 (str): Second table name
catalog1 (str, optional): Catalog for table1
schema1 (str, optional): Schema for table1
catalog2 (str, optional): Catalog for table2
schema2 (str, optional): Schema for table2

Example:

Quick compare 'table_a' and 'table_b' schemas and row counts

Add this configuration to your VS Code settings (mcp.json):

Click to expand VS Code configuration

{ "inputs": [ { "type": "promptString", "id": "databricks_server_hostname", "description": "Databricks Server Hostname" }, { "type": "promptString", "id": "databricks_http_path", "description": "Databricks HTTP Path" }, { "type": "promptString", "id": "databricks_access_token", "description": "Databricks Access Token", "password": true }, { "type": "promptString", "id": "databricks_catalog", "description": "Databricks Catalog (default: main)" }, { "type": "promptString", "id": "databricks_schema", "description": "Databricks Schema (default: default)" } ], "servers": { "databricks": { "command": "python", "args": [ "${workspaceFolder}/databricks_server.py" ], "env": { "PYTHONUNBUFFERED": "1", "DATABRICKS_SERVER_HOSTNAME": "${input:databricks_server_hostname}", "DATABRICKS_HTTP_PATH": "${input:databricks_http_path}", "DATABRICKS_ACCESS_TOKEN": "${input:databricks_access_token}", "DATABRICKS_CATALOG": "${input:databricks_catalog}", "DATABRICKS_SCHEMA": "${input:databricks_schema}" }, "workingDirectory": "${workspaceFolder}" } } }

Execute this SQL query: SELECT customer_id, order_date, total_amount FROM orders WHERE order_date >= '2024-01-01' LIMIT 100

Compare tables 'sales_2023' and 'sales_2024' and show me the differences

Quick compare the schemas of 'old_users' and 'new_users' tables

This project is licensed under the MIT License.

Made with ❤️ for the Databricks community

Read Entire Article