A Lightweight Solution to Speed Up Queries by Dumping Data to Files

3 months ago 3

Traditional transactional databases (TP) do not good at handling analytical computations. Professional OLAP databases are too heavy and often need clustering, resulting in high costs and more complicated system structure.

Storing static historical data as a lightweight esProc SPL file of columnar format enables access to SPL’s powerful computing capabilities, producing far better performance than traditional databases generate. esProc SPL is lightweight, and can run by being directly embedded in the application. It can speed up queries on dumped data while maintaining a relatively simple system structure.

This article provides a series of practical SPL methods for speeding up external data querying:

Practice #1: Regular Filtering and Grouping & Aggregation

Practice #2: COUNT DISTINCT

Practice #3: Foreign-key-based Dimension Table Join

Practice #4: Large Primary-Subtable Join

Practice #5: EXISTS at Large Primary-Subtable Join

Practice #6: Conditional Filtering by Enumerated Field

Examples in these articles cover operations having performance issues that have greatly troubled traditional databases. Those include COUNT DISTINCT, foreign-key-based JOIN, large primary-subtable JOIN (including EXISTS), conditional filtering by enumerated field (including IN), to name a few. These articles will help you break database query performance bottlenecks.

Some preparation work is required before we start.

【Here】 we have csv files and a create table SQL file to simulate part of the data of a company’s offline orders and e-commerce activities. The goal is to create tables in MYSQL database and import data from the csv files into them.

Below is the relationship between the preceding csv files:

The event table (events) stores users’ actions on the e-commerce website in ten million rows and the following fields:

Fields Description Type Note
event_id Event ID Integer Primary key
user_id User ID Integer About one million IDs
event_time Time when an event happens Datetime Within the year 2025
event_type Event type Integer 1 represents login, 2 represents view, …, and 7 represents confirm

Orders table (orders) stores offline orders data in ten million rows:

Fields Description Type Note
order_id Order ID Integer Primary key
customer_id Customer ID Integer
employee_id Employee ID Integer
employee_name Employee name String
order_date Date when an order is placed Date Within the year 2024
shipper_id Shipper ID Integer
shipping_fee Shipping fee Numeric
shipping_address Shipping address String

Orders detail (details) stores offline orders details in thirty million rows:

Fields Description Type Note
order_id Order ID Integer Primary key
product_id Product ID Integer Primary key
quantity Order quantity Integer
price Unit Price Numeric

Customer table (customer) stores data of offline customers. The table has a relatively small amount of data:

Fields Description Type Note
customer_id Customer ID Integer Primary key
customer_name Customer name String
city_id City ID Integer

City table (city) stores data of cities where offline customers are based. The table has a relatively small amount of data:

Fields Description Type Note
city_id City ID Integer Primary key
city_name City name String
state_id State ID Integer

State table (state) stores data of states where offline customers come from. The table has a relatively small amount of data:

Fields Description Type Note
state_id State ID Integer Primary key
state_name State name String

Shipper table (shipper) stores data of offline shippers. The table has a relatively small amount of data:

Fields Description Type Note
shipper_id Shipper ID Integer Primary key
shipper_name Shipper name String

Download esProc. The Standard Edition is enough.

After installing esProc, check whether a database can be successfully accessed from IDE. Take MYSQL as an example. Put the database’s JDBC in "[installation directory]\common\jdbc", which is one of esProc’s classpaths:

To create MYSQL data source in esProc, select Tool ->Connect to Data Source on the menu, configure the standard MySQL JDBC connection named speed:

Return to the Datasoure window and connect to the configured data source just now. The setup is successful if the data source name turns pink.

Create a new script in IDE, write SPL statements, and connect to the database to load data from orders table through an SQL statement.

SPL code example 1:

A
1 =connect("speed")
2 =A1.query@x("select * from orders limit 100")

Press Ctrl-F9 or click the Execute button and then click A2, and there are 100 records listed to the right.

The SPL code is written in the grid, where a cell name can be directly used as a temporary variable.

Look up Tutorial and Function Reference if you have any problems.

The environment where our tests are performed comprises VMware virtual machine, 8-core CPU, 8G RAM, SSD drive, Windows 11 OS, MySQL 8.0, and esProc SPL Standard Edition.

Read Entire Article