Analysing FIX Data with ClickHouse

11 hours ago 1

The Financial Information eXchange (FIX) Protocol is a standard that is used extensively within the capital markets industry to exchange information between trading counterparties.

It is based around a series of known FIX messages which can be used for purposes such as placing new trades, amending open orders, exchanging market data, and for post trade processing.

For example, if a company wishes to electronically place a buy order for a stock, their systems could send a FIX message to their broker which includes details such as the symbol of the stock they wish to trade, the quantity, the limit price. The broker would then execute the trade, perhaps involving another FIX message being routed to a stock exchange, before sending a confirmation message back to the originator to confirm the order and the price it was filled at.

A FIX message to place a new 100 lot order for Google shares (tag 55) might look like this:

8=FIX.4.2|9=178|35=D|49=CLIENT1|56=BROKER1|34=1|52=20250526-15:30:00.000|11=ORD123456|21=1|55=GOOGL|54=1|38=100|40=2|44=2795.00|59=0|10=235|

Once the order is executed, the broker could respond with an execution report showing that the full 100 lot order has been filled (tag 38) at an average price of $2795.00 (tag 31).

8=FIX.4.2|9=196|35=8|49=BROKER1|56=CLIENT1|34=2|52=20250526-15:30:01.000|37=EXEC7890|11=ORD123456|17=EX12345|150=0|39=2|55=GOOGL|54=1|38=100|14=100|6=2795.00|31=2795.00|32=100|151=0|10=072|

As you can see, each FIX message is a string that is made up of a series of numeric tags and values, each of which has a specific meaning. For instance, tag 8 is the protocol version, 35 is the message type, 49 is the sender, and 52 is the timestamp. By agreeing on the meaning of these tags, the FIX protocol allows for the exchange of information between different counterparties in a standardised way.

How FIX Data Is Used

At a minimum, businesses have to safely retain and archive their FIX messages for audit, regulatory and compliance purposes. If there is an issue with a trade, it is important to be able to see the exact FIX messages that were sent and received in order to understand what happened some time after the fact.

The same data can also be used for valuable analytics purposes. For instance, it may be useful to understand simple metrics such as trade volumes or more complex insights relating to trading opportunities. There may also be compliance use cases such as identifying market abuse or wash trades which may be apparent in the FIX logs.

For all of these use cases, it is important that data is easy to query, search, analyse and join. If FIX data is simply written to a text file or stored in an inappropriate database then it is very hard to extract value from. I myself have spent a lot of times searching through text based log files to try and recreate a picture of what happened with a given set of trades and it is far from ideal!

Why ClickHouse Is A Good Fit For This Data

As ClickHouse is growing in prominence within capital markets, businesses such as banks, hedge funds, exchanges, brokers and trading technology vendors are increasingly interested in using it to store and analyse their FIX related data. Fortunately, they will find that ClickHouse is a natural fit for this use case.

FIX data is inherently high volume. A single bank could be exchanging millions of messages per day as they process huge numbers of orders with their various counterparties. Even vanilla trading use cases could generate significant data volumes, whilst in the realm of algorithmic and high frequency trading, the number of messages could be vast. Over the year, we could be in the tens to hunderds of terrabyte scale based on FIX messages alone. ClickHouse is designed for these types of data volumes and can compress data very well. It also has the ability to store data multiple replicas of data and to migrate it to cold storage over time, making it an excellent fit for vanilla archival and retrieval.

FIX data tends to be time series and event based, meaning it can be appended to a single table without the need for significant denormalisation. This is the type of data which ClickHouse is really optimised for as it can be stored in one big table, partitioned by time, and continually appended to. Specialised log table engines are available optimised for this use case.

Electronic trading is also an inherently real time use case. Users such as traders and risk managers and external systems such as post trade processing or compliance systems increasingly need visibility of FIX logs in near real time. ClickHouse gives us this ability to rapidly ingest new data, to query it and expose fresh data, making it an excellent fit for this.

The final thing I would flag is the fact that FIX messages are essentially strings of key value pairs. ClickHouse has a number of powerful string and array processing functions built in, making it very easy to work with FIX data prior to it’s normalisation without any external processing.

About This Example

We wanted to show a simple example of using ClickHouse to store and analyse FIX data to demonstrate some techniques for working with it efficiently.

In this example, we have a trading technology vendor who are capturing trade data between various banks. These banks are trading a mixture of 10 different stocks, the prices of which are changing over time.

Though most trades are succesful, in some instances, a counterparty can reject a trade, perhaps due to an infrastructure problem or breached trading limits. This could end up impacting the senders profit and loss (PnL) positively or negatively as the market price changes.

Our task in this example is to understand and quantifiy the financial impact of these rejections by interrogating the FIX logs. We will perform some analysis such as splitting this by counterparty and investigating the cumulative loss through the trading day.

Dataset

We used the SimpleFix Python library for generating a fictitious set of FIX logs representing buy/sell transactions between 10 banks, which we saved in a ClickHouse table called fix.log. For simplicity, we considered only two types of messages:

  • New Orders - Single (MsgType = ‘D’): A trade request is sent from one bank to another to buy or sell a stock at market price;

  • Confirmation (MsgType = ‘AK’): The bank that received the request either confirms or rejects the trade.

We generated both the stock prices and the FIX messages at secondly frequency. We assumed that the difference between the time when the trade request is sent, the time when the trade is executed (if confirmed), and the time when the trade confirmation is received is only a few milliseconds, and that the time lag between the trade request and the trade confirmation has no impact on the market price at which the trade is executed. A preview of the raw data is shown below:

SELECT UID, Timestamp, LEFT(Message, 50) FROM fix.log LIMIT 10 Query id: e13cec5b-1483-47c0-a3ba-af7063e1ad69 ┌─UID─┬───────────Timestamp─┬─left(Message, 50)──────────────────────────────────┐ 1. │ 1 │ 2024-03-08 09:38:45 │ 8=FIX.4.4|35=D|49=BNP Paribas|56=Goldman Sachs|34= │ 2. │ 2 │ 2024-03-08 09:38:45 │ 8=FIX.4.4|35=AK|49=Goldman Sachs|56=BNP Paribas|34 │ 3. │ 3 │ 2024-03-08 09:38:46 │ 8=FIX.4.4|35=D|49=BNP Paribas|56=Bank of America|3 │ 4. │ 4 │ 2024-03-08 09:38:46 │ 8=FIX.4.4|35=AK|49=Bank of America|56=BNP Paribas| │ 5. │ 5 │ 2024-03-08 09:38:47 │ 8=FIX.4.4|35=D|49=Morgan Stanley|56=Bank of Americ │ 6. │ 6 │ 2024-03-08 09:38:47 │ 8=FIX.4.4|35=AK|49=Bank of America|56=Morgan Stanl │ 7. │ 7 │ 2024-03-08 09:38:48 │ 8=FIX.4.4|35=D|49=Morgan Stanley|56=Nomura|34=2|52 │ 8. │ 8 │ 2024-03-08 09:38:48 │ 8=FIX.4.4|35=AK|49=Nomura|56=Morgan Stanley|34=1|5 │ 9. │ 9 │ 2024-03-08 09:38:49 │ 8=FIX.4.4|35=D|49=J.P. Morgan|56=BNP Paribas|34=1| │ 10. │ 10 │ 2024-03-08 09:38:49 │ 8=FIX.4.4|35=AK|49=BNP Paribas|56=J.P. Morgan|34=3 │ └─────┴─────────────────────┴────────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.003 sec. Processed 2.12 thousand rows, 484.37 KB (706.32 thousand rows/s., 161.15 MB/s.) Peak memory usage: 907.05 KiB.

The transactions refer to 10 different stocks. We generated the stock prices from a random walk and saved them in a ClickHouse table called fix.prices whilst generating the test data. The price data has the following format:

SELECT * FROM fix.prices LIMIT 10 Query id: d897451f-a51c-44a5-881d-f785bf66e8b1 ┌───────────Timestamp─┬─Ticker─┬─Price─┐ 1. │ 2024-03-08 09:38:45 │ GTO │ 28.89 │ 2. │ 2024-03-08 09:38:45 │ KHU │ 8.82 │ 3. │ 2024-03-08 09:38:45 │ GZS │ 98.57 │ 4. │ 2024-03-08 09:38:45 │ WKK │ 77.46 │ 5. │ 2024-03-08 09:38:45 │ XUD │ 19.92 │ 6. │ 2024-03-08 09:38:45 │ HXC │ 56.27 │ 7. │ 2024-03-08 09:38:45 │ VUB │ 38.77 │ 8. │ 2024-03-08 09:38:45 │ XLF │ 36.6 │ 9. │ 2024-03-08 09:38:45 │ BUA │ 33.55 │ 10. │ 2024-03-08 09:38:45 │ LZV │ 9.18 │ └─────────────────────┴────────┴───────┘ 10 rows in set. Elapsed: 0.003 sec. Processed 8.20 thousand rows, 196.80 KB (2.92 million rows/s., 70.01 MB/s.) Peak memory usage: 453.33 KiB.

Data Preparation

We start by creating a ClickHouse materialized view called fix.messages where we extract the main fields from the raw FIX messages in the fix.log table using ClickHouse string and array processing functions. We also do simple transformations such as transforming numbers to descriptive strings to make downstream processing easier.

We implement this as a materialized view so that messages are broken down for each batch of records inserted into the fix.messages table. We chose Sender, Receiver as our order key based on likely query patterns. As this system scales, we could make use of ClickHouse projections and further materialized views to pre-aggregate data and speed up queries. Introducing optimisations including LowCardinality and different compression codecs would also be avenues of exploration in future.

CREATE MATERIALIZED VIEW fix.messages ENGINE = MergeTree() ORDER BY (Sender, Receiver) POPULATE AS SELECT UID, Timestamp, Message, Protocol, MessageCode, MessageType, TradeID, TradeStatus, Sender, Receiver, Ticker, BuySell, Price, Quantity FROM ( SELECT UID, Timestamp, Message, arrayMap(x -> splitByChar('=', x), splitByChar('|', Message)) AS MessageArray, arrayFlatten(arrayMap(x -> arraySlice(x, 1, 1), MessageArray)) AS Tag, arrayFlatten(arrayMap(x -> arraySlice(x, 2, 1), MessageArray)) AS Value, arraySlice(Value, indexOf(Tag, '35'), 1)[1] AS MessageCode, IF(MessageCode = 'AK', arraySlice(Value, indexOf(Tag, '665'), 1)[1], NULL) AS ConfirmStatus, arraySlice(Value, indexOf(Tag, '8'), 1)[1] AS Protocol, IF(MessageCode = 'D', 'New Order - Single', IF(MessageCode = 'AK', 'Confirmation', 'Other')) AS MessageType, arraySlice(Value, indexOf(Tag, '11'), 1)[1] AS TradeID, IF(ConfirmStatus = '4', 'Confirmed', IF(ConfirmStatus = '5', 'Rejected', NULL)) AS TradeStatus, arraySlice(Value, indexOf(Tag, '49'), 1)[1] AS Sender, arraySlice(Value, indexOf(Tag, '56'), 1)[1] AS Receiver, arraySlice(Value, indexOf(Tag, '55'), 1)[1] AS Ticker, IF(arraySlice(Value, indexOf(Tag, '54'), 1)[1] = '1', 'Buy', 'Sell') AS BuySell, toFloat64(IF(MessageCode = 'AK', arraySlice(Value, indexOf(Tag, '6'), 1)[1], NULL)) AS Price, toFloat64( IF( MessageCode = 'D', arraySlice(Value, indexOf(Tag, '38'), 1)[1], arraySlice(Value, indexOf(Tag, '80'), 1)[1] ) ) AS Quantity FROM fix.log );

Each trade consists of an Order - New message with MessageCode = D and a confirmation or rejection with MessageCode = X. We create a view to cross join the fix messages table to make it easier to see the status and outcome of different trades. Again we make use of a materialised view to calculate the results incrementally rather than perform an expensive join each time the view is interrogated.

CREATE MATERIALIZED VIEW fix.report ENGINE=MergeTree() ORDER BY ( Sender, Receiver ) POPULATE AS SELECT Timestamp, TradeID, TradeStatus, Sender, Receiver, Ticker, BuySell, Price, RequestedQuantity, AllocatedQuantity FROM ( SELECT TradeID, Sender, Receiver, Ticker, BuySell, Quantity AS RequestedQuantity FROM fix.messages WHERE MessageCode = 'D' ) AS a INNER JOIN ( SELECT Timestamp, TradeID, TradeStatus, Price, Quantity AS AllocatedQuantity FROM fix.messages WHERE MessageCode = 'AK' ) AS b ON a.TradeID = b.TradeID;

The FIX report table has the following format, combining elements of the New Order (Quantity, Symbol, Side) with elements of the Execution Report (TradeStatus, Price). This joined data gives us a perfect base for our subsequent analysis.

SELECT * FROM fix.report LIMIT 10 Query id: e954df22-a21b-487a-b598-ae5d2ba41388 ┌───────────Timestamp─┬─TradeID─┬─TradeStatus─┬─Sender──────────┬─Receiver────────┬─Ticker─┬─BuySell─┬─Price─┬─RequestedQuantity─┬─AllocatedQuantity─┐ 1. │ 2024-03-08 09:38:45 │ 1 │ Rejected │ BNP Paribas │ Goldman Sachs │ VUB │ Buy │ 38.77 │ 108 │ 0 │ 2. │ 2024-03-08 09:38:46 │ 2 │ Confirmed │ BNP Paribas │ Bank of America │ GZS │ Buy │ 98.81 │ 100 │ 100 │ 3. │ 2024-03-08 09:38:47 │ 3 │ Confirmed │ Morgan Stanley │ Bank of America │ LZV │ Buy │ 7.48 │ 140 │ 140 │ 4. │ 2024-03-08 09:38:48 │ 4 │ Confirmed │ Morgan Stanley │ Nomura │ XLF │ Buy │ 35.33 │ 88 │ 88 │ 5. │ 2024-03-08 09:38:49 │ 5 │ Confirmed │ J.P. Morgan │ BNP Paribas │ WKK │ Buy │ 72.76 │ 105 │ 105 │ 6. │ 2024-03-08 09:38:50 │ 6 │ Confirmed │ Nomura │ J.P. Morgan │ XLF │ Buy │ 36.08 │ 65 │ 65 │ 7. │ 2024-03-08 09:38:51 │ 7 │ Confirmed │ Bank of America │ Nomura │ GZS │ Buy │ 96.96 │ 118 │ 118 │ 8. │ 2024-03-08 09:38:52 │ 8 │ Confirmed │ Citi │ J.P. Morgan │ BUA │ Buy │ 31.84 │ 89 │ 89 │ 9. │ 2024-03-08 09:38:53 │ 9 │ Confirmed │ Nomura │ BNP Paribas │ XLF │ Sell │ 37.11 │ 65 │ 65 │ 10. │ 2024-03-08 09:38:54 │ 10 │ Confirmed │ BNP Paribas │ J.P. Morgan │ XLF │ Buy │ 37.38 │ 141 │ 141 │ └─────────────────────┴─────────┴─────────────┴─────────────────┴─────────────────┴────────┴─────────┴───────┴───────────────────┴───────────────────┘ 10 rows in set. Elapsed: 0.055 sec. Processed 4.25 thousand rows, 969.35 KB (77.17 thousand rows/s., 17.61 MB/s.) Peak memory usage: 4.77 MiB.

At this point we can visualize the number of trades that were confirmed or rejected for each bank to see who has been most impacted:

Data Analysis

We have observed that a large fraction of trades were rejected, which will likely have an impact on multiple banks PnL. To quantify the impact of the rejected trades, we will aim to calculate two PnLs for each bank:

  • The expected PnL, which is calculated taking into account all trades;
  • The realized PnL, which is calculated taking into account only the confirmed trades.

We start by calculating the trade-level PnL, which we save in ClickHouse view called fix.profit_and_loss. This view uses window functions to associate the correct buy transaction with each sell and combine them into a single row which can be used to derive PnL. For simplicity, we calculate each PnL using as buy price the price at which the last buy trade was executed before each sell trade.

CREATE OR REPLACE VIEW fix.profit_and_loss AS SELECT Timestamp, Sender, Receiver, Ticker, BuyTradeStatus, SellTradeStatus, BuyQuantity, SellQuantity, BuyPrice, SellPrice, (SellQuantity * SellPrice - BuyQuantity * BuyPrice) AS PnL FROM ( SELECT Timestamp, Sender, Receiver, Ticker, LAST_VALUE(BuyQuantity) OVER ( PARTITION BY Sender, Ticker ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS BuyQuantity, SellQuantity, LAST_VALUE(BuyTradeStatus) OVER ( PARTITION BY Sender, Ticker ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS BuyTradeStatus, SellTradeStatus, LAST_VALUE(BuyPrice) OVER ( PARTITION BY Sender, Ticker ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS BuyPrice, SellPrice FROM ( SELECT Timestamp, Sender, Receiver, Ticker, IF(BuySell = 'Buy', TradeStatus, NULL) AS BuyTradeStatus, IF(BuySell = 'Sell', TradeStatus, NULL) AS SellTradeStatus, IF(BuySell = 'Buy', RequestedQuantity, NULL) AS BuyQuantity, IF(BuySell = 'Sell', RequestedQuantity, NULL) AS SellQuantity, IF(BuySell = 'Buy', Price, NULL) AS BuyPrice, IF(BuySell = 'Sell', Price, NULL) AS SellPrice FROM fix.report ORDER BY Sender, Ticker, Timestamp ) ) WHERE isNotNull(SellPrice);

The view has the following format, summarising the PnL for each trade alongside the BuyTradeStatus and SellTradeStatus which indicate which legs of the trade were rejected:

SELECT * FROM profit_and_loss LIMIT 10 Query id: 6770cbc9-4d37-4b21-8be7-4b0664a8ec29 ┌───────────Timestamp─┬─Sender──────┬─Receiver────────┬─Ticker─┬─BuyTradeStatus─┬─SellTradeStatus─┬─BuyQuantity─┬─SellQuantity─┬─BuyPrice─┬─SellPrice─┬────────────────PnL─┐ │ 2024-03-08 09:39:37 │ BNP Paribas │ Morgan Stanley │ BUA │ Confirmed │ Confirmed │ 105 │ 105 │ 32.08 │ 41.23 │ 960.75 │ │ 2024-03-08 09:43:31 │ BNP Paribas │ Bank of America │ BUA │ Confirmed │ Rejected │ 131 │ 131 │ 40.89 │ 49.19 │ 1087.2999999999993 │ │ 2024-03-08 09:47:16 │ BNP Paribas │ Deutsche Bank │ BUA │ Confirmed │ Rejected │ 106 │ 106 │ 47.9 │ 57.18 │ 983.6800000000003 │ │ 2024-03-08 10:15:30 │ BNP Paribas │ Citi │ BUA │ Confirmed │ Confirmed │ 145 │ 145 │ 33.86 │ 46.65 │ 1854.5500000000002 │ │ 2024-03-08 09:42:33 │ BNP Paribas │ Goldman Sachs │ KHU │ Confirmed │ Confirmed │ 62 │ 62 │ 16.29 │ 19.48 │ 197.7800000000001 │ │ 2024-03-08 09:54:09 │ BNP Paribas │ Citi │ KHU │ Confirmed │ Confirmed │ 79 │ 79 │ 15.64 │ 28.02 │ 978.02 │ │ 2024-03-08 09:56:45 │ BNP Paribas │ Barclays │ KHU │ Rejected │ Confirmed │ 64 │ 64 │ 40.58 │ 41.36 │ 49.92000000000007 │ │ 2024-03-08 09:58:27 │ BNP Paribas │ J.P. Morgan │ KHU │ Confirmed │ Confirmed │ 57 │ 57 │ 27.42 │ 34.85 │ 423.51 │ │ 2024-03-08 10:00:33 │ BNP Paribas │ Citi │ KHU │ Confirmed │ Confirmed │ 135 │ 135 │ 29.29 │ 36.73 │ 1004.3999999999992 │ │ 2024-03-08 10:06:19 │ BNP Paribas │ Morgan Stanley │ KHU │ Confirmed │ Rejected │ 140 │ 140 │ 34.87 │ 42.18 │ 1023.4000000000005 │ └─────────────────────┴─────────────┴─────────────────┴────────┴────────────────┴─────────────────┴─────────────┴──────────────┴──────────┴───────────┴────────────────────┘ 10 rows in set. Elapsed: 0.051 sec. Processed 4.25 thousand rows, 969.35 KB (83.09 thousand rows/s., 18.96 MB/s.) Peak memory usage: 6.13 MiB.

(Note at this point we have had to change to views instead of materialised views due to materialised views not supporting window functions.)

We can use this data to calculate the cumulative expected PnL and realized PnL of each bank, which we save in a ClickHouse view called fix.cumulative_profit_and_loss. This view works by calculating a time series for the realized PnL, and a time series for the PnL assuming all trades were succesfully executed.

CREATE OR REPLACE VIEW fix.cumulative_profit_and_loss AS SELECT Timestamp, Sender AS Bank, ExpectedPnL, IF(isNotNull(RealizedPnL), RealizedPnL, 0) AS RealizedPnL FROM ( SELECT Timestamp, Sender, ExpectedPnL, IF( isNotNull(RealizedPnL), RealizedPnL, LAST_VALUE(RealizedPnL) OVER ( PARTITION BY Sender ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) ) AS RealizedPnL FROM ( SELECT Timestamp, Sender, SUM(PnL) OVER ( PARTITION BY Sender ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS ExpectedPnL FROM fix.profit_and_loss ) AS a LEFT JOIN ( SELECT Timestamp, Sender, SUM(PnL) OVER ( PARTITION BY Sender ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS RealizedPnL FROM fix.profit_and_loss WHERE BuyTradeStatus = 'Confirmed' AND SellTradeStatus = 'Confirmed' ) AS b ON a.Timestamp = b.Timestamp AND a.Sender = b.Sender ) ORDER BY Sender, Timestamp;

This view has the following format:

SELECT * FROM fix.cumulative_profit_and_loss LIMIT 10 Query id: d57444ae-96ce-4675-92d8-cab6145566e5 ┌───────────Timestamp─┬─Bank────────┬────────ExpectedPnL─┬────────RealizedPnL─┐ 1. │ 2024-03-08 09:39:04 │ BNP Paribas │ 98.69999999999982 │ 0 │ 2. │ 2024-03-08 09:39:37 │ BNP Paribas │ 1059.4499999999998 │ 960.75 │ 3. │ 2024-03-08 09:40:03 │ BNP Paribas │ 2323.8499999999995 │ 2225.1499999999996 │ 4. │ 2024-03-08 09:40:29 │ BNP Paribas │ 2583.0099999999993 │ 2225.1499999999996 │ 5. │ 2024-03-08 09:40:38 │ BNP Paribas │ 3350.6099999999997 │ 2225.1499999999996 │ 6. │ 2024-03-08 09:40:41 │ BNP Paribas │ 3874.5599999999986 │ 2225.1499999999996 │ 7. │ 2024-03-08 09:41:10 │ BNP Paribas │ 5967.059999999999 │ 2225.1499999999996 │ 8. │ 2024-03-08 09:41:33 │ BNP Paribas │ 6267.059999999999 │ 2525.1499999999996 │ 9. │ 2024-03-08 09:41:38 │ BNP Paribas │ 7656.959999999999 │ 2525.1499999999996 │ 10. │ 2024-03-08 09:42:08 │ BNP Paribas │ 7862.49 │ 2525.1499999999996 │ └─────────────────────┴─────────────┴────────────────────┴────────────────────┘ 10 rows in set. Elapsed: 0.145 sec. Processed 8.50 thousand rows, 1.94 MB (58.58 thousand rows/s., 13.37 MB/s.) Peak memory usage: 14.85 MiB.

We can now visualize the expected PnL against the realized PnL over time to understand the impact of the rejections for each bank, in this case Goldman Sachs. We see that in just a few hours the relatively small deltas become significant.

Due to ClickHouse performance, a similar chart could be rendered within an application, giving traders and risk managers real time intraday visibility and alerts on the impact of their trade rejections. The same data can be exposed to data analysts and data scientists after the fact to dig into the impacts as we have done here.

To finish, we can also derive the difference between the expected PnL and the realised PnL for each bank at the end of period:

SELECT DISTINCT Bank, 'Realized' AS Type, LAST_VALUE(RealizedPnL) OVER ( PARTITION BY Bank ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS PnL FROM fix.cumulative_profit_and_loss order by 1; SELECT DISTINCT Bank, 'Expected' AS Type, LAST_VALUE(ExpectedPnL) OVER (PARTITION BY Bank ORDER BY Timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS PnL FROM fix.cumulative_profit_and_loss UNION ALL SELECT DISTINCT Bank, 'Realized' AS Type, LAST_VALUE(RealizedPnL) OVER (PARTITION BY Bank ORDER BY Timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS PnL FROM fix.cumulative_profit_and_loss ORDER BY 1 ASC Query id: 3e4b8add-22c9-4879-9bac-c5a3a41deb3f ┌─Bank────────────┬─Type─────┬────────────────PnL─┐ │ BNP Paribas │ Expected │ 39731.450000000004 │ │ Bank of America │ Expected │ 28225.830000000005 │ │ Barclays │ Expected │ 28722.180000000008 │ │ Citi │ Expected │ 35262.13000000001 │ │ Deutsche Bank │ Expected │ 35576.069999999985 │ │ Goldman Sachs │ Expected │ 29090.160000000007 │ │ J.P. Morgan │ Expected │ 39022.11000000001 │ │ Lloyds │ Expected │ 25885.840000000007 │ │ Morgan Stanley │ Expected │ 29346.030000000002 │ │ Nomura │ Expected │ 26647.309999999998 │ └─────────────────┴──────────┴────────────────────┘ ┌─Bank────────────┬─Type─────┬────────────────PnL─┐ │ BNP Paribas │ Realized │ 22916.699999999993 │ │ Bank of America │ Realized │ 15993.58 │ │ Barclays │ Realized │ 16991.869999999995 │ │ Citi │ Realized │ 14295.580000000002 │ │ Deutsche Bank │ Realized │ 20882.629999999983 │ │ Goldman Sachs │ Realized │ 11436.249999999998 │ │ J.P. Morgan │ Realized │ 24969.880000000005 │ │ Lloyds │ Realized │ 11774.170000000006 │ │ Morgan Stanley │ Realized │ 14847.54 │ │ Nomura │ Realized │ 16172.450000000006 │ └─────────────────┴──────────┴────────────────────┘ 20 rows in set. Elapsed: 0.028 sec. Processed 4.25 thousand rows, 296.35 KB (150.59 thousand rows/s., 10.51 MB/s.) Peak memory usage: 601.77 KiB.

Visualised like so:

As expected, the large fraction of rejected trades has a substantial negative impact on the performance of each bank’s trading strategy.

Conclusion

Many capital markets businesses are finding that ClickHouse is a great fit for their use cases including working with high volume FIX data.

In this article we have demonstrated how we can use ClickHouse String and Array processing features to break down FIX messages and store them in materialised views which are incrementally updated as new FIX messages are ingested.

We demonstrated joining the resulting messages with market data and making use of window functions to tie related FIX messages together.

We used these techniques to calculate the PnL effects of trade rejections, including cumulative impact through the day and total impact for each bank at the end of the period.

We explained how this could be calculated in real time and exposed to users such as traders and risk managers through real time applications and dashboards, giving them immediate visiblity of market activity which may previously have been delivered in batches or on end of day reports.

Read Entire Article