The Kingfisher: Market Quality
Previously featured on FTX November 2020 digest
The health and maturity of a market can’t be estimated only through price and volume. Its actors and their relationships have a significant impact on the direction an industry tends to take. We’ll consider the exchanges, major market makers, and common market participants and how their interactions are shaping the crypto-markets.
In this article, we’ll go through the physical limitation of exchanges matching engines, how this impacts the broader crypto market and a few propositions that could make cryptos a fairer and more efficient market.
The physical limits of trading
For thousands of years, humans have worked to make their markets more resilient, efficient and ever faster. From logistics networks to online payment processing systems, the hunt for latency reduction is as old as markets themselves. What happens when an industry faces a roadblock it can’t push through? The discovery of Silicon and its applications allowed us to push the speed of information propagation close to the speed of light. We can’t make it any faster.
Having worked on the Euronext matching engine, I gained an understanding of how to get the most of hardware’s physical limits and develop faster Matching Engines (ME).
The ME is the heart of any exchange. Highly skilled marathon men have a slow but strong heartbeat that can do a lot, every heartbeat is optimized to give the best efficiency and speed in bringing the nutrients to tissues and muscles. Likewise, the ME is responsible for efficiently and speedily resolving trading operations, while making sure the resulting information is relayed to the correct part of Exchanges’ infrastructures, in a reliable and resilient way. Latency of ME is going down years after years thanks to exchanges’ teams working hard, while some old leaders start getting based and sit on the trophy losing more and more customers everyday.
“If you don’t go up, you go down”
Let’s go down in order to go up: Demystifying the CPU 101:
The mid-2000s marked the end of the Moore’s law era, when upgrading to higher-frequency hardware brought automatic performance enhancements. Instead, the industry turned to a model where performance gains come from adding more execution units (cores). But leveraging multicore architectures
requires extra development and testing efforts, and a naive approach of adding more threads often falls short of scalability expectations. Why?
“One Central Processing Unit (CPU) is really good at one task, many CPUs aren’t good as a team by default”.
A CPU is composed of a processing unit and a memory unit. One CPU core is organized to execute instructions at its frequency speed. For example, a 3 GHz CPU processes each instruction at 0.333 ns (1 000 000 000 ns = 1 sec). The semiconductor technology (in nanometers) determines the physics boundaries.
The reality is much more complex. We are assuming that instructions and data are already stored in the CPU physical registers that run at the same speed as CPU frequency. In reality, Instructions and data are stored in a multilayer cache memory architecture (L1/L2/L3). The more it is close to the CPU processing unit PHYSICALLY ON THE CHIP, the fastest and smallest it is. So the L1 cache is the fastest and the smallest. Figure 1 shows the multicore CPU microarchitecture and the corresponding size/speed of each cache level.
To understand what makes latency long and cycles disappear, we have to understand the fundamentals of the Universal Scalability Law applied to the multicore CPU system. The main parameters that impact scalability are:
• Core resources contention: due to hyperthreading for example. Several software threads want to access simultaneously to the same execution units, and thus processing is serialized and prioritized.
• Cache memory contention: it occurs when there is contention from the same core on its private cache memory (L1/L2) in case of operating system context switch, or when the problem size is too big to be stored in the private cache memory. This phenomenon is called cache thrashing. In multicore, cache thrashing occurs on the shared L3 cache (or LLC last level cache) between the cores. Hence, the impact is several hundreds of nanoseconds lost in latency. The solution shall be an intelligent software that improves the data locality in-cache processing in order to also improve the memory bandwidth utilization while keeping the CPU core busy doing local computation; this is a very difficult problem, and there is no easy solution today.
• Cache coherency: it is a hardware mechanism that allows for core to core seamless communication whenever there is a software synchronization (mutex, barrier, etc…) or access to the same memory area by both cores. A simple cache coherency (one cache line) costs at least 600 cycles.
Load experienced by matching engines
Taking into account CPUs’ limitations, exchanges have to manage their operations. First, exchanges have to manage their traders’ positions. That exposes them to liquidity risks when mass liquidations occur during highly volatile market episodes. Exchanges tend to manage the liquidation of those positions in ways to avoid ripple effects on other market participants when mass liquidations happen. They also need to enforce market rules and backlog trades, to ensure the resiliency and consistency of their services.
The 99th percentile represents the top 1% highest latencies values for queries evaluated within a given time interval for a given task. A resilient and efficient piece of software will be considered by the 99th percentile that will become the advertised latency threshold. Exchanges’ Matching engines, regardless of how well coded they are, will always be limited by those latencies, limiting the total transaction throughput for their users. If an exchange has a capped amount of transactions per second.
This adds up to a couple thousand operations per matching instruction, for the ME to manage. They also have constraints for business enquiries related to serving many instruments and handling millions of messages per second.
How to achieve scalability and stability from an architecture point of view ?
The ME is a highly-interconnected workflow computational profile, combining data from multiple sources, where the frequency of communication between modules and their inter-dependencies are high and dynamic. Threads synchronizations and scheduling needs to be done dynamically and quickly.
Reactive software is a design philosophy and a paradigm shift that englobes building both large-scale reactive microservices and fine-grain reactive applications (one process). Based on asynchronous message-passing design, there exist a plethora of concurrent programming models that allow for building a reactive software from the ground up. The actor model is one such battle-proven programming mode.
Actors are a very efficient concept, supporting the whole development to production lifecycle. By being directly mapped to functional concepts, actors shorten the distance between business and functional architectures; they encapsulate the logic at a level granular enough for splitting work between developers; they are directly usable concepts for testing; and they allow administrators to decide the topology dynamically, based on available hardware and application load.
The ME receiving those operations will distribute the needed computations between the available CPUs’ cores. If a core is full, the data is sent either to an upper cache level, to motherboard sockets, or in the worst case, back to the RAM and sent to hardware resources (this has the highest cost because it needs to reach other components).
Every single time a core has to sync to balance load; they pay the “latency fee”. The more cores you add, the more the data can be spread out but the more latency fees you will pay. This fee increases exponentially. This is mostly why exchanges can’t really just scale by buying more servers, and also why they can be victims of their own success. The critical point of latency hits when entropy kicks in, spreading data to many cores and they just make promises to each other to wait for data to execute instructions.
The strategy used by engineers is to dedicate each core to one specific role and never do anything else, make sure they never get too much data or instructions, and avoid as much as possible any sync and randomness from the inputs. To achieve this, they will either have to spend money on research and/or invest into software to manage memory allocation and core control. And especially have rigorous engineers not wasting any bytes of CPU time.
The exchanges and the market as a whole, find themselves in a dilemma: increase the load at the cost of making their speed and latency limitations more apparent.
From the sell side (exchanges) point of view, you spend a lot of money developing solutions to better handle entropy. You need talented developers that understand the latency dilemma including the entropy generated by having a growing customer base (trading software, algo traders, market makers). At some point, the design limits are reached by physics rules. Considering that, any mistake stacks and can quickly become painful, this ends up limiting development capacity until innovation is killed due to the fear of creating new latency costs. Exchanges eventually have to reduce the risk of ME malfunctions due to what can be similar to a Distributed Denial of Service made by legitimate market participants.
To mitigate those risks, they apply rate limits and prioritize which instructions bring them the best business opportunities. CPU and latency is physics and math. Research, rigor and notions of “fairness” are up to humans discretion. If you slow down everyone, you make the market more inefficient for the majority of traders. It then gives an advantage to the ones with resources (low level coding engineer, capital to deploy, research team, volume). Another point worth mentioning, each operation, for legal reasons, has to be saved even if it’s to cancel the order a couple millisecond later. HFT (High frequency trading)
firms put a high load on exchanges with mostly wasted resources at the cost of the exchange.
On the buy side (customers’ point of view), an unlimited ME allows for better strategies, more competition, more efficient markets, better arbitrage opportunities between exchanges, and a fairer market for low volume traders that can access liquidity at the same pace as everyone else.
When the number of customers increases, the ME slows down, and excludes most traders. Buying servers in the geographic vicinity of the exchanges’ servers allow certain market participants to send specific price patterns, which is of particular interest from a regulatory and exchanges viewpoint. During high volatility periods, HFT will mostly consume competitors’ liquidity and absorb non-HFTs’ passive orders that make it through the orderflow. Takers tend to act as informed traders and their order flow can be considered toxic for others participants.
One example that most traders already experience, was a HFT strategy sending orders at a rate of 6,000 orders placed and cancelled per second. Meaning that each quote has a life shorter than 1.6 millionths of a second — the amount of time it takes for light to travel about 1,500 feet. Anyone further than half that distance has no chance of executing against these quotes. This is a commonly used strategy called quote stuffing, possible because a market participant is in the same datacenter as the exchange. This leads today to large hidden spreads, created when aggressive orders hit the exchange.
Questions arise about market rules only the sell side can apply with incentive from the regulators or self regulations to improve the quality of their services. Supporters of HFT will say they improve liquidity by reducing the spread and the possibility of other malicious strategies, like spoofing, to occur.
Others will say that such speed is unfair. Non-HFTs find themselves sandwiched between rate limits, statistical arbitrage strategies, and ghost quotes, which reduces the possibility for small participants to win. On top of highly skilled participants, new products like altcoins quantos have emerged in the last years and are the most concerning at the time of writing this article. The ICO craze and most writing and educations about the liquidity trap of the cryptocurrencies market have lead traders to prefer leveraged products with improved liquidity instead of low liquidity spot coins/tokens, as detailed in the previous FTX digest of June from Clarens Caraccio.
Being long “Inverse” is riskier than being “short”. As the price goes down, the position accumulates more and more delta and puts the trader at a greater risk of liquidation. A new wave of traders are turning to highly leveraged products with the same low underlying liquidity on spot, but this time exposed to negative gamma on “long” positions when using non-USD collateral.
Market makers seek neutral positions, their rates of taker orders are mostly dependent on their inventory. On higher volatility and without proper incentives they won’t bother executing an optimum liquidation of their inventory in weak limit order books with controlled intensity. They remove liquidity against informed traders to avoid potential losses themselves. Leveraged participants end up getting their positions taken out by the exchanges risk engine just because the underlying index’s order books have been withdrawn.
The Bitcoin and altcoin markets aren’t regulated by any generic “fair rules” regarding latency, spreads, collateral requirements or fee structure. But all exchanges apply rate-limits based on volume. These limits are designed mostly to protect matching engines against the high load of demand and provide an efficient cryptocurrencies market.
The rise of new tools to quickly trade into these limits and capture the available liquidity allow non systematic participants to also capture these inefficiencies with faster execution to trade against market makers but this enters directly in conflicts with them. The role of quoting low liquidity caps is tedious work that needs incentives, now they also get hunted, so they will complain about the 99th percentile degradation and use it as an argument to complain about the matching engine latency.
A few thoughts about how to solve these dilemmas
A point on how exchanges incentivize market makers. Legacy markets offering out-sized rebates was the industry’s first incentive program to implement a depth of book requirement. Executed volume in accordance with market quality rules will align the interests of market makers and exchanges, improve the overall trading environment for investors and all participants in marketplaces.
We suggest the following points as ways to further improve the market’s quality, exchanges execution and their MEs performances.
Market will always have pitfalls sown by delicacies, some have created the centralised world we know today that we are trying to disrupt, let’s not reproduce errors of the past for ideology and let’s try to build together the best of the world with best practices in mind. The higher the quality of a market the lower the explicit costs will be for the investors, therefore ensuring a better quality of execution.