The Thermodynamics of Trading

9 hours ago 1

with Daniel Pontecorvo

Season 3, Episode 9 | July 25th, 2025

BLURB

Daniel Pontecorvo runs the “physical engineering” team at Jane Street. This group blends architecture, mechanical engineering, electrical engineering, and construction management to build functional physical spaces. In this episode, Ron and Dan go deep on the challenge of heat exchange in a datacenter, especially in the face of increasingly dense power demands—and the analogous problem of keeping traders cool at their desks. Along the way they discuss the way ML is changing the physical constraints of computing; the benefits of having physical engineering expertise in-house; the importance of monitoring; and whether you really need Apollo-style CO2 scrubbers to ensure your office gets fresh air.

SUMMARY

Some links to topics that came up in the discussion:

ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers)
Some research on CO2’s effects on human performance, which motivated us to look into CO2 Scrubbers
The Open Compute Project
Rail-Optimized and Rail-only network topologies.
Immersion cooling, where you submerge a machine in a dielectric fluid!

TRANSCRIPT

00:03

Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I’m Ron Minsky. It is my pleasure to introduce Dan Pontecorvo. Dan has worked here at Jane Street for about 13 years on our physical engineering team, and I think this is the thing our audience is not particularly conversant with. So maybe just to start off with, what is physical engineering and why does Jane Street have a physical engineering team?

00:26

Thanks for having me, Ron. I appreciate it. Yeah, I think physical engineering is a term I think we came up with here to represent a couple of different things, but really the team thinks about all of our physical spaces, be it data centers, offices, and the team’s really responsible for thinking about leasing spaces, renting spaces, designing and building them and operating them in a way that allows us to run our business.

00:49

So let’s dive into the data center space for a bit because data centers are a place where trading firms are really quite different, and there’s a bunch of ways in which we’ve talked about in previous episodes of this podcast how the networking level of things is different, right? The vast majority of activity that most ordinary companies do that are highly technical happens over the open internet. We operate in a bunch of co-location sites near data centers and do cross connects in the back of the network rather than the trunk of the internet, or at least not for our core trading activity. But how does the classic trading network and trading data center differ at the physical level? What are the unique requirements that show up there?

01:25

I mean, I think proximity is one. That’s an important note. I think there’s trading venues that you need to be close to. Latency becomes a big concern all the way down to the length of the fiber when you’re talking about microseconds and lower. So proximity is key. I think performance is also very important. There’s different hardware that is used and from a power cooling standpoint that also poses some challenges. I think being able to scale over time and not being boxed in. So thinking about optionality and growth and what that growth means. You don’t want to build a data center that’s properly located and then run out of space or power there and then have to build another one, and then the distance between those two becomes an issue. So I think there’s a few different things we have to think about. A lot of it comes down to performance at the end.

02:05

And when you think about the physical space, a lot of those performance questions come down to cooling.

02:09

Yes, yes. Cooling is an interesting one because it’s a byproduct of consuming a lot of power and cooling has seen a few different evolutions over the last 25 years, if you will, and people are constantly balancing efficiency with performance. Cooling is the largest consumer besides the IT equipment is the largest consumer of power in a data center. So there’s lots of effort and there’s been efforts over the years to drive down PUEs to a place where the amount of power you’re spending cooling your space is manageable.

02:40

And what’s a PUE?

02:42

Power utilization efficiency, it’s a measure of how much total power you consume divided by the power that you’re using for your compute.

02:48

So what fraction of your power is actually powering the computer versus all the other stuff that you need to make the data center work?

02:54

That’s correct. And you’ll see ranges from low end 1.1 people might claim lower, but let’s say 1.1 up to the worst dataset is 1.8 or two.

03:02

So 1.1 means I’m wasting roughly 10% of the power?

03:04

Yep, that’s right. You do that by utilizing different things like cooler ambient temperatures to do economize cycles to use outside air, ways that you can use less mechanical cooling, which is running compressors and big fans that use a lot of energy.

03:18

So let’s do data center cooling 101 just to understand the basic thermodynamics of the situation. I want to throw a bunch of computers into a bunch of racks in a data center. What are the basic issues I need to think about? And also, other than the computers themselves, what are the physical components that go into this design?

03:33

Yeah, so you’ll ask yourself a few questions, but in the most basic data centers, you could use a medium which we call chilled water, which is water that is cooled down to say 50 degrees Fahrenheit through maybe 65 degrees Fahrenheit. And you use this by utilizing refrigerant cycles, maybe chillers on the roof, we call ‘em air cooled chillers, blow air over coil. You run a vapor compression cycle, you’d leave that chiller with some cool water that now can be converted back to cool air at these devices called CRAH units. So basically we’re taking that warm air that leaves the server and blowing it over a coil, and that heat’s being transferred to that chilled water medium and then blowing that air back into the data center. So that’s the most basic.

04:13

Just to zoom out, there’s things that are glorified air conditioners or something, except they’re not air conditioners, they’re water conditioners. You’re cooling the water and then the water is the medium of distribution for the cold. It holds the coldness and you can ship it in little pipes all over the building.

04:26

Yeah, it becomes very flexible.

04:28

Right. And then the CRAH unit is a thing that sits relatively close to the stuff you’re trying to cool where it’s got a big metal radiator in the middle of it, and some fans, you blow hot air over the radiator energy moves from the air into the radiator. That water then gets cycled back into the cooling system.

04:44

That’s correct. Yeah. Closed loop and continuously runs. The closer you could get those CRAH units or those coils to the load, the better you are, the better heat transfer, the less heat losses you have with air recycling and data center. So talking about that most basic design over the years, there’s been efforts on optimizing by moving it closer to the load by increasing the temperatures because the service could withstand higher temperatures and you could save energy there. So a lot of work on optimizing and saving energy over the years has been done.

05:12

Got it. And also, you don’t always have to use this closed loop design if you’re sitting close to the water, you can literally use the water from the Hudson.

05:19

Yeah, I mean there’s some salt in there. We’ll have to deal with that, but you could reject heat into that. There’s lots of big hyperscalers that you use, moderately tempered outside air. You evaporate some water there. You have the latent heat of vaporization, you’re able to bring that air temperature down and cycle it to the data center. So there’s many, many ways to cool these servers and air cooled servers. For many years, it was a function of what’s the warmest temperature that I could bring to the face of these servers and have them run well. And you try to ride that upper limit so you don’t use as much mechanical energy to get that air down nice and cold.

05:51

I’m hearing two issues here. One is we can make the overall thing more efficient by just tolerating higher temperatures for the equipment itself, and presumably that’s also tolerating higher failure rates.

06:01

Yeah and I think there’s a lot of work. ASHRAE is one body, the American Society of Heating and Refrigeration Engineers that’s done some work and written some white papers about allowable and recommended ranges for temperature and humidity and done enough testing there to get comfortable where OEM server manufacturers are using those as guidelines. So we run CFD studies to look at those air cooled scenarios and try to understand where we can design our systems to allow for both good normal operation, but also good operation during failure scenarios of failed mechanical equipment.

06:31

And I guess the failure scenarios come up because if you allow your equipment to run at a higher temperature, then when some bad thing happens and your AC isn’t working for a little while, you’re closer to disaster.

06:40

That’s right. And there’s a balance, right? You can add more CRAH units, you can add more chillers to a point at which it becomes too costly or too complex. So you want to look at some failure analysis and understand what are the more likely devices to fail. Those are the ones we want redundant. When there is a failure, how quickly do you respond? What are your ways to mitigate that? And then for us, how quickly do we communicate to our business that there’s a failure or a likely failure about to happen? What does that mean to our business and how do they respond to that?

07:06

Got it. So there’s a bunch of pieces here. There’s the air conditioners, the chillers that cool the water, or I guess not quite air conditioners. We’ve got the CRAH units that deliver the localized cooling. It sounds like there’s all sorts of monitoring that we’re going to need for understanding the system. And then there’s the design of the actual room with the racks and computers. What goes into that? How do you make a room full of machines more or less efficient by adjusting the design?

07:28

Yeah, I mentioned moving those cooling units closer to the load. There’s this concept of rear door heat exchanger that bolts a cooling coil right to the back of the cabinet, so it’s within inches to foot from the back of the server, allowing that heat transfer so you don’t have this potential recirculation of that hot air back into the inlet.

07:45

So at the thermodynamics level, why does this matter? You said that I want to bring it closer. Why do I care if it’s closer? What does it matter if the hot air has to travel a while before getting back to the CRAH unit to get cooled again?

07:55

There’s a couple of things. One is you run the risk of that air moving in a direction you don’t want it to go in and then coming back into the inlet of the server and now you have an even higher inlet temperature of the server. The other thing is having to move large volumes of air to get this parcel of hot air back to a cooling unit takes energy, lots of fan energy to move that around. And the energy consumed by fans goes with the cube of the velocity. You’ve got to move that air and the further you have to move it, the more power it’s consuming.

08:21

So why is this mixing so important? So here’s a degraded model of how cooling works, which is just not physically right at all. But it’s I think 20 years ago when I started thinking about this, how I thought about it, which is I have this CRAH unit whose job is to extract energy and it can extract some number of joules per second or whatever from the system. And then I don’t know what do I care about the air unit? As long as the air conditioner can pull out energy at a rate that matches the rate at which the machines are running and inserting energy into the system, why am I so worried about things like airflow?

08:49

In the data center, you have various different types of server network switches, various types of equipment. They’re not always built to work very nicely with each other. For years, we’ve had situations where you have these servers that move airflow a standard way and the network switches that might move it in opposite way. So now you have to move that air around differently. So really understanding where these devices are pulling the air from, making sure that that area of the data center or that part of the data center is getting the cool air that you want and that hot air is being contained in a way or the cold air is being contained in a way where you funnel it right to where you want to consume it and not allow it to have this short cycling mixing where you could imagine taking a home PC and putting it in an enclosed desk and running it and seeing what happens a time that heat would just build up in there and keep consuming more and more hot air.

09:39

So I think you can get hotspots and then some equipment can just get hotter than you want it to be, even if the average temperature is fine. But I think there’s another issue that you also won’t successfully lower the average temperature because that thing I said before about the air conditioning can just remove some amount of energy per second. It’s just not true, right? It’s conditional on maintaining this large temperature differential.

09:56

That’s right.

09:57

Can you talk a little bit more about why that temperature differential is important and how that guides the way you build the data center?

10:02

The temperature differential is directly proportional to the amount of heat you can reject, which is also proportional to the amount of airflow. So as you have larger delta Ts and change in temperature, you can reduce the amount of airflow you need. So there’s a balance between how much delta T or change in temperature and the amount of airflow to cool a specific amount of power or amount of heat rejected. So the industry does things like 20 to 30 degrees Fahrenheit on the delta T at servers. That’s a nice sweet spot where you get a flow rate that’s manageable and also a delta T that’s manageable. There’s ways where you can withstand higher delta Ts and get less airflow. Also, that’s more likely a play at reducing the amount of fan energy and energy consumption used by the mechanical systems.

10:46

And just to think about how this shows up in different parts, this Delta team matters in at least two places. One is you want this delta to be high when you are pumping air into the CRAH unit, right? Because you’re going to cool that air and then the higher the difference in temperature between the air and the water, the faster energy is going to move. You’re going to get better heat transfer. And then the exact same thing is true within one of your, you have my one U box that I stuff into a rack, and basically the difference in temperature between that hot CPU or other hot equipment within the machine and the air that’s blowing through.

11:16

And you have to be very careful inside that box, inside that server is that cold air parcel enters, right, it’s passing over different pieces of equipment and the last device that it passes over, be it a power supply or memory, you want to make sure that the temperature at that point is still cool enough that it could reject that last bit of heat. So if you have too little airflow and increases too rapidly in the beginning, you don’t have any cooling left towards the end of the box as it’s passing over component after component.

11:42

So it really matters the physical location of the things being cooled and what’s the direction of airflow. And you have to make sure that you’re cooling the whole thing by enough.

11:50

And when server manufacturer designing specifically placing memory and chips and power supplies in locations where they have an expected temperature at a different point in the box itself.

12:00

So there are clearly bad things that happen if the delta T is too small.

12:03

Yep.

12:04

Is there anything bad that happens if you make the delta T too large?

12:07

Yeah, I think there’s a point that air, that warm air that gets back eventually gets back to the chilled water becomes problems at the chillers where they lose their heat transfer abilities above a certain temperature. They’re designed at a certain capacity with a delta T tested above that you’re running into areas where you’re not able to reject heat efficiently back at those chillers. And you run into issues at the chillers too.

12:29

Why is that? If the air itself is too hot, it’s not going to be able to cool it.

12:32

Yeah. So the air comes back, goes through the CRAH, and now the water warms up and it goes back to the chiller, and the chiller has to be able to reject that amount of heat. It has a delta T that it’s expecting too. So if it’s coming up higher, it could only do a delta T. So water’s 10 degrees higher at the same delta T, it’s going to be leaving 10 degrees higher as well.

12:51

Maybe this is in some sense partially this is about the delta T, but partially it’s also just about the total amount of energy that you’re capable of cooling at the end. If you exceed the capacity of the system, you’re just in trouble.

13:00

You’re in trouble and you’re going to dip into redundancy and all sorts of things. Balancing flows will get mismatched. So you’ll have some issues there. I mean, it’s not a place you want to see, but unfortunately sometimes you run into performance issues or failures and you have to respond to failures and deal with these situations.

13:15

Got it. So we want to maintain this separation of hot and cold air in order that the air going into the chiller is as hot as possible and the air going into the machines as cold as possible. What do you end up doing in the physical design of the space in order to make that happen?

13:28

What you’re looking at is flow rates really, like I said, you have this fixed heat. You kind of understand what your heat rejection is going to need to be based on how much power you’re consuming. It’s directly proportional to the amount of heat or power you’re consuming. So the amount of heat you have in the space, you have two different ways to deal with it, whether it’s air or water, you have the ability to adjust flow or adjust the delta T. So we’re sizing pipes sizing, ductwork sizing fans for specific flow rates. The servers are also the size for a specific flow rate of air, let’s say in this case. And you’re trying to match those flow rates, moving that liquid or that air around such that you’re getting the expected delta T by providing the correct flow rate.

14:07

And then there’s also stuff you do in the physical arrangement of computers. You’re talking about the direction in which air flows. So there’s this very basic idea of cold row designs where you basically line up all the computers so they’re all pulling air in the same direction. So there’s one side where the cold air comes in and one side where the hot air comes out, and then you try and blow the air from the hot side

14:25

Back to the CRAH unit. Yeah, that’s exactly right. Cold aisle, hot aisle. It’s the concept that came around. It’s one of the early concepts of as things started getting slightly more dense, people are like, oh, we just have these machines that are room and we’re just putting cold air everywhere. At some point you start to deal with this air recirculation issue that I described earlier. So they said, okay, well let’s really contain it. So you could think of containment like a piece of duct work that’s just funneling the air either where you want to bring it, so IE cold air to the inlet of the server or hot air from the back of the server to the cooling unit to get that heat transfer back into the water.

14:59

Got it. And then one of the things that we’ve dealt with over the years is the way in which all of this moving of hot air around connects to fire suppression.

15:07

Yeah.

15:07

So can you talk a little bit about what the story is there?

15:10

Yeah. So obviously with this amount of power and just being in a building, you have to think about fire suppression. So most fire suppression around the world, there’s other ways you can do it with foam and gaseous substances, but water is still a big key component in fire suppression. So you kind of use these devices called pre-action systems. Ultimately what they are is a valve setup that allows you to delay these sprinkler pipes from holding water above your racks until you can prove that there’s heat or smoke or both. So we’ve had situations where maybe you have a cooling failure and the data center gets warmer than you expect, and the sprinkler heads melt at a certain temperature. They have a fluid inside there that is in a glass housing and melts and opens a valve. Now, thankfully when this happened, there was no water in the pipes. It was a lesson learned from us of, hey, maybe standard temperature sprinkler heads aren’t sufficient in a data center, especially when you have a failure. So something we looked at in detail and change our design to have more resilient, higher temperature rated sprinkler heads to prevent these failure modes.

16:08

I have to say I love the brute physicality of the mechanism here. It’s not like, oh, there’s a sensor and a microchip that detects. It’s like, no, no, no. It melts. And when it melts, it opens and then water comes out.

16:19

Yeah, fire suppression, you don’t want to mess. Keep it simple, get it done, get the water where it needs to be and not make it too complicated.

16:26

A critical part of this is this two phase design. One is you have the pre-action system where nothing bad can happen until the water flows in. And the other piece is these actual sprinklers where the top has to melt in order for water to actually come out and destroy all of your equipment. A key part of that, I imagine is monitoring. If you build a system where there are multiple things that have to be tripped before the bad thing happens, and here the bad thing is water comes out when there isn’t really a fire. If there’s a fire, you would then the,

16:51

I’m not sure which one’s worse, the water or the fire at this point. Yeah,

16:55

Right. I mean I think once there’s a fire, you probably want to put it out.

16:58

Yes.

16:58

That seems good. Having those two steps only really helps you if you have a chance to notice when one of them is breached. And so monitoring seems like a really critical part of all this.

17:08

Yeah, that’s right. And it doesn’t start or stop at the fire protection systems, right? Monitoring is key throughout the entire building. Cooling systems, power systems, various different things. Lighting, it could be anything, but traditionally there’s been different platforms for different things. Power systems and mechanical systems would have different either control over monitoring solutions, and over time they’ve gotten to a place where it’s unwieldy. If you’re trying to manage a data center, you’re looking at three or four pieces of software to get a picture of what’s going on inside the data center. So at Jane Street, we’ve worked over the years to develop our own software that pulls in data from all these places and puts it in a nice format for us to be able to monitor and look at it in a single pane of glass, if you will, understand exactly what these alerts mean, put our own thresholds on the alerts that is something that we care about.

Maybe not necessarily the manufacturer, maybe the manufacturer’s a little bit more or less conservative, and we want to be more conservative. We want to get an early alert. We’re able to change our own thresholds, and then we’re able to use our software to deal with projections as well on top of the real-time monitoring and help us understand where we’re power constrained, where we’re cooling constrained, if we were to build a new phase, how much capacity we actually have. Is there stranded capacity that we can use and give our data center admin folks a look as to, hey, we have some stranded capacity here. Why don’t we look at racking the next servers in this location?

18:27

And I think it’s actually really hard for people who are creating general purpose software to do a good job of alerting at the right moment because there’s a delicate balance. You tune it in one direction and you don’t see things until it’s too late and you tune it in the other direction, you see way too many alerts, and that’s also not seeing anything. You need to kind of just at the right level where it shows up just enough and at a high enough percentage of the times where it says something, it’s a real issue.

18:50

Yeah, it’s a Goldilocks problem. It’s one of those things that I don’t know that there’s any way to get really good at it without reps. And we’ve used both our building construction and testing and commissioning to help tune our alerting. We’ve had real-time incidents which help us understand if we’re getting the right level of alerting and reporting. And when we do postmortems, we’re looking back, Hey, when was the first early indication or early warning? Was there something we could have done then? Was there maybe a different alert that we could have set up that would’ve gave us even earlier notice? So yeah, I think it is a bit of the art of understanding when to alert, when not to alert, especially out of hours waking people up trying to respond to different things. You really want to make sure it’s an emergency or something that needs to be responded to.

19:35

Do you have any examples of real world stories of failures where you feel like the monitoring system that we have allowed us to respond in a much more useful way?

19:44

Yeah, I think there’s lots of examples. I can give one, we had a data center that was using chilled water as mentioned as the medium. It was late in the day and we’re noticing sooner than the provider in many cases, temperatures increasing. And we have temperature sensors, various points. You can have temperature sensors at the CRAH units, but you could also have temperature sensors at the servers at the racks. When you’re at the rack measuring temperature, you’re able to see smaller changes much quicker than at the large volume, either at the chillers or at the CRAH units. So in this one case, we saw some temperature changes and we’ve investigated, poked around, and we’re able to uncover a bigger problem, which was an unfortunate draining down of a chilled water system that caused major incident for us that we had to respond to. But had we not had the monitoring system, we probably wouldn’t have been able to communicate to the business what was happening, why it was happening, and how long it might take to recover.

20:30

So this was like a major incident. This was someone who was servicing the building, basically opened a valve somewhere and drained out all the chilled water.

20:37

Yeah, that’s right.

20:38

And during the trading day, notably, it was towards the end of the trading day.

20:41

3:58 I think, was when we first got our alerts. So it was a very scary time to see these alerts. And the first couple of moments in any incident is very much a scramble trying to understand what do these signals mean? What is happening, trying to gather as much information, but also not make any bold claims initially until you have a good clear picture of what’s going on. But what happened here was there was a maintenance event switching of chillers, normal operation except for one of the chillers was out of service. And the valving that was manipulated ended up diverting chilled water to an open port and drain down thousands of gallons of chilled water that we critically needed to cool our space. A couple things here that we learned, I mean, I think something called a method of procedure. MOP is something that is extremely important, which you don’t want in a data center or any critical facility, is for technicians to go around and do maintenance and do service on a whim or off the top of their head. You want a checklist. You want something that is vetted and in a room when there’s no stress, and you can kind of create a step-by-step process to avoid opening, closing, doing anything incorrectly. So really the time to plan and figure out the procedure is before the activity, not during the activity.

21:50

I think this may be not obvious to people who work in other industries is you might think it’s like, okay, you messed something up and now this data center’s in a bad state, but data centers are kind of units of failure. You just fail over to the other data center or something. And we do have a lot of redundancy and there are other data centers we can use, but they’re not the same because locality is so important. So if you have a data center that’s in the right place, you can fail over to other places, but it’s not like a transparent hot swap over. The physical properties are different. The latencies are very different. And so it really is important at the business level that you understand, okay, we’re now in a bad state, temperature is starting to climb and how long can we safely run under the present circumstances? And the difference between being able to finish the trading of the day and getting to the closing and being able to trade a few more minutes after versus not could be a very material difference to just the running of the business.

22:38

And that’s a great point. I think it’s a key distinction between what you would call hyperscalers and financial enterprises as far as locality of their data centers and why we tend to think a lot more about the resiliency of a site rather than, as you mentioned, being able to fail over site to site. So we do spend more time thinking about our design and the resiliency in our designs because of that fact. And there’s knock on effects. You have an issue like this, you have to refill all this water that takes a period of time. So to your point, being able to communicate how long this is going to take, we’re there doing back of the envelope calculations of, alright, we have this flow rate on this hose here. How long will it take to fill up 12 inch pipe that goes up however many feet? And being able to do that on the fly-in report back, also going there in person and being able to talk to the people involved. We have a team that responds in person. We don’t just rely on a third party. So we had individuals go to site, supervise, ask questions, be able to feed back to the rest of the team what the progress is, what the likely recovery time or recovery looks like, and how could we change our business or trading based on those inputs that we’re getting from the rest of the team.

23:46

I feel like a really important part of making that part go well is being able to have open and honest conversations with the people who are involved. And how do we try and maintain a good culture around this where people feel open to talking about mistakes that they’ve made in the challenging circumstance where some of the people are not people who work for Jane Street, but people who work for our service providers. How does that even work?

24:06

Yeah, I mean, it happens long before the incident. If you don’t have those relationships in place prior, you’re not going to be able to do it in real time when something’s going wrong. So the team that I sit on is responsible for everything about our physical spaces from negotiating leases, to designing the spaces to building them, to operating them. So we’re sitting with the key stakeholders of these third party operators many times from day one, and they see the same people in the same team appreciate the inputs that we’re giving along the way. But because we’ve developed that relationship for many months, at times for years, we’re able to have those real conversations where they know we’re going to ask questions. We want to understand if they have a problem or not. We’d rather hear the bad news than be surprised later. And the only way you get there is by putting that work in early and often.

24:52

And developing a lot of trust.

24:53

Developing a lot of trust both ways and showing them that mistakes will happen. We are building these sites knowing mistakes will happen. It’s how we respond to those as a team cross walls, if you will. We’re not the same firm. How we respond to those in a way that allows us to mitigate or lessen the blow is going to make or break it. The mistake already happened. How do we get to the next point?

25:13

So all of the discussions we’ve had here are in some sense around the traditional trading focused data center. And in the last few years we’ve pivoted pretty hard towards adding a lot of machine learning driven infrastructure to our world. And that has changed things in a bunch of ways, and I think obvious has changed lots of things at the software level and at the networking level. What kind of pressures has doing more machine learning work put on the physical level?

25:38

Yeah, that’s a great question. And I think this is kind of an industry-wide thing where the densities for some of this GPU compute have just increased a lot, the power densities and power consumed. I think that poses a couple of big questions. And if I focus on the cooling on the power side of that, it’s doing a lot of the same stuff that we’re doing, but differently, tighter, closer, bigger capacities, bigger pipes, bigger wires, things like that. Some of the numbers are getting so large that a suite and a data center or a couple of rows or racks that the amount of power there could be consumed in a single rack. And that’s something that is scaring people, but it’s also creating a lot of opportunity for interesting designs, different approaches. We can talk a little bit about liquid cooled computers and GPUs. I think that that’s something that has really pushed the industry to hurry up and come up with solutions. Something that maybe the high performance computing world was doing for a bit longer, but now anyone that’s looking to do any AI ML stuff will have to figure out pretty quickly.

26:33

I think the first part of this conversation in some sense can be summarized by water is terrifying.

26:38

That’s right.

26:39

And then now we’re talking about actually we want to put the water really, really close to the computers. So first of all, actually, why, again, from a physical perspective, why is using water for cooling more effective than using air?

26:50

Based on the specific heat and the density of water versus air, it’s three to 4,000 times more effective at capturing heat.

26:56

Is that a literal, three to 4,000 times?

26:59

Three to 4,000 times.

26:59

Is that measuring the rate at which I can transfer heat, how much heat I can pack in per unit? What is the thing that’s 4,000 times faster?

27:06

Yeah, the specific heat is four times more heat capacity at a unit per unit mass of water versus air. And then the density is multiples, obviously higher in water. So you combine those two and per unit mass, you’re able to hold more energy in to

27:21

We were able to move a ton more mass because water is so much denser than air.

27:25

That’s right. It’s in a smaller pipe rather than this larger duct.

27:28

Got it. So water is dramatically more efficient?

27:31

More efficient, and that’s why it was being used to chilled water from the chiller to the CRAH. You’re using these smaller pipes and then when you get to the air side, it gets very large in the duct size. So it’s being used in data centers for many years, but to your point, scary at the rack and something that we’ve tried for many years to keep outside of the data center or outside of the white space, if you will.

27:51

Got it. And so now, what are the options when you think, okay, how can I bring in the water closer to the machines to make it more efficient? What can I do?

27:59

Yeah, there’s a couple of things you can do. One, you could do something called immersion where you can dunk your entire server right into this dielectric fluid and be able to transfer that heat right to that liquid by touching the entire server because the fluid is non-conductive, it’s safe to do.

28:14

Actually, I want to interrupt. I want to go back and answer the question in the other direction. I feel like there’s levels of increasing terror and I want to start from the least terror to the most terror.

28:22

Sure.

28:22

So I feel like starting with exchange,

28:24

The dunking?

28:24

We’re starting with the exchange doors and then with the direct liquid cooling, and then we can not use water at all and do the direct kind of immersion thing.

28:31

Yeah. So with the rear door heat exchangers, it’s getting that liquid very close to the server but not actually touching it. So you’re inches away.

28:39

And this is the moral equivalent of stapling the CRAH unit to the back of the rack?

28:42

Yeah, just pushing it over, bolting it to the back. Other people have done other things like putting a roof in there with a coil, but yes, it is getting it as close to the rack as possible.

28:51

What’s the roof thing?

28:52

I think Google for years was doing something where in that hot aisle at the top of it, you were putting cooling units, custom CRAH units that sit at the top of this hot aisle containment, let that hot air pull in, use fans to pull that hot air in, cool it, and then send it back to the data center and create,

29:08

So you didn’t put an actual roof over the top?

29:10

Yep.

29:11

But then how does that interact with fire suppression?

29:13

So they have these panels that also melt.

29:16

Amazing.

29:18

Yeah. Roof panels that in a thermal event at a certain temperature will shrink and then fall out of their grid system and now allow sprinklers to be able to get to the fire below.

29:29

That’s amazing. So we could do really serious containment by physically building a containment thing around it. And then we don’t have to bring the water that close in. We could bring the water really close in by stapling the CRAH units to the back of the door and moving water around. What else can we do?

29:42

So the other one, which is most prevalent now with GPUs is something called DLC or direct liquid cooling. This is bringing water or liquid to the chip. And when I say to the chip, you can imagine an air cooled chip has this nice chunky heat sink on the back where you blow air over and you transfer that heat out, take that off for a second and bolt on a coil or heat exchanger if you will. So maybe it’s copper or similar material brass heat sink that sits on there and has very small channels for a liquid to pass through and absorb the heat. So now you have this heat sink on the GPU and you have to get some liquid to it. So the liquid is something that we have to be very careful about because of these small channels on these, what we’re calling cold plates on these GPUs.

30:26

And those are essentially just radiators?

30:27

That’s right.

30:28

Except they’re radiators that instead of blowing air through your pushing water through.

30:30

Instead of a big air cold heat sink, it’s a radiator or coil that’s sitting on a chip and some thermal paste to have some nice contact there and transfer as much heat as possible. You’ve used these micro channels to spread that water out to give you the greatest surface area to transfer heat over. And then the liquid that you’re passing over is something that you’re just very conscious about the quality of that liquid. You don’t want to plug these very tiny micron sized channels. You’re doing things like very, very fine filtration. You’re doing things like putting propylene glycol in there to prevent bacterial growth within the pipe. All these things can lead to worse performance, lower heat transfer, perhaps chips that overheat and degrade.

31:12

Part of running water through my data center is I have to be worried about algae or something.

31:16

Sure, yeah, absolutely. The types of materials you’re using, how do they react? How do two different materials react and how do they corrode over time? Dissimilar metals, things like that. So there’s this list of wedded materials, like once you’re touching this cold plate at the server, you have to be very careful about the types of materials. So we’re using types of plastic piping or stainless steel piping because we’re very concerned about just the particulates coming off of the piping and any small debris.

31:43

So that’s the whole problem that hadn’t occurred to me before. But another maybe more obvious problem is, I dunno, pipes have leaks sometimes. Now we’re piping stuff into the actual servers. I assume if there’s a leak in the server, that server is done?

31:55

Yep. And maybe the ones below it or adjacent to it. And in fact, there’s some concerns about if it’s leaking, what do you do? Do you go to this server?

32:03

Can you even touch it?

32:04

Yeah, human health and safety. There’s 400 volts potentially at this rack. So there’s a lot of procedures and standard operating procedures, emergency operating procedures on how do you interact with this fluid or a potential leak in a data center? What are the responsibilities, both of the provider and also the data center occupier.

32:21

So is there anything you can do at the level of the design of the physical pipes to drive the probability of leaks very low?

32:27

Yeah. I think one of the things that we do is really consider where the pipe connections are minimizing them offsite welding, so we have nice solid joints instead of a mechanical bolted joint or threaded joint. So about the types of connections, thinking about the locations of the connections, putting leak detection around those connection points.

32:45

So monitoring, again?

32:46

Monitoring of course, and with monitoring, well, what do you do? We just sense the leak. Are we going to turn things off? Are we going to wait and see? Are we going to respond in person to see how bad it is? Potentially you’re shutting down maybe a trading run that’s been going on for a month,

33:00

Although hopefully you have checkpoints more recently.

33:02

Sure, sure, sure. But it’s still impactful, even if it’s a couple of days or a day since your last checkpoint, whatever it is. We don’t want to be, as the physical engineering folks, we don’t want to be the reason why either a trading job has to stop or furthermore inference where it could be much more impactful to trading.

33:18

We have all of these concerns that are driven by power. Can you give me a sense of how big the differences in power are? What do the machines that we put out there 10 years ago look like and what do they look like now?

33:29

Yeah, 10 years ago, you’re talking about 10 to 15 KW per rack as being pretty high.

33:36

KW kilowatts. We’re talking about amount of power per second, essentially?

33:41

Yeah, power is energy per second being consumed at a voltage and a current. And we’ve done things over the years, like 415 volt distribution to the rack to get to the point where the higher voltage, you’re able to get more power per wire size. So being able to scale helped us early. Designing those power distribution systems, 10 to 15 KW was a high end. Now we have designs at 170 kw per rack, so more than 10 times. If you listen to Jensen, he’s talking about 600 KW at some point in the future, which is a mind blowing number, but a lot of the thermodynamics and it stays the same, but there’s many, many different challenges that you’ll have to face at those numbers.

34:19

One of the issues is you’re creating much more in the way of these power hotspots, right? You’re putting tons of power in the same place, and the data centers we used to build just could not tolerate the power at that density at all. If you go into some of our data centers now that have been retrofitted to have GPUs in them, you might have a whole big rack of which there is one computer in that row. That computer is on its own consuming the entire amount of power that we had planned for that rack.

34:43

Yeah, it looks pretty interesting. Yet, if you’re looking to deploy as quickly as possible and use your existing infrastructure, you’re having to play with those densities and say, all right, well this one device consumes as much as five or 10 of those other devices, so just rack one and let it go. But the more bespoke and custom data centers that we’re building, we want to be more efficient with the space and be able to pack them in more dense so you end up with less space for the computers and racks and more space for the infrastructure that supports it. So the space problem isn’t as much of a problem because things are getting so dense.

35:14

What’s the actual physical limiting factor that stops you from taking all of the GPU machines and putting them in the same rack? Is it that you can’t deliver enough power to that rack or you could, but if you did, you couldn’t cool it? Obviously the overall site has enough power, so what stops you from just taking all that power and running some extension cords and putting it all in the same rack?

35:32

I mean, the pipes and wires just get much bigger, and as these densities are increasing, you’re having to increase both. If you’re breaking liquid to the rack, your pipe size is proportional to the amount of heat you’re rejecting, so you’re able to increase that up until a point at which it just doesn’t fit. And then the same thing with power. And power is becoming interesting because not only do you have to have the total amount of capacity, you also have to break it down and build it in components that are manageable. So we have these UPS systems, uninterruptible power supplies, and they’re fixed capacity. So if I have a megawatt or say a megawatt UPS and I need to feed a two or three megawatt cluster, I have to bring multiple of these together and now distribute them in a way that is, if one of them fails, where does the load swing over? So you’re thinking about all these failure scenarios. So it’s not just bringing one large wire over and dropping it so it gets very cumbersome and messy. And there’s also different approaches by different OEMs and how their power, or is it DC power? Is it AC power? At what current? Where are you putting your power distribution units within the rack? Where do they fit? So there’s a lot of different constraints that we have to consider.

36:37

Yeah, it’s interesting the degree to which power has now become the limiting factor in how you design these spaces and how you think about how you distribute the hardware. And then you mentioned it’s not good to waste space, and that’s one reason to put things close to each other, but it’s also miserable from a networking perspective to have things like splayed across the data center. One thing that maybe most people don’t realize is just that the nature of networking for modern GPUs has completely changed. The amount of data that you need to exchange between GPUs is just dramatically higher, and there’s all new network designs. One thing which has really required a lot of thinking of just how do you physically achieve this is this thing called a rail optimized network, where the old classic design is like, I have a bunch of computers. I stick them in a rack, there’s a top of rack switch, and then I have uplinks from the top of rack switch it, some more central switches and I have this tree-like structure. But now you sort of think much more at the GPU level. You maybe have a NIC married to each individual GPU, and then you’re basically wiring the GPUs to each other directly in a fairly complicated pattern, and it just requires a lot of wiring and it’s very complicated and it’s a real pain if they’re going to be far from each other.

37:43

And being able to fit all that fiber or that InfiniBand or wiring, whatever it may be within the rack, also leaving room for airflow or leaving room for pipes. So you end up looking at some of these racks and not only do you have all these GPUs, but you have all these wires, all these network cables, all these pipes now, and you’re trying to fit everything together. So it really does become a physical challenge in the rack. And it’s one where maybe the racks get bigger over time just to give you more space since you’re not using as many as you used to. Maybe let them get bigger so you can fit all these components in more effectively.

38:16

And maybe just more kind of customization of the actual rack to match. Because your building in some sense, these fairly specialized supercomputers.

38:22

Yep.

38:23

At this point.

38:23

There’s some folks working on something called Open Compute Project that is they are thinking about what the next generation of rack looks like and DC power distribution, wider racks, taller racks, various different ways. And I think different folks have different ways of approaching the problem. What’s clear right now is standardization is not really set in stone, and it’s going to take a little while before folks start to agree on some standards.

38:45

And a lot of this is just driven by the vendors announcing. It’s like, we’re going to do this big thing in two years and yeah, good luck guys.

38:50

Yeah, let us know how you figure it out. Yeah.

38:53

The other thing that always strikes me about these setups is that they’re actually quite beautiful. A lot of work goes into placing the wires just so that it turns out the highly functional design is also quite pretty to look at.

39:04

Yeah, and I think it’s extremely important for troubleshooting. You imagine you can run a fiber and that fiber gets nicked or fails and you have this messy bundle. It’s like good luck finding that, and how long is it going to take to find it and replace it. We have a great team of data center admins that take a lot of care in placing things, designing things, thinking about not just how quickly could we build it, but also how functional and how maintainable it is over time.

39:27

So we’ve spent a lot of time talking about data centers, but a lot of what our physical engineering team thinks about is the physical spaces where we work. And I think one particularly important aspect of that, at least from my perspective, is just the desks. So can you talk a little bit about how desks work at Jane Street and why they’re important and what engineering challenges come from them?

39:44

Yeah, that’s a good one. I think back early in my career here at Jane Street, and it was my first time working at a trading firm or financial firm, and it was very interesting and everyone sitting at these similar desks, but at the time these desks were fixed. If we wanted to move someone around, it was breaking down their entire setup, their monitors, their keyboard, their PCs, and moving around. It’s just very time consuming and it caused desk moves to happen less frequently than we wanted to just, as teams grew and people wanted to sit closer to other people. So at the time before we moved into a current building, we said, Hey, there’s got to be a better way to do this. We hadn’t seen it at the time. So we said very simply, why don’t we just put our desks on wheels and move ‘em around? And from a desk surface level,

40:24

I want to stop for a second. We’re talking about how to solve this problem, but why do we have this problem? Maybe you can for a second paint a picture of what does the trading floor look like and why do people want to do desk moves and what’s going on anyway?

40:35

Yeah, I think that for our business, we really value collaboration and everyone sits next to each other. There’s no private offices, no, hey, this group sits in its own corner. We very much have these large open trading floors. People want to be able to look down an aisle and shout and talk about something that’s happening on the desk in real time. And so we have these long rows of desks, people sitting close together, they’re four feet wide. And really it’s about having close communication and collaboration.

41:01

And I will say there used to be more shouting than there is now, and the shouting is much more on the trading desks, especially when things are really busy and there are more different kinds of groups. You go to the average developer group and it’s a little bit more chill than that, but it is still the case that the density and proximity is highly valued, and the ability to stand up and walk over to someone and have a conversation about the work is incredibly important. And we also have side rooms where people go and can get some quiet space to work and all of that. It is still very different from, I dunno, certainly places where offices are the dominant mode or even the cubicle thing. It’s just way more open and connected than that.

41:34

Yeah, some of the best conversations we have in our group is just spinning around in our chair and talking to the person behind you or across. And we do enough moves, if you will, throughout the year that you get to sit next to different people and have different interactions. So I think from a culture standpoint, from the way we work at Jane Street, we really value their close proximity to each other.

41:53

And how often do we have these desk moves?

41:55

Once a week? Varied sizes. So there’s a dedicated MAC (moves, adds, and changes) team that executes the move. At times, it’s hundreds of people. It’s amazing. But it’s because the physical engineering team worked very closely with our IT teams to develop a system where you’re able to move these desks now, like I said, the surface and then the physical desk putting on wheels. That’s fine, you could do that, right? But now you’ve got to think about the power and the networking and the cooling. All things we talked about earlier, and those were the challenges on this project were how do we create a modular wiring system where it’s resilient, it works, it doesn’t get kicked and unplugged and stuff like that, doesn’t pose any harm, but also can be undone once a week and plugged in somewhere else. How do we think about cooling? And we use this underfloor cooling distribution system where you’re able to move the cooling to support a user or to cool their PC under the desk by moving these diffusers around the floor because of this raised floor system.

42:47

So yeah, let’s talk about how that works. What’s physically going on with the cooling there?

42:50

So what we do here, again, we use a chilled water medium in our offices, but we build these air handlers that discharge air below the floor. So in essence, you take that cold water, you blow that warm air over it and push it under the floor. We supply between 60 and 65 degrees Fahrenheit, maybe closer to 65, and you get this nice cooling effect where you’re sitting,

43:07

There’s a real floor, and then there’s space, a plenum, I guess?

43:10

Yeah, like 12 to 16 inches depending on our design.

43:13

And then a big grid of metal or something and tiles that we put on top of it?

43:17

Exactly, yep. Concrete tiles that sit there that have holes in them, various ones have holes for airflow and also cables pass through for our fiber to the end of the row.

43:25

And the air underneath is pressurized?

43:26

It’s pressurized, very low pressure, but it’s pressurized and it gets to the extent of our floor. And as an individual, you’re able to lean over and adjust the amount of flow by rotating this diffuser. So you’re able to provide your own comfort level where you sit, but also pretty importantly, be able to cool the desks. And the traders have pretty high energy, high power PCs under the desk, and they’re enclosed, and we’re able to get some cold air to them. It was a design that was much better than a previous thing we did in London, which was CO2 to these coils in the desk, which was kind of scary.

43:58

That’s a little bit more like that was a case where we’d done piping of,

44:01

Yeah, it was one of those knee jerk where the desks are getting hot, so let’s make sure we squash this problem. And that was prior to my time, but it was something where I think a few other firms were doing liquid cooling or CO2 cooling to the desk. It’s an approach that’s died down at this point.

44:17

In some sense, the approach we have now is one where we want the desks to be modular. So you can literally physically come and pick it up and move it somewhere else, and someone set up just remains as it was. You don’t have to reconfigure their computer every time you do a move.

44:28

Yeah. That’s the key.

44:29

And that’s kind of incompatible if we’re going to do the cooling by having copper pipes carrying CO2 everywhere.

44:33

Yeah. It just couldn’t move it.

44:34

It’s just not going to work.

44:35

Yeah. And if you have overhead cooling, it’s also not great because it’s not landing exactly where the desk is landing. So we have a lot of flexibility here. But to your point, one of the main reasons of doing it is people set up their desk exactly how they like them, their keyboard, their mouse, their monitors set up. You come to Jane Street, you get a desk, and that’s a desk that stays with you and it moves around with you. So when you come in the next day after a move, besides being in a different spot on the floor, you feel exactly the same as you did the day before.

44:59

I wonder if this sounds perverse to people of, ah, there’s a move every week. It’s worth saying. It’s not like any individual moves every week, but somebody is moving every week and there are major moves that significantly reorganize the floors and which teams work where at least once, probably twice a year.

45:15

That’s right. Making room for interns.

45:18

Right. Some of it’s ordinary growth, some of it’s interns. And I guess another thing that I think is important about, we in part do it because we value the proximity. And so as we grow, we kind of at every stage want to find what is the optimal set of adjacencies that we can build so teams can more easily collaborate. And there’s also just some value in mixing things up periodically. I think that’s true on a personal level. If you change who you sit next to, even just by a few feet, it can change the rate of collaboration by a lot. And it’s also really true between teams. At some point, the tools and compilers team used to not work very much with the research and trading tools team, and then research and trading tools grew a Python infrastructure team and suddenly there was a need for them to collaborate a lot. And we ended up putting the teams next to each other for a few months, and then six or 12 months later when we had to do the next move, we decided, oh, that adjacency was now less critical and other things were more important and we did it in other ways.

46:09

It lowers the bar for asking for these moves if we know we can kind of revert it. It allows us to take more chances and put teams closer together, see how the collaboration works. I think it’s done wonders for our culture, being able to have maybe tenured folks next to new joiners to allow them to learn a little bit faster. I think it’s been great for our team as well.

46:27

And even though a lot of engineering has gone to make it easy, one shouldn’t understate the fact that it’s actually a lot of work and the team that does these moves works incredibly hard to make them happen, and they happen really reliably and in a timely way. It’s very impressive. Did you have to do anything special with the actual physical desks to make this work?

46:44

Yeah, we worked closely with some of the manufacturers to come up with a Jane Street standard desk figuring out exactly where our cable tray would land for the power and the networking using end of row switches that we have. Being able to open perforations for airflow to flow nicely through the desk, putting wheels on the desk to allow wheels that could lock and move position to allow us to wheel them around pretty carefully. And we did this globally too, so that we’ve created a desk. We had to pick a standard to use, so we built them to a metric standard and we’ve shipped them all over the world. So we have this one desk that we used globally at Jane Street or one style of desk that we use globally at Jane Street, and we’re able to move it in different locations. So we had to find a manufacturer that would meet all those needs, the shape, the size, fitting our PCs, having our monitor arms that we like, having the raise-lower feature, having a pathway for our power and data to flow. So there’s a few different things that we had to factor in there, but once we got a design that we’re happy with, we’re able to deploy it pretty rapidly.

47:41

Actually, how does the power and data connections work? I imagine you have wires internally in the desk, but how do they connect from desk to desk? What’s the story there?

47:49

Yeah, so under this floor, under this 12 to 16 inch raised floor, we have these power module boxes where you gang together a lot of circuits, and then you have these module plugs that plug in. So we’ll use an electrician to come in and plug them in underneath the floor. We’ll lift a floor tile, which is very easy to do. And then we have these predetermined whips depending on what position the desk is, they’re fixed lengths or we could adjust them if we need to, we can shorten them. And you run these whips out to the end of the row where we have something called a hub and basically a pass through for these wires to come from the floor above and run along the back of the desk in a nice cable tray for the networking side. We ran into design constraint where it was like at some point you’re just running copper from your IDF rooms out.

Your network switches out to the desk, but you end up with these giant bundles of copper. Obviously they have a distance limitation, but also they’ve gotten so large over time that they would block the airflow under the floor. So now we’re like, okay, well here’s a new constraint. So then we started designing, bringing fiber, and this was a while ago that we decided this, bringing fiber to the end of the row and housing our switches, our network switches in these custom enclosures at the end of the row that bring power to bring cooling. We cool our switches out there with the same under floor cooling that we used to cool people. So now we have these very small fibers that don’t block the airflow, land at a switch and the copper stays above the floor behind the desk.

49:06

Got it. So instead of a top of rack switch, you have an end of row switch.

49:09

That’s right. We like to joke that our offices feel a lot like data centers just stretched out a little bit with people in them.

49:15

So, other than this physical arrangement of the desks, what are other things that we do in these spaces to make them better places for people to work and talk and communicate and collaborate in?

49:25

Yeah, that’s a great question. I mean, I think one of the things that we try to do as a group is really talk to our coworkers and understand what they need and what they want. Some things that we’ve done, our lighting system, we spend a lot of time thinking about the quality of lighting. We have circadian rhythm lighting, which changes color throughout the day to match your circadian rhythm where you come in the morning, it’s nice and warm, allows you to grab a cup of coffee, warm up, get ready for the day, peaks at a cooler temperature, midday after lunch, and then fades back at the end of the day. So that’s something that we think is pretty cool. Something we’ve been doing globally for a while now.

49:55

How do we know if that actually works? How can you tell, obviously you can tell if the light temperature is changing in the way that’s expected, but how do you know if it has the effect on people that you think it does?

50:03

Yeah, that’s a good question. I mean, I think the only way it’s talk to them and the folks that we’ve asked about it, feel pretty good about the effect it has. I mean, I think speaking for myself, I know coming in the morning to something like 4,000 k lighting color temperature, it’s just harsh. And coming in at 2,700, 3000 feels a little bit more easier to adapt to.

50:22

Is there also outside world research that validates this stuff?

50:24

Yeah. I don’t know that any of them tie to any performance, but there is logic as to why the color temperatures throughout periods of days has an energizing effect to you or relaxing effect. But once you design the system and build it, we have complete control over it. We can do things like have it follow Circadian rhythm or we can pick one color that we think everyone likes and say, all right, that’s going to be the color from now on. So by designing it and building it with this functionality, we’re able to, on the software side, make changes as we need to.

50:51

Okay. So color is one thing. What else do we do?

50:53

Yeah, I think we touched on the cooling, and I think the underfloor cooling is another example of where we think about thermal comfort and giving people the ability to adjust temperature at their desk, but also the fact that we’re cooling under floor keeps that air very close to the breathing zone. So that air comes out of the floor, comes up five or six feet, and it’s as fresh as it could be right at the desk. So we’re mixing outside air, we’re mixing that air and sending it out and allowing you to consume it when it comes out of the floor. The other thing that it allows us to do is by keeping a smaller delta T, we move a lot more volume, and by moving a lot more volume, we have more air changes. You’re getting more fresh air. We use something called Merv 16 filters, like hospital surgical grade filtration to clean our air at twice the rate normally because we’re moving twice the volume that you normally do gives us the ability to keep our air very fresh at the breathing zone where people are working.

51:44

Actually, this reminds me, there’s one topic that I know we’ve talked a bunch about over the years is CO2.

51:49

Yeah.

51:49

What’s the story with thinking about the amount of CO2 in the air and what have we done about that?

51:54

Yeah, there’s been some reports of varying degrees talking about performance versus the CO2 concentration.

51:57

Human performance?

51:58

Human performance, yes, yes. And it’s hard to tell exactly the impact, but it does seem that there’s enough evidence that it does impact folks.

52:08

And roughly at high levels of CO2, you get kind of dumb, roughly?

52:12

Yeah, that’s right. That’s roughly correct. Yeah.

52:14

What are those levels like at parts per million? What’s totally good? Where do you start getting nervous?

52:17

I think you start getting nervous above 1500, 2000 parts per million outside is probably around 400 parts of a million. Depending where you measure interior, you’ll see anywhere between 750 to 1200. It just really depends. And for our trading floors, people are close together. There’s lots of people. CO2 is driven by people.

52:36

People are exhaling next to each other.

52:37

Yeah, people are breathing. So we’ve done a couple things. First here. You kind of start with the monitoring. You got to see what the problem is. So we’ve done a lot of air quality monitoring throughout our space to measure various things. We published them internally for folks, and you’re able to see what the data is. But then we’ve done other things like we’ve brought in more outside air. We’ve mixed in that outside air to try to dilute the CO2 with fresh air and exhausting some of the stale air. But also we’ve tested and been testing CO2 scrubbers. Things that were used on spacecraft. Those are challenging at the volumes that we’re talking about. We have large double height trading floors, hundreds of thousands of square feet. It’s very hard to extract all of that. But these are things that the team is looking at and testing and planning.

53:18

But wait, we’ve gotten the whole space age CO2 scrubbers. Why isn’t mixing in outside air just the end of the story and that makes you happy and you can just stop there.

53:26

Yeah. If you want to get down to 5, 6, 700 parts of a million that starting at 400 parts of a million outside, the amount of volume that you need to bring in is a challenge. Moving that much outside air into the space becomes very difficult. One from just a space standpoint, duct work, getting it all that air into a building, into an existing building, but also the energy it takes, whether on the coldest day to heat that air on the warmest day to cool all that air. Typical air conditioning systems recycle inside air to allow more efficient cooling. So you’re not bringing in the warmest air on the design day and cooling it down. It just takes a tremendous amount of energy. So it’s a mix of bringing in more outside air, thinking about things like scrubbers and trying to find the balance there and moving the air where you need it when you need it, if you have a class moving the air to the classroom, if you’re the trading desk, moving the air to the trading desk. So moving the air where you need it is also an approach that we look at.

54:18

That sounds super hard though. Jane Street is not a place where people pre-announce all the things they’re going to do, right? There’s a lot of random, oh, let’s go to that classroom and do a thing.

54:26

Well, looking at the sensors and seeing the CO2 climb and being able to move dampers around and redirect air based on sensor input.

54:33

Is that a thing that we have that’s fully automated, or do we have people who are paying attention, notice things are happening?

54:37

I think it’s a little bit of both. I mean, we can make it fully automated, but I think that it’s important to have a human looking at it to make sure we’re, if you have large congregations in different areas, you can get fooled as to where you should send the air and think about that. So it’s not something we’re doing as a fully automated thing. It’s something we’re aware of and we’re able to make tweaks and adjustments.

54:55

Back to the space-age thing. Let’s say we wanted to try and run these scrubbers. What are the practical impediments there?

55:00

So I think the chemical process of pulling the CO2 from the air, the material that’s used in these scrubbers, it gets saturated with CO2 over time. It’s proportional again to the amount of CO2 in the air. And the way you release that CO2 from that material is by burning it off with heat. So now we have the situation where you consume a bunch of CO2 you store, it gets saturated, it stops being effective, and now you have to discharge it out. So not only do you need the amount of power to burn that off, but you also have to be able to duck that CO2-laden air out of the space. So it’s a physical challenge, physical space challenge. These things get large, they’re power hungry, and you have to have a path to get the air outside.

55:42

Is it clear that the CO2 scrubbers would be net more efficient than just pulling in the outside air at the level that you’d need?

55:48

It’s not clear. I think we’re still analyzing it. Looking at it, if you think about the power consumption and space required, you can make arguments both ways. So I think the outside air is a more tried and true situation, but we’ve increased it pretty significantly over time. We’re going to keep doing that and looking at that. But there’s many people in the industry looking at increasing CO2 as a function of indoor air quality, but for many years it’s been frowned upon because of the energy that it consumes. So you have to balance that.

56:15

So one thing that’s striking to me about this whole conversation is just the diversity of different kinds of issues that you guys have to think about. How do you think about hiring people for this space who can deal with all of these different kinds of complicated issues and also integrate well into the kind of work that we do here?

56:31

Yeah, it’s interesting. I think first of all, many people don’t think of Jane Street right away when talking about these physical engineering, mechanical, electrical, architecture, construction, project management. So part of it is explaining to them the level of detail we think about these things in.

56:45

Right, that there’s an interesting version of the problem here.

56:47

Absolutely. And why it matters for our business is very important. And for the right person, they want to be impactful to a business. For many people who work in the physical engineering world, you’re there to support a business, but you don’t always see the direct impact of your work. And here I feel like we get to see the direct impact. I get to talk to you and hear about how desk moves help your team or how our data center design being flexible allows us to put machines where we need them, when we need them, how the feedback we get from our interns, our new joiners about the food and the lighting and the space and all the things that we build. Those things go a long way in helping people here on the team understand the impact that they’re having. And for people who get to work with us, it only takes a few meetings to see how much we care about these details and how deep we’re willing and able to go on these topics.

57:36

And to what degree, when we’re looking to grow the team, are we looking for people who are industry veterans who know a lot already about the physical building industry and to what degree are we getting people out of school?

57:46

We just started an internship, so that’s really exciting for us. And I think that it’s a blend of the two. I think we really value people with experience, but we also feel very confident in our ability to teach. And if we bring someone in with the right mindset and willingness to learn and cast a wide net of knowledge, I think they’re very successful here at Jane Street because you come in without these preconceived notions of how things are done and you’re able to challenge the status quo, you’re able to say, Hey, these desks don’t work the right way. We want to move them around. Or, Hey, we need to bring liquid cooling to a data center is something that is very much on the cutting edge now. Those are the types of problems we want people who are excited by those problems, excited by looking at it through a different lens.

58:29

Awesome. Alright, well maybe let’s end it there. Thanks so much for joining me. You’ll find a complete transcript of the episode along with show notes and [email protected]. Thanks for joining us. See you next time.

Read Entire Article