Cloudflare DDoSed itself with React useEffect hook blunder

2 hours ago 1

Cloudflare has confessed to a coding error using a React useEffect hook, notorious for being problematic if not handled carefully, that caused an outage for the platform's dashboard and many of its APIs.

The outage was on September 12, lasted for over an hour, and was triggered by a bug in the dashboard, which caused "repeated, unnecessary calls to the Tenant Service API," according to VP of engineering Tom Lianza. This API is part of the API request authorization logic and therefore affected other APIs.

The cause was hard to troubleshoot since the apparent issue was with the API availability, disguising the fact that it was the dashboard that was overloading it.

Lianza said the core issue was a React useEffect hook with a "problematic object in its dependency array." The useEffect hook is a function with parameters including a setup function that returns a cleanup function, and an optional list of dependencies. The setup function runs every time a dependency changes.

In this Cloudflare case, the function made calls to the Tenant Service API, and one of the dependencies was an object that was "recreated on every state or prop change." The consequence was that the hook ran repeatedly during a single render of the dashboard, when it was only intended to run once. The function ran so often that the API was overloaded, causing the outage.

The useEffect hook is powerful but often overused. The documentation is full of warnings about misuse and common errors, and encouragement to use other approaches where possible. Performance pitfalls with useEffect are common.

The incident triggered a discussion in the community about the pros and cons of useEffect. One developer said on Reddit there were too many complaints about useEffect, that it is an essential part of React, and "the idea that it is a bad thing to use is just silly." Another reaction, though, was "the message has not yet been received. Nearly everyone I know continues to put tons of useEffects everywhere for no reason."

Another remarked: "the real problem is the API going down by excessive API calls... in a company that had dedicated services to prevent DDoS [Distributed Denial of Service]."

Lianza said the Tenant Service had not been allocated sufficient capacity to "handle spikes in load like this" and more resources have now been allocated to it, along with improved monitoring. In addition, new information has been added to API calls from the dashboard to distinguish retries from new requests, since if the team had known that it was seeing "a large volume of new requests, it would have made it easier to identify the issue as a loop in the dashboard." ®

Read Entire Article