We are very excited to announce that HTTP/2 is available globally on the Edgecast CDN. Enabling HTTP/2 on a network as large as the Edgecast CDN was no small task. It required a tremendous amount of work across a number of teams. The endeavour of re-architecting our network to support HTTP/2 demanded such fundamental changes to our existing software stack that we want to share how and why we journeyed to where we are today.
There were multiple options available to us as to how to implement HTTP/2 that we seriously considered, until arriving at our final implementation. In this post, we will discuss the options considered and decisions that were made, why they were necessary, and what actually changed to implement HTTP/2. This is the first post in a series about how we implemented HTTP/2. Let’s start with a little background about the Edgecast CDN.
The Edgecast CDN runs on a web server we call Sailfish. Sailfish, originally a port from an open source web server, has gone through many iterations and changes over the 11 years we have been a CDN. The core of Sailfish was designed when HTTP/1.x was the industry standard for CDN’s to support. In fact, binary and multiplexed application layer protocols, such as HTTP/2 didn’t exist until a few years ago. Therefore, there was no way for us to have initially architected Sailfish to support a protocol like HTTP/2. So naturally, Sailfish was architected to maximize and utilize every bit of performance HTTP/1.x had to offer. This approach worked well for many years, allowing us to grow as a performance-leading CDN. When implementing HTTP/2 became a real necessity, and it being a fundamentally different type of protocol from HTTP/1.x, deciding the fate and direction of Sailfish’s future became an unavoidable subject around the office. We knew that some major decisions were going to need to be made.
When HTTP/2 was becoming a finalized specification, we watched how other CDNs scrambled to quickly find ways to support HTTP/2. However, in their rush to be first, many CDNs opted to use shortcuts that actually reduced or even reversed the intended performance gains of HTTP/2. Two common shortcuts were:
Using an HTTP/2 terminator to convert HTTP/2 traffic to HTTP/1.1 at the point where a connection reaches the CDN edge servers.
Building a smaller, separate network to handle HTTP/2 traffic.
Both these methods could “handle” HTTP/2 traffic, but would become problematic as HTTP/2 became more widely used. This type of implementation was meant to tick the box on Request For Proposals (RFPs) during the sales cycle rather than serve as a long-term solution. We needed to decide how we wanted to handle HTTP/2 both in the short term and long term. Did we want to follow other CDNs and create a short-term solution to get HTTP/2 out the door quickly or focus our resources on building long term, native solution?
Our engineering and product teams reviewed our strategy at length. From those conversations came these four seriously considered options:
Terminate HTTP/2 connections and proxy those connections via HTTP/1.1 to our existing web servers (like other CDNs)
Build a separate, smaller network dedicated to handle HTTP/2 traffic (like other CDNs)
Replace our L7 load balancer and TLS terminator on our web servers to handle HTTP/2 and HTTP/1.1 connections
Build HTTP/2 into Sailfish natively
If we went down the path of implementing a short-term solution it would have to involve terminating HTTP/2 connections and then feeding those connections to Sailfish over HTTP/1.1. With many of the major open source web servers supporting HTTP/2, leveraging an already existing technology would certainly save us development time and allow us to deploy quickly. But there were downsides to doing this:
An HTTP/2 terminator would require partitioning our existing servers to install an additional web server. Installing another web server on our existing servers cannibalizes our resources and capacity within a PoP.
Adding an additional HTTP/2 terminator infrastructure in front of our existing web server adds latency to all HTTP/2 requests.
Using an open source web server only give us HTTP/2 between the client and the edge server. It doesn’t allow us to use HTTP/2 between PoPs, or requests back to the customer’s origin server. Eventually, we will want to support HTTP/2 for the entire request from client to CDN to origin and back.
Our takeaway: The reduced development time was not worth the losses in performance.
Build a separate network for HTTP/2:
One idea involved building out an entirely separate network specifically for HTTP/2 traffic as other CDNs have done. We would have begun small, with a few quickly deployed nodes and then slowly increased the capacity of the network over time as customer adoption grew. While this method required development work, the scope was smaller, and time to complete would be much quicker since the traffic volumes would be smaller and we could control the types of customers that would be allowed to serve traffic over the network. The downside to doing this were the following:
We don’t have existing purpose-built networks. We build product into our existing platforms and networks. This would be a substantial deviation from our normal development and product release processes.
One of the main reasons companies choose to use CDNs is to manage large spikes of traffic. However, when CDNs build smaller, dedicated networks, their ability to manage these huge surges of traffic is diminished.
A separate network limited the number of customers we could support. We could slowly take on more customer as we added capacity, but adding capacity can take time which would greatly limit our ability to scale quickly.
Our takeaway: While increased control and a quicker time to market were appealing, building a separate network was too much of a departure from our normal development and product release processes.
Replace our front-ends:
This idea is similar to the HTTP/2 termination discussed above. Sailfish uses a front-end/back-end type architecture, which run independently of each other, so rather than adding an additional layer to terminate HTTP/2 connections, we could replace the Sailfish front-ends with an off-the-shelf web server, like NGINX that can terminate HTTP/2, HTTP/1.X and TLS connections, and proxy the requests via HTTP1.1 to our existing back-end processes.
But there were downsides to doing this:
We have significant operational investment in Sailfish allowing us excellent visibility into its operation in production. All this would need to be rebuilt for NGINX.
All the improvements and optimizations we perform to speed up our network, that pay dividends in both the front and backend processes, would need to be ported from Sailfish to NGINX.
We would need to port our proprietary L7 load balancing logic from Sailfish to NGINX.
Fear. A large-scale transition at the heart of the edge to a technology we lack operational depth is scary. This in itself is not a reason to disqualify an option, but sometimes it pays to keep in mind the risks involved when weighing different options.
Our takeaway: Once again, the short-term gains were not worth the loss of so much of what made our network what it is today.
We were really only left with one option…
Build HTTP/2 into Sailfish natively:
Refactoring Sailfish required us to completely gut most of the code to handle the features of HTTP/2. We knew that was huge engineering effort and it would take time to complete. We also knew that this type of effort was risky since Sailfish is the heart of the CDN. But we thought if we focused our development efforts on the most important features of HTTP/2 (i.e. everything other than server push) we could reduce some of the risks. By building HTTP/2 natively into Sailfish it allowed:
Easier implementation of end-to-end HTTP/2 support in the future. This means supporting HTTP/2 between front-end and back-end servers, between PoPs, or requests back to the customer’s origin server.
Additional multiplexed protocols (like QUIC) to be integrated into Sailfish much more easily in the future.
We knew, eventually, we would have to build HTTP/2 into our server stack to truly support HTTP/2. We were not interesting in checking the box on an RFP to say we supported HTTP/2, we want to improve the way the web works. If we wanted to get the best performance and support HTTP/2 for the entire request life cycle, the most feasible option was to build HTTP/2 into Sailfish natively. With limited engineering resources, the effort to build a temporary solution that was only going to be scrapped later was not a good use of our teams time.
Our takeaway: Developing HTTP/2 natively into Sailfish matched best with our long-term goals, and our historical methods of managing our network.
And that’s how we decided to move forward with a native integration of HTTP/2 and Sailfish. In our next post we will discuss how we went about refactoring the core of Sailfish to support HTTP/2. Stay tuned!