News
BackCumulus Linux: A Switch That Paid Off
Based on a common definition, IaaS offerings consist of compute, storage, and network. While the first two areas often receive more attention, we will now devote two articles to our new switching infrastructure based on Cumulus Linux. In the first part, we will look at why the network is so important at cloudscale.ch, what the main advantages were that led us to the solution we use today, and how the new switching fabric affects our infrastructure as a whole.
The switching infrastructure as a key element
In everyday IT life, the network often receives little attention: once the systems are wired, the focus tends to shift to computing power and storage space for years to come. At cloudscale.ch, on the other hand, the topic is constantly present. Not only does the connection of our cloud servers to the Internet and thus the external availability of your services depend on the network, but with the trend towards microservices and cluster setups, internal networking between cloud servers is also vital for the performance and reliability of the overall system. And finally, our Ceph-based storage cluster can only fully leverage its advantages with a top network infrastructure.
In addition to our general growth and the increasing demand for switch ports, the opening of our second cloud site in Lupfig also had an impact on our choice. While the two locations should be able to operate independently of each other and thus enable geo-redundant setups, connections between services at both locations should, at the same time, be as direct as possible. One of the things we liked about Cumulus Linux was that most of our requirements could be implemented using open standards and without the need for proprietary protocols from any particular vendor.
How the solution with Cumulus Linux stands out
The "Cumulus Linux" distribution maintained by Cumulus Networks is a novelty among network operating systems. Unlike the systems of traditional network vendors, it is based on Debian GNU/Linux and is for the most part open source. In order to ensure the stability and security required in an enterprise setting, certain versions are maintained as "ESR" (Extended Support Release) versions and provided with security updates over a long period of time – a strategy known from Ubuntu's LTS versions that is also being adopted in an increasing number of other software projects. One of the components of Cumulus Linux is FRRouting, which is maintained under the umbrella of the Linux Foundation and which we are already using successfully on our border routers.
We spent considerably more time implementing our new switching infrastructure than we had originally planned. Over the course of several releases, we gained experience with Cumulus Linux in our lab and fed our insights back into the network design in many iterations. We also benefited from the community that has formed around Cumulus Linux. There is, for example, a dedicated Slack channel where you can pick up tips and tricks from other Cumulus users; if an issue cannot be resolved this way in a timely manner, Cumulus's own engineers often join the discussion and actively offer their help. Working directly with Cumulus Networks has also proved to be open and productive. Where we actually found bugs, they were carefully analyzed and fixed – including patches that often found their way back "upstream" into the individual open source projects.
Key technical specs and tangible advantages
Cumulus Linux is a network operating system that can be used on devices from a wide variety of manufacturers. The "Cumulus Express" combination that we chose includes hardware from Edgecore. Our switches feature a Broadcom Trident 3 ASIC that supports line-rate switching on all 32 100 Gbps ports, which provides a total of 3.2 Tbps. In addition, the "breakout" option means that any of the 100 Gbps ports can be split into 4 logical ports with 10 or 25 Gbps each, which provides even more flexibility with regard to the systems that can be connected.
We have built our network following the leaf-spine concept, whereby each switch is configured in a redundant manner. All connections are redundant as well: a leaf pair (two "top-of-rack" switches) is connected to the two spines at a total of 800 Gbps. In addition to the multi-100-Gbps networking of our backbone, the hardware used also allows a gradual transition to a multi-25-Gbps connection for individual physical servers. This not only benefits the private networks between our customers' virtual servers, but also connectivity to the storage cluster. Finally, the dedicated connection between our two cloud locations is also well dimensioned at multi-10-Gbps via CWDM on route-redundant dark fibers.
At cloudscale.ch we are aware that the network is the basis for all other features of the cloud. Accordingly, we place a strong emphasis on the performance, reliability, and support of all components used. However, the atypical approach of Cumulus Linux offers even more advantages, which we will describe in a separate article in order to give you some insights into e.g. how we at cloudscale.ch can benefit from synergies between network engineering and system engineering.
Lightning fast,
Your cloudscale.ch team