InfiniCortex project – breaking the walls of supercomputing centres
InfiniBand (IB) is a computer networking communications standard featuring very high throughput and low latency. It is often used within high performance computing (HPC) data centres and supercomputer facilities, and until recently its use was restricted to the boundaries of those facilities. However at the recent SC15 event, a demonstration was successfully given of this technology being used across vast distances and in so doing showed an unprecedented way of connecting these facilities together.
Compared to the nature and limitations of the transmission control protocol (TCP), IB can reach higher throughput for long-distance flows and distribute heavy-parallel computational jobs over more facilities, thanks to the message passing interface (MPI) and remote direct memory access (RDMA) capabilities inherent in the IB protocol. Furthermore, it can be encapsulated in other protocols, as was the case for the InfiniCortex project which uses Ethernet as the carrier protocol to transport IB. However, despite being the world's most popular supercomputer interconnect, and largely known as a data centre fabric, to date IB it is virtually unknown in the internet community.
The InfiniCortex project
To harness the power of this protocol widespread industry collaboration is essential, leading to the creation of the InfiniCortex project. Over the past two years, InfiniCortex has clearly demonstrated that IB can perform over trans-continental distances, exploiting this technology to create a “Galaxy of Supercomputers” (a term coined by Marek Michalewicz and Yuefan Deng whose research focus is on mathematically optimal network topologies for supercomputers), a worldwide IB network spanning sites across Asia, Europe and North America.
Initiated and led by A*STAR CRC (Agency for Science, Technology and Research - Computational Resource Centre in Singapore), the project hit its first major breakthrough at SuperComputing14 (SC14), showcasing a first-time-ever 100G IB transcontinental connection from Singapore to the SC14 venue in New Orleans (USA). This was made possible primarily thanks to the support of TATA Telecommunications, which provided the 100G trans-pacific link from Singapore to the US, and Obisidian Strategics, the Canadian manufacturer of the IB long-range equipment, that made available a number of units to be deployed in the participating sites.
Following SC14 the project expanded to include several National and Regional Research and Education Networks (SingAREN, TEIN, GÉANT, PIONIER, RENATER, Internet2, ESnet) and end-sites (e.g. France’s University of Reims Champagne-Ardenne and Poland’s Poznan Supercomputing and Networking Centre [PSNC]). During the course of 2015 these project partners joined forces to demonstrate IB at various exhibitions, culminating in the complete topology being showcased at SC15 in Austin, involving around 10 different sites across four continents, connected with high-speed IB links (including 100G Singapore-Austin and 30G Poznan-Austin).
This successful demo relied on:
- The newly-deployed 10G direct link between Singapore and London to connect the European sites to the A*STAR Computational Resource Centre and other sites in Singapore. This circuit, jointly funded by NSCC (National SuperComputing Centre - Singapore) and the Asia-Pacific TEIN project, has increased the total capacity between the TEIN network and GÉANT, with the latter providing hosting for A*STAR network and IB equipment in the GÉANT PoP in London Slough.
- The GÉANT 100G circuit between Paris and New York to connect PSNC to the SC15 venue in Austin, with a 30G connection.
- ‘Sub-netting’ the different world areas to build the testbed, effectively enabling IB routing.
For the first time ever, an IB fabric that circumnavigated the world was deployed, with the final important segment being the direct GÉANT-SingaGAREN link.
The InfiniCortex project has now achieved its final milestone, put in place by Vincenzo Capone, Business Development Officer for GÉANT, who coordinated the cooperation of more than a dozen partners, culminating in the IB around-the-world ring, demonstrated at SC15, traditionally the world's fastest-network showcase event.
Successful large scale collaboration
This huge effort was made possible thanks to the collaboration and the support of several institutions and actors. Obsidian has made available a number of units to be deployed in several sites. Internet2 has enabled the in-land capacity from New York to Austin for the PSNC-SC15 connection, not to mention the co-funding (together with NSCC) of ACA100, the 100G transpacific link from Singapore to the USA. ESnet has enabled the European partners to connect to A*STAR in Singapore with a temporary solution prior to the deployment of the new direct link from London, consisting of a connection reaching from the US East coast to their PoP in Amsterdam. RENATER, PIONIER and SingAREN have been instrumental in enabling their respective connecting sites to be part of this endeavour.