Universities refresh their core networks to bolster high-performance computing systems for research.
The University of Florida recently built the fastest artificial intelligence supercomputer in higher education. Now, the Gainesville, Fla.-based institution is building an ultra-high-speed network to go with it.
University researchers amass vast amounts of scientific data in fields such as medicine, astronomy and agriculture. As the size and volume of their data grow larger, they need a fast, higher-capacity network to keep pace and enable them to more quickly upload or download data, move data between labs and feed into UF’s new HiPerGator AI supercomputer for analysis.
To meet demand, UF’s IT department is building a new universitywide network that will double network speeds to 400 gigabits per second for research, says UF CIO Elias Eldayrie.
“Network speed is extremely important, especially when researchers run simulations or real-time evaluations of different data sets to draw conclusions,” he says. “Our current network meets their needs, but we expect data to double in the next five years. So, that’s what we’re planning for as we build a much faster, more reliable network.”
Many universities are upgrading their networks to support research, teaching and learning and other campus activities. In some cases, they are upgrading from 10Gbps connections to 100Gbps or multiple 100Gbps connections.
For researchers, the combination of faster networks and high-performance computing (HPC) systems enables researchers to accelerate their work as they seek new discoveries and scientific breakthroughs, from astronomers finding new insights about the universe to scientists improving crop production.
“In academia, research work requires a lot of horsepower to process and move data packets around the network,” says Will Townsend, senior analyst at Moor Insights & Strategy, a global technology analyst and advisory firm.
New Network Speeds AI Research
Today, UF operates two separate core networks: a research network with two 100-gigabyte core switches providing researchers 200Gbps of throughput and an enterprise network with two 10GB core switches providing 20Gbps of throughput to the rest of campus.
The university is now consolidating both networks into one new core ring that reaches 800Gbps. Individual campus buildings will feature speeds of 1Gbps to 100Gbps, depending on users’ bandwidth needs, while network speeds for research will double to 400Gbps, Eldayrie says.
UF is standardizing on Arista Networks’ networking equipment because of its scalability, built-in automation and security features, price and performance, smaller form factor, and energy efficiency, he says.
“We are moving away from a chassis-based platform to a pizza box style for space and power efficiency and ease of scale,” Eldayrie says. “As a result, we will have a more economical and efficient network, and through automation features, reduce the amount of support time.”
The faster network connectivity will enable researchers to take full advantage of the new $70 million HiPerGator AI supercomputer. The university wants to drive innovation with cutting-edge AI research and prepare the next-generation workforce with necessary AI skills, Eldayrie says.
In fact, the university is currently integrating AI across its curricula and hiring 100 additional faculty members focused on AI. “HiPerGator AI helps amplify and accelerate our work in AI,” he says.
HiPerGator AI, launched in early 2021, is a turnkey AI supercomputer made up of more than 140 Nvidia DGX servers, each with 2.5 petabytes of flash storage. The new AI supercomputer, which is paired with UF’s existing general purpose HiPerGator 3.0 supercomputer, delivers a combined 700 petaflops of performance.
However, data-intensive workloads can easily overwhelm the network and create a bottleneck that can slow performance, which is why building a new high-speed network is critical, Eldayrie says.
The faster network, which is expected to go live during the spring 2022 semester, will optimize supercomputer performance and facilitate research.
“As data grows, the network has to grow with it,” he says.
Adjusting Network Bandwidth on the Fly
The University of Michigan, in Ann Arbor, Mich., is also building a new core network that will provide up to a tenfold performance increase for research, teaching and learning, and other campus activities, including access to research archives, says Ravi Pendse, UM’s CIO and vice president for IT.
Researchers, faculty and staff all have different bandwidth needs. Users currently with 1Gbps or 10Gbps will be able to get 100Gbps, while users with 100Gbps will be able to get several hundred gigabits per second, he says.
“Our new architecture supports diverse needs and takes care of our brilliant scientists and our outstanding students and staff,” Pendse says. “Before, we were limited in how many 100-gig connections we could provide. Now, we have the flexibility to scale as needs increase.”
UM’s IT team has laid new fiber across campus and is using software-defined networking concepts via configurable fabrics to build the new core network. Through software-controllable network gear and homegrown automation software, the IT staff can provision network resources and scale up and down as researchers and other users require, he says.
For example, researchers who typically require 10Gb speeds at their labs may need 100Gb speeds for six months. IT administrators can remotely configure the higher speeds.
“After six months, when the researcher doesn’t need the bandwidth anymore, we can virtually deprovision that link down and give the bandwidth to somebody else who needs it,” Pendse says.
The network provisioning process could take a few minutes to a few hours, which is much faster than in the past, when it could take up to six months to plan and deploy new cabling and networking equipment, he says.
UM began the $10.1 million project in the fall of 2019 and plans to go live with the new network in late spring 2022. Once complete, it will have a huge impact for researchers, Pendse says.
“As Michigan’s researchers need more bandwidth, we can quickly adapt the network and provide them the environment they need to get their research done,” he says.