FPGAs and the New Era of Cloud-based ‘Hardware Microservices’

Jun 8th, 2017 6:00am by Mary Branscombe

Feature image via Pixabay.

In his keynote at the Microsoft Build conference earlier this year, the head of Microsoft’s AI and Research Harry Shum hinted that at some point the Microsoft Azure cloud service will give developers access to field programmable gate arrays (FPGAs). Azure Chief Technology Officer Mark Russinovich also talked about Azure exposing “[FPGAs] as a service for you sometime in the future.”

What is that FPGA-powered future going to look like and how are developers going to use it?

FPGAs aren’t a new technology by any means; Traditionally, they have been reserved for specialized applications where the need for custom processing hardware that can be updated as very demanding algorithms evolve outweigh the complexity of programming the hardware.

With processors, Russinovich explained to the New Stack, “the more general purpose you are, generally, the more flexible you are, the more kinds of programs and algorithms you can drop throw at the compute engine — but you sacrifice efficiency.”

The array of gates that make up an FPGA can be programmed to run a specific algorithm, using the combination of logic gates (usually implemented as lookup tables), arithmetic units, digital signal processors (DSPs) to do multiplication, static RAM for temporarily storing the results of those computation and switching blocks that let you control the connections between the programmable blocks. Some FPGAs are essentially systems-on-a-chip (SoC), with CPUs, PCI Express and DMA connections and Ethernet controllers, turning the programmable array into a custom accelerator for the code running on the CPU.

The combination means that FPGAs can offer massive parallelism targeted only for a specific algorithm, and at much lower power compared to a GPU. And unlike an application-specific integrated circuit (ASIC), they can be reprogrammed when you want to change that algorithm (that’s the field-programmable part).

FPGAs have much more data parallelism than CPUs.

“FPGAs hit that spot, where they can process streams of data very quickly and in parallel,” Russinovich explained. “They’re programmable like GPU or CPU but aimed at this parallel low-latency world for things like inference and Deep Neural Networks; if you need to do online speech recognition, image recognition it’s really important to have that low latency.”

The disadvantage is that the programming and reprogramming is done in complex, low-level hardware definition languages like Verilog. Rob Taylor, CEO of ReconfigureIO — a startup planning to offer hardware acceleration in the cloud by letting developers program FPGAs with Go — told the New Stack that there simply aren’t many hardware engineers who are familiar with these.

Most FPGA development takes place at processor development companies. And the very different programming model, where you’re actually configuring the hardware, is challenging for developers used to higher level languages.

“As a software engineer, you can start writing simple hardware but writing capable hardware takes several years of learning to get to right,” Taylor said. In rare cases, it’s possible to program an FPGA in a way that permanently damages it, although the toolchain that programs the hardware should provide warnings.

This is one of the reasons FPGAs have never become mainstream, Taylor suggested. “It’s the cost of doing FPGA engineering. If you can only hire a few expensive engineers, there’s only so much you can do. You end up with very vertical specific solutions and you don’t get the bubbling innovation that, say, the cloud has brought.”

Nonetheless, Taylor sees FPGAs as a good solution for a range of problems. “Anything where you have data in movement and you’re processing that and getting an answer and responding to it or sharing that answer somewhere else. You could build an in-memory database on FPGA to do statistical analysis blazingly fast without going near the CPU.” Such applications could include image and video processing, real-time data analytics, ad technologies, audio, telecoms and even software-defined networking (SDN), which he noted is “still a massive drain on resources.”

The ReconfigureIO approach uses Go Channels, which Taylor said fit the model of FPGA pipes, “but we’re working on an intermediate layer, which we want to have be standard and open source that will let people use whatever random language they want.”

The complexity of programming them is why the Amazon Web Services FPGA EC2 F1 instances that let you program Xilinx FPGAs are targeted at customers who already use FPGA appliances for their vertical workloads in genomics, analytics, cryptography or financial services and want to bring those workloads to the cloud. AWS actually provides a hardware development kit for FPGA configurations. Some of those appliance makers like Ryft will be providing APIs to integrate the AWS FPGA instances with their analytics platforms the way their FPGA appliances already do.

The bandwidth between two VMs inside Azure, even with a 40 gigabit network adapter on each VM, is only around 4Gbps per second; with FPGA-accelerated networking, that goes up to 25Gbps.

FPGA vendors are starting to offer higher level programming options, like C, C++ and OpenCL. AWS is relying on OpenCL FPGA programming to reach more developers in future, although this still requires a lot of expertise and isn’t necessarily a good match for the FPGA programming model.

“It’s still a very esoteric type of development environment,” Russinovich noted; “but I think the trend is clear that things are going to get more and more accessible. I think you can imagine at some point — I’m talking a far future vision here — developers using different languages to write programs with tools that will take a look at your algorithm and determine, based on profiling or analysis, that this piece of your program is more efficient if we run it on FPGA and this one on GPU and this one on CPU, and developers just take advantage of the best capabilities the platform has to offer.

Smart Networks

Microsoft is taking a rather different approach. On Azure, you can actually use FPGA-powered services already; you just don’t know that you’re using FPGAs — in the same way that you don’t know you’re using flash SSDs when you use Cosmos DB or GPUs when you use Microsoft Cognitive Services. In fact, the whole Azure network relies on FPGA-powered software-defined networking.

When Microsoft first started putting FPGAs into Azure, it was to scale low latency and high throughput to systems with very large amounts of data and very high traffic; the indexing servers for Bing. Initially, those FPGAs had their own network connections, but to simplify the network topology Microsoft switched to connecting them to the same NIC as the server they were in. Once the FPGAs were connected directly to those network cards, they could also accelerate the software-defined networking that Azure uses for routing and load balancing.

The impact of FPGAs on query latency for Bing; even at double the query load FPGA-accelerated ranking has lower latency than software-powered ranking at any load.

Like custom silicon designed to go on a network card, these FPGA SmartNICs are more efficient than CPUs and use less power. But as Microsoft improves that software-defined networking stack to work with the 50GB and 100GB network adaptors that are coming soon, the FPGAs can be reprogrammed — which you couldn’t do with custom silicon.

These SmartNICs already implement the flow tables that are the basis of Azure’s software-defined networking; in future, they might also implement Quality of Service or RDMA, and speed up storage by offloading cryptographic calculations and error checking.

Azure Accelerated Networking has been available on the larger Azure VM sizes since last year, for both Windows Server and Ubuntu, although the service is still in preview and has what Russinovich called “extremely rare compatibility issues,” so you have to choose to use it. It also has some limitations, like needing separate Azure subscriptions if you want to use it for both Windows Server and Linux. The bandwidth between two VMs inside Azure, even with a 40-gigabit network adapter on each VM, is only around 4Gbps per second; with FPGA-accelerated networking, that goes up to 25Gbps, with five to ten times less latency (depending on your application).

The impact of FPGA-accelerated SDN (credit Microsoft).

The next step is building services for developers to use those FPGAs, even if it’s indirect. “There are multiple ways to make FPGAs available to developers, including us, just using them for infrastructure enhancements that accrue to every developer that uses our cloud, like SDN,” Russinovich explained. “We want to make deep neural network [DNN] and inference models available to developers, that are easy to deploy and easy to consume, and that’s running DNN on top of FPGA so they get the best performance. They would do their training for example on GPU, and bring us the models. The developers aren’t aware it’s FPGAs underneath; they just hand the DNN model to the platform and the platform executes it in the most efficient way possible.

Different ways developers will use FPGAs in Azure (credit Microsoft).

Russinovich demonstrated the advantage of that at Build, with what he called “tens to hundreds of tera-operations, so you can get really effective inference.” Running the same machine learning algorithm on 24 FPGAs rather than 24 CPUs, he showed a 150-200x improvement in latency and around 50 times the energy efficiency.

Developers can already take advantage of this through the Microsoft Cognitive Services APIs. “We’ve already got this in production in Bing as part of the next level of acceleration for Cognitive Services training, as well as for Bing index ranking.”

Hardware Microservices

Although each FPGA deployed in Azure is all on the same motherboard as a CPU and connected to them as a hardware accelerator, they’re also directly connected to the Azure network, so they can connect to other FPGAs with very low latency, rather than being slowed down by piping the data through the CPUs.

That gives you much better utilization of the FPGAs, and the flexibility to still use them for acceleration as part of a distributed application that also runs on CPUs, or for experimenting with algorithms for acceleration that you’re still developing. “If you’re not sure of the optimal algorithms for say compression or encryption for the data you’re processing, or the data shape is going to be changing over time so you don’t want to take the risk of burning it to the silicon, you can experiment and be agile on FPGAs,” Russinovich told us.

A management fabric co-ordinates those directly connected FPGAs into applications, so different layers of a DNN for a model that’s been pre-trained with TensorFlow or Microsoft Cognitive Toolkit (CNTK) could be on different FPGAs — giving you a way of distributing a very deep network across many devices, which avoids the scaling problems of many DNN frameworks.

Distributing a DNN across the Azure FPGA hardware microservices fabric (credit Microsoft).

“This is a vastly more general usage for FPGAs, where we think there is potential for lots of innovation, that we call hardware microservices,” Russinovich told us. “If you have a large fleet of FPGAs and they’re directly connected to the network and programmable through the network, then what kinds of apps can you build that are accelerated in ways we can’t achieve on standard kinds of hardware that we’ve got today? We’re using that infrastructure first for our DNNs, but we see that becoming a general-purpose platform.”

He talked about that fabric having web search ranking, deep neural networks, SQL accelerations and SDN offload programmed into it. “Azure Data Lake Analytics is looking at pushing machine learning also into FPGAs.”

Will developers end up writing their own applications to run on that hardware microservices reconfigurable compute layer, or will they use FPGAs broadly in any way? Russinovich predicted a mix of the ways developers will use FPGAs.

“There will be developers that will directly take advantage of these things but I think many developers will end up indirectly taking advantage of this by leveraging libraries and frameworks that include those things for them, or using microservices models provided by ISVs or the open source community.” Further down the line, he suggested that could work much the same way containers do.

“Today in the developer space if I want a REST front end, I just pull a node.js Docker container. I don’t have to write it myself, I’m just using it. I think you’ll see the same model, where you’ll say I want this algorithm, I want the most efficient possible deployment of it and I’ll be deploying it on FPGAs even though I’m not directly writing the code that goes onto FPGAs — and maybe I’m even getting it from a Docker repository!”

FPGAs make sense for cloud providers that have the expertise to work with them, but they might also show up wherever you’re collecting a lot of data. “I definite think there’s a place for FPGAs on the edge, because you’re going to have a lot of inference happening on the edge. Instead of sending data up into the cloud you do processing right there and you can do incremental training on top of the FPGAs, as well as having the models evolve with the data that’s being consumed on the edge.”

That’s all some way off, Russinovich noted, but just as GPUs became a standard development tool for certain problems as Moore’s Law slowed down for CPUs, so might FPGAs — whether developers know they’re using them or not.

“We’re at the early stages of opening this up and making it accessible not just to vendors like Xilinx and Altera, but to startups who are looking at higher level programming languages for FPGAs. I think we’re at the first wave of this new generation that’s kind of a rebirth of this technology, which seems to come and go — every five to ten years it gets hot and then fades away, but I think it’s here to stay this time.”

After completing an MSc in Intelligent Knowledge Based Systems in 1990, Mary Branscombe was convinced that promising as the AI techniques she’d been studying were, they weren’t even close to being ready. Since then, she’s been a technology journalist for...