IBM, which has been at the forefront of quantum computing and a number of other research fields, recently showcased what it feels the solution to AI processing (and its costs) could be. And if IBM’s vision translates into something, the future isn’t centered around GPUs: instead, it takes place within mixed-signal, analog chips that could bring about massive improvements in energy efficiency while offering competitive performance against the market’s current go-tos.
According to a research paper published in Nature Electronics last week, IBM believes the future of AI inferencing could pass through a chip combining phase-change memory (PCM) alongside digital circuits. According to the paper, matrix-vector multiplication (one of the main workloads for AI inferencing) could be performed directly on chip-stored weights.
In this scenario, the reduced power requirements of passive, analog circuitry (which don’t require a continuous electrical current to maintain the bit value they’re holding) should allow for a reduction in the overall power required to successfully perform matrix calculations — or, at the very least, allow for the surplus energy budget of the (now) analog sections of the chip to be repurposed towards its remaining digital circuits for added throughput. The design takes clues from learnings from research in neuromorphic computing.
Developed as part of IBM’s Hermes project, the latest version of the chip counts with 64 compute tiles, which communicate with each other through a Network-on-Chip (NOC) approach that’s similar in concept to AMD’s Infinity Fabric. There’s also fixed-function hardware that’s specialized in the processing of convolutional layers (which aim to reduce complexity of the underlying information in order to accelerate processing speed and increase efficiency). Being a research chip, it’s been fabricated at a 14-nm fabrication process; perhaps IBM has room to further improve power efficiency, if the analog cells can be further miniaturized.
The phase-change-memory (PCM) cells themselves are distributed throughout each of the 64 tiles arranged in a crossbar, which can store a 256×256 matrix-vector multiplication space. To be fair, there are certain performance constraints in such a mixed analog-digital design: signals need to be converted from analog to digital (and vice-versa), which incurs penalties in both latency and energy utilization. But with appropriate scheduling optimizations, the final result is higher efficiency compared with a fully-digital chip (such as Nvidia’s A100 and H100). According to IBM, a single ResNet-9 input was processed in 1.52 μs (micro-seconds) and consumed 1.51 μJ (micro-Joules) of energy. According to Abu Sebastian at the IBM Rüschlikon Center (as covered by EE Times), the current iteration of the chip achieves a peak matrix-vector-multiplication throughput of 16.1 to 63.1 TOPC (trillion operations per second) at an energy efficiency of 2.48 to 9.76 TOPS W-1.
The still-ongoing AI “revolution” has sparked volcanic moves in the High Performance Computing (HPC) market. But besides driving home the marvel of GPUs (the general processing units responsible for accelerating most of that particular market), the gold rush for AI accelerators has showcased just how dependent on a single player the market still is (read: Nvidia), while also bringing back to the forefront questions of energy efficiency.
Analog chips that break apart the power efficiency barriers would certainly be a welcome move, but as with any new technology, analog AI inferencing chips will have to fight to survive against the already-entrenched technologies, software stack, and techniques deployed today. Network effects and market share are real, and Nvidia’s grip on the HPC market through both its hardware and CUDA software stacks is… vice-like, to say the least.