Startup combines Digital In-Memory Compute and chiplet implementations for data center-grade inferencing.
This article was written by Cambrian AI analysts Alberto Romero and Karl Freund
D-Matrix was founded in 2019 by two AI hardware veterans, Sid Sheth and Sudeep Bhoja, who previously worked together at Inphi (Marvell) and Broadcom. The company was born at a unique moment for AI, just two years after the popular transformer architecture was invented by Google Brain scientists. By 2019, the world was beginning to realize the tremendous importance of transformer-based models and D-Matrix saw an opportunity to specifically define its AI hardware to excel in using these large language models.
Transformers eat the world
GPT-3, MT-NLG, Gopher, DALL·E, PaLM and virtually every other major language model is based on the now ubiquitous transformer architecture. Tech companies continue to announce potentially astonishing models that remain inaccessible to the world due to one insurmountable obstacle: putting these models into production for inference in the data center is virtually unfeasible with today’s AI hardware. That’s what D-Matrix wants to solve and as a company developing in parallel to the already world-changing wave of transformers and LLMs, they are well positioned to tackle this problem with a clean slate.
Focusing on large multimodal models (which use different types of data) is what sets the company apart from its competitors. Transformer-based models are usually trained on high-performance GPUs (where Nvidia enjoys a multi-year advantage), but making inferences is a story about power efficiency, not just performance at any cost. D-Matrix has found an innovative solution that claims to achieve 10-30x the efficiency of current hardware. Once technology companies start embedding transformer-based NLP models in all kinds of applications and spreading them across different industries, this kind of ultra-efficient hardware will be attractive to handle the inference workloads.
The key to the next generation of AI hardware: in-memory compute
D-Matrix’s solution currently is a proof-of-concept chiplet-based architecture called Nighthawk. Together with Jayhawk, the soon-to-be second chiplet to also implement die-to-die interfaces, they form the basis for Corsair, D-Matrix’s hardware product expected to be released in the second half of 2023. Nighthawk consists of an AI engine with four neural cores and a RISC-V CPU. Each neural core consists of two octal computation cores (OC), each of which has eight digital computation cores in memory where weights are stored and matrix multiplication is performed.
Nighthawk is the result of the new combination of three technological pillars. The first is digital in-memory compute (Digital IMC). The efficiency barrier existing hardware encounters is due to the cost and performance limits created by moving data to perform the calculations. D-Matrix has combined the accuracy and predictability of digital hardware with super-efficient IMC to create what D-Matrix believes is the first DIMC architecture for inference in the data center. Nighthawk’s projected performance seems to support D-Matrix’s idea of bringing both data and computing power into the SRAM, which is currently the best memory type the IMC solution serves. D-Matrix claims its hardware is 10x more efficient than an NVIDIA A100 for inference workloads.
The second pillar is the use of Lego-like modular chiplet architecture. Chiplets can be interconnected with Jayhawk – the complementary IP piece for Nighthawk – to scale up and scale out the hardware. Up to 8 chiplets can be placed on a single card, while maintaining efficiency capabilities. These chiplets can be “plugged in” with existing hardware and used specifically to handle transformer-related workloads. In the future, D-Matrix believes its hardware can store models as large as the 175 billion-parameter GPT-3 on a single card.
The company also foresees dramatic growth in capacity going forward, with more than 1000 TOPS per watt within reach by the end of this decade.
Finally, D-Matrix applies transformer-specific numerics, sparsity and other ML tools that further enhance their efficiency-oriented solution. They also offer a model zoo and off-the-shelf ML libraries, which also bolsters their AI-first approach to their hardware.
It won’t be an easy ride for D-Matrix and other start-ups in this space. Its competitors, some significantly more mature, also realized the potential of a transformer architecture. Nvidia recently unveiled the Hopper H-100, the next-generation GPU architecture capable of 10x the performance of the previous hardware on large AI models, albeit at significantly higher power consumption and cost. Another company with similar ambitions is Cerebras Systems. The latest Wafer scaling system, the Cerebras CS-2, is the largest AI server on the market and the business claims a cluster of these could soon support a 120-trillion-parameter model for training and inference.
However, while D-Matrix is a new company entering a very competitive market, it has an edge; it showed up at just the right time when transformers were clearly promising, but still so young that most companies hadn’t had time to react. There are plenty of opportunities and strategies for companies like D-Matrix trying to capture a slice of the transformer market. The D-Matrix hardware could fill a space that could grow significantly in the coming years. And the founders’ vast expertise and knowledge will help them turn this advantage into a reality.
disclosures: This article expresses the views of the authors and should not be construed as advice to buy from or invest in the companies mentioned. Cambrian AI Research is fortunate to have many, if not most, semiconductor companies as our clients, including Blaize, Cerebras, D-Matrix, Esperanto, Graphcore, GML, IBM, Intel, Mythic, NVIDIA, Qualcomm Technologies, Si-Five, Synopsys and Tenstorrent. We have no investment positions in any of the companies mentioned in this article and do not plan to start one in the near future. For more information, please visit our website at: https://cambrian-AI.com†