AI-powered recommendation applications are opening up new avenues to enhance the customer experience. With this technology, online stores can highlight other items to add to digital shopping carts, digital music services can suggest songs based on tunes already in the rotation, and social media channels can offer up content that might fit the user’s interests. When these systems work seamlessly and deliver accurate suggestions, they can also bring more dollars to the bottom line. However, a significant amount of challenging engineering work goes on behind the scenes to produce accurate recommendations.
AI accelerators are a critical part of the technology stack for recommendation systems. Their speed and energy efficiency, as measured in inferences per Joule of energy, are key to their prediction accuracy. In 2019, Meta (then Facebook) called on the industry to work on hardware acceleration for recommendation system, based on its open-source deep learning recommendation model (DLRM). That call to action inspired the engineering team at Neuchips Inc. to rally around this problem of providing increased recommender model capacity that scales in an Open Compute Project (OCP) form factor. In the race to meet Meta’s request, the young company announced this summer that it has taped out its first DLRM accelerator, the Neuchips RecAccel™-N3000, in Taiwan.
Designed for data center recommendation models, the RecAccel™-N3000 has achieved one million DLRM inferences per Joule of energy (which translates into 20 million inferences per second per 20-Watt chip). The AI accelerator, developed with support and EDA tools from Synopsys and other semiconductor industry leaders, will be manufactured on TSMC’s 7nm process, with the sample plan scheduled to be ready at the end of 2022.
In this blog post, we’ll provide more details about how Neuchips, with a team of about 30 engineers, was able to tape out its 400mm2 AI chip in just 18 months, a process that would typically require more than 100 engineers over the course of 3 to 4 years. Another opportunity to learn more about Neuchips comes during its presentation, “Design of a High-Efficiency Accelerator for Full-Scale Deep Learning Recommendation Models (DLRM) in the Datacenter,” at the upcoming ARC® Processor Summit 2022 on Thursday, September 8, at the Santa Clara Marriott. The company’s session is scheduled from 2:35p.m. to 3:20p.m. PDT.
AI recommendation systems, especially DLRMs, are the dominant machine learning application when it comes to cloud resource usage. Novel adaptations of DLRMs are generating more useful predictions, while requiring more compute capacity within fixed energy and space constraints. Neuchips is pioneering a unique “direct-to-ASIC” engineering approach that accelerates software with a purpose-built, domain-specific AI accelerator plus co-designed compiler and runtime software. In the company’s asynchronous, heterogeneous dataflow architecture, each type of IP and processor is carefully tailored to optimize a component of the DLRM logical architecture. The configurable Synopsys ARC® processors, with their low power consumption and high performance, play an integral role in the groundbreaking performance of the RecAccel™-N3000.
Other features of the RecAccel™-N3000 include:
Striving to get to market first, Neuchips sought support, design and verification tools, and IP that could help the company accelerate its design cycle. It found what it needed through the AI Chip Design Lab, a joint effort between Synopsys and the Industrial Technology Research Institute (ITRI) in Taiwan. Many on the team were already familiar with Synopsys technologies, which made it an easy decision to collaborate with Synopsys on the ambitious project.
The AI Chip Design Lab is located at ITRI headquarters in Hsinchu, Taiwan. It receives support from the Technology Development Programs of the Department of Industrial Technology (DoIT) and the Ministry of Economic Affairs (MOEA) in Taiwan. The lab aims to help the country’s semiconductor industry advance through access to the latest design tools and design and verification services. One of the key offerings of the AI Chip Design Lab is a Synopsys system-level solution based on the ARC AI Reference Design Platform, spanning architecture design to virtual prototyping and system verification. The design platform is intended to help lower the barrier of entry into AI and to shorten design cycles.
Based on their unique characteristics, DLRMs can be difficult to accelerate with general-purpose AI accelerators. Neuchips developed its RecAccel™-N3000 with tailored hardware IPs that accelerate embedding tables, matrix multiplication, and feature interaction. Working with Synopsys to implement early hardware/software co-development enabled by the ARC AI Reference Design Platform, Neuchips was able to save more than one year in chip development time. With the design platform, the team was able to develop and verify both the RecAccel™-N3000 domain-specific AI accelerator’s PCIe 5.0 subsystem and its LPDDR5 subsystem early and then integrate them into the whole chip. Synopsys ZeBu® Server 4 emulation system in the cloud was used to verify the subsystems as well as the entire RecAccel™-N3000.
The RecAccel™-N3000 leverages an array of Synopsys IP blocks, including:
Using silicon-proven Synopsys IP helped the Neuchips team reduce integration risks and contributed to a shorter design cycle. Synopsys application engineers also supported Neuchips in optimizing the code for its cloud-based chip design, configuring the IP, and with simulation and verification on the FPGA-based ZeBu Server 4 system, which accelerated full ASIC RTL simulations from two weeks down to about 20 minutes.
Other design and verification tools that played a part in the development of the RecAccel™-N3000 include Synopsys Design Compiler RTL synthesis solution, Synopsys VCS® functional verification solution, Synopsys SpyGlass® static and formal verification platform, Synopsys Verdi® automated debug system, Synopsys Formality® equivalence checking, Synopsys PrimeTime® static timing analysis tool, Synopsys PrimePower RTL to signoff power analysis tool, and Synopsys IC Compiler™ II place-and-route solution.
With recommendation systems becoming both more prevalent and more insightful in our digital world, Neuchips’ RecAccel™-N3000 comes at a good time. By accelerating recommendation inference for data centers, the high-performance, energy-efficient, and scalable AI platform is poised to help a variety of industries personalize the customer experience online. Working closely with Synopsys, ITRI, and others in the Taiwan semiconductor ecosystem, Neuchips Inc. has achieved the fast time-to-market needed to get a head start in the race to deliver impactful AI solutions.
Stay up-to-date on the latest technologies and trends in electronic design by subscribing to the “From Silicon to Software” blog.