System/platform-level debug is extremely important not only for manufacturing debug but also for
mission mode debug. After the Systems-on-Chip (SoCs) are manufactured and assembled on system boards,
with the installation of OS (Operating System), SW (Software) and FW (Firmware), any system-level error
or BSOD (Blue Screen of Death) seen in laptops/systems are hard to debug. Debugging such issues require
opening up the system, which can be very time-consuming. Closed-chassis debug techniques help
alleviate opening up the system for debug, thus saving tremendous amount of debug time and money,
as well as helps improve TTM (Time to Market). USB Type-C® receptacle has become the most popular
choice by most OEMs/ODMs as a system debug interface for sending out debug information. This
keynote will start with an overview of SoC debug and proceed to describe the importance of closed-chassis
debug at the system/platform level using the ubiquitous USB Type-C® receptacle. The talk will
cover the debug architecture framework, challenges, innovations, and solutions for capturing Hardware,
Software and Firmware traces from the SoCs/platforms/system as well as use of the interface for In-Field
Debug (IFD) and for Silicon Lifecycle Management (SLM) purposes.
Silent Data Corruptions are extremely hard to diagnose in a production fleet
and cause significant impact at scale. This talk will cover Meta's experience
tackling this emerging challenge at our scale. We find that Silent Data Corruptions
are not 1 in a million occurrence as previously thought by the industry, they
happen far too frequently (1 in thousands). This is a huge order of magnitude
difference that industry should take immediate action on. This will need
cross-functional work across many areas including testing, verification, design,
manufacturing, fleet detection, and software approaches for resiliency. SDC work
is foundational for computational accuracy, and the talk will also be a call to
action for industry and researchers to address this critical challenge together.
Next-generation SoCs for the Zetta-Scale computing era will be developed with increased
integration of our compute, memory and communication systems in optimized and complex
packaging solutions. We will explore the challenges and opportunities in circuits and
technology to enable resilient and reliable circuits and systems.