NVIDIA holds the dominant accelerator revenue share in 2026, with AMD Instinct, Intel Gaudi, Qualcomm AI Engine, and Broadcom custom-ASIC platforms competing in the data-center and edge-inference lanes [S5].
On the Chinese side, HiSilicon, MediaTek, and Horizon Robotics have consistently led the domestic AI silicon ranking since the 2022 baselines published by China Daily, and they remain the reference names in 2026 [S4].
Global Vendor Stack and What Each One Actually Ships
NVIDIA's data-center line is built around the H100, H200, B100, B200, and GB200 NVL72 rack-scale systems, paired with CUDA 12.x, NVLink, and the Spectrum-X Ethernet fabric; on the workstation side the RTX 5090 (24 GB GDDR7) and RTX 6000 Ada are the discrete boards most specifiers compare against [S5].
AMD competes with the Instinct MI300X (192 GB HBM3) and MI325X, on the ROCm 6 software stack, with CDNA 3 architecture [S5].
Intel ships Gaudi 2 and Gaudi 3 accelerators on the SynapseAI software path, and integrates the AI Engine into Xeon 6 server CPUs; on the edge side Intel Core Ultra (Meteor Lake / Lunar Lake) carries an NPU alongside the integrated GPU [S5].
Qualcomm's Hexagon NPU inside the Snapdragon 8 Elite and Snapdragon X Elite is the company's high-volume AI play for mobile and Copilot+ PCs, paired with the AI Engine SDK [S5].
Broadcom, Marvell, and Astera Labs do not sell branded AI boards at scale; they ship custom AI silicon (XPU) and networking retimers/PCIe switches to hyperscalers like Google (TPU) and Meta (MTIA), and they are the most cited "AI-revenue" names in the ASIC lane [S5].
Buyers comparing accelerator cards for rack deployment should treat the AI Chip Market 2026: USD 154.93 B Growth Pool, ASIC Surge and Architecture Splits reference as the macro overlay for the vendor decisions below.
Chinese Vendor Stack: HiSilicon, MediaTek, Horizon Robotics
China Daily's 2022 ranking placed HiSilicon at #1, MediaTek at #2, and Horizon Robotics at #3, with the broader Askci top-10 covering Cambricon, Will Semiconductor, Allwinner, UNISOC, Sanechips, and Rockchip [S4].
HiSilicon's portfolio in 2026 centres on the Ascend 910C/Ascend 920 family for training and inference, the Kirin 9020/9030 smartphone SoCs with DaVinci architecture NPUs, and the Kunpeng 920 server CPU feeding the Ascend 920P AI cluster boards [S4].
MediaTek continues to push the Dimensity 9400/9400+ flagship mobile SoC with the APU 790 NPU, and the Kompanio 1300T tablet SoC; the company is also sampling the Dimensity Auto cockpit chips (CT-X1) with an AI accelerator block [S4].
Horizon Robotics is the leading ADAS silicon vendor in China with the Journey 2/3/5/6 (征程) line, shipping into Li Auto, BYD, Chery, and GAC platforms; the Journey 6 uses the Nash architecture with a BPU delivering 128 TOPS at INT8 [S4].
Cambricon's Siyuan 590 and latest Siyuan 770 accelerators are the most direct domestic alternative to NVIDIA H-class boards for training workloads in 2026 [S4].
Selection Criteria: 7 Gates Buyers Actually Lock In

Memory bandwidth and capacity drive LLM throughput: H200 delivers 4.8 TB/s on 141 GB HBM3e, while MI300X uses 192 GB HBM3 at 5.3 TB/s, and the B200 carries 192 GB HBM3e at 8 TB/s [S5].
Process node dictates TDP per rack: the H100 SXM uses TSMC N4, the B100/B200 use TSMC 4NP, and the MI300X uses TSMC N5 with a mix of CCD and I/O dies in 2.5D packaging [S5].
Software stack maturity is the de-facto lock-in: CUDA 12.x, ROCm 6, SynapseAI, and the Qualcomm AI Engine SDK all gate the developer's first-day productivity on the card [S5].
Numerics exposed on silicon: NVIDIA H100/B200 expose FP8 (E4M3/E5M2) and FP6/FP4 tensor cores, AMD MI300X exposes FP8 and INT8, and the Gaudi 3 card exposes BF16, FP8, and INT8 on the GEMM engine [S5].
Interconnect ceiling: NVLink 4 on H100 is 900 GB/s, NVLink 5 on B200 is 1,800 GB/s, and Infinity Fabric on MI300X is 1024 GB/s across the OAM module; RoCE / Spectrum-X400 is the Ethernet path for clusters beyond one rack [S5].
Form factor and power: SXM5/6 modules run 700-1000 W per board, OAM modules target 600-750 W, and PCIe cards sit in the 300-400 W band; rack PDU sizing is a real procurement gate [S5].
Supply availability in 2026: lead time on H100/H200 is still multi-quarter for non-hyperscaler buyers, and B200 allocation is rationed against the GB200 NVL72 reserved pool [S5].
Comparison Table: 5 Major Accelerator Cards on 4 Decision Criteria
Memory capacity, FP8/INT8 throughput, board TDP, and software readiness line up the 5 reference cards most specifiers quote in 2026 [S5].
Card | HBM/HBM3e capacity | Peak FP8 TFLOPS | Board TDP | Software stack<br/> NVIDIA H200 SXM | 141 GB HBM3e | 1,979 TFLOPS (FP8 sparse) | 700 W | CUDA 12.x + NVLink<br/> NVIDIA B200 SXM | 192 GB HBM3e | 4,500 TFLOPS (FP4 dense) | 1000 W | CUDA 12.x + NVLink 5<br/> AMD MI300X | 192 GB HBM3 | 2,615 TFLOPS (FP8 sparse) | 750 W | ROCm 6 + Infinity Fabric<br/> Intel Gaudi 3 | 128 GB HBM2e | 1,835 TFLOPS (FP8) | 900 W | SynapseAI + Ethernet-only scaling<br/> HiSilicon Ascend 910C | 128 GB HBM2e | 780 TFLOPS (FP16) | 310 W | CANN 7.x + HCCL [S4][S5]
Use-Case Mapping: Training, Inference, Edge, ADAS

LLM training at frontier scale (1e25+ FLOPs) stays on NVIDIA B200/GB200 and the AMD MI300X for buyers willing to maintain ROCm kernels; the Gaudi 3 is positioned at the cost-optimised LLM fine-tuning tier [S5].
LLM inference at batch size 1 to 32 runs on the same three families plus the Chinese Ascend 910C/920 cluster; the Ascend path is the default for domestic Chinese buyers subject to procurement rules [S4].
Edge inference on PCs uses the Qualcomm Hexagon NPU (Snapdragon X Elite 12-core, 45 TOPS) or the Intel Core Ultra 200V (48 TOPS NPU) on Lunar Lake, and the same NPU IP is now shipping on PLC modules for factory-floor AI workloads [S5].
Mobile inference at sub-7 W is dominated by MediaTek APU 790 on the Dimensity 9400 and the Hexagon NPU on the Snapdragon 8 Elite Gen 2; both target on-device LLM and stable-diffusion demos at sub-second latency [S4][S5].
ADAS workloads land on Horizon Robotics Journey 5/6, Mobileye EyeQ6, and NVIDIA Drive Orin/Thor, where the SoC output drives servo motor controllers for steering, throttle, and braking actuation; the Journey 6 Nash BPU at 128 INT8 TOPS is the most-cited 2026 ADAS reference on Chinese passenger vehicles [S4].
Limitations, Failure Modes and Sourcing Constraints
Supply remains the gating constraint on the H100/H200/B200 in 2026; lead times of 24-40 weeks are common for non-hyperscaler buyers, and allocation is mediated through OEM partners such as Supermicro, Dell PowerEdge, and HPE Cray [S5].
Software-porting risk is real on ROCm 6, SynapseAI, and the HiSilicon CANN stack; production workloads must be validated on the target numerics (FP8/INT8/FP4) and the target HBM generation, not on a paper spec [S4][S5].
Power and cooling ceilings on existing datacentres cap rack density; retrofit sites must be re-rated for 80 kW+ per rack, with liquid cooling (CDU + rear-door heat exchanger or full immersion) loops instrumented by flow meter sensors as the de-facto path for 2026 GB200 NVL72 deployments [S5].
Export-control friction: US Bureau of Industry and Security rules continue to gate A100/H100, A800/H800, and derivatives into China; the Ascend, Cambricon, and Hygon paths exist precisely to fill that void [S4].
Quality, talent, and packaging shifts (CoWoS-S, CoWoS-L, SoIC) are the real gating capacity constraint in 2026; see Semiconductor Industry Trends 2026: Talent, Test, and Packaging Shifts for the upstream view.
Standards, Sourcing Signals and Trackable Next Nodes

Buyers should map each candidate card against the OCP (Open Compute Project) Accelerator Module spec, the UBB (Universal Baseboard) form factor for HGX/OAM, and the IEEE P3154 / P3455 working drafts on AI-system benchmarking, which are the governance bodies vendors are aligning to in 2026 [S5].
Trackable next node: NVIDIA's Rubin R100 generation is the B200 successor to watch in the second half of 2026, with HBM4 memory and a planned CX9 Superchip pair; AMD's MI355X on CDNA 4 and the Gaudi 4 follow-on are the AMD/Intel counter-moves [S5].
Trackable next node: Cambricon Siyuan 770 and HiSilicon Ascend 920P (CloudMatrix 384) are the Chinese-side reference points for sovereign LLM training capacity, and they are the names most procurement documents will quote against through Q4 2026 [S4].
Final specifier guidance: lock the HBM generation, the FP8/INT8/FP4 numerics, the software-stack version, and the rack PDU envelope before picking a vendor brand; the brand decision follows those four gates, not the other way round [S5].