The Sunway TaihuLight was the world's fastest supercomputer for two years, from June 2016 to June 2018, according to the TOP500 lists. The record was surpassed in June 2018 by IBM's Summit.[6][5][7]
Architecture
The Sunway TaihuLight utilizes domestically developed semiconductors, including a total of 40,960 Chinese-designed SW26010manycore 64-bit RISC processors based on the Sunway architecture.[5][2][8] Each processor chip contains 256 processing cores, and an additional four auxiliary cores for system management (also RISC cores, just more fully featured) for a total of 10,649,600 CPU cores across the entire system.[8]
The system runs on its own operating system, Sunway RaiseOS 2.0.5, which is based on Linux.[8] The system has its own customized implementation of OpenACC 2.0 to aid the parallelization of code.[10]
Future development
China's first exascale supercomputer was scheduled to enter service by 2020 according to the head of the school of computing at the National University of Defense Technology (NUDT). According to the national plan for the next generation of high performance computers, the country would have develop an exascale computer during the 13th Five-Year-Plan period (2016–2020). The government of Tianjin Binhai New Area, NUDT and the National Supercomputing Center of Tianjin are working on the project.[11] The investment is likely to hit 3 billion yuan ($470.6 million).[12]
^ abcdDongarra, Jack (2016-06-20). "Report on the Sunway TaihuLight System"(PDF). netlib.org. Retrieved 2016-06-20. Each CPE Cluster is composed of a Management Processing Element (MPE) which is a 64-bit RISC core which is supporting both user and system modes, a 256-bit vector instructions, 32 KB L1 instruction cache and 32 KB L1 data cache, and a 256KB L2 cache. The Computer Processing Element (CPE) is composed of an 8×8 mesh of 64-bit RISC cores, supporting only user mode, with a 256-bit vector instructions, 16 KB L1 instruction cache and 64 KB Scratch Pad Memory (SPM). [..] Each CPE has a 64 KB local (scratchpad) memory, no cache memory. The local memory is SRAM. There is a 16KB instruction cache. Each of the 4 CPE/MPE clusters has 8 GB of DDR3 memory. So a node has 32 GB of primary memory. Each processor connects to four 128-bit DDR3-2133 memory controllers, with a memory bandwidth of 136.51 GB/s.
^Lendino, James (2016-06-20). "Meet the new world's fastest supercomputer: China's TaihuLight". Extremetech. Retrieved 2016-06-21. The TOP500 report said that the chip also lacks any traditional L1-L2-L3 cache, and instead has 12KB[sic] of instruction cache and 64KB "local scratchpad" that works sort of like an L1 cache.