Dojo's goal is to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla's 4+ million cars.[3] This goal led to a considerably different architecture than conventional supercomputer designs.[4][5]
History
Tesla operates several massively parallel computing clusters for developing its Autopilot advanced driver assistance system. Its primary unnamed cluster using 5,760 NvidiaA100graphics processing units (GPUs) was touted by Andrej Karpathy in 2021 at the fourth International Joint Conference on Computer Vision and Pattern Recognition (CCVPR 2021) to be "roughly the number five supercomputer in the world"[6] at approximately 81.6 petaflops, based on scaling the performance of the Nvidia Selene supercomputer, which uses similar components.[7] However, the performance of the primary Tesla GPU cluster has been disputed, as it was not clear if this was measured using single-precision or double-precision floating point numbers (FP32 or FP64).[8] Tesla also operates a second 4,032 GPU cluster for training and a third 1,752 GPU cluster for automatic labeling of objects.[9][10]
The primary unnamed Tesla GPU cluster has been used for processing one million video clips, each ten seconds long, taken from Tesla Autopilot cameras operating in Tesla cars in the real world, running at 36 frames per second. Collectively, these video clips contained six billion object labels, with depth and velocity data; the total size of the data set was 1.5 petabytes. This data set was used for training a neural network intended to help Autopilot computers in Tesla cars understand roads.[6] By August 2022, Tesla had upgraded the primary GPU cluster to 7,360 GPUs.[11]
Dojo was first mentioned by Musk in April 2019 during Tesla's "Autonomy Investor Day".[12] In August 2020,[6][13] Musk stated it was "about a year away" due to power and thermal issues.[14]
The defining goal of [Dojo] is scalability. We have de-emphasized several mechanisms that you find in typical CPUs, like coherency, virtual memory, and global lookup directories just because these mechanisms do not scale very well... Instead, we have relied on a very fast and very distributed SRAM [static random-access memory] storage throughout the mesh. And this is backed by an order of magnitude higher speed of interconnect than what you find in a typical distributed system.
— Emil Talpes, Tesla hardware engineer, 2022 The Next Platform article[5]
Dojo was officially announced at Tesla's Artificial Intelligence (AI) Day on August 19, 2021.[15] Tesla revealed details of the D1 chip and its plans for "Project Dojo", a datacenter that would house 3,000 D1 chips;[16] the first "Training Tile" had been completed and delivered the week before.[9] In October 2021, Tesla released a "Dojo Technology" whitepaper describing the Configurable Float8 (CFloat8) and Configurable Float16 (CFloat16) floating point formats and arithmetic operations as an extension of Institute of Electrical and Electronics Engineers (IEEE) standard 754.[17]
At the follow-up AI Day in September 2022, Tesla announced it had built several System Trays and one Cabinet. During a test, the company stated that Project Dojo drew 2.3 megawatts (MW) of power before tripping a local San Jose, California power substation.[18] At the time, Tesla was assembling one Training Tile per day.[10]
In August 2023, Tesla powered on Dojo for production use as well as a new training cluster configured with 10,000 Nvidia H100 GPUs.[19]
In January 2024, Musk described Dojo as "a long shot worth taking because the payoff is potentially very high. But it's not something that is a high probability."[20]
In June 2024, Musk explained that ongoing construction work at Gigafactory Texas is for a computing cluster claiming that it is planned to comprise an even mix of "Tesla AI" and Nvidia/other hardware with a total thermal design power of at first 130 MW and eventually exceeding 500 MW.[21]
Reception
Various analysts have stated Dojo "is impressive, but it won't transform supercomputing",[4] "is a game-changer because it has been developed completely in-house",[22] "will massively accelerate the development of autonomous vehicles",[23] and "could be a game changer for the future of Tesla FSD and for AI more broadly."[1]
On September 11, 2023, Morgan Stanley increased its target price for Tesla stock (TSLA) to US$400 from a prior target of $250 and called the stock its top pick in the electric vehicle sector, stating that Tesla’s Dojo supercomputer could fuel a $500 billion jump in Tesla’s market value.[24]
Technical architecture
The fundamental unit of the Dojo supercomputer is the D1 chip,[25] designed by a team at Tesla led by ex-AMDCPU designer Ganesh Venkataramanan, including Emil Talpes, Debjit Das Sarma, Douglas Williams, Bill Chang, and Rajiv Kurian.[5]
Updating at Artificial Intelligence (AI) Day in 2022, Tesla announced that Dojo would scale by deploying multiple ExaPODs, in which there would be:[23]
10 Cabinets per ExaPOD (1,062,000 cores, 3,000 D1 chips)
2 System Trays per Cabinet (106,200 cores, 300 D1 chips)
6 Training Tiles per System Tray (53,100 cores, along with host interface hardware)
25 D1 chips per Training Tile (8,850 cores)
354 computing cores per D1 chip
According to Venkataramanan, Tesla's senior director of Autopilot hardware, Dojo will have more than an exaflop (a million teraflops) of computing power.[28] For comparison, according to Nvidia, in August 2021, the (pre-Dojo) Tesla AI-training center used 720 nodes, each with eight Nvidia A100 Tensor Core GPUs for 5,760 GPUs in total, providing up to 1.8 exaflops of performance.[29]
D1 chip
Each node (computing core) of the D1 processing chip is a general purpose 64-bit CPU with a superscalar core. It supports internal instruction-level parallelism, and includes simultaneous multithreading (SMT). It doesn't support virtual memory and uses limited memory protection mechanisms. Dojo software/applications manage chip resources.
The D1 instruction set supports both 64-bit scalar and 64-byte single instruction, multiple data (SIMD) vector instructions.[30] The integer unit mixes reduced instruction set computer (RISC-V) and custom instructions, supporting 8, 16, 32, or 64 bit integers. The custom vector math unit is optimized for machine learning kernels and supports multiple data formats, with a mix of precisions and numerical ranges, many of which are compiler composable.[5] Up to 16 vector formats can be used simultaneously.[5]
Node
Each D1 node uses a 32-byte fetch window holding up to eight instructions. These instructions are fed to an eight-wide decoder which supports two threads per cycle, followed by a four-wide, four-way SMT scalar scheduler that has two integer units, two address units, and one register file per thread. Vector instructions are passed further down the pipeline to a dedicated vector scheduler with two-way SMT, which feeds either a 64-byte SIMD unit or four 8×8×4 matrix multiplication units.[30]
The network on-chip (NOC) router links cores into a two-dimensional mesh network. It can send one packet in and one packet out in all four directions to/from each neighbor node, along with one 64-byte read and one 64-byte write to local SRAM per clock cycle.[30]
Each core has a 1.25 megabytes (MB) of SRAM main memory. Load and store speeds reach 400 gigabytes (GB) per second and 270 GB/sec, respectively. The chip has explicit core-to-core data transfer instructions. Each SRAM has a unique list parser that feeds a pair of decoders and a gather engine that feeds the vector register file, which together can directly transfer information across nodes.[5]
Die
Twelve nodes (cores) are grouped into a local block. Nodes are arranged in an 18×20 array on a single die, of which 354 cores are available for applications.[5] The die runs at 2 gigahertz (GHz) and totals 440 MB of SRAM (360 cores × 1.25 MB/core).[5] It reaches 376 teraflops using 16-bit brain floating point (BF16) numbers or using configurable 8-bit floating point (CFloat8) numbers, which is a Tesla proposal,[17] and 22 teraflops at FP32.
Each die comprises 576 bi-directional serializer/deserializer (SerDes) channels along the perimeter to link to other dies, and moves 8 TB/sec across all four die edges.[5] Each D1 chip has a thermal design power of approximately 400 watts.[31]
Training Tile
The water-cooled Training Tile packages 25 D1 chips into a 5×5 array.[5] Each tile supports 36 TB/sec of aggregate bandwidth via 40 input/output (I/O) chips - half the bandwidth of the chip mesh network. Each tile supports 10 TB/sec of on-tile bandwidth. Each tile has 11 GB of SRAM memory (25 D1 chips × 360 cores/D1 × 1.25 MB/core). Each tile achieves 9 petaflops at BF16/CFloat8 precision (25 D1 chips × 376 TFLOP/D1). Each tile consumes 15 kilowatts;[5] 288 amperes at 52 volts.[31]
System Tray
Six tiles are aggregated into a System Tray, which is integrated with a host interface. Each host interface includes 512 x86 cores, providing a Linux-based user environment.[18] Previously, the Dojo System Tray was known as the Training Matrix, which includes six Training Tiles, 20 Dojo Interface Processor cards across four host servers, and Ethernet-linked adjunct servers. It has 53,100 D1 cores.
Dojo Interface Processor
Dojo Interface Processor cards (DIP) sit on the edges of the tile arrays and are hooked into the mesh network. Host systems power the DIPs and perform various system management functions. A DIP memory and I/O co-processor hold 32 GB of shared HBM (either HBM2e or HBM3) – as well as Ethernet interfaces that sidestep the mesh network. Each DIP card has 2 I/O processors with 4 memory banks totaling 32 GB with 800 GB/sec of bandwidth.
The DIP plugs into a PCI-Express 4.0 x16 slot that offers 32 GB/sec of bandwidth per card. Five cards per tile edge offer 160 GB/sec of bandwidth to the host servers and 4.5 TB/sec to the tile.
Tesla Transport Protocol
Tesla Transport Protocol (TTP) is a proprietary interconnect over PCI-Express. A 50 GB/sec TTP protocol link runs over Ethernet to access either a single 400 Gb/sec port or a paired set of 200 Gb/sec ports. Crossing the entire two-dimensional mesh network might take 30 hops, while TTP over Ethernet takes only four hops (at lower bandwidth), reducing vertical latency.
Cabinet and ExaPOD
Dojo stacks tiles vertically in a cabinet to minimize the distance and communications time between them. The Dojo ExaPod system includes 120 tiles, totaling 1,062,000 usable cores, reaching 1 exaflops at BF16 and CFloat8 formats. It has 1.3 TB of on-tile SRAM memory and 13 TB of dual in-linehigh bandwidth memory (HBM).
Software
Dojo supports the framework PyTorch, "Nothing as low level as C or C++, nothing remotely like CUDA".[5] The SRAM presents as a single address space.[5]
Because FP32 has more precision and range than needed for AI tasks, and FP16 does not have enough, Tesla has devised 8- and 16-bit configurable floating point formats (CFloat8 and CFloat16, respectively) which allow the compiler to dynamically set mantissa and exponent precision, accepting lower precision in return for faster vector processing and reduced storage requirements.[5][17]
Administrasi Pariwisata Nasional Tiongkok中华人民共和国国家旅游局Guójiā Lǚyóu JúLambang Nasional Republik Rakyat TiongkokInformasi lembagaDibentuk1982Dibubarkan19 Maret 2018Lembaga penggantiKementerian Kebudayaan dan Pariwisata TiongkokJenisNasionalWilayah hukum TiongkokKantor pusatBeijingPejabat eksekutifLi Jinzao (李金早), ChairmanLembaga indukDewan NegaraSitus webzh.travelchina.gov.cn Administrasi Pariwisata Nasional Tiongkok (APNT) (Hanzi sederhana: 国家旅�...
Fa'a'ā International AirportAéroport International Tahiti Fa'a'āIATA: PPTICAO: NTAAInformasiJenisPublikPengelolaSETIL - AéroportsMelayaniTahiti, Polinesia PrancisLokasiPape'eteMaskapai penghubungAir TahitiAir Tahiti NuiKetinggian dpl mdplSitus webhttp://tahiti-aeroport.pfPetaNTAALokasi bandara di Polinesia PrancisLandasan pacu Arah Panjang Permukaan m kaki 04/22 3,420 11 Bitumen Statistik (2011)Total penumpang1,169,819Sumber: French AIP[1] Aéroport.fr[2] Pesawat...
Mike Frantz Informasi pribadiNama lengkap Mike FrantzTanggal lahir 14 Oktober 1986 (umur 37)Tempat lahir Saarbrücken, Jerman BaratTinggi 1,80 m (5 ft 11 in)Posisi bermain GelandangInformasi klubKlub saat ini SC FreiburgNomor 8Karier junior0000–2006 Borussia NeunkirchenKarier senior*Tahun Tim Tampil (Gol)2005–2006 Borussia Neunkirchen 39 (8)2006 1. FC Saarbrücken II 12 (4)2007–2008 1. FC Saarbrücken 41 (14)2008–2014 1. FC Nürnberg 125 (10)2008–2012 1. FC Nür...
American journalist and author Nuwer at the 2023 Texas Book Festival Rachel Nuwer is an independent American journalist and author of the 2018 nonfiction book Poached: Inside the Dark World of Wildlife Trafficking (Da Capo Press). She has covered the issue of poaching from the perspectives of criminals, activists and science for years in prominent publications, including the Smithsonian, BBC Future, The New York Times, and National Geographic.[1] Early life Nuwer grew up in Mississipp...
American politician Aaron KaufmanKaufman in 2014Member of the Maryland House of Delegatesfrom the 18th districtIncumbentAssumed office January 11, 2023Serving with Emily Shetty and Jared SolomonPreceded byAl Carr Personal detailsBorn (1987-02-15) February 15, 1987 (age 37)Political partyDemocraticResidenceChevy Chase, MarylandEducationMontgomery College (AA)University of Maryland, College Park (BA) Aaron M. Kaufman (born February 15, 1987)[1] is an America...
Qoloba CalankeedB. Indonesia: Memuji Tanah AirLagu kebangsaan Somalia Puntland Somalia Barat DayaAliasQoloba Calenkeedu Waa Cayn ooPenulis lirikAbdullahi Qarshe, 1959KomponisAbdullahi Qarshe, 1959Penggunaan1 Agustus 2012Sampel audioQolobaa Calankeedberkasbantuan Sampel audioQolobaa Calankeedberkasbantuan Qolobaa Calankeed adalah lagu kebangsaan Somalia, Puntland dan Somalia Barat Daya. Lagu yang syairnya digubah oleh Abdullahi Qarshe ini ditetapkan pada tahun 2012. Untuk menggantik...
Bahasa Tatana Tatanaq, Tatana Bisaya Sabah Dituturkan diMalaysiaWilayahSabahEtnisBisayaPenutur(21.000 jiwa per 1982–2000)[1] Rumpun bahasaAustronesia Melayu-PolinesiaKalimantan Utara RayaKalimantan UtaraSabahSabah Barat DayaMurutik RayaTatana Kode bahasaISO 639-3Mencakup:txx – Tatana'bsy – Sabah BisayaGlottologtata1257 (Tatana)[2]saba1267 (Sabah Bisaya)[3]QIDQ7395820 Portal Bahasa Sunting kotak info • L • B...
Public housing estate in Diamond Hill, Hong Kong Fung Tak EstateFung Tak EstateGeneral informationLocation111 Fung Tak Road, Diamond HillKowloon, Hong KongCoordinates22°18′59″N 114°16′09″E / 22.3163°N 114.26923°E / 22.3163; 114.26923StatusCompletedCategoryPublic rental housingPopulation13,342[1] (2016)No. of blocks7[2]No. of units1,659[2]ConstructionConstructed1991; 33 years ago (1991)AuthorityHong Kong Housing...
Barros jarpaPlace of originChileMain ingredientsBread Barros Jarpa is a popular sandwich in Chilean cuisine that includes ham and melted cheese.[1] It is named after the 19th century Chilean Minister Ernesto Barros Jarpa, and was coined in the restaurant of the National Congress of Chile, where Barros Jarpa always asked for this sandwich.[2] It is a derivative of the Barros Luco sandwich. The minister's cousin, President Ramón Barros Luco, asked for sandwiches with beef and c...
English painter and sculptor (1830–1896) The Right HonourableThe Lord LeightonPRAFrederic Leighton, Self-portrait, 1880BornFrederic Leighton(1830-12-03)3 December 1830Scarborough, North Riding of Yorkshire, EnglandDied25 January 1896(1896-01-25) (aged 65)Kensington, London, EnglandEducationEduard von SteinleGiovanni CostaKnown forPainting and sculptureNotable workFlaming JuneMovementAcademicism, Pre-Raphaelite and British AestheticismAwardsPrix de RomeLégion d'honneurSignature Fr...
Cari artikel bahasa Cari berdasarkan kode ISO 639 (Uji coba) Cari berdasarkan nilai Glottolog Kolom pencarian ini hanya didukung oleh beberapa antarmuka Halaman rumpun acak Rumpun bahasaLuzon Utara CordilleraPersebaranCordillera Tengah, Luzon bagian utaraPenggolongan bahasaAustronesiaMelayu-PolinesiaFilipinaLuzon Utara Iloko Lembah Cagayan Meso-Cordillera Luzon Timur Laut Agta Dicamay Arta Kode bahasaGlottolognort3238Lokasi penuturanDaerah penutur rumpun bahasa Luzon Utara...
Angkatan Laut Republik Islam Iranنیروی دریایی ارتش جمهوری اسلامی ایرانNirvi-ye Daryâ'i-ye Artesh-e Jomhuri-ye Eslâmi-ye IrânLambang Angkatan Laut Republik Islam IranDibentuk500 SM (pembentukan awal)1923 (angkatan laut kekaisaran)1979 (bentuk saat ini)Negara IranCabangAngkatan LautJumlah personel18,000 (estimasi 2019)[1]Bagian dariTentara Republik Islam IranMarkasBandar Abbas[2]Julukanدریادلان (Daryādelān)Kuda lautMotoراه م�...
Žiar nad Hronom District in the Banská Bystrica Region of Slovakia Repište (Hungarian: Repistye) is a village and municipality in Žiar nad Hronom District in the Banská Bystrica Region of central Slovakia. 6: Banská Bystrica Region vteMunicipalities of Žiar nad Hronom District Kremnica Žiar nad Hronom Bartošova Lehôtka Bzenica Dolná Trnávka Dolná Ves Dolná Ždaňa Hliník nad Hronom Horná Ves Horná Ždaňa Hronská Dúbrava Ihráč Janova Lehota Jastrabá Kopernica Kosorín Kr...
U.S. semiconductor company that produces Field-Programmable Gate Arrays (FPGA) Not to be confused with Altria Group or Alterra Mountain Company. Altera CorporationHeadquarters in San Jose, CaliforniaCompany typeSubsidiaryIndustryIntegrated circuitsFoundedJune 1983; 40 years ago (June 1983)HeadquartersSan Jose, California, United StatesKey peopleSandra L. Rivera (CEO)Shannon J. Poulin (COO)ProductsFPGAsCPLDsEmbedded systems ASICsRevenue $1.932 billion (2014)Net income $472 million...
Arm of the Chinese People's Liberation Army Arm of the People's Liberation Army Wuxi Joint Logistics Support Center无锡联勤保障中心Founded13 September 2016; 7 years ago (2016-09-13)Country People's Republic of ChinaAllegiance Chinese Communist PartyTypeLogistics serviceRoleMilitary logisticsPart of People's Liberation ArmyCommandersCommanderPLAGF Sr Col Hou Zhiping (侯志平)Political CommissarPLAGF Sr Col He Zhengyi (贺正义)InsigniaSleeve badgeMil...
Questa voce o sezione deve essere rivista e aggiornata appena possibile. Sembra infatti che questa voce contenga informazioni superate e/o obsolete. Se puoi, contribuisci ad aggiornarla. Questa voce o sezione sugli argomenti laghi e Lombardia non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Segui i suggerimenti dei progetti di riferimento 1, 2. Questa voc...