In computing, CUDA (originally Compute Unified Device Architecture) is a proprietary[1]parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU).[2] CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels.[3] In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.
CUDA is designed to work with programming languages such as C, C++, Fortran and Python. This accessibility makes it easier for specialists in parallel programming to use GPU resources, in contrast to prior APIs like Direct3D and OpenGL, which require advanced skills in graphics programming.[4] CUDA-powered GPUs also support programming frameworks such as OpenMP, OpenACC and OpenCL.[5][3]
CUDA was created by Nvidia in 2006.[6] When it was first introduced, the name was an acronym for Compute Unified Device Architecture,[7] but Nvidia later dropped the common use of the acronym and now rarely expands it.[8]
The graphics processing unit (GPU), as a specialized computer processor, addresses the demands of real-time high-resolution 3D graphics compute-intensive tasks. By 2012, GPUs had evolved into highly parallel multi-core systems allowing efficient manipulation of large blocks of data. This design is more effective than general-purpose central processing unit (CPUs) for algorithms in situations where processing large blocks of data is done in parallel, such as:
Ian Buck, while at Stanford in 2000, created an 8K gaming rig using 32 GeForce cards, then obtained a DARPA grant to perform general purpose parallel programming on GPUs. He then joined Nvidia, where since 2004 he has been overseeing CUDA development. In pushing for CUDA, Jensen Huang aimed for the Nvidia GPUs to become a general hardware for scientific computing. CUDA was released in 2006. Around 2015, the focus of CUDA changed to neural networks.[9]
Ontology
The following table offers a non-exact description for the ontology of CUDA framework.
thread (aka. "SP", "streaming processor", "cuda core", but these names are now deprecated)
analogous to individual scalar ops within a vector op
Programming abilities
The CUDA platform is accessible to software developers through CUDA-accelerated libraries, compiler directives such as OpenACC, and extensions to industry-standard programming languages including C, C++, Fortran and Python. C/C++ programmers can use 'CUDA C/C++', compiled to PTX with nvcc, Nvidia's LLVM-based C/C++ compiler, or by clang itself.[10] Fortran programmers can use 'CUDA Fortran', compiled with the PGI CUDA Fortran compiler from The Portland Group.[needs update] Python programmers can use the cuNumeric library to accelerate applications on Nvidia GPUs.
CUDA provides both a low level API (CUDA Driver API, non single-source) and a higher level API (CUDA Runtime API, single-source). The initial CUDA SDK was made public on 15 February 2007, for Microsoft Windows and Linux. Mac OS X support was later added in version 2.0,[18] which supersedes the beta released February 14, 2008.[19] CUDA works with all Nvidia GPUs from the G8x series onwards, including GeForce, Quadro and the Tesla line. CUDA is compatible with most standard operating systems.
CUDA 8.0 comes with the following libraries (for compilation & runtime, in alphabetical order):
cuBLAS – CUDA Basic Linear Algebra Subroutines library
CUDART – CUDA Runtime library
cuFFT – CUDA Fast Fourier Transform library
cuRAND – CUDA Random Number Generation library
cuSOLVER – CUDA based collection of dense and sparse direct solvers
cuSPARSE – CUDA Sparse Matrix library
NPP – NVIDIA Performance Primitives library
nvGRAPH – NVIDIA Graph Analytics library
NVML – NVIDIA Management Library
NVRTC – NVIDIA Runtime Compilation library for CUDA C++
CUDA 8.0 comes with these other software components:
nView – NVIDIA nView Desktop Management Software
NVWMI – NVIDIA Enterprise Management Toolkit
GameWorks PhysX – is a multi-platform game physics engine
CUDA 9.0–9.2 comes with these other components:
CUTLASS 1.0 – custom linear algebra algorithms,
NVIDIA Video Decoder was deprecated in CUDA 9.2; it is now available in NVIDIA Video Codec SDK
CUDA 10 comes with these other components:
nvJPEG – Hybrid (CPU and GPU) JPEG processing
CUDA 11.0–11.8 comes with these other components:[20][21][22][23]
CUDA has several advantages over traditional general-purpose computation on GPUs (GPGPU) using graphics APIs:
Scattered reads – code can read from arbitrary addresses in memory.
Unified virtual memory (CUDA 4.0 and above)
Unified memory (CUDA 6.0 and above)
Shared memory – CUDA exposes a fast shared memory region that can be shared among threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture lookups.[24]
Faster downloads and readbacks to and from the GPU
Full support for integer and bitwise operations, including integer texture lookups
Limitations
Whether for the host computer or the GPU device, all CUDA source code is now processed according to C++ syntax rules.[25] This was not always the case. Earlier versions of CUDA were based on C syntax rules.[26] As with the more general case of compiling C code with a C++ compiler, it is therefore possible that old C-style CUDA source code will either fail to compile or will not behave as originally intended.
Interoperability with rendering languages such as OpenGL is one-way, with OpenGL having access to registered CUDA memory but CUDA not having access to OpenGL memory.
Copying between host and device memory may incur a performance hit due to system bus bandwidth and latency (this can be partly alleviated with asynchronous memory transfers, handled by the GPU's DMA engine).
Threads should be running in groups of at least 32 for best performance, with total number of threads numbering in the thousands. Branches in the program code do not affect performance significantly, provided that each of 32 threads takes the same execution path; the SIMD execution model becomes a significant limitation for any inherently divergent task (e.g. traversing a space partitioning data structure during ray tracing).
No emulation or fallback functionality is available for modern revisions.
Valid C++ may sometimes be flagged and prevent compilation due to the way the compiler approaches optimization for target GPU device limitations.[citation needed]
C++ run-time type information (RTTI) and C++-style exception handling are only supported in host code, not in device code.
In single-precision on first generation CUDA compute capability 1.x devices, denormal numbers are unsupported and are instead flushed to zero, and the precision of both the division and square root operations are slightly lower than IEEE 754-compliant single precision math. Devices that support compute capability 2.0 and above support denormal numbers, and the division and square root operations are IEEE 754 compliant by default. However, users can obtain the prior faster gaming-grade math of compute capability 1.x devices if desired by setting compiler flags to disable accurate divisions and accurate square roots, and enable flushing denormal numbers to zero.[27]
Unlike OpenCL, CUDA-enabled GPUs are only available from Nvidia as it is proprietary.[28][1] Attempts to implement CUDA on other GPUs include:
Project Coriander: Converts CUDA C++11 source to OpenCL 1.2 C. A fork of CUDA-on-CL intended to run TensorFlow.[29][30][31]
GPUOpen HIP: A thin abstraction layer on top of CUDA and ROCm intended for AMD and Nvidia GPUs. Has a conversion tool for importing CUDA C++ source. Supports CUDA 4.0 plus C++11 and float16.
ZLUDA is a drop-in replacement for CUDA on AMD GPUs and formerly Intel GPUs with near-native performance.[33] The developer, Andrzej Janik, was separately contracted by both Intel and AMD to develop the software in 2021 and 2022, respectively. However, neither company decided to release it officially due to the lack of a business use case. AMD's contract included a clause that allowed Janik to release his code for AMD independently, allowing him to release the new version that only supports AMD GPUs.[34]
chipStar can compile and run CUDA/HIP programs on advanced OpenCL 3.0 or Level Zero platforms.[35]
Example
This example code in C++ loads a texture from an image into an array on the GPU:
texture<float,2,cudaReadModeElementType>tex;voidfoo(){cudaArray*cu_array;// Allocate arraycudaChannelFormatDescdescription=cudaCreateChannelDesc<float>();cudaMallocArray(&cu_array,&description,width,height);// Copy image data to arraycudaMemcpyToArray(cu_array,image,width*height*sizeof(float),cudaMemcpyHostToDevice);// Set texture parameters (default)tex.addressMode[0]=cudaAddressModeClamp;tex.addressMode[1]=cudaAddressModeClamp;tex.filterMode=cudaFilterModePoint;tex.normalized=false;// do not normalize coordinates// Bind the array to the texturecudaBindTextureToArray(tex,cu_array);// Run kerneldim3blockDim(16,16,1);dim3gridDim((width+blockDim.x-1)/blockDim.x,(height+blockDim.y-1)/blockDim.y,1);kernel<<<gridDim,blockDim,0>>>(d_data,height,width);// Unbind the array from the texturecudaUnbindTexture(tex);}//end foo()__global__voidkernel(float*odata,intheight,intwidth){unsignedintx=blockIdx.x*blockDim.x+threadIdx.x;unsignedinty=blockIdx.y*blockDim.y+threadIdx.y;if(x<width&&y<height){floatc=tex2D(tex,x,y);odata[y*width+x]=c;}}
Below is an example given in Python that computes the product of two arrays on the GPU. The unofficial Python language bindings can be obtained from PyCUDA.[36]
importpycuda.compilerascompimportpycuda.driverasdrvimportnumpyimportpycuda.autoinitmod=comp.SourceModule("""__global__ void multiply_them(float *dest, float *a, float *b){ const int i = threadIdx.x; dest[i] = a[i] * b[i];}""")multiply_them=mod.get_function("multiply_them")a=numpy.random.randn(400).astype(numpy.float32)b=numpy.random.randn(400).astype(numpy.float32)dest=numpy.zeros_like(a)multiply_them(drv.Out(dest),drv.In(a),drv.In(b),block=(400,1,1))print(dest-a*b)
Additional Python bindings to simplify matrix multiplication operations can be found in the program pycublas.[37]
This section needs to be updated. The reason given is: Missing CUDA compute capability 10.x (Blackwell). Please help update this article to reflect recent events or newly available information.(March 2024)
Feature support (unlisted features are supported for all compute capabilities)
oneAPI is an initiative based in open standards, created to support software development for multiple hardware architectures.[123] The oneAPI libraries must implement open specifications that are discussed publicly by the Special Interest Groups, offering the possibility for any developer or organization to implement their own versions of oneAPI libraries.[124][125]
Originally made by Intel, other hardware adopters include Fujitsu and Huawei.
Unified Acceleration Foundation (UXL)
Unified Acceleration Foundation (UXL) is a new technology consortium working on the continuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal is to offer open alternatives to Nvidia's CUDA. The main companies behind it are Intel, Google, ARM, Qualcomm, Samsung, Imagination, and VMware.[126]
SYCL – an open standard from Khronos Group for programming a variety of platforms, including GPUs, with single-source modern C++, similar to higher-level CUDA Runtime API (single-source)
BrookGPU – the Stanford University graphics group's compiler
^In the Whitepapers the Tensor Core cube diagrams represent the Dot Product Unit Width into the height (4 FP16 for Volta and Turing, 8 FP16 for A100, 4 FP16 for GA102, 16 FP16 for GH100). The other two dimensions represent the number of Dot Product Units (4x4 = 16 for Volta and Turing, 8x4 = 32 for Ampere and Hopper). The resulting gray blocks are the FP16 FMA operations per cycle. Pascal without Tensor core is only shown for speed comparison as is Volta V100 with non-FP16 datatypes.
^No more than one scheduler can issue 2 instructions at once. The first scheduler is in charge of warps with odd IDs. The second scheduler is in charge of warps with even IDs.
^Jia, Zhe; Maggioni, Marco; Staiger, Benjamin; Scarpazza, Daniele P. (2018). "Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking". arXiv:1804.06826 [cs.DC].
^Note that Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC]. disagrees and states 2 KiB L0 instruction cache per SM partition and 16 KiB L1 instruction cache per SM
^Jia, Zhe; Maggioni, Marco; Staiger, Benjamin; Scarpazza, Daniele P. (2018). "Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking". arXiv:1804.06826 [cs.DC].
Artikel ini membutuhkan rujukan tambahan agar kualitasnya dapat dipastikan. Mohon bantu kami mengembangkan artikel ini dengan cara menambahkan rujukan ke sumber tepercaya. Pernyataan tak bersumber bisa saja dipertentangkan dan dihapus.Cari sumber: Soekanto Tjokrodiatmodjo – berita · surat kabar · buku · cendekiawan · JSTOR Soekanto Tjokrodiatmodjo Kepala Kepolisian Negara Republik Indonesia ke-1Masa jabatan29 September 1945 – 14 Desember...
Ini adalah nama Korea; marganya adalah Chun. Cheon Duhwan천두환全斗煥Chun Doo-hwan, 1983 Presiden Korea Selatan ke-5Masa jabatan1 September 1980 – 24 Februari 1988Perdana MenteriYoo Chang-soonKim Sang-hyupChin Iee-chongLho Shin-yongLee Han-keyKim Chung-yul PendahuluChoi Kyu-hahPak Choong-hoon (penjabat)PenggantiRoh Tae-wooPresiden Partai Keadilan DemokratMasa jabatan15 Januari 1981 – 10 Juli 1987 PendahuluPosisi baruPenggantiRoh Tae-woo Informasi pribadiLahir(1931-...
Casimir PulaskiLukisan Jenderal Pulaski, oleh seniman Polandia Jan StykaNama asliKazimierz PułaskiLahir(1745-03-04)4 Maret 1745 atau (1745-03-06)6 Maret 1745Warsawa, Persemakmuran Polandia-LituaniaMeninggal11 Oktober 1779(1779-10-11) (umur 34)Savannah, Georgia, Amerika SerikatDikebumikanMonterey Square, SavannahPengabdianPersemakmuran Polandia – LituaniaAmerika SerikatDinas/cabangTentara Persemakmuran Polandia – LituaniaTentara KontinentalLama dinas1762–1779PangkatBrigadier g...
Untuk penyair, lihat Kim Ok (penyair). Dalam nama Korea ini, nama keluarganya adalah Kim. Kim OkNama asal김옥Lahir28 Agustus 1964 (umur 59)PasanganKim Jong-il (2004–2011)Nama KoreaJosŏn-gŭl김옥 Hanja金玉 Alih AksaraGim OkMcCune–ReischauerKim Ok Kim Ok (김옥; lahir 28 Agustus 1964) adalah seorang mantan karyawati pemerintahan Korea Utara yang menjabat sebagai sekretaris pribadi Kim Jong-il dari 1980an sampai kematiannya.[1] Setelah kematian Ko Yong-hui pada Agus...
Liam GallagherLahir21 September 1972 (umur 51)Manchester, England William John Paul Gallagher (lahir 21 September 1972) merupakan seorang penyanyi dan penulis lagu berkebangsaan Inggris. Dia merupakan vokalis dari Grup Musik Inggris Oasis dari tahun 1991-2009 sebelum akhirnya merilis debut album solonya sendiri pada tahun 2017. Dia merupakan salah satu figur paling dikenal dalam musik Inggris yang terkenal karena sikapnya yang blak-blakan dan kasar, kegemarannya mengenakan parka, dan ga...
See also: Timeline of the COVID-19 pandemic The following is a timeline of the COVID-19 pandemic in the United States during 2020.[1] Background See also: Timeline of the COVID-19 pandemic in 2019 By late November 2019, coronavirus disease 2019 had broken out in Wuhan, China.[2] As reported in Clinical Infectious Diseases on November 30, 2020, 7,389 blood samples collected between December 13, 2019, and January 17, 2020, by the American Red Cross from normal donors in nine st...
Overview of the events of 1596 in art This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: 1596 in art – news · newspapers · books · scholar · JSTOR (February 2024) (Learn how and when to remove this message) Overview of the events of 1596 in art List of years in art (table) … 1586 1587 1588 1589 1590 1591 1592...
Кайзер Гюнтер фон Шварцбург, со шлемом, украшенным гербовой фигурой. Гербовые фигуры нанесены также и на его одежду, а на щите нарисован герб.Рыцарь в сюрко и шлеме с нашлемником, бурлетом и намётом. Гербовые фигуры[1][2] (pieces honorables, ordinaries, Ehrenstucke) — все фигуры, р�...
English footballer Cliff Holton Holton in 1958Personal informationFull name Clifford Charles HoltonDate of birth (1929-04-29)29 April 1929Place of birth Oxford, EnglandDate of death 31 May 1996(1996-05-31) (aged 67)Position(s) Centre forwardYouth career Oxford CitySenior career*Years Team Apps (Gls)1947–1958 Arsenal 198 (83)1958–1961 Watford 120 (84)1961–1962 Northampton Town 62 (50)1962–1965 Crystal Palace 101 (40)1965–1966 Watford 24 (12)1966 Charlton Athletic 18 (7)1966–19...
President of Moldova from 2016 to 2020 Igor DodonDodon in 2019Leader of the Party of SocialistsIncumbentAssumed office 23 March 2024Preceded byHimself (as Executive Secretary)In office30 December 2020 – 18 December 2021Preceded byZinaida GreceanîiSucceeded byVlad Batrîncea (as Executive Secretary)In office18 December 2011 – 18 December 2016Preceded byVeronica AbramciucSucceeded byZinaida GreceanîiMember of the Moldovan ParliamentIn office23 July 2021 – ...
System of professional and social rank This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article possibly contains original research. Please improve it by verifying the claims made and adding inline citations. Statements consisting only of original research should be removed. (February 2012) (Learn how and when to remove this message) The examples and perspective in this article may no...
سلاح الهندسة القتالي الإسرائيلي الدولة إسرائيل الإنشاء 1947-الحاضر جزء من القوات البرية الإسرائيلية الموقع الرسمي الموقع الرسمي تعديل مصدري - تعديل سلاح الهندسة القتالي الإسرائيلي (بالعبرية: חיל ההנדסה הקרבית) هو أحد الأذرع البرية للجيش الإسرائيلى. يُعد سلاح ...
Questa voce o sezione sugli argomenti militari e politici è priva o carente di note e riferimenti bibliografici puntuali. Commento: la quasi totalità della voce è priva di note da oltre 10 anni, rendendo impossibile capire da dove provengano le informazioni e se esse siano affidabili. Sebbene vi siano una bibliografia e/o dei collegamenti esterni, manca la contestualizzazione delle fonti con note a piè di pagina o altri riferimenti precisi che indichino puntualmente la provenienza d...
Situmeang (Surat Batak: ᯘᯪᯖᯮᯔᯩᯀᯰ) adalah salah satu marga Batak Toba yang masuk ke dalam kelompok marga-marga keturunan Naipospos. Rumpun keturunan Naipospos Artikel utama: Naipospos Dalam silsilah Batak, marga Situmeang masuk dalam rumpun keturunan Raja Naipospos. Marga ini diwariskan langsung oleh putera keempat si Raja Naipospos bernama Jamita Mangaraja. Situmeang masuk dalam rumpun marga-marga keturunan Raja Naipospos bersama dengan marga Sibagariang, Hutauruk, Simanungkal...
Paraguayan footballer (born 1995) In this Spanish name, the first or paternal surname is López and the second or maternal family name is Quintana. Mario López Personal informationFull name Mario Ernesto López QuintanaDate of birth (1995-07-06) 6 July 1995 (age 29)Place of birth Mariano Roque Alonso, ParaguayHeight 1.82 m (6 ft 0 in)Position(s) DefenderTeam informationCurrent team AldosiviNumber 2Youth career2007 Cerro Corá2007 Valois Rivarola2008–2011 Cerro P...
Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne s'appuie pas, ou pas assez, sur des sources secondaires ou tertiaires (février 2024). Pour améliorer la vérifiabilité de l'article ainsi que son intérêt encyclopédique, il est nécessaire, quand des sources primaires sont citées, de les associer à des analyses faites par des sources secondaires. Cet article est une ébauche concernant le Concours Eurovision de la chanson. Vous pouvez parta...
Barony in the Peerage of the United Kingdom Barony Montagu of Beaulieu Blazon Arms: Quarterly: 1st and 4th grandquarter: quarterly: 1st and 4th, Argent three Lozenges conjoined in fess Gules within a Bordure Sable (Montagu); 2nd and 3rd, Or an Eagle displayed Vert beaked and membered Gules (Monthermer); 2nd grandquarter: Argent on a Bend Azure an Estoile between two Crescents Or (Scott); 3rd grandquarter: quarterly: 1st and 4th, Argent a Human Heart Gules imperially crowned Or on a Chief Azur...
Questa voce o sezione sull'argomento centri abitati dell'Ucraina non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Kerč'città(RU) Керчь(UK) Керч(CRH) Keriç Kerč' – Veduta LocalizzazioneStato Russia Ucraina[1] Circondario federaleMeridionale Soggetto federale Crimea RajonKerč' AmministrazioneSindacoSvjatoslav Brusakov T...
German theologian and author (1927–2021) Uta Ranke-HeinemannBorn(1927-10-02)2 October 1927Essen, GermanyDied25 March 2021(2021-03-25) (aged 93)Essen, GermanyEducation Burggymnasium Essen University of Bonn University of Basel University of Oxford University of Montpellier OccupationTheologianOrganizations University of Duisburg-Essen Notable workEunuchs for the Kingdom of HeavenSpouseEdmund RankeChildrenJohannes Ranke-HeinemannParentGustav Heinemann (father)RelativesChristina Rau (niec...
Emperor of Japan Emperor Sutoku崇徳天皇Portrait of Emperor Sutoku by Fujiwara Tamenobu, 14th centuryEmperor of JapanReignFebruary 25, 1123 – January 5, 1142EnthronementMarch 18, 1123PredecessorTobaSuccessorKonoeBornJuly 7, 1119DiedSeptember 14, 1164(1164-09-14) (aged 45)BurialShiramine no misasagi (白峯陵) (Kagawa)SpouseFujiwara no KiyokoIssuePrince ShigehitoPosthumous nameTsuigō:Emperor Sutoku (崇徳院 or 崇徳天皇)HouseImperial House of JapanFatherEmperor TobaMotherFujiw...