Multiply–accumulate operation

In computing, especially digital signal processing, the multiply–accumulate (MAC) or multiply-add (MAD) operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator (MAC unit); the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulator a:

When done with floating-point numbers, it might be performed with two roundings (typical in many DSPs), or with a single rounding. When performed with a single rounding, it is called a fused multiply–add (FMA) or fused multiply–accumulate (FMAC).

Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in combinational logic followed by an adder and an accumulator register that stores the result. The output of the register is fed back to one input of the adder, so that on each clock cycle, the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers. Percy Ludgate was the first to conceive a MAC in his Analytical Machine of 1909,[1] and the first to exploit a MAC for division (using multiplication seeded by reciprocal, via the convergent series (1+x)−1). The first modern processors to be equipped with MAC units were digital signal processors, but the technique is now also common in general-purpose processors.[2][3][4][5]

In floating-point arithmetic

When done with integers, the operation is typically exact (computed modulo some power of two). However, floating-point numbers have only a certain amount of mathematical precision. That is, digital floating-point arithmetic is generally not associative or distributive. (See Floating-point arithmetic § Accuracy problems.) Therefore, it makes a difference to the result whether the multiply–add is performed with two roundings, or in one operation with a single rounding (a fused multiply–add). IEEE 754-2008 specifies that it must be performed with one rounding, yielding a more accurate result.[6]

Fused multiply–add

A fused multiply–add (FMA or fmadd)[7] is a floating-point multiply–add operation performed in one step (fused operation), with a single rounding. That is, where an unfused multiply–add would compute the product b × c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire expression a + (b × c) to its full precision before rounding the final result down to N significant bits.

A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products:

Fused multiply–add can usually be relied on to give more accurate results. However, William Kahan has pointed out that it can give problems if used unthinkingly.[8] If x2y2 is evaluated as ((x × x) − y × y) (following Kahan's suggested notation in which redundant parentheses direct the compiler to round the (x × x) term first) using fused multiply–add, then the result may be negative even when x = y due to the first multiplication discarding low significance bits. This could then lead to an error if, for instance, the square root of the result is then evaluated.

When implemented inside a microprocessor, an FMA can be faster than a multiply operation followed by an add. However, standard industrial implementations based on the original IBM RS/6000 design require a 2N-bit adder to compute the sum properly.[9]

Another benefit of including this instruction is that it allows an efficient software implementation of division (see division algorithm) and square root (see methods of computing square roots) operations, thus eliminating the need for dedicated hardware for those operations.[10]

Dot product instruction

Some machines combine multiple fused multiply add operations into a single step, e.g. performing a four-element dot-product on two 128-bit SIMD registers a0×b0 + a1×b1 + a2×b2 + a3×b3 with single cycle throughput.

Support

The FMA operation is included in IEEE 754-2008.

The Digital Equipment Corporation (DEC) VAX's POLY instruction is used for evaluating polynomials with Horner's rule using a succession of multiply and add steps. Instruction descriptions do not specify whether the multiply and add are performed using a single FMA step.[11] This instruction has been a part of the VAX instruction set since its original 11/780 implementation in 1977.

The 1999 standard of the C programming language supports the FMA operation through the fma() standard math library function and the automatic transformation of a multiplication followed by an addition (contraction of floating-point expressions), which can be explicitly enabled or disabled with standard pragmas (#pragma STDC FP_CONTRACT). The GCC and Clang C compilers do such transformations by default for processor architectures that support FMA instructions. With GCC, which does not support the aforementioned pragma,[12] this can be globally controlled by the -ffp-contract command line option.[13]

The fused multiply–add operation was introduced as "multiply–add fused" in the IBM POWER1 (1990) processor,[14] but has been added to numerous other processors since then:

See also

References

  1. ^ "The Feasibility of Ludgate's Analytical Machine". Archived from the original on 2019-08-07. Retrieved 2020-08-30.
  2. ^ Lyakhov, Pavel; Valueva, Maria; Valuev, Georgii; Nagornov, Nikolai (January 2020). "A Method of Increasing Digital Filter Performance Based on Truncated Multiply-Accumulate Units". Applied Sciences. 10 (24): 9052. doi:10.3390/app10249052.
  3. ^ Tung Thanh Hoang; Sjalander, M.; Larsson-Edefors, P. (May 2009). "Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements". 2009 IEEE International Symposium on Parallel & Distributed Processing. pp. 1–7. doi:10.1109/IPDPS.2009.5161212. ISBN 978-1-4244-3751-1. S2CID 14535090.
  4. ^ Kang, Jongsung; Kim, Taewhan (2020-03-01). "PV-MAC: Multiply-and-accumulate unit structure exploiting precision variability in on-device convolutional neural networks". Integration. 71: 76–85. doi:10.1016/j.vlsi.2019.11.003. ISSN 0167-9260. S2CID 211264132.
  5. ^ "mad - ps". 20 November 2019. Retrieved 2021-08-14.
  6. ^ Whitehead, Nathan; Fit-Florea, Alex (2011). "Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs" (PDF). nvidia. Retrieved 2013-08-31.
  7. ^ "fmadd instrs". IBM.
  8. ^ Kahan, William (1996-05-31). "IEEE Standard 754 for Binary Floating-Point Arithmetic".
  9. ^ Quinnell, Eric (May 2007). Floating-Point Fused Multiply–Add Architectures (PDF) (PhD thesis). Retrieved 2011-03-28.
  10. ^ Markstein, Peter (November 2004). Software Division and Square Root Using Goldschmidt's Algorithms (PDF). 6th Conference on Real Numbers and Computers. CiteSeerX 10.1.1.85.9648.
  11. ^ "VAX instruction of the week: POLY". Archived from the original on 2020-02-13.
  12. ^ "Bug 20785 - Pragma STDC * (C99 FP) unimplemented". gcc.gnu.org. Retrieved 2022-02-02.
  13. ^ "Optimize Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-02-02.
  14. ^ Montoye, R. K.; Hokenek, E.; Runyon, S. L. (January 1990). "Design of the IBM RISC System/6000 floating-point execution unit". IBM Journal of Research and Development. 34 (1): 59–70. doi:10.1147/rd.341.0059.Closed access icon
  15. ^ "Godson-3 Emulates x86: New MIPS-Compatible Chinese Processor Has Extensions for x86 Translation".
  16. ^ Hollingsworth, Brent (October 2012). "New "Bulldozer" and "Piledriver" Instructions". AMD Developer Central.
  17. ^ "Intel adds 22nm octo-core 'Haswell' to CPU design roadmap". The Register. Archived from the original on 2012-02-17. Retrieved 2008-08-19.
  18. ^ "STM32 Cortex-M33 MCUs programming manual" (PDF). ST. Retrieved 2024-05-06.

Read other articles:

Studio Alam TVRILokasi di DepokJenisStudio alam, taman rekreasiLokasiSukmajaya, Depok, Jawa Barat, IndonesiaKoordinat6°25′2″S 106°50′35″E / 6.41722°S 106.84306°E / -6.41722; 106.84306Area20 ha (49,42 ekar)[1]Dibuat1980–1985[1]Dioperasikan olehTelevisi Republik IndonesiaDibuka08:30–17:00 Studio Alam TVRI adalah sebuah bekas studio dan tempat wisata yang terletak di Sukmajaya, Depok, Jawa Barat, Indonesia yang dimiliki oleh LPP T...

Libia Italiana (dettagli) (dettagli) Motto: FERT Libia Italiana - Localizzazione Dati amministrativiNome completoColonia della Libia Lingue ufficialiItaliano Lingue parlateArabo, Italiano e berbero InnoMarcia Reale d'Ordinanza CapitaleTripoli (111 124 ab./1938) Dipendente da Italia DipendenzeCommissariato di TripoliCommissariato di BengasiCommissariato di DernaCommissariato di MisurataTerritorio Militare del Sud PoliticaForma di governoColonia Re d'ItaliaVittorio Emanuele III Presid...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (ديسمبر 2020) كابل ر. بيري معلومات شخصية الميلاد 4 يوليو 1848  مقاطعة أمهيرست  تاريخ الوفاة 27 أغسطس 1910 (62 سنة)   مواطنة الولايات المتحدة  مناصب الحياة العملية المهنة

Войтенко (Гориславець) Ніна Юхимівна Народилася 18 червня 1928(1928-06-18)Кривуші, Потіцький район, Кременчуцька округа, Українська СРРПомерла 30 квітня 2008(2008-04-30) (79 років)Одеса, УкраїнаГромадянство  СРСР,  УкраїнаНаціональність українкаДіяльність учитель української мови ...

Untuk orang lain yang bernama sama, silakan lihat William Ramsay William Ramsay Sir William Ramsay (1852–1916) terkenal untuk karyanya yang menciptakan golongan baru pada sistem periodik unsur-unsur—selama beberapa waktu disebut inert atau gas mulia. Pada dasawarsa terakhir pada abad ke-19 ia dan fisikawan terkenal John Strutt, 3rd Baron Rayleigh ( 1842–1919)—telah dikenal untuk karyanya atas suara, cahaya, dan radiasi elektromagnetik—mengadakan penelitian terpisah, sehingga menerim...

SenandungPoster resmiGenre Drama Roman Musikal PembuatMD EntertainmentDitulis olehDono IndartoSkenarioDono IndartoSutradaraAnto AgamPemeran Siti Badriah Hengky Kurniawan Dicky Wahyudi Helsi Herlinda Ucie Sucita Vizza Dara Firda Razak Nima Ata Lavanya Bhardwaj Tengku Resi Revado Kris Anjar Ray Rendy Mozza Kirana Cathy Fakandi Irene Librawati Dwi Yan Mastur Felix William Smith Nadira Sungkar Aida Demimor Penggubah lagu temaSiti BadriahLagu pembukaSenandung Cinta — Siti BadriahLagu penutupSena...

For other uses, see Cacao and Cocoa. Species of tree grown for its cocoa beans Theobroma cacao Cacao fruits on the tree Scientific classification Kingdom: Plantae Clade: Tracheophytes Clade: Angiosperms Clade: Eudicots Clade: Rosids Order: Malvales Family: Malvaceae Genus: Theobroma Species: T. cacao Binomial name Theobroma cacaoL.[1] Synonyms[1][2] Cacao minar Gaertn. Cacao minus Gaertn. Cacao sativa Aubl. Cacao theobroma Tussac Theobroma cacao f. leiocarpum (Ber...

Politics of Tonga Constitution Monarchy King Tupou VI Crown Prince Tupoutoʻa ʻUlukalala Succession Privy Council Executive Prime Minister Siaosi Sovaleni Deputy Prime Minister Vacant Cabinet Legislative Legislative Assembly Speaker: Lord Fakafanua Members Judiciary Court of Appeal Supreme Court Elections Recent elections (constituencies) General: 2010201420172021 Political parties Democratic Party of theFriendly Islands (DPFI) People's DemocraticParty (PDP) Administrative divisions Foreign ...

This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Show's Just Begun – news · newspapers · books · scholar · JSTOR (October 2018) (Learn how and when to remove this template message) 2008 studio album by MC MongShow's Just BegunStudio album by MC MongReleasedApril 17, 2008RecordedOctober 2007–April 2008GenreK-pop, hip ...

Chemical compound ArtemotilClinical dataAHFS/Drugs.comInternational Drug NamesPregnancycategory not available Routes ofadministrationIntramuscular injectionATC codeP01BE04 (WHO) Pharmacokinetic dataMetabolismHepaticElimination half-life20 hoursIdentifiers IUPAC name (5aS,6R,8aS,9R,10S,12R,12aR)-10-ethoxy-3,6,9-trimethyldecahydro-3,12-epoxy[1,2]dioxepino[4,3-i]isochromene CAS Number75887-54-6 NPubChem CID3000469ChemSpider2272064 NUNIIXGL7GFB9YIChEMBLChEMBL301267 NChemi...

Book by Greg Palast The Best Democracy Money Can Buy AuthorGreg PalastPublisherPluto Press, LondonPublication date2002Pages211ISBN0-452-28391-4OCLC48110962 The Best Democracy Money Can Buy: An Investigative Reporter Exposes the Truth about Globalization, Corporate Cons, and High Finance Fraudsters is a 2002 book by investigative journalist Greg Palast. It is about corporate corruption, global capitalism, environmental destruction, third world exploitation, freedom of speech and political corr...

This article lacks inline citations besides NRIS, a database which provides minimal and sometimes ambiguous information. Please help ensure the accuracy of the information in this article by providing inline citations to additional reliable sources. (December 2013) (Learn how and when to remove this template message) United States historic placeCommunity Center and War Memorial BuildingU.S. National Register of Historic Places Location1611 Everett Ave., Everett, WashingtonCoordinates47°58′...

1926 film by Sidney Franklin Beverly of GraustarkLobby cardDirected bySidney FranklinWritten byAgnes Christine Johnston (scenario)Joseph W. Farnham (titles)Based onBeverly of Graustark1904 novelby George Barr McCutcheonStarringMarion DaviesAntonio MorenoCreighton HaleCinematographyPercy Hilburn (*French)Edited byFrank E. HullDistributed byMetro-Goldwyn-MayerRelease date April 19, 1926 (1926-04-19) (U.S.) Running time70 minutesCountryUnited StatesLanguageSilent (English inte...

1964 Mercedes-Benz LP 1620, the original heavy-duty version Mercedes-Benz LP cubic cabin trucks were a series of cab-over-engine trucks first shown in 1963. They are most commonly referred to as the Kubische Kabine in German, referring to the squared-off appearance of the cabin. LP was also used on the cab-over versions of the preceding range of Mercedes-Benz trucks. The heavy- and medium-duty models had been discontinued by 1975 as the Mercedes-Benz NG range took over, while the lighter...

Villahermosa Mexico TempleNumber85Dedication21 May 2000, by Thomas S. MonsonSite1.36 acres (0.55 ha)Floor area10,700 sq ft (990 m2)Height71 ft (22 m)Official website • News & imagesChurch chronology ←Nashville Tennessee Temple Villahermosa Mexico Temple →Montreal Quebec Temple Additional informationAnnounced30 October 1998, by Gordon B. HinckleyGroundbreaking9 January 1999, by Richard E. Turley Sr.Open house9-13 May 2000Current presidentCarlos Monro...

Kabinet Dwikora IKabinet Pemerintahan IndonesiaDibentuk2 September 1964Diselesaikan21 Februari 1966Struktur pemerintahanKepala negaraSoekarnoKepala pemerintahanSoekarnoSejarahPendahuluKabinet Kerja IVPenggantiKabinet Dwikora II Artikel ini adalah bagian dari seriPolitik dan ketatanegaraanIndonesia Pemerintahan pusat Hukum Pancasila(ideologi nasional) Undang-Undang Dasar Negara Republik Indonesia Tahun 1945 Hukum Perpajakan Ketetapan MPR Undang-undang Perppu Peraturan pemerintah Peraturan pres...

Nigerian actress, screenwriter and film producer Ruth KadiriKadiri in the 2016 Nigerian film First ClassBorn (1988-03-24) 24 March 1988 (age 35)Benin City, Edo State, NigeriaNationalityNigerianAlma materUniversity of LagosYaba College of TechnologyOccupationsActressScreenwriterFilm producerYears active2009 — presentChildren2 Ruth Kadiri (born 24 March 1988) is a Nigerian actress, screenwriter and film producer.[1] Early and personal life Ruth Kadiri was born on 24 Mar...

Voce principale: XVIII Giochi del Mediterraneo. Calcio ai XVIII Giochi del Mediterraneo Competizione Giochi del Mediterraneo Sport Calcio Edizione 18ª Organizzatore CIJM Luogo  SpagnaTarragona Partecipanti 9 Risultati Vincitore Spagna(3º titolo) Finalista Italia Terzo Marocco Quarto Grecia Statistiche Miglior marcatore Abel Ruiz (7) Incontri disputati 15 Gol segnati 47 (3,13 per incontro) Cronologia della competizione Mersina 2013 Orano 2022 Manuale Il torneo di calcio ai XVI...

主要地方道 鳥取県道51号 倉吉川上青谷線主要地方道 倉吉川上青谷線 地図 起点 倉吉市上井町2丁目【北緯35度27分10.8秒 東経133度50分57.9秒】 終点 鳥取市青谷町青谷【北緯35度31分25.9秒 東経133度59分49.8秒】 接続する主な道路(記法) 鳥取県道29号三朝東郷線 国道9号 ■テンプレート(■ノート ■使い方) ■PJ道路 鳥取県道51号倉吉川上青谷線(とっとりけんどう51ごう く�...

  لمعاني أخرى، طالع حلب. فلفل حلبي الفلفل الحلبي (بالإنجليزية: Aleppo pepper)‏ هو مجموعة متنوعة من الفليفلة الحولية تستخدم كبهارات، وبخاصة في مطبخ الشرق الأوسط والمطبخ المتوسطي.[1] تبدأ القرون التي تنضج إلى لون خمري شبه جافة، تزال البذور، ثم تسحق بخشونة على الأرض. ويسمى ا...