Streaming SIMD Extensions

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in its Pentium III series of central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions (65 unique mnemonics[1] using 70 encodings), most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

Intel's first IA-32 SIMD effort was the MMX instruction set. MMX had two main problems: it re-used existing x87 floating-point registers making the CPUs unable to work on both floating-point and SIMD data at the same time, and it only worked on integers. SSE floating-point instructions operate on a new independent register set, the XMM registers, and adds a few integer instructions that work on MMX registers.

SSE was subsequently expanded by Intel to SSE2, SSE3, SSSE3 and SSE4. Because it supports floating-point math, it had wider applications than MMX and became more popular. The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations[when?] by using MMX in parallel with SSE operations.

SSE was originally called Katmai New Instructions (KNI), Katmai being the code name for the first Pentium III core revision. During the Katmai project Intel sought to distinguish it from its earlier product line, particularly its flagship Pentium II. It was later renamed Internet Streaming SIMD Extensions (ISSE[2]), then SSE.

AMD added a subset of SSE, 19 of them, called new MMX instructions,[3] and known as several variants and combinations of SSE and MMX, shortly after with the release of the original Athlon in August 1999, see 3DNow! extensions. AMD eventually added full support for SSE instructions, starting with its Athlon XP and Duron (Morgan core) processors.

Registers

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The AMD64 extensions from AMD (originally called x86-64) added a further eight registers XMM8 through XMM15, and this extension is duplicated in the Intel 64 architecture. There is also a new 32-bit control/status register, MXCSR. The registers XMM8 through XMM15 are accessible only in 64-bit operating mode.

SSE used only a single data type for XMM registers:

SSE2 would later expand the usage of the XMM registers to include:

  • two 64-bit double-precision floating-point numbers or
  • two 64-bit integers or
  • four 32-bit integers or
  • eight 16-bit short integers or
  • sixteen 8-bit bytes or characters.

Because these 128-bit registers are additional machine states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions that can save all x86 and SSE register states at once. This support was quickly added to all major IA-32 operating systems.

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the floating-point unit (FPU).[2] While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating-point operations to be mixed without the performance hit from explicit MMX/floating-point mode switching.

SSE instructions

SSE introduced both scalar and packed floating-point instructions.

Floating-point instructions

  • Memory-to-register/register-to-memory/register-to-register data movement
    • Scalar – MOVSS
    • Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS, MOVMSKPS
  • Arithmetic
    • Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
    • Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
  • Compare
    • Scalar – CMPSS, COMISS, UCOMISS
    • Packed – CMPPS
  • Data shuffle and unpacking
    • Packed – SHUFPS, UNPCKHPS, UNPCKLPS
  • Data-type conversion
    • Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
    • Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI
  • Bitwise logical operations
    • Packed – ANDPS, ORPS, XORPS, ANDNPS

Integer instructions

  • Arithmetic
    • PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW
  • Data movement
    • PEXTRW, PINSRW
  • Other
    • PMOVMSKB, PSHUFW

Other instructions

  • MXCSR management
    • LDMXCSR, STMXCSR
  • Cache and Memory management
    • MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE

Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions.

 vec_res.x = v1.x + v2.x;
 vec_res.y = v1.y + v2.y;
 vec_res.z = v1.z + v2.z;
 vec_res.w = v1.w + v2.w;

This corresponds to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace the four scalar addition instructions.

 movaps xmm0, [v1] ;xmm0 = v1.w | v1.z | v1.y | v1.x 
 addps xmm0, [v2]  ;xmm0 = v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x
 movaps [vec_res], xmm0  ;xmm0

Later versions

  • SSE2, Willamette New Instructions (WNI), introduced with the Pentium 4, is a major enhancement to SSE. SSE2 adds two major features: double-precision (64-bit) floating-point for all SSE operations, and MMX integer operations on 128-bit XMM registers. In the original SSE instruction set, conversion to and from integers placed the integer data in the 64-bit MMX registers. SSE2 enables the programmer to perform SIMD math on any data type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers. It offers an orthogonal set of instructions for dealing with common data types.
  • SSE3, also called Prescott New Instructions (PNI), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions. It also allowed addition or multiplication of two numbers that are stored in the same register, which wasn't possible in SSE2 and earlier. This capability, known as horizontal in Intel terminology, was the major addition to the SSE3 instruction set. AMD's 3DNow! extension could do the latter too.
  • SSSE3, Merom New Instructions (MNI), is an upgrade to SSE3, adding 16 new instructions which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture.
  • SSE4, Penryn New Instructions (PNI), is another major enhancement, adding a dot product instruction, additional integer instructions, a popcnt instruction (Population count: count number of bits set to 1, used extensively e.g. in cryptography), and more.
  • XOP, FMA4 and CVT16 are new iterations announced by AMD in August 2007[4][5] and revised in May 2009.[6]
  • Advanced Vector Extensions (AVX), Gesher New Instructions (GNI), is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Intel released processors in early 2011 with AVX support.[7]
  • AVX2 is an expansion of the AVX instruction set.
  • AVX-512 (3.1 and 3.2) are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture.

Identifying

The following programs can be used to determine which, if any, versions of SSE are supported on a system

  • Intel Processor Identification Utility[8]
  • CPU-Z – CPU, motherboard, and memory identification utility.
  • lscpu - provided by the util-linux package in most Linux distributions.

References

  1. ^ "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture". Intel. April 2022. pp. 5-16–5-19. Archived from the original on April 25, 2022. Retrieved May 16, 2022.
  2. ^ a b Diefendorff, Keith (March 8, 1999). "Pentium III = Pentium II + SSE: Internet SSE Architecture Boosts Multimedia Performance" (PDF). Microprocessor Report. 13 (3). Archived (PDF) from the original on April 17, 2018. Retrieved September 1, 2017.
  3. ^ "AMD Extensions to the 3DNow and MMX Instruction Sets Manual" (PDF). Advanced Micro Devices, Inc. March 2000. Archived from the original (PDF) on May 17, 2008. Retrieved April 18, 2024.
  4. ^ Vance, Ashlee (August 3, 2007). "AMD plots single thread boost with x86 extensions". The Register. Archived from the original on April 27, 2011. Retrieved August 24, 2017.
  5. ^ "AMD64 Technology: 128-Bit SSE5 Instruction Set" (PDF). AMD. August 2007. Archived (PDF) from the original on August 25, 2017. Retrieved August 24, 2017.
  6. ^ "AMD64 Technology AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions" (PDF). AMD. November 2009. Archived (PDF) from the original on January 31, 2017. Retrieved August 24, 2017.
  7. ^ Girkar, Milind (October 1, 2013). "Intel® Advanced Vector Extensions (Intel® AVX)". Intel. Archived from the original on August 25, 2017. Retrieved August 24, 2017.
  8. ^ "Download the Intel® Processor Identification Utility". Intel. July 24, 2017. Archived from the original on August 25, 2017. Retrieved August 24, 2017.

Read other articles:

Tempat penyimpanan gas alam. Sumber daya alam ini menjadi asal mula munculnya istilah dutch disease. Penyakit Belanda (bahasa Inggris: Dutch disease) adalah fenomena di bidang perekonomian yang merujuk pada dampak yang biasanya ditimbulkan oleh berlimpahnya sumber daya alam di suatu negara.[1] Istilah ini dikemukakan pertama kali pada tahun 1977, yang merujuk pada menurunnya pertumbuhan di sektor perindustrian secara drastis akibat ditemukannya sumber gas alam yang berlimpah di Be...

 

 

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Sistem otot – berita · surat kabar · buku · cendekiawan · JSTOR Sistem otot adalah sistem organ pada hewan dan manusia yang mengizinkan makhluk tersebut bergerak. Sistem otot pada vertebrata dikontrol ol...

 

 

Politisi sosial demokrat Austria dan pendiri Schutzbund Austria yang anti-fasis. Julius Deutsch (2 Februari 1884, Lackenbach, Austria-Hungaria – 17 Januari 1968, Wina, Austria) adalah seorang politikus dari Partai Buruh Demokrat Sosial Austria dan merupakan anggota Parlemen antara tahun 1920-1933. Ia merupakan salah satu pendiri dan pemimpin milisi Sosial Demokrat Republikanischer Schutzbund (Liga Pertahanan Republik). Pemimpin Schutzbund Julius Deutsch mendirikan Schutzbund pada tahun 1923...

Prefecture of Japan Prefecture in Tōhoku, JapanFukushima Prefecture 福島県PrefectureJapanese transcription(s) • Japanese福島県 • RōmajiFukushima-ken FlagSymbolAnthem: Fukushima-ken kenmin no utaCountry JapanRegionTōhokuIslandHonshuCapitalFukushimaLargest cityIwakiSubdivisionsDistricts: 13, Municipalities: 59Government • GovernorMasao UchiboriArea • Total13,783.90 km2 (5,321.99 sq mi) • Rank3rdPopulat...

 

 

Questa voce sugli argomenti giocatori di football americano statunitensi e cestisti statunitensi è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Questa voce sull'argomento allenatori di pallacanestro statunitensi è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Ed Hickey Eddie Hickey nel 1961 Nazionalità  St...

 

 

† Египтопитек Реконструкция внешнего вида египтопитека Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:Четвероно...

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

 

 

British energy supply company Co-operative Energy LimitedCompany typePrivate subsidiary company of an Industrial and Provident SocietyIndustryPublic utilityFounded2010ProductsGas and electricity supplyParentThe Midcounties Co-operativeWebsiteenergy.yourcoop.coop Co-op Energy is a membership-owned British energy supply company based in Warwick that began trading in 2010. It sells renewable electricity (some from community-owned sources) and gas to its ethically concerned member owner/customers...

 

 

此條目需要补充更多来源。 (2021年7月4日)请协助補充多方面可靠来源以改善这篇条目,无法查证的内容可能會因為异议提出而被移除。致使用者:请搜索一下条目的标题(来源搜索:美国众议院 — 网页、新闻、书籍、学术、图像),以检查网络上是否存在该主题的更多可靠来源(判定指引)。 美國眾議院 United States House of Representatives第118届美国国会众议院徽章 众议院旗...

Une caisse en bois contenant des carabines Winchester M92 destinées à la Révolution mexicaine. La Winchester 1892 est une carabine de chasse conçue par John Browning et fabriquée par la Winchester Repeating Arms Company de 1892 à 1932, devant remplacer la Winchester 1873 qu'elle côtoya jusqu'en 1925. Historique Winchester 1892 en version Mousqueton. Depuis la commercialisation du modèle 1886 ; la Winchester Repeating Arms C° proposait une arme solide, légère et économique. La...

 

 

City in Illinois, United StatesBerwyn, IllinoisCityBerwyn City Hall FlagSealNickname: The City of HomesLocation of Berwyn in Cook County, Illinois.BerwynLocation of Berwyn in Greater Chicago AreaShow map of Greater ChicagoBerwynLocation of Berwyn in IllinoisShow map of IllinoisBerwynLocation of Berwyn in the USAShow map of the United StatesCoordinates: 41°50′33″N 87°47′24″W / 41.84250°N 87.79000°W / 41.84250; -87.79000Country United StatesState&#...

 

 

此条目序言章节没有充分总结全文内容要点。 (2019年3月21日)请考虑扩充序言,清晰概述条目所有重點。请在条目的讨论页讨论此问题。 哈萨克斯坦總統哈薩克總統旗現任Қасым-Жомарт Кемелұлы Тоқаев卡瑟姆若马尔特·托卡耶夫自2019年3月20日在任任期7年首任努尔苏丹·纳扎尔巴耶夫设立1990年4月24日(哈薩克蘇維埃社會主義共和國總統) 哈萨克斯坦 哈萨克斯坦政府...

History of status with IRS Part of a series onScientology General Scientology Dianetics Timeline History L. Ron Hubbard Publications Glossary Beliefs and practices Thetan Auditing Bridge to Total Freedom OT Xenu Ethics and justice Church of Scientology Officials and staff Sea Org David Miscavige Controversies Litigation Status by country Suppressive person Disconnection Fair game RPF The Hole Office of Special Affairs Guardian's Office War on psychiatry More The tax status of the Church of Sc...

 

 

Cộng hoà Djibouti Tên bằng ngôn ngữ chính thức République de Djibouti (tiếng Pháp)جمهورية جيبوتي (tiếng Ả Rập)Jumhuriyaa Jibuti (tiếng Ả Rập) Quốc kỳ Huy hiệu Bản đồ Vị trí của Djibouti Tiêu ngữاتحاد، مساواة، سلام(tiếng Ả Rập)Unité, Égalité, Paix(tiếng Pháp)Đoàn kết, Bình đẳng, Hòa bìnhQuốc caDjibouti Hành chínhChính phủCộng hòa tổng thống đơn đảngTổng thốngThủ tướng...

 

 

この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。 出典を追加して記事の信頼性向上にご協力ください。(このテンプレートの使い方)出典検索?: 浜野清吾 – ニュース · 書籍 · スカラー · CiNii · J-STAGE · NDL · dlib.jp · ジャパンサーチ · TWL (2013年2月) 日本の政治家浜野 清吾(はまの せいご)生年月...

North American Soccer League 1976 Généralités Sport football (soccer) Organisateur(s) North American Soccer League Édition 9e Lieu(x) Canada États-Unis Date Saison régulière : 14 avril au 15 août 1976 Séries éliminatoires : 17 au 28 août 1976 Nations Canada États-Unis Participants 20 franchises Matchs joués 240 matchs + séries éliminatoires Affluence 7 642 spectateurs de moyenne par match (saison régulière) Palmarès Vainqueur Metros-Croatia de Toronto Finalis...

 

 

Pour les articles homonymes, voir Mayer. Bernadette MayerBernadette Mayer en 2018.BiographieNaissance 12 mai 1945BrooklynDécès 22 novembre 2022 (à 77 ans)New YorkNationalité américaineFormation The New SchoolActivités Poétesse, écrivaineFratrie Rosemary Mayer (en)Autres informationsSite web (en-US) www.bernadettemayer.comDistinctions Shelley Memorial Award (2014)Bourse Guggenheimmodifier - modifier le code - modifier Wikidata Bernadette Mayer, née le 12 mai 1945 à New Yor...

 

 

Voce principale: Campionati europei di atletica leggera 2014. Europei diatletica leggera diZurigo 2014 Corse piane 100 m piani   uomini   donne 200 m piani uomini donne 400 m piani uomini donne 800 m piani uomini donne 1500 m piani uomini donne 5000 m piani uomini donne 10000 m piani uomini donne Corse ad ostacoli 110 / 100 m hs uomini donne 400 m hs uomini donne 3000 m siepi uomini donne Prove su strada Maratona uomini donne Marcia 20 km uomini donne Marcia 50 km uomini Salti Salt...

Questa voce sull'argomento competizioni cestistiche è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Prima Divisione 1942-1943Dettagli della competizioneSport Pallacanestro Federazione FIP Periodo26 ottobre 1942 —1943 Squadre29 Cronologia della competizioneed. successiva →     ← ed. precedente Elemento Wikidata assente · Manuale La Prima Divisione 1942-1943 è stata la tredicesima edizione del campionato con ques...

 

 

Native American Language Sericmiique iitomPronunciation[kw̃ĩːkɛˈiːtom]RegionSonora, MexicoEthnicitySeriNative speakers720 (2020 census)[1]Language familyHokan? SeriLanguage codesISO 639-3seiGlottologseri1257ELPSeriSeri is classified as Vulnerable by the UNESCO Atlas of the World's Languages in DangerThis article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters. For an introduct...