Transport triggered architecture

In computer architecture, a transport triggered architecture (TTA) is a kind of processor design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. This is similar to what happens in a systolic array. Due to its modular structure, TTA is an ideal processor template for application-specific instruction set processors (ASIP) with customized datapath but without the inflexibility and design cost of fixed function hardware accelerators.

Typically a transport triggered processor has multiple transport buses and multiple functional units connected to the buses, which provides opportunities for instruction level parallelism. The parallelism is statically defined by the programmer. In this respect (and obviously due to the large instruction word width), the TTA architecture resembles the very long instruction word (VLIW) architecture. A TTA instruction word is composed of multiple slots, one slot per bus, and each slot determines the data transport that takes place on the corresponding bus. The fine-grained control allows some optimizations that are not possible in a conventional processor. For example, software can transfer data directly between functional units without using registers.

Transport triggering exposes some microarchitectural details that are normally hidden from programmers. This greatly simplifies the control logic of a processor, because many decisions normally done at run time are fixed at compile time. However, it also means that a binary compiled for one TTA processor will not run on another one without recompilation if there is even a small difference in the architecture between the two. The binary incompatibility problem, in addition to the complexity of implementing a full context switch, makes TTAs more suitable for embedded systems than for general purpose computing.

Of all the one-instruction set computer architectures, the TTA architecture is one of the few that has had processors based on it built, and the only one that has processors based on it sold commercially.

Benefits in comparison to VLIW architectures

TTAs can be seen as "exposed datapath" VLIW architectures. While VLIW is programmed using operations, TTA splits the operation execution to multiple move operations. The low level programming model enables several benefits in comparison to the standard VLIW. For example, a TTA architecture can provide more parallelism with simpler register files than with VLIW. As the programmer is in control of the timing of the operand and result data transports, the complexity (the number of input and output ports) of the register file (RF) need not be scaled according to the worst case issue/completion scenario of the multiple parallel instructions.

An important unique software optimization enabled by the transport programming is called software bypassing. In case of software bypassing, the programmer bypasses the register file write back by moving data directly to the next functional unit's operand ports. When this optimization is applied aggressively, the original move that transports the result to the register file can be eliminated completely, thus reducing both the register file port pressure and freeing a general purpose register for other temporary variables. The reduced register pressure, in addition to simplifying the required complexity of the RF hardware, can lead to significant CPU energy savings, an important benefit especially in mobile embedded systems.[1] [2]

Structure

TTA processors are built of independent function units and register files, which are connected with transport buses and sockets.

Parts of transport triggered architecture

Function unit

Each function unit implements one or more operations, which implement functionality ranging from a simple addition of integers to a complex and arbitrary user-defined application-specific computation. Operands for operations are transferred through function unit ports.

Each function unit may have an independent pipeline. In case a function unit is fully pipelined, a new operation that takes multiple clock cycles to finish can be started in every clock cycle. On the other hand, a pipeline can be such that it does not always accept new operation start requests while an old one is still executing.

Data memory access and communication to outside of the processor is handled by using special function units. Function units that implement memory accessing operations and connect to a memory module are often called load/store units.

Control unit

Control unit is a special case of function units which controls execution of programs. Control unit has access to the instruction memory in order to fetch the instructions to be executed. In order to allow the executed programs to transfer the execution (jump) to an arbitrary position in the executed program, control unit provides control flow operations. A control unit usually has an instruction pipeline, which consists of stages for fetching, decoding and executing program instructions.

Register files

Register files contain general purpose registers, which are used to store variables in programs. Like function units, also register files have input and output ports. The number of read and write ports, that is, the capability of being able to read and write multiple registers in a same clock cycle, can vary in each register file.

Transport buses and sockets

Interconnect architecture consists of transport buses which are connected to function unit ports by means of sockets. Due to expense of connectivity, it is usual to reduce the number of connections between units (function units and register files). A TTA is said to be fully connected in case there is a path from each unit output port to every unit's input ports.

Sockets provide means for programming TTA processors by allowing to select which bus-to-port connections of the socket are enabled at any time instant. Thus, data transports taking place in a clock cycle can be programmed by defining the source and destination socket/port connection to be enabled for each bus.

Conditional execution

Some TTA implementations support conditional execution.

Conditional execution is implemented with the aid of guards. Each data transport can be conditionalized by a guard, which is connected to a register (often a 1-bit conditional register) and to a bus. In case the value of the guarded register evaluates to false (zero), the data transport programmed for the bus the guard is connected to is squashed, that is, not written to its destination. Unconditional data transports are not connected to any guard and are always executed.

Branches

All processors, including TTA processors, include control flow instructions that alter the program counter, which are used to implement subroutines, if-then-else, for-loop, etc. The assembly language for TTA processors typically includes control flow instructions such as unconditional branches (JUMP), conditional relative branches (BNZ), subroutine call (CALL), conditional return (RETNZ), etc. that look the same as the corresponding assembly language instructions for other processors.

Like all other operations on a TTA machine, these instructions are implemented as "move" instructions to a special function unit.

TTA implementations that support conditional execution, such as the sTTAck and the first MOVE prototype, can implement most of these control flow instructions as a conditional move to the program counter.[3][4]

TTA implementations that only support unconditional data transports, such as the Maxim Integrated MAXQ,[5] typically have a special function unit tightly connected to the program counter that responds to a variety of destination addresses. Each such address, when used as the destination of a "move", has a different effect on the program counter—each "relative branch <condition>" instruction has a different destination address for each condition; and other destination addresses are used CALL, RETNZ, etc.

Programming

In more traditional processor architectures, a processor is usually programmed by defining the executed operations and their operands. For example, an addition instruction in a RISC architecture could look like the following.

add r3, r1, r2

This example operation adds the values of general-purpose registers r1 and r2 and stores the result in register r3. Coarsely, the execution of the instruction in the processor probably results in translating the instruction to control signals which control the interconnection network connections and function units. The interconnection network is used to transfer the current values of registers r1 and r2 to the function unit that is capable of executing the add operation, often called ALU as in Arithmetic-Logic Unit. Finally, a control signal selects and triggers the addition operation in ALU, of which result is transferred back to the register r3.

TTA programs do not define the operations, but only the data transports needed to write and read the operand values. Operation itself is triggered by writing data to a triggering operand of an operation. Thus, an operation is executed as a side effect of the triggering data transport. Therefore, executing an addition operation in TTA requires three data transport definitions, also called moves. A move defines endpoints for a data transport taking place in a transport bus. For instance, a move can state that a data transport from function unit F, port 1, to register file R, register index 2, should take place in bus B1. In case there are multiple buses in the target processor, each bus can be utilized in parallel in the same clock cycle. Thus, it is possible to exploit data transport level parallelism by scheduling several data transports in the same instruction.

An addition operation can be executed in a TTA processor as follows:

r1 -> ALU.operand1
r2 -> ALU.add.trigger
ALU.result -> r3

The second move, a write to the second operand of the function unit called ALU, triggers the addition operation. This makes the result of addition available in the output port 'result' after the execution latency of the 'add'.

The ports associated with the ALU may act as an accumulator, allowing creation of macro instructions that abstract away the underlying TTA:

lda r1    ; "load ALU": move value to ALU operand 1
add r2    ; add: move value to add trigger
sta r3    ; "store ALU": move value from ALU result

Programmer visible operation latency

The leading philosophy of TTAs is to move complexity from hardware to software. Due to this, several additional hazards are introduced to the programmer. One of them is delay slots, the programmer visible operation latency of the function units. Timing is completely the responsibility of the programmer. The programmer has to schedule the instructions such that the result is neither read too early nor too late. There is no hardware detection to lock up the processor in case a result is read too early. Consider, for example, an architecture that has an operation add with latency of 1, and operation mul with latency of 3. When triggering the add operation, it is possible to read the result in the next instruction (next clock cycle), but in case of mul, one has to wait for two instructions before the result can be read. The result is ready for the 3rd instruction after the triggering instruction.

Reading a result too early results in reading the result of a previously triggered operation, or in case no operation was triggered previously in the function unit, the read value is undefined. On the other hand, result must be read early enough to make sure the next operation result does not overwrite the yet unread result in the output port.

Due to the abundance of programmer-visible processor context which practically includes, in addition to register file contents, also function unit pipeline register contents and/or function unit input and output ports, context saves required for external interrupt support can become complex and expensive to implement in a TTA processor. Therefore, interrupts are usually not supported by TTA processors, but their task is delegated to an external hardware (e.g., an I/O processor) or their need is avoided by using an alternative synchronization/communication mechanism such as polling.

Implementations

  • MAXQ[6][5][7] from Maxim Integrated, the only commercially available microcontroller built upon transport triggered architecture, is an OISC or "one-instruction set computer". It offers a single though flexible MOVE instruction, which can then function as various virtual instructions by moving values directly to the program counter.
  • The "move project" has designed and fabricated several experimental TTA microprocessors.
  • OpenASIP is an open source application-specific instruction set toolset utilizing TTA as the processor template.
  • The architecture of the Amiga Copper has all the basic features of a transport triggered architecture.
  • The Able processor developed by New England Digital.
  • The WireWorld based computer.
  • Dr. Dobb's published One-Der, a 32-bit TTA in Verilog with a matching cross assembler and Forth compiler.[8][9]

See also

References

  1. ^ V. Guzma, P. Jääskeläinen, P. Kellomäki, and J. Takala, “Impact of Software Bypassing on Instruction Level Parallelism and Register File Traffic”
  2. ^ Johan Janssen. "Compiler Strategies for Transport Triggered Architectures". 2001. p. 168.
  3. ^ Henk Corporaal. "Transport Triggered Architectures examined for general purpose applications". p. 6.
  4. ^ Aliaksei V. Chapyzhenka. "sTTAck: Stack Transport Triggered Architecture".
  5. ^ a b "MAXQ Family User's Guide". Maxim Integrated. Section "1.1 Instruction Set". A register-based, transport-triggered architecture allows all instructions to be coded as simple transfer operations. All instructions reduce to either writing an immediate value to a destination register or memory location or moving data between registers and/or memory locations.
  6. ^ Catsoulis, John (2005), Designing embedded hardware (2 ed.), O'Reilly Media, pp. 327–333, ISBN 978-0-596-00755-3
  7. ^ "Introduction to the MAXQ Architecture". Maxim Integrated. Centralized access to resources (includes transfer map diagram).
  8. ^ Dr. Dobb's article with 32-bit FPGA CPU in Verilog
  9. ^ Web site with more details on the Dr. Dobb's CPU Archived 2013-02-18 at archive.today

Read other articles:

Josef Nae Soi Wakil Gubernur Nusa Tenggara Timur ke-9Masa jabatan5 September 2018 – 5 September 2023PresidenJoko WidodoGubernurViktor Laiskodat PendahuluBenny Alexander LitelnoniPenggantiPetahanaAnggota DPR RI Fraksi GolkarMasa jabatan1 Oktober 2004 – 26 Februari 2018PresidenMegawati SoekarnoputriSusilo Bambang YudhoyonoJoko Widodo Informasi pribadiLahir22 September 1953 (umur 70)Mataloko, NgadaPartai politikGolkarAnak2Alma materUniversitas Atma Jaya JakartaPeke...

 

Masmimar MangiangLahir(1949-09-10)10 September 1949Limbanang, Suliki, Lima Puluh Kota, Sumatera BaratMeninggal29 Juni 2020(2020-06-29) (umur 70)JakartaKebangsaanIndonesiaPekerjaanWartawan, pengajarDikenal atasAhli bahasa mediaOrang tuaM. Sain Dt. Manggung Mangiang (ayah) Sitti Rugaiyah (ibu) Masmimar Mangiang (10 September 1949 – 29 Juni 2020)[1] adalah seorang wartawan dan pakar bahasa jurnalistik Indonesia. Ia juga merupakan pengajar di Departemen Komunikasi FI...

 

Style of Jewish music For the 2015 war-drama film, see Klezmer (film). KlezmerNative nameקלעזמערOther namesJewish instrumental folk music, Freylekh musicStylistic originsOld European dance musicreligious Jewish musicRomanian musicMoldovan musicUkrainian musicPolish musicBaroque musicOttoman musicGreek musicCultural originsAshkenazic Jewish ceremonies, especially weddings, in Eastern EuropeTypical instrumentsStandard orchestra instruments, accordion, cimbalom Menorah(מְנוֹרָה) F...

2017 Indian Hindi web series Bewafaa sii WafaaCover photoGenreRomance Suspense DramaCreated bySOL ProductionsDeveloped byEkta KapoorWritten byStory and ScreenplayVibha SinghDialoguesEisha ChopraDirected bySonam NairRanjan SinghCreative directorNimisha PandeyStarringSamir SoniAditi VasudevDipannita SharmaTheme music composerAmit TrivediOpening themeBewafaa sii Wafaa, Ishq Ne Di SazaaComposersBackground MusicAashish RegoCountry of originIndiaOriginal languagesHindiEnglishNo. of seasons1No. of e...

 

2012 studio album by Alicia KeysGirl on FireStudio album by Alicia KeysReleasedNovember 22, 2012 (2012-11-22)Recorded2011–2012Studio Jungle City Oven (New York City) Geejam (Port Antonio) Chalice Record Plant (Los Angeles) Metropolis (London) GenreR&BLength53:08LabelRCAProducer Alicia Keys Babyface Jeff Bhasker Antonio Dixon Dr. Dre Rodney Jerkins Malay Pop & Oak Salaam Remi Jamie Smith Swizz Beatz Alicia Keys chronology The Element of Freedom(2009) Girl on F...

 

Disambiguazione – Se stai cercando l'omonimo ciclista spagnolo, vedi Carlos Rodríguez Cano. Questa voce sull'argomento schermidori venezuelani è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Carlos Rodríguez Nazionalità  Venezuela Altezza 170 cm Peso 69 kg Scherma Specialità Fioretto Palmarès Competizione Ori Argenti Bronzi Giochi Panamericani 0 0 4 Per maggiori dettagli vedi qui Statistiche aggiornate al 25 giugno 2009 Modifica dati s...

Archaeological site in Croatia Krapina Neanderthal siteHušnjakovo brdo (Croatian)Hušnjakovo Hill finding siteShown within Krapina-Zagorje CountyShow map of Krapina-Zagorje CountyKrapina Neanderthal site (Croatia)Show map of CroatiaKrapina Neanderthal site (Europe)Show map of EuropeLocationWestern part of KrapinaRegionKrapina-Zagorje County, CroatiaCoordinates46°9′53″N 15°51′49″E / 46.16472°N 15.86361°E / 46.16472; 15.86361HistoryPeriodsPalaeolithicCu...

 

Malaysian politician and lawyer This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Alan Ling – news · newspapers · books · scholar · JSTOR (December 2015) (Learn how and when to remove this message...

 

Kejuaraan DuniaFormula Satu FIA 1994 Juara Dunia Pembalap: Michael Schumacher Juara Dunia Konstruktor: Williams-Renault Sebelum: 1993 Sesudah: 1995 Balapan menurut negaraBalapan menurut musimSeri pendukung: Piala Super Porsche Michael Schumacher (foto tahun 2012) keluar sebagai juara dunia pembalap F1 musim 1994. Damon Hill (foto tahun 1995) menjadi runner-up dengan satu poin, dengan membalap untuk tim Williams. Gerhard Berger (foto tahun 1991) dari tim Scuderia Ferrari menyelesaikan musim i...

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

 

土库曼斯坦总统土库曼斯坦国徽土库曼斯坦总统旗現任谢尔达尔·别尔德穆哈梅多夫自2022年3月19日官邸阿什哈巴德总统府(Oguzkhan Presidential Palace)機關所在地阿什哈巴德任命者直接选举任期7年,可连选连任首任萨帕尔穆拉特·尼亚佐夫设立1991年10月27日 土库曼斯坦土库曼斯坦政府与政治 国家政府 土库曼斯坦宪法 国旗 国徽 国歌 立法機關(英语:National Council of Turkmenistan) ...

 

2020年夏季奥林匹克运动会波兰代表團波兰国旗IOC編碼POLNOC波蘭奧林匹克委員會網站olimpijski.pl(英文)(波兰文)2020年夏季奥林匹克运动会(東京)2021年7月23日至8月8日(受2019冠状病毒病疫情影响推迟,但仍保留原定名称)運動員206參賽項目24个大项旗手开幕式:帕维尔·科热尼奥夫斯基(游泳)和马娅·沃什乔夫斯卡(自行车)[1]闭幕式:卡罗利娜·纳亚(皮划艇)&#...

Indian Institute of Millets Researchभारतीय कदन्न अनुसंधान संस्थानTypeUnder aegis of ICAREstablished1958AffiliationICARLocationHyderabad, Telangana, IndiaWebsitemillets.res.in The Indian Institute of Millets Research (ICAR-IIMR) located at Rajendranagar (Hyderabad, Telangana, India) is an agricultural research institute engaged in basic and strategic research on sorghum and other millets. IIMR operates under the aegis of Indian Council of Agr...

 

Baseball team in Waldorf, Maryland, US Southern Maryland Blue Crabs Team logo Cap insignia InformationLeagueAtlantic League of Professional Baseball (South Division)LocationWaldorf, MarylandBallparkRegency Furniture StadiumFounded2006Division championships3 (2009, 2012, 2013)ColorsNavy blue, light blue, red, tan, white         MascotPinch the CrabRetired numbers42OwnershipCrabs on Deck, LLCManagerStan CliburnGeneral ManagerCourtney KnichelWebsitesomdbluecrabs.com...

 

For the thoroughbred racehorse, see Petrarch (horse). For his namesake crater on Mercury, see Petrarch (crater). 14th-century Italian scholar and poet Francis PetrarchPortrait by Altichiero, c. 1370–1380BornFrancesco di Petracco(1304-07-20)20 July 1304Comune of ArezzoDied19 July 1374(1374-07-19) (aged 69)Arquà, PaduaResting placeArquà PetrarcaOccupationScholarpoetCatholic clericLanguageItalian (Tuscan dialect)LatinNationalityAretineEducationUniversity of MontpellierUniversity o...

Oil spill in the Gulf of Mexico This article is about the oil spill itself. For the initial explosion, see Deepwater Horizon explosion. For other related articles, see Deepwater Horizon (disambiguation). Deepwater Horizon oil spillAs seen from space by the Terra satellite on 24 May 2010LocationMacondo Prospect (Mississippi Canyon Block 252), in the North-central Gulf of Mexico, United States (south of Louisiana)Coordinates28°44′17″N 88°21′58″W / 28.73806°N 88.36611�...

 

City in Central, ParaguayFernando de la MoraCityFernando de la Mora FlagFernando de la MoraLocation in ParaguayCoordinates: 25°19′12″S 57°32′24″W / 25.32000°S 57.54000°W / -25.32000; -57.54000Country ParaguayDepartmentCentralFoundedFebruary 28, 1939Government • Intendente MunicipalAlcides Ramón Riveros Candia (PLRA)Area • Total21 km2 (8 sq mi)Elevation143 m (469 ft)Population • Total183,390&...

 

Artikel ini bukan mengenai [[:orang Oroch dari Khabarovsk Krai, atau orang Oroqen dari Tiongkok]]. OroksNama alternatif:Orok, Ul'ta, Ulcha, Uil'ta, NaniGrup orang UiltaDaerah dengan populasi signifikan Rusia295[1] Jepang20BahasaOrok, Rusia, JepangAgamaShamanisme, Kristen Ortodoks RusiaKelompok etnik terkaitAinu, Nivkh, Itelmen, Even, Koryak, Evenk, Ulch, Nanai, Oroch, Udege Orok (Ороки dalam bahasa Rusia; penyebutan diri: Ulta, Ulcha), yang terkadang disebut Uilta, adal...

NegrosLokasi NegrosGeografiLokasiAsia TenggaraKepulauanVisayasLuas13.328 km2Peringkat luas62Titik tertinggiKanlaon (2.435 m)PemerintahanNegaraFilipinaProvinsiNegros Occidental, Negros OrientalKota terbesarBacolod (429.076 jiwa)KependudukanPenduduk3.700.000 jiwa (2000)Kepadatan283 jiwa/km2Kelompok etnikVisayas Negros adalah pulau yang terletak di kepulauan Visayas, Filipina, di 10°N 123°E / 10°N 123°E / 10; 123. Pulau ini a...

 

LOSC LilleSaison 2016-2017 Généralités Couleurs Blanc, bleu et rouge Stade Stade Pierre-Mauroy50 157 places Président Michel Seydouxpuis Gérard Lopez Entraîneur Frédéric Antonettipuis Patrick Collotpuis Franck Passi Résultats Championnat Onzième 46 points (13V, 7N, 18D)(40 buts pour, 47 buts contre) Coupe de France Quarts de finaleÉliminé par l'AS Monaco (2-1) Coupe de la Ligue Huitièmes de finaleÉliminé par le Paris Saint-Germain (3-1) Ligue Europa 3e tour de qualif...