Granularity (parallel computing)

In parallel computing, granularity (or grain size) of a task is a measure of the amount of work (or computation) which is performed by that task.[1]

Another definition of granularity takes into account the communication overhead between multiple processors or processing elements. It defines granularity as the ratio of computation time to communication time, wherein computation time is the time required to perform the computation of a task and communication time is the time required to exchange data between processors.[2]

If Tcomp is the computation time and Tcomm denotes the communication time, then the granularity G of a task can be calculated as:[2]

Granularity is usually measured in terms of the number of instructions which are executed in a particular task.[1] Alternately, granularity can also be specified in terms of the execution time of a program, combining the computation time and communication time.[1]

Types of parallelism

Depending on the amount of work which is performed by a parallel task, parallelism can be classified into three categories: fine-grained, medium-grained and coarse-grained parallelism.

Fine-grained parallelism

In fine-grained parallelism, a program is broken down to a large number of small tasks. These tasks are assigned individually to many processors. The amount of work associated with a parallel task is low and the work is evenly distributed among the processors. Hence, fine-grained parallelism facilitates load balancing.[3]

As each task processes less data, the number of processors required to perform the complete processing is high. This in turn, increases the communication and synchronization overhead.

Fine-grained parallelism is best exploited in architectures which support fast communication. Shared memory architecture which has a low communication overhead is most suitable for fine-grained parallelism.

It is difficult for programmers to detect parallelism in a program, therefore, it is usually the compilers' responsibility to detect fine-grained parallelism.[1]

An example of a fine-grained system (from outside the parallel computing domain) is the system of neurons in our brain.[4]

Connection Machine (CM-2) and J-Machine are examples of fine-grain parallel computers that have grain size in the range of 4-5 μs.[1]

Coarse-grained parallelism

In coarse-grained parallelism, a program is split into large tasks. Due to this, a large amount of computation takes place in processors. This might result in load imbalance, wherein certain tasks process the bulk of the data while others might be idle. Further, coarse-grained parallelism fails to exploit the parallelism in the program as most of the computation is performed sequentially on a processor. The advantage of this type of parallelism is low communication and synchronization overhead.

Message-passing architecture takes a long time to communicate data among processes which makes it suitable for coarse-grained parallelism.[1]

Cray Y-MP is an example of coarse-grained parallel computer which has a grain size of about 20s.[1]

Medium-grained parallelism

Medium-grained parallelism is used relatively to fine-grained and coarse-grained parallelism. Medium-grained parallelism is a compromise between fine-grained and coarse-grained parallelism, where we have task size and communication time greater than fine-grained parallelism and lower than coarse-grained parallelism. Most general-purpose parallel computers fall in this category.[4]

Intel iPSC is an example of medium-grained parallel computer which has a grain size of about 10ms.[1]

Example

Consider a stack of 20 images with size 10x10 pixels that need to be processed, assuming that each of the 100 pixels can be processed independently of each other. Processing 1 pixel takes 1 clock cycle.

Fine-grained parallelism: Each pixel will be processed individually by one processor at a time. Assuming there are 100 processors that are responsible for processing the image, the 100 processors can process one 10x10 image in a single clock cycle. With 20 processors, it would take 5 clock cycles per image. Each processor can be utilized for 100% of its available time but the result of each pixel-computation needs to be communicated and aggregated at the end of each image processing which can cause a lot of overhead (100 communications per image = 2000 total).

Medium-grained parallelism: The images are split into quarters. Each quarter will be processed individually by one processor at a time taking 25 clock cycles (for 5x5 pixels). Assuming there are 20 processors that are responsible for processing the stack of 20 images, 5 images can be processed in parallel with 4 processors working on each image. If 100 processors were available, 80 could process the stack in parallel taking 25 clock cycles while 20 processors sit idle without any work assigned to them. Once the four quarters have been processed, the results must be aggregated (4 communications per image = 80 total).

Coarse-grained parallelism: A full image is processed by a single processor taking 100 clock cycles. In this case only 20 processors can be used at a time, completing the work in 100 clock cycles without any communication.

The decision on which approach is best depends on the workload and available processing units. The goal should be to maximize parallelization (split work into enough units to evenly distribute it across most available processors) while minimizing communication overhead (ratio of time spend on communication vs time spend on computation). In our example, if the number of pictures to process is high compared to the number of workers, it does not make sense to break images down into smaller units since each worker will receive enough load. If the number of pictures is small compared to the number of workers, some workers might sit idle and waste computation time. However, this only is a problem if processing a single image takes a long time. If the processing is very fast then splitting the work into smaller units might make the total operation slower since the time lost to communication is more than the time gained through parallelization.

Levels of parallelism

Granularity is closely tied to the level of processing. A program can be broken down into 4 levels of parallelism -

  1. Instruction level.
  2. Loop level
  3. Sub-routine level and
  4. Program-level

The highest amount of parallelism is achieved at instruction level, followed by loop-level parallelism. At instruction and loop level, fine-grained parallelism is achieved. Typical grain size at instruction-level is 20 instructions, while the grain-size at loop-level is 500 instructions.[1]

At the sub-routine (or procedure) level the grain size is typically a few thousand instructions. Medium-grained parallelism is achieved at sub-routine level.[1]

At program-level, parallel execution of programs takes place. Granularity can be in the range of tens of thousands of instructions.[1] Coarse-grained parallelism is used at this level.

The below table shows the relationship between levels of parallelism, grain size and degree of parallelism

Levels Grain Size Parallelism
Instruction level Fine Highest
Loop level Fine Moderate
Sub-routine level Medium Moderate
Program level Coarse Least

Impact of granularity on performance

Granularity affects the performance of parallel computers. Using fine grains or small tasks results in more parallelism and hence increases the speedup. However, synchronization overhead, scheduling strategies etc. can negatively impact the performance of fine-grained tasks. Increasing parallelism alone cannot give the best performance.[5]

In order to reduce the communication overhead, granularity can be increased. Coarse grained tasks have less communication overhead but they often cause load imbalance. Hence optimal performance is achieved between the two extremes of fine-grained and coarse-grained parallelism.[6]

Various studies[5][7][8] have proposed their solution to help determine the best granularity to aid parallel processing. Finding the best grain size, depends on a number of factors and varies greatly from problem-to-problem.

See also

Citations

  1. ^ a b c d e f g h i j k Hwang, Kai (1992). Advanced Computer Architecture: Parallelism, Scalability, Programmability (1st ed.). McGraw-Hill Higher Education. ISBN 978-0070316225.
  2. ^ a b Kwiatkowski, Jan (9 September 2001). "Evaluation of Parallel Programs by Measurement of Its Granularity". Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science. Vol. 2328. pp. 145–153. doi:10.1007/3-540-48086-2_16. ISBN 9783540437925. ISBN 9783540480860.
  3. ^ Barney, Blaise. Introduction to Parallel Computing.
  4. ^ a b Miller, Russ; Stout, Quentin F. (1996). Parallel Algorithms for Regular Architectures: Meshes and Pyramids. Cambridge, Mass.: MIT Press. pp. 5–6. ISBN 9780262132336.
  5. ^ a b Chen, Ding-Kai; Su, Hong-Men; Yew, Pen-Chung (1 January 1990). "The impact of synchronization and granularity on parallel systems". Proceedings of the 17th annual international symposium on Computer Architecture - ISCA '90. Vol. 18. pp. 239–248. CiteSeerX 10.1.1.51.3389. doi:10.1145/325164.325150. ISBN 0-89791-366-3. S2CID 16193537.
  6. ^ Yeung, Donald; Dally, William J.; Agarwal, Anant. "How to Choose the Grain Size of a Parallel Computer". CiteSeerX 10.1.1.66.3298. {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ McCreary, Carolyn; Gill, Helen (1 September 1989). "Automatic Determination of Grain Size for Efficient Parallel Processing". Commun. ACM. 32 (9): 1073–1078. doi:10.1145/66451.66454. ISSN 0001-0782. S2CID 14807217.
  8. ^ Kruatrachue, Boontee; Lewis, Ted (1 January 1988). "Grain Size Determination for Parallel Processing". IEEE Softw. 5 (1): 23–32. doi:10.1109/52.1991. ISSN 0740-7459. S2CID 2034255.

Read other articles:

Courteney Cox ArquetteLahirCourteney Bass Cox15 Juni 1964 (umur 59)Birmingham, Alabama, USANama lainCourteney Cox ArquettePekerjaanAktrisTahun aktif1984–sekarangSuami/istriDavid Arquette ​(m. 1999)​ (berpisah; perceraian diajukan)[1]Anak1 Courteney Bass Cox Arquette (lahir dengan nama Cox pada 15 Juni 1964) merupakan seorang aktris dan mantan model busana berkebangsaan Amerika Serikat. Dia menjadi yang terkenal saat berperan sebagai Monic...

 

Serie B 1961-1962 Competizione Serie B Sport Calcio Edizione 30ª Organizzatore Lega Nazionale Professionisti Date dal 3 settembre 1961al 3 giugno 1962 Luogo  Italia Partecipanti 20 Formula girone unico Risultati Vincitore Genoa(3º titolo) Altre promozioni NapoliModena Retrocessioni Novara (per condanna) ReggianaPrato Statistiche Miglior marcatore Renzo Cappellaro (22) I genoani vincitori del torneo Cronologia della competizione 1960-1961 1962-1963 Manuale La Serie B 1961-...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (ديسمبر 2020) إيبر بيكر (بالإنجليزية: Eber Baker)‏    معلومات شخصية تاريخ الميلاد سنة 1780  تاريخ الوفاة سنة 1864 (83–84 سنة)  مواطنة الولايات المتحدة  الحياة العملية �...

American philosopher (1957–2023) Gary VarnerBorn(1957-03-10)March 10, 1957DiedJune 28, 2023(2023-06-28) (aged 66)NationalityAmericanAlma materUniversity of Wisconsin–MadisonNotable workIn Nature's Interests? (1998)Personhood, Ethics, and Animal Cognition (2012)InstitutionsTexas A&MMain interestsEnvironmental ethics, animal ethics, utilitarianism, R. M. HareNotable ideasBiocentric individualism, Harean approaches to animal ethics Gary Edward Varner (March 10, 1957 – June 28...

 

Gregg Karukas dalam Java Jazz 2008 Jakarta International Java Jazz Festival (JIJJF) atau hanya Java Jazz Festival (JJF), merupakan festival musik jazz terbesar yang diselenggarakan setiap tahun pada awal bulan Maret sejak tahun 2005 di Jakarta, Indonesia, oleh Java Festival Production. Selain menghadirkan musisi jazz mancanegara maupun dalam negeri, festival ini juga diperkaya musisi dari genre musik yang lain seperti R&B, Soul dan Reggae.[1] Beberapa dari musisi terkemuka yang ha...

 

Ir.Dolfie Othniel Frederic Palit Anggota Dewan Perwakilan Rakyat Republik IndonesiaPetahanaMulai menjabat 1 Oktober 2019PresidenJoko WidodoDaerah pemilihanJawa Tengah IVMasa jabatan1 Oktober 2009 – 30 September 2014PresidenSusilo Bambang YudhoyonoDaerah pemilihanKalimantan Barat Informasi pribadiLahir27 Oktober 1968 (umur 55)Kijang, Kepulauan RiauPartai politikPDI-PSuami/istriNita E.L. LumowaAnak1Alma materInstitut Teknologi BandungPekerjaanPolitikusSunting kotak info ...

City in New York, United StatesNorth TonawandaCityCity of North TonawandaLeft to right from top: Gateway Harbor, Herschell Carrousel Factory Museum, Riviera Theatre FlagNickname: N.T.Location in Niagara County and the state of New York.Coordinates: 43°2′28″N 78°52′8″W / 43.04111°N 78.86889°W / 43.04111; -78.86889CountryUnited StatesStateNew YorkCountyNiagaraGovernment • TypeMayor-Council • MayorAustin J. Tylec (D)Area[1]...

 

2008 2015 Élections cantonales de 2011 en Charente 18 des 35 cantons de la Charente les 20 et 27 mars 2011 Type d’élection Élections cantonales Majorité départementale – Michel Boutant Liste PSDVGPCF Sièges obtenus 23 Opposition départementale – François Bonneau Liste UMPMoDemDVD Sièges obtenus 12 Président du Conseil général Sortant Élu Michel Boutant PS Michel Boutant PS modifier - modifier le code - voir Wikidata  Les élections cantonales ...

 

South Vietnamese diplomat (1923–2021) In this Vietnamese name, the surname is Bùi, but is often simplified to Bui in English-language text. In accordance with Vietnamese custom, this person should be referred to by the given name, Diem (Diễm). Bùi DiễmBùi Diễm in 1940South Vietnamese Ambassador to the United StatesIn office19 January 1967[1] – 1972[2]PresidentNguyễn Văn ThiệuPreceded byVũ Văn TháiSucceeded byTrần Kim Phượng Personal detailsBo...

Bagian dari seriSaksi-Saksi Yehuwa Ikhtisar Struktur organisasi Badan Pimpinan Watch Tower Bibleand Tract Society Badan usaha Sejarah Gerakan Siswa Alkitab Sengketa kepemimpinan Kelompok-kelompok pecahan Perkembangan doktrin Prediksi-prediksi keliru Demografi Menurut negara-negara KepercayaanRitual KeselamatanEskatologi 144.000 Hamba yang setia dan bijaksana HymneNama Tuhan DarahDisiplin Literatur Menara PengawalSadarlah! Kitab Suci Terjemahan Dunia Baru Daftar publikasi Daftar pustaka Progra...

 

Bukhori YusufBukhori Yusuf sebagai Calon Anggota Legislatif DPR RI dari Partai Keadilan Sejahtera untuk Pemilihan Umum Legislatif tahun 2019 Anggota Dewan Perwakilan RakyatRepublik IndonesiaMasa jabatan1 Oktober 2019 – Mei 2023Perolehan suara52.790 (2019)[1]PenggantiWisnu Wijaya Adi PutraDaerah pemilihanJawa Tengah IMasa jabatan1 Oktober 2009 – 30 September 2014Daerah pemilihanSumatera Selatan II Informasi pribadiLahir5 Maret 1965 (umur 59)Jepara, Jawa Tenga...

 

Questa voce sull'argomento calciatori spagnoli è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Marc Cardona Nazionalità  Spagna Altezza 183 cm Peso 70 kg Calcio Ruolo Attaccante Squadra  Las Palmas CarrieraGiovanili 2013-2014 Atlético SanluqueñoSquadre di club1 2014-2016 Atlético Sanluqueño48 (21)2016-2018 Barcellona B60 (24)2018-2019→  Eibar17 (3)2019-2020 Osa...

この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。出典を追加して記事の信頼性向上にご協力ください。(このテンプレートの使い方)出典検索?: コルク – ニュース · 書籍 · スカラー · CiNii · J-STAGE · NDL · dlib.jp · ジャパンサーチ · TWL(2017年4月) コルクを打ち抜いて作った瓶の栓 コルク(木栓、�...

 

English rock band For other forms of quintessence, see Quintessence (disambiguation). QuintessenceQuintessence (Kralingen, 1970)Background informationOriginUnited KingdomGenresPsychedelic rock, progressive rock, jazz rock, raga rockYears active1969–1980(Reunion: 2010)SpinoffsShpongle, BlurtPast members Sambhu Babaji Jake Milton Allan Mostert Raja Ram Maha Dev Shiva Shankar Jones Quintessence was a rock band formed in April 1969 in Notting Hill, London, England.[1] Their style was a ...

 

IC 2602 صورة محاكية لعنقود IC 2602 بيانات المراقبة (الدهر: J2000) الكوكبة القاعدة المطلع المستقيم 10سا 42د 57.5ث[1] الميل −64° 23′ 39″[1] البعد 547 سنة ضوئية (167.7 فرسخ فلكي) القدر الظاهري (V) 1.9[2] الأبعاد الظاهرية (V) 50′ × 50′ تسميات أخرى Theta Carinae Cluster, Caldwell 102, Melotte 102, Coll...

Bandar Udara InternasionalDiwopu Ürümqiئۈرۈمچى دىۋوپۇ خەلقئارا ئايروپورتى乌鲁木齐地窝堡国际机场IATA: URCICAO: ZWWWInformasiJenisPublikPengelolaXinjiang Airport Group Co. Ltd.MelayaniÜrümqi, XinjiangLokasiKota Diwopu, Distrik Xinshi, ÜrümqiMaskapai penghubung China Southern Airlines Urumqi Air Maskapai utamaHainan AirlinesKetinggian dpl648 mdplKoordinat43°54′26″N 87°28′27″E / 43.90722°N 87.47417°E / 4...

 

In mathematics, there are several equivalent ways of defining the real numbers. One of them is that they form a complete ordered field that does not contain any smaller complete ordered field. Such a definition does not prove that such a complete ordered field exists, and the existence proof consists of constructing a mathematical structure that satisfies the definition. The article presents several such constructions.[1] They are equivalent in the sense that, given the result of any...

 

مدرسة عباس قلي خان شاملو مدرسه عباسقلی‌خان شاملو مدرسة عباس قلي خان شاملو معلومات الموقع الجغرافي المدينة مشهد البلد  إيران تعديل مصدري - تعديل   مدرسة عباس قلي خان شاملو هي مدرسة تاريخية تعود إلى القرن الحادي عشر الهجري، وتقع في مشهد.[1] مراجع ^ Encyclopaedia of the Iranian Archit...

Pandemi COVID-19 di Pantai GadingJumlah kasus dari setiap Distrik.   ≥ 10 000  1000 hingga 9 999  500 hingga 999  100 hingga 499  10 hingga 99  1 hingga 9  0PenyakitCOVID-19Galur virusSARS-CoV-2LokasiPantai GadingKasus pertamaAbidjanTanggal kemunculan11 March 2020(4 tahun, 3 bulan dan 3 hari)AsalWuhan, Republik Rakyat TiongkokKasus terkonfirmasi2,799 (per 30 May)[1]Kasus dirawat1,381 (per 30 May)Kas...

 

American Arctic explorer Ross Gilmore MarvinMarvin in Arctic outerwareBorn(1880-01-28)January 28, 1880Elmira, New York, U.S.Disappearedafter December 8, 1908 (1908-12-08)DiedApril 10, 1909(1909-04-10) (aged 29)near Cape Columbia, CanadaAlma materCornell University, New York Nautical SchoolOccupationCollege instructorKnown forArctic exploration with Robert PearyAwardsPeary Polar Expedition Medal Ross Gilmore Marvin (January 28, 1880 – after December 8, 1908; rep...