Data parallelism

Sequential vs. data-parallel job execution

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

A data parallel job on an array of n elements can be divided equally among all the processors. Let us assume we want to sum all the elements of the given array and the time for a single addition operation is Ta time units. In the case of sequential execution, the time taken by the process will be n×Ta time units as it sums up all the elements of an array. On the other hand, if we execute this job as a data parallel job on 4 processors the time taken would reduce to (n/4)×Ta + merging overhead time units. Parallel execution results in a speedup of 4 over sequential execution. One important thing to note is that the locality of data references plays an important part in evaluating the performance of a data parallel programming model. Locality of data depends on the memory accesses performed by the program as well as the size of the cache.

History

Exploitation of the concept of data parallelism started in 1960s with the development of the Solomon machine.[1] The Solomon machine, also called a vector processor, was developed to expedite the performance of mathematical operations by working on a large data array (operating on multiple data in consecutive time steps). Concurrency of data operations was also exploited by operating on multiple data at the same time using a single instruction. These processors were called 'array processors'.[2] In the 1980s, the term was introduced [3] to describe this programming style, which was widely used to program Connection Machines in data parallel languages like C*. Today, data parallelism is best exemplified in graphics processing units (GPUs), which use both the techniques of operating on multiple data in space and time using a single instruction.

Most data parallel hardware supports only a fixed number of parallel levels, often only one. This means that within a parallel operation it is not possible to launch more parallel operations recursively, and means that programmers cannot make use of nested hardware parallelism. The programming language NESL was an early effort at implementing a nested data-parallel programming model on flat parallel machines, and in particular introduced the flattening transformation that transforms nested data parallelism to flat data parallelism. This work was continued by other languages such as Data Parallel Haskell and Futhark, although arbitrary nested data parallelism is not widely available in current data-parallel programming languages.

Description

In a multiprocessor system executing a single set of instructions (SIMD), data parallelism is achieved when each processor performs the same task on different distributed data. In some situations, a single execution thread controls operations on all the data. In others, different threads control the operation, but they execute the same code.

For instance, consider matrix multiplication and addition in a sequential manner as discussed in the example.

Example

Below is the sequential pseudo-code for multiplication and addition of two matrices where the result is stored in the matrix C. The pseudo-code for multiplication calculates the dot product of two matrices A, B and stores the result into the output matrix C.

If the following programs were executed sequentially, the time taken to calculate the result would be of the (assuming row lengths and column lengths of both matrices are n) and for multiplication and addition respectively.

// Matrix multiplication
for (i = 0; i < row_length_A; i++)
{		
    for (k = 0; k < column_length_B; k++)
    {
        sum = 0;
        for (j = 0; j < column_length_A; j++)
        {
            sum += A[i][j] * B[j][k];
        }
        C[i][k] = sum;
    }
}
// Array addition
for (i = 0; i < n; i++) {
    c[i] = a[i] + b[i];
}

We can exploit data parallelism in the preceding code to execute it faster as the arithmetic is loop independent. Parallelization of the matrix multiplication code is achieved by using OpenMP. An OpenMP directive, "omp parallel for" instructs the compiler to execute the code in the for loop in parallel. For multiplication, we can divide matrix A and B into blocks along rows and columns respectively. This allows us to calculate every element in matrix C individually thereby making the task parallel. For example: A[m x n] dot B [n x k] can be finished in instead of when executed in parallel using m*k processors.

Data parallelism in matrix multiplication
// Matrix multiplication in parallel
#pragma omp parallel for schedule(dynamic,1) collapse(2)
for (i = 0; i < row_length_A; i++){
    for (k = 0; k < column_length_B; k++){
        sum = 0;
        for (j = 0; j < column_length_A; j++){
            sum += A[i][j] * B[j][k];
        }
        C[i][k] = sum;
    }
}

It can be observed from the example that a lot of processors will be required as the matrix sizes keep on increasing. Keeping the execution time low is the priority but as the matrix size increases, we are faced with other constraints like complexity of such a system and its associated costs. Therefore, constraining the number of processors in the system, we can still apply the same principle and divide the data into bigger chunks to calculate the product of two matrices.[4]

For addition of arrays in a data parallel implementation, let's assume a more modest system with two central processing units (CPU) A and B, CPU A could add all elements from the top half of the arrays, while CPU B could add all elements from the bottom half of the arrays. Since the two processors work in parallel, the job of performing array addition would take one half the time of performing the same operation in serial using one CPU alone.

The program expressed in pseudocode below—which applies some arbitrary operation, foo, on every element in the array d—illustrates data parallelism:[nb 1]

if CPU = "a" then
    lower_limit := 1
    upper_limit := round(d.length / 2)
else if CPU = "b" then
    lower_limit := round(d.length / 2) + 1
    upper_limit := d.length

for i from lower_limit to upper_limit by 1 do
    foo(d[i])

In an SPMD system executed on 2 processor system, both CPUs will execute the code.

Data parallelism emphasizes the distributed (parallel) nature of the data, as opposed to the processing (task parallelism). Most real programs fall somewhere on a continuum between task parallelism and data parallelism.

Steps to parallelization

The process of parallelizing a sequential program can be broken down into four discrete steps.[5]

Type Description
Decomposition The program is broken down into tasks, the smallest exploitable unit of concurrence.
Assignment Tasks are assigned to processes.
Orchestration Data access, communication, and synchronization of processes.
Mapping Processes are bound to processors.

Data parallelism vs. task parallelism

Data parallelism Task parallelism
Same operations are performed on different subsets of same data. Different operations are performed on the same or different data.
Synchronous computation Asynchronous computation
Speedup is more as there is only one execution thread operating on all sets of data. Speedup is less as each processor will execute a different thread or process on the same or different set of data.
Amount of parallelization is proportional to the input data size. Amount of parallelization is proportional to the number of independent tasks to be performed.
Designed for optimum load balance on multi processor system. Load balancing depends on the availability of the hardware and scheduling algorithms like static and dynamic scheduling.

Data parallelism vs. model parallelism

Data parallelism Model parallelism
Same model is used for every thread but the data given to each of them is divided and shared. Same data is used for every thread, and model is split among threads.
It is fast for small networks but very slow for large networks since large amounts of data needs to be transferred between processors all at once. It is slow for small networks and fast for large networks.
Data parallelism is ideally used in array and matrix computations and convolutional neural networks Model parallelism finds its applications in deep learning

[6]

Mixed data and task parallelism

Data and task parallelism, can be simultaneously implemented by combining them together for the same application. This is called Mixed data and task parallelism. Mixed parallelism requires sophisticated scheduling algorithms and software support. It is the best kind of parallelism when communication is slow and number of processors is large.[7]

Mixed data and task parallelism has many applications. It is particularly used in the following applications:

  1. Mixed data and task parallelism finds applications in the global climate modeling. Large data parallel computations are performed by creating grids of data representing Earth's atmosphere and oceans and task parallelism is employed for simulating the function and model of the physical processes.
  2. In timing based circuit simulation. The data is divided among different sub-circuits and parallelism is achieved with orchestration from the tasks.

Data parallel programming environments

A variety of data parallel programming environments are available today, most widely used of which are:

  1. Message Passing Interface: It is a cross-platform message passing programming interface for parallel computers. It defines the semantics of library functions to allow users to write portable message passing programs in C, C++ and Fortran.
  2. Open Multi Processing[8] (Open MP): It's an Application Programming Interface (API) which supports shared memory programming models on multiple platforms of multiprocessor systems. Since version 4.5, OpenMP is also able to target devices other than typical CPUs. It can program FPGAs, DSPs, GPUs and more. It is not confined to GPUs like OpenACC.
  3. CUDA and OpenACC: CUDA and OpenACC (respectively) are parallel computing API platforms designed to allow a software engineer to utilize GPU's computational units for general purpose processing.
  4. Threading Building Blocks and RaftLib: Both open source programming environments that enable mixed data/task parallelism in C/C++ environments across heterogeneous resources.

Applications

Data parallelism finds its applications in a variety of fields ranging from physics, chemistry, biology, material sciences to signal processing. Sciences imply data parallelism for simulating models like molecular dynamics,[9] sequence analysis of genome data [10] and other physical phenomenon. Driving forces in signal processing for data parallelism are video encoding, image and graphics processing, wireless communications [11] to name a few.

Data-intensive computing

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive if they require large volumes of data and devote most of their processing time to input/output and manipulation of data.[12]

See also

Notes

  1. ^ Some input data (e.g. when d.length evaluates to 1 and round rounds towards zero [this is just an example, there are no requirements on what type of rounding is used]) will lead to lower_limit being greater than upper_limit, it's assumed that the loop will exit immediately (i.e. zero iterations will occur) when this happens.

References

  1. ^ "The Solomon Computer".
  2. ^ "SIMD/Vector/GPU" (PDF). Retrieved 2016-09-07.
  3. ^ Hillis, W. Daniel and Steele, Guy L., Data Parallel Algorithms Communications of the ACMDecember 1986
  4. ^ Barney, Blaise. "Introduction to Parallel Computing". computing.llnl.gov. Archived from the original on 2013-06-10. Retrieved 2016-09-07.
  5. ^ Solihin, Yan (2016). Fundamentals of Parallel Architecture. Boca Raton, FL: CRC Press. ISBN 978-1-4822-1118-4.
  6. ^ "How to Parallelize Deep Learning on GPUs Part 2/2: Model Parallelism". Tim Dettmers. 2014-11-09. Retrieved 2016-09-13.
  7. ^ "The Netlib" (PDF).
  8. ^ "OpenMP.org". openmp.org. Archived from the original on 2016-09-05. Retrieved 2016-09-07.
  9. ^ Boyer, L. L; Pawley, G. S (1988-10-01). "Molecular dynamics of clusters of particles interacting with pairwise forces using a massively parallel computer". Journal of Computational Physics. 78 (2): 405–423. Bibcode:1988JCoPh..78..405B. doi:10.1016/0021-9991(88)90057-5.
  10. ^ Yap, T.K.; Frieder, O.; Martino, R.L. (1998). "Parallel computation in biological sequence analysis". IEEE Transactions on Parallel and Distributed Systems. 9 (3): 283–294. CiteSeerX 10.1.1.30.2819. doi:10.1109/71.674320.
  11. ^ Singh, H.; Lee, Ming-Hau; Lu, Guangming; Kurdahi, F.J.; Bagherzadeh, N.; Filho, E.M. Chaves (2000-06-01). "MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications". IEEE Transactions on Computers. 49 (5): 465–481. doi:10.1109/12.859540. ISSN 0018-9340.
  12. ^ Handbook of Cloud Computing, "Data-Intensive Technologies for Cloud Computing," by A.M. Middleton. Handbook of Cloud Computing. Springer, 2010.

Read other articles:

Fredric JamesonLahir14 April 1934 (umur 89)Cleveland, Ohio, Amerika SerikatAlmamaterHaverford CollegeUniversitas YaleEraFilsafat abad ke-20/21KawasanFilsafat BaratAliranMarxisme BaratHermeneutika Marxis[1]Minat utamaPascamodernisme  · modernisme  · fiksi ilmiah  · utopia  · sejarah  · naratif  · kajian budaya  · dialektika  · strukturalismeGagasan pentingPemetaan kognitif  · alego...

 

 

Abdullahi Yusuf Ahmed Presiden Somalia 8Masa jabatan14 Oktober 2004 – 29 Desember 2008Perdana MenteriMuhammad Abdi YusufAli Muhammad GhediSalim Aliyow IbrowNur Hassan Hussein Mohamoud Mohamed Gacmodhere (Tidak diakui) PendahuluAbdiqasim Salad HassanPenggantiAdan Mohamed Nuur Madobe (Pjs) Informasi pribadiLahir15 Desember 1934 (umur 89)Galkacyo, Mudug, SomaliaKebangsaan SomaliaPartai politikTFGSunting kotak info • L • B Abdullahi Yusuf Ahmed (bahasa Somal...

 

 

Once Upon a Small TownPoster promosiHangul어쩌다 전원일기 Alih Aksara yang DisempurnakanEojjeoda Jeon-won-ilgiArtiAccidental Country Diary GenreKomedi romantisBerdasarkanAccidental Country Diaryoleh Park Ha-minPengembangKakao EntertainmentDitulis olehBaek Eun-kyeongSutradaraKwon Seok-jangPemeranPark Soo-youngChoo Young-wooBaek Seong-cheolJung Suk-yongMusikMoon Seong-namNegara asalKorea SelatanBahasa asliKoreaJmlh. episode12ProduksiProduser eksekutifKim Min-jiProduserChoi Se-jeongChoi ...

The United States Hockey League began in 1961 as a semi-professional ice hockey league.[1] Starting with the 1979–80 season, the league became a strictly Amateur league, and began awarding its champion the Clark Cup Trophy.[2] All champions of the USHL are highlighted in this page. Clark Cup The Clark Cup is the current trophy awarded annually to the winner of the United States Hockey League Tier 1 Junior Hockey playoff champions. The Clark Cup was named in honor of Don Cla...

 

 

LabuanKecamatanNegara IndonesiaProvinsiBantenKabupatenPandeglangPopulasi • Total56,947 jiwa (2.017)[1] jiwaKode Kemendagri36.01.12 Kode BPS3601120 Luas15,66 km²[2]Desa/kelurahan9[3] Labuan adalah nama salah satu kecamatan di Kabupaten Pandeglang, Provinsi Banten, Indonesia. Geografi Pelabuhan Labuan (foto diambil tahun 2011) Pantai di Labuan Perbatasan Utara Kecamatan Carita Timur Kecamatan Jiput dan Cikedal Selatan Kecamatan Pagelaran Barat Samudra H...

 

 

لجنة اليونسكوب اختصارا لـ UNISCOP، أنشأت في 15 مايو من العام 1947 لدراسة المسألة الفلسطينية وطرح مقترحات لحل مشكلة فلسطين بعد انسحاب بريطانيا من فلسطين بناء على قرار الجمعية العامة للأمم المتحدة الذي حمل رقم (106) و كانت تضم يوغوسلافيا والبيرو وإيران والهند وكندا والنمسا وغواتيم�...

Air Aruba IATA ICAO Kode panggil FQ ARU ARUBA Didirikan1986Berhenti beroperasi23 Oktober 2000PenghubungBandar Udara Internasional Ratu BeatrixAliansiAserca AirlinesArmada26Tujuan27Kantor pusatOranjestad, ArubaTokoh utama Tawa Irausquin (CEO) Peter Look Hong (CEO) Henri Coffie (CEO) Situs webinterknowledge.com/air-aruba/ Air Aruba adalah maskapai penerbangan utama dari pulau Aruba. Didirikan pada tahun 1986 dan menyatakan kebangkrutan pada tahun 2000.[butuh rujukan] Berkantor pusat di ...

 

 

Eamonn McCrystalMcCrystal in Belfast, 2016Background informationBorn (1987-06-01) 1 June 1987 (age 36)Cookstown, Northern IrelandGenresPop, easy listening, vocal, traditional vocal popOccupation(s)Singer, Actor, TV host and ProducerInstrument(s)Vocals, piano, fluteYears active2000–presentLabelsHedgehog RecordsWebsiteeamonn.netMusical artist Eamonn McCrystal (born 1 June 1987)[1] is a multi-Emmy Award winning[2][3] Northern Irish pop tenor, TV host[4] an...

 

 

Japanese novel series You can help expand this article with text translated from the corresponding article in Japanese. (March 2009) Click [show] for important translation instructions. View a machine-translated version of the Japanese article. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translated text...

Louis Brière de l'IsleJenderal Briere de l'Isle, dengan plakat Perwira Agung Légion d'honneur di dadanya.Lahir24 Juni 1827MartinikMeninggal19 Juni 1896(1896-06-19) (umur 68)Saint-Leu–Taverny, PrancisPengabdian FranceDinas/cabangAngkatan Darat PrancisLama dinas1847–1893PangkatGénéral de divisionKomandan1er Régiment d'Infanterie de MarineKorps Ekspedisi TonkinPerang/pertempuranPerang Prancis-PrusiaPerang Tiongkok-PrancisPenghargaanCitation in L'Ordre de L'Armee (1861) Perwira...

 

 

Branko Ilić Ilić in 2015Informasi pribadiNama lengkap Branko Ilić[1]Tanggal lahir 6 Februari 1983 (umur 41)[2]Tempat lahir Ljubljana, SFR Yugoslavia[2]Tinggi 1,85 mPosisi bermain Pemain bertahanKarier junior Grosuplje0000–2002 OlimpijaKarier senior*Tahun Tim Tampil (Gol)2001–2004 Olimpija 55 (0)2002 → Grosuplje (pinjaman) 13 (2)2005–2007 Domžale 63 (2)2007 → Betis (pinjaman) 13 (0)2007–2010 Betis 21 (0)2009 → FC Moscow (pinjaman) 6 (0)2010–201...

 

 

بوشكان  - قرية -    تقسيم إداري البلد  إيران[1] عاصمة لـ ناحية بوشكان  المحافظة بوشهر المقاطعة مقاطعة دشتستان الناحية ناحية بوشكان القسم الريفي قسم بوشكان الريفي (مقاطعة دشتستان) خصائص جغرافية إحداثيات 28°49′45″N 51°41′53″E / 28.82917°N 51.69806°E / 28.82917...

Botanical garden on the grounds of the United States Capitol in Washington, D.C., USA United States Botanic GardenU.S. Botanical Garden in 2017TypeBotanical GardenLocationWashington DCCoordinates38°53′17″N 77°00′47″W / 38.888°N 77.013°W / 38.888; -77.013Created1820Operated byUS CongressPublic transit accessFederal Center SW station The United States Botanic Garden (USBG) is a botanical garden on the grounds of the United States Capitol in Washington, D...

 

 

La lemniscate de Bernoulli. La lemniscate de Bernoulli est une courbe plane unicursale. Elle porte le nom du mathématicien et physicien suisse Jacques Bernoulli. Histoire La lemniscate de Bernoulli fait partie d'une famille de courbes décrite par Jean-Dominique Cassini en 1680, les ovales de Cassini. Jacques Bernoulli la redécouvre en 1694 au détour de travaux sur l'ellipse[1], et la baptise lemniscus (« ruban » en latin). Le problème de la longueur des arcs de la lemnisc...

 

 

Questa voce o sezione sull'argomento siti archeologici d'Italia non cita le fonti necessarie o quelle presenti sono insufficienti. Commento: mancano citazioni puntuali, il lungo testo di Blanc che viene riportato è da verificare per eventuale copyviol Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Grotta GuattariL'ingresso della grotta.UtilizzoGrotta Epoca100-80.000 anni fa (livello 7)75.000 anni fa (livello 5)55.000 ...

American football player, coach, and commentator (born 1939) American football player Mike DitkaDitka in 2008No. 89, 98Position:Tight endPersonal informationBorn: (1939-10-18) October 18, 1939 (age 84)Carnegie, Pennsylvania, U.S.Height:6 ft 3 in (1.91 m)Weight:228 lb (103 kg)Career informationHigh school:Aliquippa (Aliquippa, Pennsylvania)College:Pittsburgh (1958–1960)NFL draft:1961 / Round: 1 / Pick: 5AFL draft:1961 / Round: 1...

 

 

Lake in Laguna, Philippines Lake SampalocLake SampalocLocation in the PhilippinesLocationLagunaGroupSeven Lakes of San PabloCoordinates14°04′44″N 121°19′48″E / 14.079°N 121.33°E / 14.079; 121.33TypemaarBasin countriesPhilippinesMax. width1.2 kilometres (0.75 mi)Surface area104 hectares (260 acres)Average depth10 metres (33 ft)Max. depth27 metres (89 ft)SettlementsSan Pablo City Panoramic view of Sampalok Lake Lake Sampaloc is a volcanic ...

 

 

Benjamin Nivet Informasi pribadiTanggal lahir 2 Januari 1977 (umur 47)Tempat lahir Chartres, PrancisTinggi 1,75 m (5 ft 9 in)Posisi bermain GelandangInformasi klubKlub saat ini TroyesNomor 10Karier senior*Tahun Tim Tampil (Gol)1992–1999 Auxerre 13 (0)1999–2000 → Châteauroux (pinjaman) 0 (0)2000–2002 Châteauroux 21 (4)2002–2007 Troyes 140 (28)2007–2012 Caen 167 (24)2012– Troyes 8 (2) * Penampilan dan gol di klub senior hanya dihitung dari liga domestik ...

French sociologist, anthropologist, and philosopher (1930–2002) Pierre BourdieuBourdieu in 1996Born1 August 1930Denguin, FranceDied23 January 2002(2002-01-23) (aged 71)Paris, FranceAlma materÉcole normale supérieure, University of Paris[3]SchoolStructuralism · Genetic structuralism[1] · Critical sociology[2]InstitutionsÉcole pratique des hautes études (before 1975) École des hautes études en sciences sociales (after 1975) C...

 

 

Euclidean geometry without distance and angles In affine geometry, one uses Playfair's axiom to find the line through C1 and parallel to B1B2, and to find the line through B2 and parallel to B1C1: their intersection C2 is the result of the indicated translation. GeometryProjecting a sphere to a plane OutlineHistory (Timeline) Branches Euclidean Non-Euclidean Elliptic Spherical Hyperbolic Non-Archimedean geometry Projective Affine Synthetic Analytic Algebraic Arithmetic Diophantine Differentia...