Independent and identically distributed random variables

A chart showing uniform distribution. Plot points are scattered randomly, with no pattern or clusters.
A chart showing a uniform distribution

In probability theory and statistics, a collection of random variables is independent and identically distributed (i.i.d., iid, or IID) if each random variable has the same probability distribution as the others and all are mutually independent.[1] IID was first defined in statistics and finds application in many fields, such as data mining and signal processing.

Introduction

Statistics commonly deals with random samples. A random sample can be thought of as a set of objects that are chosen randomly. More formally, it is "a sequence of independent, identically distributed (IID) random data points."

In other words, the terms random sample and IID are synonymous. In statistics, "random sample" is the typical terminology, but in probability, it is more common to say "IID."

  • Identically distributed means that there are no overall trends — the distribution does not fluctuate and all items in the sample are taken from the same probability distribution.
  • Independent means that the sample items are all independent events. In other words, they are not connected to each other in any way;[2] knowledge of the value of one variable gives no information about the value of the other and vice versa.

Application

Independent and identically distributed random variables are often used as an assumption, which tends to simplify the underlying mathematics. In practical applications of statistical modeling, however, this assumption may or may not be realistic.[3]

The i.i.d. assumption is also used in the central limit theorem, which states that the probability distribution of the sum (or average) of i.i.d. variables with finite variance approaches a normal distribution.[4]

The i.i.d. assumption frequently arises in the context of sequences of random variables. Then, "independent and identically distributed" implies that an element in the sequence is independent of the random variables that came before it. In this way, an i.i.d. sequence is different from a Markov sequence, where the probability distribution for the nth random variable is a function of the previous random variable in the sequence (for a first-order Markov sequence). An i.i.d. sequence does not imply the probabilities for all elements of the sample space or event space must be the same.[5] For example, repeated throws of loaded dice will produce a sequence that is i.i.d., despite the outcomes being biased.

In signal processing and image processing, the notion of transformation to i.i.d. implies two specifications, the "i.d." part and the "i." part:

i.d. – The signal level must be balanced on the time axis.

i. – The signal spectrum must be flattened, i.e. transformed by filtering (such as deconvolution) to a white noise signal (i.e. a signal where all frequencies are equally present).

Definition

Definition for two random variables

Suppose that the random variables and are defined to assume values in . Let and be the cumulative distribution functions of and , respectively, and denote their joint cumulative distribution function by .

Two random variables and are independent if and only if for all . (For the simpler case of events, two events and are independent if and only if , see also Independence (probability theory) § Two random variables.)

Two random variables and are identically distributed if and only if for all . [6]

Two random variables and are i.i.d. if they are independent and identically distributed, i.e. if and only if

Definition for more than two random variables

The definition extends naturally to more than two random variables. We say that random variables are i.i.d. if they are independent (see further Independence (probability theory) § More than two random variables) and identically distributed, i.e. if and only if

where denotes the joint cumulative distribution function of .

Examples

Example 1

A sequence of outcomes of spins of a fair or unfair roulette wheel is i.i.d. One implication of this is that if the roulette ball lands on "red", for example, 20 times in a row, the next spin is no more or less likely to be "black" than on any other spin (see the gambler's fallacy).

Example 2

Toss a coin 10 times and write down the results into variables .

  1. Independent: Each outcome will not affect the other outcome (for from 1 to 10), which means the variables are independent of each other.
  2. Identically distributed: Regardless of whether the coin is fair (with a probability of 1/2 for heads) or biased, as long as the same coin is used for each flip, the probability of getting heads remains consistent across all flips.

Such a sequence of i.i.d. variables is also called a Bernoulli process.

Example 3

Roll a die 10 times and save the results into variables .

  1. Independent: Each outcome of the die roll will not affect the next one, which means the 10 variables are independent from each other.
  2. Identically distributed: Regardless of whether the die is fair or weighted, each roll will have the same probability of seeing each result as every other roll. In contrast, rolling 10 different dice, some of which are weighted and some of which are not, would not produce i.i.d. variables.

Example 4

Choose a card from a standard deck of cards containing 52 cards, then place the card back in the deck. Repeat this 52 times. Observe when a king appears.

  1. Independent: Each observation will not affect the next one, which means the 52 results are independent from each other. In contrast, if each card that is drawn is kept out of the deck, subsequent draws would be affected by it (drawing one king would make drawing a second king less likely), and the observations would not be independent.
  2. Identically distributed: After drawing one card from it (and then returning the card to the deck), each time the probability for a king is 4/52, which means the probability is identical each time.

Generalizations

Many results that were first proven under the assumption that the random variables are i.i.d. have been shown to be true even under a weaker distributional assumption.

Exchangeable random variables

The most general notion which shares the main properties of i.i.d. variables are exchangeable random variables, introduced by Bruno de Finetti.[citation needed] Exchangeability means that while variables may not be independent, future ones behave like past ones — formally, any value of a finite sequence is as likely as any permutation of those values — the joint probability distribution is invariant under the symmetric group.

This provides a useful generalization — for example, sampling without replacement is not independent, but is exchangeable.

Lévy process

In stochastic calculus, i.i.d. variables are thought of as a discrete time Lévy process: each variable gives how much one changes from one time to another. For example, a sequence of Bernoulli trials is interpreted as the Bernoulli process.

One may generalize this to include continuous time Lévy processes, and many Lévy processes can be seen as limits of i.i.d. variables—for instance, the Wiener process is the limit of the Bernoulli process.

In machine learning

Machine learning (ML) involves learning statistical relationships within data. To train ML models effectively, it is crucial to use data that is broadly generalizable. If the training data is insufficiently representative of the task, the model's performance on new, unseen data may be poor.

The i.i.d. hypothesis allows for a significant reduction in the number of individual cases required in the training sample, simplifying optimization calculations. In optimization problems, the assumption of independent and identical distribution simplifies the calculation of the likelihood function. Due to this assumption, the likelihood function can be expressed as:

To maximize the probability of the observed event, the log function is applied to maximize the parameter . Specifically, it computes:

where

Computers are very efficient at performing multiple additions, but not as efficient at performing multiplications. This simplification enhances computational efficiency. The log transformation, in the process of maximizing, converts many exponential functions into linear functions.

There are two main reasons why this hypothesis is practically useful with the central limit theorem (CLT):

  1. Even if the sample originates from a complex non-Gaussian distribution, it can be well-approximated because the CLT allows it to be simplified to a Gaussian distribution ("for a large number of observable samples, the sum of many random variables will have an approximately normal distribution").
  2. The second reason is that the model's accuracy depends on the simplicity and representational power of the model unit, as well as the data quality. The simplicity of the unit makes it easy to interpret and scale, while the representational power and scalability improve model accuracy. In a deep neural network, for instance, each neuron is simple yet powerful in representation, layer by layer, capturing more complex features to enhance model accuracy.

See also

References

  1. ^ Clauset, Aaron (2011). "A brief primer on probability distributions" (PDF). Santa Fe Institute. Archived from the original (PDF) on 2012-01-20. Retrieved 2011-11-29.
  2. ^ Stephanie (2016-05-11). "IID Statistics: Independent and Identically Distributed Definition and Examples". Statistics How To. Retrieved 2021-12-09.
  3. ^ Hampel, Frank (1998), "Is statistics too difficult?", Canadian Journal of Statistics, 26 (3): 497–513, doi:10.2307/3315772, hdl:20.500.11850/145503, JSTOR 3315772, S2CID 53117661 (§8).
  4. ^ Blum, J. R.; Chernoff, H.; Rosenblatt, M.; Teicher, H. (1958). "Central Limit Theorems for Interchangeable Processes". Canadian Journal of Mathematics. 10: 222–229. doi:10.4153/CJM-1958-026-0. S2CID 124843240.
  5. ^ Cover, T. M.; Thomas, J. A. (2006). Elements Of Information Theory. Wiley-Interscience. pp. 57–58. ISBN 978-0-471-24195-9.
  6. ^ Casella & Berger 2002, Theorem 1.5.10

Further reading

Read other articles:

Opera by Wolfgang Amadeus Mozart For other uses, see The Marriage of Figaro (disambiguation). The Marriage of FigaroOpera by W. A. MozartEarly 19th-century engraving depicting Count Almaviva and Susanna in act 3Native titleLe nozze di FigaroLibrettistLorenzo Da PonteLanguageItalianBased onLa folle journée, ou le Mariage de Figaroby Pierre BeaumarchaisPremiere1 May 1786 (1786-05-01)Burgtheater, Vienna The Marriage of Figaro (Italian: Le nozze di Figaro, pronounced [le ˈnɔ...

 

 

Sporting event delegationSeychelles at the2017 World Aquatics ChampionshipsFlag of SeychellesFINA codeSEYNational federationSeychelles Swimming Associationin Budapest, HungaryCompetitors4 in 1 sportMedals Gold 0 Silver 0 Bronze 0 Total 0 World Aquatics Championships appearances197319751978198219861991199419982001200320052007200920112013201520172019202220232024 Seychelles competed at the 2017 World Aquatics Championships in Budapest, Hungary from 14 July to 30 July. Swimming Main article: Swi...

 

 

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Ladakh – berita · surat kabar · buku · cendekiawan · JSTOR Ladakh Ladakh, yang terletak di bagian timur pemerintahan Jammu dan Kashmir adalah salah satu daerah tempat tinggal tertinggi di muka bumi. Suat...

يفتقر محتوى هذه المقالة إلى الاستشهاد بمصادر. فضلاً، ساهم في تطوير هذه المقالة من خلال إضافة مصادر موثوق بها. أي معلومات غير موثقة يمكن التشكيك بها وإزالتها. (ديسمبر 2018)   لمعانٍ أخرى، طالع مسجد النور (توضيح). مسجد النور معلومات عامة القرية أو المدينة الموصل الدولة  ا�...

 

 

Para otros usos de este término, véase Lérida (desambiguación). Lérida Lleida Municipio y ciudad de EspañaBanderaEscudo De izquierda a derecha y de arriba abajo: panorámica de la ciudad desde el río Segre, la Estación de Lérida Pirineos, Palacio de la Paeria, Iglesia de San Juan, la Lonja de Lérida, Puente de Príncipe de Viana, el conjunto de la Seo Vieja y el Aeropuerto de Lérida-Alguaire. LéridaUbicación de Lérida en España LéridaUbicación de Lérida en la provincia de L...

 

 

Men's national volleyball team representing Brazil BrazilNickname(s)CanarinhosGalacticBest of All TimesAssociationCBVConfederationCSVHead coachBernardo Rezende[1]FIVB ranking5 (as of 2 December 2023)Uniforms Home Away Third Summer OlympicsAppearances14 (First in 1964)Best result (1992, 2004, 2016)World ChampionshipAppearances17 (First in 1956)Best result (2002, 2006, 2010)World CupAppearances12 (First in 1969)Best result (2003, 2007, 2019)www.cbv.com.br (in Portuguese) Honours Event 1...

此條目可参照英語維基百科相應條目来扩充。 (2021年5月6日)若您熟悉来源语言和主题,请协助参考外语维基百科扩充条目。请勿直接提交机械翻译,也不要翻译不可靠、低品质内容。依版权协议,译文需在编辑摘要注明来源,或于讨论页顶部标记{{Translated page}}标签。 约翰斯顿环礁Kalama Atoll 美國本土外小島嶼 Johnston Atoll 旗幟颂歌:《星條旗》The Star-Spangled Banner約翰斯頓環礁�...

 

 

穆罕默德·达乌德汗سردار محمد داود خان‎ 阿富汗共和國第1任總統任期1973年7月17日—1978年4月28日前任穆罕默德·查希爾·沙阿(阿富汗國王)继任穆罕默德·塔拉基(阿富汗民主共和國革命委員會主席團主席) 阿富汗王國首相任期1953年9月7日—1963年3月10日君主穆罕默德·查希爾·沙阿 个人资料出生(1909-07-18)1909年7月18日 阿富汗王國喀布尔逝世1978年4月28日(...

 

 

2020年夏季奥林匹克运动会波兰代表團波兰国旗IOC編碼POLNOC波蘭奧林匹克委員會網站olimpijski.pl(英文)(波兰文)2020年夏季奥林匹克运动会(東京)2021年7月23日至8月8日(受2019冠状病毒病疫情影响推迟,但仍保留原定名称)運動員206參賽項目24个大项旗手开幕式:帕维尔·科热尼奥夫斯基(游泳)和马娅·沃什乔夫斯卡(自行车)[1]闭幕式:卡罗利娜·纳亚(皮划艇)&#...

Santa LuciaIl sepolcro in piazza S. Lucia alla borgata Stato Italia Regione Sicilia Provincia Siracusa CittàSiracusa Santa Lucia (A Buggata in dialetto siracusano) è il secondo quartiere storico di Siracusa (fino al 2018 dotato di una propria circoscrizione) storicamente e popolarmente definito anche Borgata, bagnata a sud-est dalle acque del porto piccolo o porto marmoreo in cui in passato era attivo il collegamento marittimo Borgata-Ortigia e viceversa con il buzzetto, una imbar...

 

 

Model of hyperbolic geometry Poincaré disk with hyperbolic parallel lines Poincaré disk model of the truncated triheptagonal tiling. In geometry, the Poincaré disk model, also called the conformal disk model, is a model of 2-dimensional hyperbolic geometry in which all points are inside the unit disk, and straight lines are either circular arcs contained within the disk that are orthogonal to the unit circle or diameters of the unit circle. The group of orientation preserving isometries of...

 

 

City in South Dakota, United StatesFort PierreCityMain and Deadwood streets in Fort Pierre, South DakotaMotto: Where The West BeginsLocation in Stanley County and the state of South DakotaCoordinates: 44°22′04″N 100°22′59″W / 44.36778°N 100.38306°W / 44.36778; -100.38306CountryUnited StatesStateSouth DakotaCountyStanleyFounded1867IncorporatedJune 2, 1890[1]Government • MayorGloria HansonArea[2] • Total3.17 s...

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Nida Ria – berita · surat kabar · buku · cendekiawan · JSTOR Nida Ria GroupGenreQasidahTahun aktif1980 - sekarang Nida Ria Group adalah sebuah group Qasidah Modern yang didirikan oleh anak pendiri Nasida...

 

 

British linguist and writer David CrystalOBE FBA FLSW FCILCrystal in 2017Born (1941-07-06) 6 July 1941 (age 82)Lisburn, Northern IrelandNationalityBritishAlma materUniversity College LondonSpouseHilary CrystalChildrenBen CrystalScientific careerFieldsLinguistics Websitedavidcrystal.com David Crystal, OBE, FBA, FLSW, FCIL (born 6 July 1941) is a British linguist who works on the linguistics of the English language. Crystal studied English at Univer...

 

 

Sports event 2013 Canadian Olympic Curling TrialsHost cityWinnipeg, ManitobaArenaMTS CentreDatesDecember 1–8Men's winner Brad JacobsCurling clubSoo CA, Sault Ste. MarieSkipBrad JacobsThirdRyan FrySecondE. J. HarndenLeadRyan HarndenAlternateCaleb FlaxeyCoachTom CoultermanFinalist John MorrisWomen's winner Jennifer JonesCurling clubSt. Vital CC, WinnipegSkipJennifer JonesThirdKaitlyn LawesSecondJill OfficerLeadDawn McEwenAlternateKirsten WallCoachJanet ArnottFinalist Sherry Middaugh« 20...

SananDesaPeta lokasi Desa SananNegara IndonesiaProvinsiJawa TengahKabupatenWonogiriKecamatanGirimartoKode pos57683Kode Kemendagri33.12.22.2013 Luas5,71 km²Jumlah penduduk3.255 jiwa (2012)Kepadatan570,05 jiwa per km² (2012) Sanan adalah desa di kecamatan Girimarto, Kabupaten Wonogiri, provinsi Jawa Tengah, Indonesia. Pembagian wilayah Desa Sanan terdiri dari 8 dusun:[1] Brenggolo Gandon Mongsari Sanan Semagarledok Sempon Sinlonggong Tampakan Pendidikan Lembaga pendidikan formal ...

 

 

Pemilihan umum Wali Kota Pematangsiantar 2015201020209 Desember 2015Kandidat   Calon Hulman Sitorus Wesly Silalahi Teddy Robinson Siahaan Partai Demokrat PDI-P NasDem Pendamping Hefriansyah Sailanto Zainal Purba Suara rakyat 59.445 25.899 19.282 Persentase 54,49% 23,74% 17,67%   Calon Sujito Partai Independen Pendamping Djumadi Suara rakyat 19.282 Persentase 17,67% Peta persebaran suara Peta lokasi Pematangsiantar Wali Kota petahanaHulman Sitorus Demokrat Wali Kota terpil...

 

 

L'emisfero occidentale geografico della Terra indicato in giallo. Emisfero occidentale in riferimento alle Americhe (o al Nuovo Mondo). L'emisfero occidentale è un termine geografico utilizzato per indicare la metà della Terra che si trova ad ovest del meridiano di Greenwich (che attraversa Greenwich a Londra, nel Regno Unito), l'altra metà è l'emisfero orientale.[1] Inoltre viene utilizzato in modo specifico in riferimento alle Americhe (o il Nuovo Mondo) e le acque adiacenti...

Italian middle-distance runner Ala ZoghlamiAla Zoghlami at the 2020 Olympics.Personal informationNational team ItalyBorn (1994-06-19) 19 June 1994 (age 30)Tunis, TunisiaHeight1.80 m (5 ft 11 in)Weight57 kg (126 lb)SportSportAthleticsEvent(s)Middle-distance running3000 metres steeplechaseClubCus PalermoCoached byGaspare PolizziAchievements and titlesPersonal best 3000 m steeplechase 8:24.98 (2021) Ala Zoghlami (Arabic: علاء الزغلامي; born 19 June ...

 

 

Cet article est une ébauche concernant le catholicisme. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Diocèse de Carpentras(la) Dioecesis Carpentoractensis Cathédrale Saint-Siffrein de Carpentras Informations générales Pays France Église catholique Rite liturgique romain Type de juridiction diocèse Suppression 1790 / 1801 Province ecclésiastique Arles puis Avignon Siège Carpentras Diocèses suffragant...