Share to: share facebook share twitter share wa share telegram print page

Sample mean and covariance

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.

The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales. The sample mean is used as an estimator for the population mean, the average value in the entire population, where the estimate is more likely to be close to the population mean if the sample is large and representative. The reliability of the sample mean is estimated using the standard error, which in turn is calculated using the variance of the sample. If the sample is random, the standard error falls with the size of the sample and the sample mean's distribution approaches the normal distribution as the sample size increases.

The term "sample mean" can also be used to refer to a vector of average values when the statistician is looking at the values of several variables in the sample, e.g. the sales, profits, and employees of a sample of Fortune 500 companies. In this case, there is not just a sample variance for each variable but a sample variance-covariance matrix (or simply covariance matrix) showing also the relationship between each pair of variables. This would be a 3×3 matrix when 3 variables are being considered. The sample covariance is useful in judging the reliability of the sample means as estimators and is also useful as an estimate of the population covariance matrix.

Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics to represent the location and dispersion of the distribution of values in the sample, and to estimate the values for the population.

Definition of the sample mean

The sample mean is the average of the values of a variable in a sample, which is the sum of those values divided by the number of values. Using mathematical notation, if a sample of N observations on variable X is taken from the population, the sample mean is:

Under this definition, if the sample (1, 4, 1) is taken from the population (1,1,3,4,0,2,1,0), then the sample mean is , as compared to the population mean of . Even if a sample is random, it is rarely perfectly representative, and other samples would have other sample means even if the samples were all from the same population. The sample (2, 1, 0), for example, would have a sample mean of 1.

If the statistician is interested in K variables rather than one, each observation having a value for each of those K variables, the overall sample mean consists of K sample means for individual variables. Let be the ith independently drawn observation (i=1,...,N) on the jth random variable (j=1,...,K). These observations can be arranged into N column vectors, each with K entries, with the K×1 column vector giving the i-th observations of all variables being denoted (i=1,...,N).

The sample mean vector is a column vector whose j-th element is the average value of the N observations of the jth variable:

Thus, the sample mean vector contains the average of the observations for each variable, and is written

Definition of sample covariance

The sample covariance matrix is a K-by-K matrix with entries

where is an estimate of the covariance between the jth variable and the kth variable of the population underlying the data. In terms of the observation vectors, the sample covariance is

Alternatively, arranging the observation vectors as the columns of a matrix, so that

,

which is a matrix of K rows and N columns. Here, the sample covariance matrix can be computed as

,

where is an N by 1 vector of ones. If the observations are arranged as rows instead of columns, so is now a 1×K row vector and is an N×K matrix whose column j is the vector of N observations on variable j, then applying transposes in the appropriate places yields

Like covariance matrices for random vector, sample covariance matrices are positive semi-definite. To prove it, note that for any matrix the matrix is positive semi-definite. Furthermore, a covariance matrix is positive definite if and only if the rank of the vectors is K.

Unbiasedness

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector , a row vector whose jth element (j = 1, ..., K) is one of the random variables.[1] The sample covariance matrix has in the denominator rather than due to a variant of Bessel's correction: In short, the sample covariance relies on the difference between each observation and the sample mean, but the sample mean is slightly correlated with each observation since it is defined in terms of all observations. If the population mean is known, the analogous unbiased estimate

using the population mean, has in the denominator. This is an example of why in probability and statistics it is essential to distinguish between random variables (upper case letters) and realizations of the random variables (lower case letters).

The maximum likelihood estimate of the covariance

for the Gaussian distribution case has N in the denominator as well. The ratio of 1/N to 1/(N − 1) approaches 1 for large N, so the maximum likelihood estimate approximately equals the unbiased estimate when the sample is large.

Distribution of the sample mean

For each random variable, the sample mean is a good estimator of the population mean, where a "good" estimator is defined as being efficient and unbiased. Of course the estimator will likely not be the true value of the population mean since different samples drawn from the same distribution will give different sample means and hence different estimates of the true mean. Thus the sample mean is a random variable, not a constant, and consequently has its own distribution. For a random sample of N observations on the jth random variable, the sample mean's distribution itself has mean equal to the population mean and variance equal to , where is the population variance.

The arithmetic mean of a population, or population mean, is often denoted μ.[2] The sample mean (the arithmetic mean of a sample of values drawn from the population) makes a good estimator of the population mean, as its expected value is equal to the population mean (that is, it is an unbiased estimator). The sample mean is a random variable, not a constant, since its calculated value will randomly differ depending on which members of the population are sampled, and consequently it will have its own distribution. For a random sample of n independent observations, the expected value of the sample mean is

and the variance of the sample mean is

If the samples are not independent, but correlated, then special care has to be taken in order to avoid the problem of pseudoreplication.

If the population is normally distributed, then the sample mean is normally distributed as follows:

If the population is not normally distributed, the sample mean is nonetheless approximately normally distributed if n is large and σ2/n < +∞. This is a consequence of the central limit theorem.

Weighted samples

In a weighted sample, each vector (each set of single observations on each of the K random variables) is assigned a weight . Without loss of generality, assume that the weights are normalized:

(If they are not, divide the weights by their sum). Then the weighted mean vector is given by

and the elements of the weighted covariance matrix are [3]

If all weights are the same, , the weighted mean and covariance reduce to the (biased) sample mean and covariance mentioned above.

Criticism

The sample mean and sample covariance are not robust statistics, meaning that they are sensitive to outliers. As robustness is often a desired trait, particularly in real-world applications, robust alternatives may prove desirable, notably quantile-based statistics such as the sample median for location,[4] and interquartile range (IQR) for dispersion. Other alternatives include trimming and Winsorising, as in the trimmed mean and the Winsorized mean.

See also

References

  1. ^ Richard Arnold Johnson; Dean W. Wichern (2007). Applied Multivariate Statistical Analysis. Pearson Prentice Hall. ISBN 978-0-13-187715-3. Retrieved 10 August 2012.
  2. ^ Underhill, L.G.; Bradfield d. (1998) Introstat, Juta and Company Ltd. ISBN 0-7021-3838-X p. 181
  3. ^ Mark Galassi, Jim Davies, James Theiler, Brian Gough, Gerard Jungman, Michael Booth, and Fabrice Rossi. GNU Scientific Library - Reference manual, Version 2.6, 2021. Section Statistics: Weighted Samples
  4. ^ The World Question Center 2006: The Sample Mean, Bart Kosko

This information is adapted from Wikipedia which is publicly available.

Read other articles:

Mazmur 35Naskah Gulungan Mazmur 11Q5 di antara Naskah Laut Mati memuat salinan sejumlah besar mazmur Alkitab yang diperkirakan dibuat pada abad ke-2 SM.KitabKitab MazmurKategoriKetuvimBagian Alkitab KristenPerjanjian LamaUrutan dalamKitab Kristen19← Mazmur 34 Mazmur 36 → Mazmur 35 (disingkat Maz 35, Mzm 35 atau Mz 35; penomoran Septuaginta: Mazmur 34) adalah sebuah mazmur dalam bagian ke-1 Kitab Mazmur di Alkitab Ibrani dan Perjanjian Lama dalam Alkitab Kristen. Mazmur ini digubah ol…

Harry Heilmann Datos personalesNacimiento San Francisco, California3 de agosto de 1894Nacionalidad(es) EstadounidenseFallecimiento 9 de julio de 1951Southfield, MichiganCarrera deportivaDeporte BéisbolClub profesionalDebut deportivo 16 de mayo de 1914(Detroit Tigers)Promedio de bateo .342Home runs 184Carreras impulsadas 1,539Posición Jardinero/Primera BaseBateo / Lanz. Derecha / DerechaRetirada deportiva 31 de mayo de 1932(Cincinnati Reds)Trayectoria Detroit Tigers (1914, 1916-1929) …

تقييم دورة الحياة (المعروف أيضًا بتحليل دورة الحياة ومراجعة الحسابات البيئية والتحليل من المهد إلى اللحد)[1] هو تقنية لتقييم الآثار البيئية المرتبطة بكل مراحل حياة منتج ما، من استخراج المادة الخام مرورًا بمعالجة المواد وعملية التصنيع والتوزيع والاستخدام والإصلاح والصي…

Альберт Бредовнім. Albert BredowНародження 1828[1]НімеччинаСмерть 1899[1]  Москва, Російська імперіяНаціональність німецьКраїна  Російська імперіяЖанр пейзажДіяльність пейзажист, сценограф, художникНапрямок романтизм  Альберт Бредов у Вікісховищі Альберт Бре…

3. deild Datos generalesDeporte fútbolSede Islas Feroe Islas FeroeConfederación UEFAContinente EuropaOrganizador FSFDatos históricosFundación 1980 (43 años)Datos estadísticosCampeón actual Víkingur III (2022)Más campeonatos HB III (8)Datos de competenciaCategoría 4 Ascenso a 2. deildCopa nacional Copa de Islas Feroe[editar datos en Wikidata] La 3. deild es el cuarto nivel de fútbol en las Islas Feroe. La liga se divide en 3 grupos diferentes. Al final de la tempora…

إحصاء وصفيصنف فرعي من إحصاء جزء من إحصاء — بحث كمي تعديل - تعديل مصدري - تعديل ويكي بيانات جزء من سلسلة مقالات حولالبحث العلمي قائمة المجالات الأكاديمية علم تطبيقي بحث وتطوير علوم شكلية إنسانيات علوم طبيعية مهنة علوم اجتماعية تصميم البحث مقترح البحث سؤال البحث كتابة حجج است

Badan Pengembangan dan Pembinaan BahasaGedung Pusat Bahasa di Rawamangun, JakartaInformasi lembagaKantor pusatJalan Daksinapati Barat IV, Rawamangun, Jakarta TimurPejabat eksekutifKepala, E. Aminudin AzizDepartemen indukKemendikbud RistekSitus webbadanbahasa.kemdikbud.go.id Badan Pengembangan dan Pembinaan Bahasa (sempat dikenal dengan nama Pusat Bahasa dan Badan Pengembangan Bahasa dan Perbukuan) adalah unsur penunjang di Kementerian Pendidikan dan Kebudayaan Republik Indonesia yang mempunyai t…

Mosque in Sarajevo, Bosnia and Herzegovina The neutrality of this article is disputed. Relevant discussion may be found on the talk page. Please do not remove this message until conditions to do so are met. (March 2023) (Learn how and when to remove this template message) This article contains weasel words: vague phrasing that often accompanies biased or unverifiable information. Such statements should be clarified or removed. (March 2023) King Fahd MosqueDžamija kralja FahdaReligionAffiliation…

Deze pagina toont een chronologisch en gedetailleerd overzicht van de interlands die het IJslands voetbalelftal heeft gespeeld in de periode 2010 – 2019. Interlands 2010 Zie IJslands voetbalelftal in 2010 voor het hoofdartikel over dit onderwerp. 3 maartVriendschappelijk№ 379«onderlinge duels»17:00 uur Cyprus  0 – 0  IJsland Antonis Papadopoulosstadion, LarnacaToeschouwers: 500Scheidsrechter: Nikolaj Jordanov (BUL) 21 maartVriendschappelijk№ 380«onderlinge duels»16:00 uur I…

1981 studio album by Jean-Michel Jarre Les Chants MagnétiquesStudio album by Jean-Michel JarreReleased20 May 1981 (1981-05-20)StudioCroissy Studio, FranceGenreElectronicSynth-pop[1]Length35:53LabelDisques DreyfusProducerJean-Michel JarreJean-Michel Jarre chronology Équinoxe(1978) Les Chants Magnétiques(1981) Les Concerts en Chine(1982) Singles from Les Chants Magnétiques Les Chants Magnétiques Part 2Released: June 1981[2] Les Chants Magnétiques Part 4 (Re…

حزب العدل البلد مصر  التأسيس تاريخ التأسيس 2011 المقرات المقر الرئيسي وسط البلد، القاهرة الأفكار الأيديولوجيا ليبرالي اجتماعي الانحياز السياسي وسطية  معلومات أخرى الموقع الرسمي العدل.أورج تعديل مصدري - تعديل   حزب العدل هو حزب سياسي مصري تم الإعلان عن تأسيسه في مايو 201…

Sungai Pasig Negara Philippines Region Wilayah Ibukota Nasional Filipina, Calabarzon Anak sungai  - kiri Sungai Pateros-Taguig, Sungai San Juan  - kanan Sungai Marikina, Sungai Napindan Kota Manila, Makati, Mandaluyong, Pasig, Taguig Sumber Laguna de Bay Muara Teluk Manila  - lokasi Manila  - elevation 0 m (0 ft) Panjang 27 km (17 mi) DAS 570 km2 (220 sq mi) Daerah Aliran Sungai Pasig-Sungai Marikina. Sungai Pasig (Filipi…

This article is about the album by Billy Taylor. For the song by Harold Arlen and Truman Capote, see A Sleepin' Bee. 1969 studio album by Billy Taylor TrioSleeping BeeStudio album by Billy Taylor TrioReleased1969RecordedApril 1969StudioRCA Studio, New York CityGenreJazzLength42:57LabelMPSMPS 15234PrestigePR 7762ProducerHans Georg Brunner-SchwerBilly Taylor chronology I Wish I Knew How It Would Feel to Be Free(1968) Sleeping Bee(1969) OK Billy(1970) Billy Taylor Today cover Sleeping Bee i…

Coordenadas: 41° 31' 42 N 8° 35' 42 O  Portugal Santa Eugénia de Rio Covo    Freguesia   Igreja de Santa EugéniaIgreja de Santa Eugénia Símbolos Brasão de armas Localização Localização no município de BarcelosLocalização no município de Barcelos Santa Eugénia de Rio CovoLocalização de Santa Eugénia de Rio Covo em Portugal Coordenadas 41° 31' 42 N 8° 35' 42 O Região Norte Sub-região Cávado Distrito Braga Municí…

Italian automobile manufacturer For other uses, see Lamborghini (disambiguation). Automobili Lamborghini S.p.A.Headquarters in Sant'Agata BologneseTypeSubsidiaryIndustryAutomotiveFounded1963; 60 years ago (1963)FounderFerruccio LamborghiniHeadquartersSant'Agata Bolognese, Emilia-Romagna, ItalyNumber of locations135 dealershipsArea servedWorldwideKey peopleStephan Winkelmann (CEO)[1]Production output 9,233 vehicles (2022)[2]Revenue €2.38 billion (2022)[…

artikel ini perlu dirapikan agar memenuhi standar Wikipedia. Tidak ada alasan yang diberikan. Silakan kembangkan artikel ini semampu Anda. Merapikan artikel dapat dilakukan dengan wikifikasi atau membagi artikel ke paragraf-paragraf. Jika sudah dirapikan, silakan hapus templat ini. (Pelajari cara dan kapan saatnya untuk menghapus pesan templat ini) Panjalu adalah sebuah kerajaan bercorak Hindu-Budha yang terletak di ketinggian 731 m dpl dan berada kaki Gunung Sawal Jawa Barat..Berikut 二ini ada…

Церква Успіння Пресвятої Богородиці 50°35′47″ пн. ш. 31°39′56″ сх. д. / 50.596514° пн. ш. 31.665586° сх. д. / 50.596514; 31.665586Координати: 50°35′47″ пн. ш. 31°39′56″ сх. д. / 50.596514° пн. ш. 31.665586° сх. д. / 50.596514; 31.665586Тип споруди церкваРозташ…

Katberg CommandoKatberg Commando emblemActive1948-1979Country South AfricaAllegiance  Republic of South Africa  Republic of South Africa Branch  South African Army  South African Army TypeInfantryRoleLight InfantrySizeOne BattalionPart ofSouth African Infantry CorpsArmy Territorial ReserveGarrison/HQAlice, Eastern Cape KatbergMilitary unit Katberg Commando was a light infantry regiment of the South African Army. It formed part of the South African Army Infantry Form…

Jay Rock discographyJay Rock performing in September 2015Studio albums3Music videos30Singles12Mixtapes10Promotional singles2 American rapper Jay Rock has released three studio albums, 10 mixtapes, 13 singles (including six as a featured artist) and 30 music videos. Albums Studio albums List of albums, with selected chart positions Title Album details Peak chart positions US[1] US R&B/HH[2] USRap[3] AUS[4] UK[5] Follow Me Home Released: July 26, 2011 La…

Song by Manic Street Preachers Liverpool RevisitedSingle by Manic Street Preachersfrom the album Resistance Is Futile Released6 April 2018RecordedDecember 2017Length2:31LabelSonySongwriter(s)Nicky Wire, James Dean Bradfield, Sean MooreProducer(s)Dave EringaManic Street Preachers singles chronology Dylan & Caitlin (2018) Liverpool Revisited (2018) Hold Me Like a Heaven (2018) Liverpool Revisited is a song by the Manic Street Preachers, it was the fourth single taken from their album Resistanc…

Kembali kehalaman sebelumnya

Lokasi Pengunjung: 3.129.194.67