Censoring (statistics)

In statistics, censoring is a condition in which the value of a measurement or observation is only partially known.

For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual's age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75.

Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 140 kg. If a 160 kg individual is weighed using the scale, the observer would only know that the individual's weight is at least 140 kg.

The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown.

Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range: values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding.

Types

  • Left censoring – a data point is below a certain value but it is unknown by how much.
  • Interval censoring – a data point is somewhere on an interval between two values.
  • Right censoring – a data point is above a certain value but it is unknown by how much.
  • Type I censoring occurs if an experiment has a set number of subjects or items and stops the experiment at a predetermined time, at which point any subjects remaining are right-censored.
  • Type II censoring occurs if an experiment has a set number of subjects or items and stops the experiment when a predetermined number are observed to have failed; the remaining subjects are then right-censored.
  • Random (or non-informative) censoring is when each subject has a censoring time that is statistically independent of their failure time. The observed value is the minimum of the censoring and failure times; subjects whose failure time is greater than their censoring time are right-censored.

Interval censoring can occur when observing a value requires follow-ups or inspections. Left and right censoring are special cases of interval censoring, with the beginning of the interval at zero or the end at infinity, respectively.

Estimation methods for using left-censored data vary, and not all methods of estimation may be applicable to, or the most reliable, for all data sets.[1]

A common misconception with time interval data is to class as left censored intervals where the start time is unknown. In these cases we have a lower bound on the time interval, thus the data is right censored (despite the fact that the missing start point is to the left of the known interval when viewed as a timeline!).

Analysis

Special techniques may be used to handle censored data. Tests with specific failure times are coded as actual failures; censored data are coded for the type of censoring and the known interval or limit. Special software programs (often reliability oriented) can conduct a maximum likelihood estimation for summary statistics, confidence intervals, etc.

Epidemiology

One of the earliest attempts to analyse a statistical problem involving censored data was Daniel Bernoulli's 1766 analysis of smallpox morbidity and mortality data to demonstrate the efficacy of vaccination.[2] An early paper to use the Kaplan–Meier estimator for estimating censored costs was Quesenberry et al. (1989),[3] however this approach was found to be invalid by Lin et al.[4] unless all patients accumulated costs with a common deterministic rate function over time, they proposed an alternative estimation technique known as the Lin estimator.[5]

Operating life testing

Example of five replicate tests resulting in four failures and one suspended time resulting in censoring.

Reliability testing often consists of conducting a test on an item (under specified conditions) to determine the time it takes for a failure to occur.

  • Sometimes a failure is planned and expected but does not occur: operator error, equipment malfunction, test anomaly, etc. The test result was not the desired time-to-failure but can be (and should be) used as a time-to-termination. The use of censored data is unintentional but necessary.
  • Sometimes engineers plan a test program so that, after a certain time limit or number of failures, all other tests will be terminated. These suspended times are treated as right-censored data. The use of censored data is intentional.

An analysis of the data from replicate tests includes both the times-to-failure for the items that failed and the time-of-test-termination for those that did not fail.

Censored regression

An earlier model for censored regression, the tobit model, was proposed by James Tobin in 1958.[6]

Likelihood

The likelihood is the probability or probability density of what was observed, viewed as a function of parameters in an assumed model. To incorporate censored data points in the likelihood the censored data points are represented by the probability of the censored data points as a function of the model parameters given a model, i.e. a function of CDF(s) instead of the density or probability mass.

The most general censoring case is interval censoring: , where is the CDF of the probability distribution, and the two special cases are:

  • left censoring:
  • right censoring:

For continuous probability distributions:

Example

Suppose we are interested in survival times, , but we don't observe for all . Instead, we observe

, with and if is actually observed, and
, with and if all we know is that is longer than .

When is called the censoring time.[7]

If the censoring times are all known constants, then the likelihood is

where = the probability density function evaluated at ,

and = the probability that is greater than , called the survival function.

This can be simplified by defining the hazard function, the instantaneous force of mortality, as

so

.

Then

.

For the exponential distribution, this becomes even simpler, because the hazard rate, , is constant, and . Then:

,

where .

From this we easily compute , the maximum likelihood estimate (MLE) of , as follows:

.

Then

.

We set this to 0 and solve for to get:

.

Equivalently, the mean time to failure is:

.

This differs from the standard MLE for the exponential distribution in that the any censored observations are considered only in the numerator.

See also

References

  1. ^ Helsel, D. (2010). "Much Ado About Next to Nothing: Incorporating Nondetects in Science". Annals of Occupational Hygiene. 54 (3): 257–262. doi:10.1093/annhyg/mep092. PMID 20032004.
  2. ^ Bernoulli, D. (1766). "Essai d'une nouvelle analyse de la mortalité causée par la petite vérole". Mem. Math. Phy. Acad. Roy. Sci. Paris, reprinted in Bradley (1971) 21 and Blower (2004)
  3. ^ Quesenberry, C. P. Jr.; et al. (1989). "A survival analysis of hospitalization among patients with acquired immunodeficiency syndrome". American Journal of Public Health. 79 (12): 1643–1647. doi:10.2105/AJPH.79.12.1643. PMC 1349769. PMID 2817192.
  4. ^ Lin, D. Y.; et al. (1997). "Estimating medical costs from incomplete follow-up data". Biometrics. 53 (2): 419–434. doi:10.2307/2533947. JSTOR 2533947. PMID 9192444.
  5. ^ Wijeysundera, H. C.; et al. (2012). "Techniques for estimating health care costs with censored data: an overview for the health services researcher". ClinicoEconomics and Outcomes Research. 4: 145–155. doi:10.2147/CEOR.S31552. PMC 3377439. PMID 22719214.
  6. ^ Tobin, James (1958). "Estimation of relationships for limited dependent variables" (PDF). Econometrica. 26 (1): 24–36. doi:10.2307/1907382. JSTOR 1907382.
  7. ^ Lu Tian, Likelihood Construction, Inference for Parametric Survival Distributions (PDF), Wikidata Q98961801.

Further reading

  • "Engineering Statistics Handbook", NIST/SEMATEK, [1]

Read other articles:

Tempat penyimpanan gas alam. Sumber daya alam ini menjadi asal mula munculnya istilah dutch disease. Penyakit Belanda (bahasa Inggris: Dutch disease) adalah fenomena di bidang perekonomian yang merujuk pada dampak yang biasanya ditimbulkan oleh berlimpahnya sumber daya alam di suatu negara.[1] Istilah ini dikemukakan pertama kali pada tahun 1977, yang merujuk pada menurunnya pertumbuhan di sektor perindustrian secara drastis akibat ditemukannya sumber gas alam yang berlimpah di Be...

 

 

Parochial, coeducational school in Perryville, , Missouri, United StatesSt. Vincent High SchoolAddress210 South Waters StreetPerryville, (Perry County), Missouri 63775United StatesCoordinates37°43′23″N 89°52′30″W / 37.72306°N 89.87500°W / 37.72306; -89.87500InformationTypeParochial, CoeducationalMottoReligio ∙ Scientia ∙ Cultura(Religion ∙ Knowledge ∙ Culture)Religious affiliation(s)Roman CatholicPatron saint(s)St. Vincent DePaulEstablished1896Super...

 

 

Town in Indiana, United StatesOsgood, IndianaTownThe Damm Theatre in Downtown OsgoodLocation of Osgood in Ripley County, Indiana.Coordinates: 39°07′32″N 85°17′36″W / 39.12556°N 85.29333°W / 39.12556; -85.29333CountryUnited StatesStateIndianaCountyRipleyTownshipCenterArea[1] • Total1.58 sq mi (4.11 km2) • Land1.55 sq mi (4.01 km2) • Water0.04 sq mi (0.09 km2)Elevation&#...

Vignette CorporationIndustrySoftwareFounded1995; 29 years ago (1995)FounderRoss Garber Neil WebberDefunctJuly 21, 2009; 14 years ago (2009-07-21)FateAcquired by Open Text CorporationHeadquartersAustin, Texas Vignette Corporation was a company that offered a suite of content management, web portal, collaboration, document management, and records management software. Targeted at the enterprise market, Vignette offered products under the name StoryServer that ...

 

 

Chronologies Données clés 1954 1955 1956  1957  1958 1959 1960Décennies :1920 1930 1940  1950  1960 1970 1980Siècles :XVIIIe XIXe  XXe  XXIe XXIIeMillénaires :-Ier Ier  IIe  IIIe Chronologies géographiques Afrique Afrique du Sud, Algérie, Angola, Bénin, Botswana, Burkina Faso, Burundi, Cameroun, Cap-Vert, République centrafricaine, Comores, République du Congo, République démocratique du Congo, Côte d'Ivoire, Djibouti, Égyp...

 

 

Supercopa andorrana 2017 Competizione Supercopa andorrana Sport Calcio Edizione 15ª Organizzatore Federazione calcistica di Andorra Date 10 settembre 2017 Luogo  Andorra Partecipanti 2 Formula Partita unica Risultati Vincitore FC Santa Coloma(6º titolo) Secondo UE Santa Coloma Cronologia della competizione 2016 2018 Manuale La Supercopa andorrana 2017 è stata la quindicesima edizione della supercopa andorrana di calcio. La partita è stata giocata dall'FC Santa Coloma, vincitore del ...

هذه المقالة عن المجموعة العرقية الأتراك وليس عن من يحملون جنسية الجمهورية التركية أتراكTürkler (بالتركية) التعداد الكليالتعداد 70~83 مليون نسمةمناطق الوجود المميزةالبلد  القائمة ... تركياألمانياسورياالعراقبلغارياالولايات المتحدةفرنساالمملكة المتحدةهولنداالنمساأسترالي�...

 

 

Pour les articles homonymes, voir Pays-Bas (homonymie). Ne doit pas être confondu avec la Hollande. Royaume des Pays-Bas(nl) Koninkrijk der Nederlanden Drapeau des Pays-Bas Armoiries des Pays-Bas Devise Je maintiendrai Hymne en néerlandais : Wilhelmus van Nassouwe (« Guillaume de Nassau ») Fête nationale 27 avril[Note 1] · Événement commémoré Anniversaire du roi régnant Le Royaume des Pays-Bas en Europe (l’Union européenne en vert clair). Administr...

 

 

这是马来族人名,“尤索夫”是父名,不是姓氏,提及此人时应以其自身的名“法迪拉”为主。 尊敬的拿督斯里哈芝法迪拉·尤索夫Fadillah bin Haji YusofSSAP DGSM PGBK 国会议员 副首相 第14任马来西亚副首相现任就任日期2022年12月3日与阿末扎希同时在任君主最高元首苏丹阿都拉陛下最高元首苏丹依布拉欣·依斯迈陛下首相安华·依布拉欣前任依斯迈沙比里 马来西亚能源转型与�...

莎拉·阿什頓-西里洛2023年8月,阿什頓-西里洛穿著軍服出生 (1977-07-09) 1977年7月9日(46歲) 美國佛羅里達州国籍 美國别名莎拉·阿什頓(Sarah Ashton)莎拉·西里洛(Sarah Cirillo)金髮女郎(Blonde)职业記者、活動家、政治活動家和候選人、軍醫活跃时期2020年—雇主內華達州共和黨候選人(2020年)《Political.tips》(2020年—)《LGBTQ國度》(2022年3月—2022年10月)烏克蘭媒�...

 

 

English politician (1804–1865) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Richard Cobden – news · newspapers · books · scholar · JSTOR (December 2022) (Learn how and when to remove this message) Richard CobdenCobden c. early 1860sParliamentary offices1841–1847Member of Parliament for Stockport1847�...

 

 

One of the first stamps of Nova Scotia issued 1851. This is a survey of the postage stamps and postal history of Nova Scotia. First stamps The first stamps of Nova Scotia were issued in 1851.[1] 1863 issue in cents Values 1 to 12+1⁄2 cents. 2 cents issued 1863. Nova Scotia joined the Dominion of Canada in 1867. See also List of people on stamps of the Canadian provinces Postage stamps and postal history of Canada References ^ Stanley Gibbons Stamp Catalogue: Commonwealth and Br...

Questa voce sull'argomento fisici statunitensi è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Robert Williams Wood Robert Williams Wood (Concord, 2 maggio 1868 – Amityville, 11 agosto 1955) è stato un fisico e inventore statunitense. Indice 1 Biografia 2 Onorificenze 3 Altri progetti 4 Collegamenti esterni Biografia I suoi studi furono volti alla spettroscopia, alla fosforescenza e alla diffrazione sebbene sia maggiormente noti i suoi lavori sulla...

 

 

2013 Singaporean filmGhost ChildTheatrical release posterDirected byGilbert ChanWritten byGilbert ChanTang Fong ChengDanny YeoScreenplay byTan Fong ChengGilbert ChanProduced byGary GohEric KhooTang Fong ChengBert Tan (Executive Producer)StarringChen HanweiJayley WooCarmen SooCinematographyKevin Kee HoeEdited byNatalie SohMusic byKen ChongProductioncompaniesGorylah PicturesClover FilmsMm2 EntertainmentDistributed byGolden Village PicturesRelease date 7 March 2013 (2013-03-07)&#...

 

 

The following is a list of football stadiums in Albania, ranked by seating capacity. The minimum capacity is 1,000. Current stadiums Rank Stadium City Capacity Home team(s) Opened Image 1 Arena Kombëtare Tiranë 22,500 Albania NT 2019 2 Loro Boriçi Stadium Shkodër 16,000 Albania NT, Vllaznia 1953, 2016 3 Tomori Stadium Berat 14,890 Tomori 1985 4 Elbasan Arena Elbasan 12,800 Albania NT, KF Elbasani 1967, 2014 5 Skënderbeu Stadium Korçë 12,343 Skënderbeu 1957 6 Niko Dovana Stadium Durr�...

Cet article est une ébauche concernant la médecine. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne cite pas suffisamment ses sources (juin 2009). Si vous disposez d'ouvrages ou d'articles de référence ou si vous connaissez des sites web de qualité traitant du thème abordé ici, merci de compléter l'article en ...

 

 

هذه المقالة بحاجة لصندوق معلومات. فضلًا ساعد في تحسين هذه المقالة بإضافة صندوق معلومات مخصص إليها. هذا المقال جزء من سلسلة عنالطب البديل والعلوم الزائفة معلومات عامة طب بديل طب بيطري بديل الدجل (الخداع الطبي) تاريخ الطب البديل تاريخ الطب العلوم الزائفة مناهضة العلم شكوكية ...

 

 

Halaman ini berisi artikel tentang perkembangan geografi dan politik dari wilayah kekuasaan orang Franka. Untuk suku bangsa dan masyarakat Franka, lihat Suku Franka. Untuk kegunaan lain, lihat Franka. Kerajaan Orang FrankaRegnum Francorumcode: la is deprecated   (Latin)Riki Frankscode: fr is deprecated   (Prancis)481–843 Lebah MaduLambang kebesaran wangsa Meroving Peta diakronik dari keseluruhan wilayah yang pernah dikuasai orang FrankaStatusKekaisaranIbu kota Tournai (4...

Part of a series onAncientMesopotamian religionChaos Monster and Sun God Religions of the ancient Near East Anatolia Ancient Egypt Mesopotamia Babylonia Sumer Iranian Semitic Arabia Canaan Primordial beings Tiamat and Abzu Lahamu and Lahmu Kishar and Anshar Mummu Seven gods who decree Four primary Anu Enlil Enki Ninhursag Three sky gods Inanna/Ishtar Nanna/Sin Utu/Shamash Other major deities Adad Dumuzid Enkimdu Enmesharra Ereshkigal Ki Kingu Geshtinanna Lahar Marduk Nergal Ninurta Šulpae M...

 

 

  هذه المقالة عن المنصورية بالمغرب. لمعانٍ أخرى، طالع المنصورية (توضيح). المنصورية (المغرب) المنصورية[1](بالفرنسية: El Mansouria)‏[1]    تقسيم إداري البلد  المغرب[2] الجهة الإقتصادية الدار البيضاء - سطات المسؤولون الإقليم إقليم بن سليمان الدائرة الإدارية ؟؟�...