Binary classification

Binary classification is the task of classifying the elements of a set into one of two groups (each called class). Typical binary classification problems include:

When measuring the accuracy of a binary classifier, the simplest way is to count the errors. But in the real world often one of the two classes is more important, so that the number of both of the different types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a false positive) is considered differently from not detecting a disease when it is present (a false negative).

In this set of tested instances, the instances left of the divider have the condition being tested; the right half do not. The oval bounds those instances that a test algorithm classifies as having the condition. The green areas highlight the instances that the test algorithm correctly classified. Labels refer to:
TP=true positive; TN=true negative; FP=false positive (type I error); FN=false negative (type II error); TPR=set of instances to determine true positive rate; FPR=set of instances to determine false positive rate; PPV=positive predictive value; NPV=negative predictive value.

Four outcomes

Given a classification of a specific data set, there are four basic combinations of actual data category and assigned category: true positives TP (correct positive assignments), true negatives TN (correct negative assignments), false positives FP (incorrect positive assignments), and false negatives FN (incorrect negative assignments).

Assigned
Actual
Test outcome positive Test outcome negative
Condition positive True positive False negative
Condition negative False positive True negative

These can be arranged into a 2×2 contingency table, with rows corresponding to actual value – condition positive or condition negative – and columns corresponding to classification value – test outcome positive or test outcome negative.

Evaluation

From tallies of the four basic outcomes, there are many approaches that can be used to measure the accuracy of a classifier or predictor. Different fields have different preferences.

The eight basic ratios

A common approach to evaluation is to begin by computing two ratios of a standard pattern. There are eight basic ratios of this form that one can compute from the contingency table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio".

There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements.

The row ratios are:

The column ratios are:

In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as sensitivity and specificity. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as precision and recall.

Cullerne Bown has suggested a flow chart for determining which pair of indicators should be used when.[1] Otherwise, there is no general rule for deciding. There is also no general agreement on how the pair of indicators should be used to decide on concrete questions, such as when to prefer one classifier over another.

One can take ratios of a complementary pair of ratios, yielding four likelihood ratios (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding likelihood ratios in diagnostic testing. Taking the ratio of one of these groups of ratios yields a final ratio, the diagnostic odds ratio (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an odds ratio – and is prevalence-independent.

Other metrics

There are a number of other metrics, most simply the accuracy or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The F-score combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score (F1 score). Some metrics come from regression coefficients: the markedness and the informedness, and their geometric mean, the Matthews correlation coefficient. Other metrics include Youden's J statistic, the uncertainty coefficient, the phi coefficient, and Cohen's kappa.

Statistical binary classification

Statistical classification is a problem studied in machine learning in which the classification is performed on the basis of a classification rule. It is a type of supervised learning, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification.

Some of the methods commonly used for binary classification are:

Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the feature vector, the noise in the data and many other factors. For example, random forests perform better than SVM classifiers for 3D point clouds.[2][3]

Converting continuous values to binary

Binary classification may be a form of dichotomization in which a continuous function is transformed into a binary variable. Tests whose results are of continuous values, such as most blood values, can artificially be made binary by defining a cutoff value, with test results being designated as positive or negative depending on whether the resultant value is higher or lower than the cutoff.

However, such conversion causes a loss of information, as the resultant binary classification does not tell how much above or below the cutoff a value is. As a result, when converting a continuous value that is close to the cutoff to a binary one, the resultant positive or negative predictive value is generally higher than the predictive value given directly from the continuous value. In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty. For example, with the urine concentration of hCG as a continuous value, a urine pregnancy test that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value. On the other hand, a test result very far from the cutoff generally has a resultant positive or negative predictive value that is lower than the predictive value given from the continuous value. For example, a urine hCG value of 200,000 mIU/ml confers a very high probability of pregnancy, but conversion to binary values results in that it shows just as "positive" as the one of 52 mIU/ml.

See also

References

  1. ^ William Cullerne Bown (2024). "Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas". Journal of Classification.
  2. ^ Zhang & Zakhor, Richard & Avideh (2014). "Automatic Identification of Window Regions on Indoor Point Clouds Using LiDAR and Cameras". VIP Lab Publications. CiteSeerX 10.1.1.649.303.
  3. ^ Y. Lu and C. Rasmussen (2012). "Simplified markov random fields for efficient semantic labeling of 3D point clouds" (PDF). IROS.

Bibliography

Read other articles:

Bronkus adalah kaliber jalan udara berupa dua percabangan utama dari trakea [1]pada sistem pernapasan yang membawa udara ke paru-paru. Tidak terdapat pertukaran udara yang terjadi pada bagian paru-paru ini. Bronkitis merupakan peradangan pada bronkus. Terdapat dua tipe utama, yakni akut dan kronik. Bronkitis akut biasanya disebabkan oleh infeksi virus atau bakteri. Galeri Referensi ^ Parker, Sybil, P (1984). McGraw-Hill Dictionary of Biology. McGraw-Hill Company.  Parameter ...

 

 

1393–1894 Korean government agency 37°34′20″N 126°58′29″E / 37.5723°N 126.9748°E / 37.5723; 126.9748 Bureau of InterpretersKorean nameHangul사역원Hanja司譯院TranscriptionsRevised RomanizationSayeok-wonMcCune–ReischauerSayŏgwŏn The Bureau of Interpreters or Sayŏgwŏn was an agency of the Joseon government of Korea from 1393 to 1894 responsible for training and supplying official interpreters. Textbooks for foreign languages produced by the burea...

 

 

Raharto Teno Prasetyo Wali Kota Pasuruan ke-17Masa jabatan21 September 2020 – 17 Februari 2021PresidenJoko WidodoGubernurKhofifah Indar ParawansaPendahuluSetiyonoPenggantiSaifullah YusufPelaksana Tugas Wali Kota PasuruanMasa jabatan8 Oktober 2018 – 21 September 2020PresidenJoko WidodoGubernurSoekarwo Khofifah Indar Parawansa Wakil Wali Kota Pasuruan ke-3Masa jabatan17 Februari 2016 – 8 Oktober 2018PresidenJoko WidodoGubernurSoekarwoPendahuluSetiyonoPen...

Device to measure hand baggage A series of baggage sizers at Alicante airport. A baggage sizer, also known as a bag sizer,[1] is a piece of furniture that is used primarily at airport check-in desks and boarding gates to assist and inform passengers and airport ground staff of baggage size limits for personal and cabin luggage or bags. It usually consists of a tubular metal or hard plastic frame with one or two holes in which bags are placed to see whether it fits the baggage size lim...

 

 

Secured forward military position Forward Operating Base Logar, Afghanistan Part of a series onWar History Prehistoric Ancient Post-classical Early modern Pike and shot napoleonic Late modern industrial fourth-gen Military Organization Command and control Defense ministry Army Navy Air force Marines Coast guard Space force Reserves Regular / Irregular Ranks Specialties: Staff Engineers Intelligence Reconnaissance Medical Military police Land units: Infantry Armor Cavalry Artillery Special for...

 

 

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

Хип-хоп Направление популярная музыка Истоки фанкдискоэлектронная музыкадабритм-энд-блюзреггидэнсхоллджаз[1]чтение нараспев[англ.]исполнение поэзииустная поэзияозначиваниедюжины[англ.]гриотыскэтразговорный блюз Время и место возникновения Начало 1970-х, Бронкс, Н...

 

 

此條目可能包含不适用或被曲解的引用资料,部分内容的准确性无法被证實。 (2023年1月5日)请协助校核其中的错误以改善这篇条目。详情请参见条目的讨论页。 各国相关 主題列表 索引 国内生产总值 石油储量 国防预算 武装部队(军事) 官方语言 人口統計 人口密度 生育率 出生率 死亡率 自杀率 谋杀率 失业率 储蓄率 识字率 出口额 进口额 煤产量 发电量 监禁率 死刑 国债 ...

 

 

سسترورتسك    علم شعار الاسم الرسمي (بالروسية: Сестрорецк)‏  الإحداثيات 60°06′00″N 29°58′00″E / 60.1°N 29.966666666667°E / 60.1; 29.966666666667   تاريخ التأسيس 1714  تقسيم إداري  البلد روسيا[3][1][2]  عدد السكان  عدد السكان 43055 (2020)[4]  معلومات أخرى منط...

Heavy casualties occurred when submarines sank large passenger ships converted into military transports, such as the Wilhelm Gustloff, that were overloaded with soldiers, prisoners, or refugees. While submarines were invented centuries ago, development of self-propelled torpedoes during the latter half of the 19th century dramatically increased the effectiveness of military submarines. Initial submarine scouting patrols against surface warships sank several cruisers during the first month of...

 

 

List of individuals with title of Infanta of Spain (Infanta de España) by birth or marriage since the reign of Carlos I, under whom the crowns of Castile and Aragon were united, forming the Kingdom of Spain. Individuals holding the title of Infanta are often styled Royal Highness (Alteza Real). Infantas of Spain by birth Picture Name Parent Born Died Marriage Notes María de Austria y Aviz Carlos I 1528 1603 Maximilian II, Holy Roman Emperor ​ ​(m. 1548; die...

 

 

American singer-songwriter (1917–1998) The Popcorn Song redirects here. For similarly titled songs, see Popcorn (disambiguation) § Songs. Cliffie StonePublicity photo of Cliffie Stone, circa 1952Background informationBirth nameClifford Gilpin SnyderBorn(1917-03-01)March 1, 1917Stockton, California, United StatesDiedJanuary 17, 1998(1998-01-17) (aged 80)Santa Clarita, California, United StatesGenresCountryOccupation(s)Singermusicianrecord producermusic publisherradio personalityte...

Mythological battles between the ancient Greeks and the Amazons Amazon with barbarian and Greek, Roman copy of Greek original, detail, c. 160 AD, marble; Galleria Borghese 4th century AD Amazonomachy mosaic from Daphne, a suburb of Antioch on the Orontes (modern Antakya, Turkey); Louvre, Denon Wing Relief now in Vienna In Greek mythology, an Amazonomachy (English translation: Amazon battle; plural, Amazonomachiai (Ancient Greek: Ἀμαζονομαχίαι) or Amazonomachies) is a mythologica...

 

 

Regeringen EdénSveriges regering Statschef Statschef Gustaf V Tidsperiod Tillträde 19 oktober 1917 Frånträde 10 mars 1920 Ministrar och partier Statsminister Nils Edén Regeringsparti(er) SocialdemokraternaLiberala samlingspartiet Status i parlamentet minoritetsregering (FK) majoritetsregering (AK) Oppositionsparti(er) Allmänna valmansförbundetSveriges socialdemokratiska vänsterpartiBondeförbundetJordbrukarnas Riksförbund Historik Val 19171920 Senaste valet 148 / 230 Mandatperiod(er...

 

 

Town in Zealand, DenmarkIdestrupTownIdestrup Church, FalsterIdestrupLocation on FalsterShow map of FalsterIdestrupIdestrup (Denmark Region Zealand)Show map of Denmark Region ZealandIdestrupIdestrup (Denmark)Show map of DenmarkCoordinates: 54°44′32″N 11°57′28″E / 54.74222°N 11.95778°E / 54.74222; 11.95778CountryDenmarkRegionZealand (Sjælland)MunicipalityGuldborgsundArea • Urban0.34 sq mi (0.88 km2)Population (2024) •&#...

Courcelles-EpayellescomuneCourcelles-Epayelles – Veduta LocalizzazioneStato Francia RegioneAlta Francia Dipartimento Oise ArrondissementClermont CantoneEstrées-Saint-Denis TerritorioCoordinate49°34′N 2°37′E49°34′N, 2°37′E (Courcelles-Epayelles) Altitudine84 e 109 m s.l.m. Superficie6,29 km² Abitanti192[1] (2009) Densità30,52 ab./km² Altre informazioniCod. postale60420 Fuso orarioUTC+1 Codice INSEE60168 CartografiaCourcelles-Epayelles Sito is...

 

 

Australian philosopher and writer Peter Godfrey-SmithPeter Godfrey-Smith reads from Other Minds at Adelaide Writers Week 2018Born1965 (age 58–59)NationalityAustralianEducationUniversity of SydneyUC San DiegoAwardsLakatos AwardInstitutionsUniversity of SydneyCUNY Graduate CenterHarvard UniversityAustralian National UniversityStanford UniversityThesisTeleonomy and the Philosophy of Mind (1991)Doctoral advisorPhilip KitcherLanguageEnglishMain interestsPhilosophy of biologyPhiloso...

 

 

Questa voce o sezione sull'argomento veicoli militari non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Segui i suggerimenti del progetto di riferimento. Frot-LafflyDescrizioneTipocarro armato Equipaggio9 ProgettistaPaul Frot Costruttore Laffly Esemplari1 Dimensioni e pesoLunghezza7,0 m Larghezza2,0 m Altezza2,3 m Peso10 t Propulsione e tecnicaMotorebenzin...

For the online investment advisor, see Betterment (company). In real estate, betterment is the increased value of real property from causes other than investment made by the property owner.[1] It is, therefore, usually referred to as unearned increment or windfall gain. When, for instance, a property is rezoned for higher-value uses, or nearby public improvements raise the value of a piece of private land, a property owner is bettered due to the actions of others. Because of this, cap...

 

 

Cet article est une ébauche concernant le droit français. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Le tribunal supérieur d'appel est une juridiction qui a, en France, des compétences semblables à celles qu'ont les cours d'appel dans le reste du territoire, à savoir : statuer, en principe, sur tous les appels interjetés contre les décisions de justice rendues par la plupart des juridictions ju...