Confusion matrix

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix,[1] is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.

Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature.[2] The diagonal of the matrix therefore represents all instances that are correctly predicted.[3] The name stems from the fact that it makes it easy to see whether the system is confusing two classes (i.e. commonly mislabeling one as another).

It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).

Example

Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows:

Individual Number 1 2 3 4 5 6 7 8 9 10 11 12
Actual Classification 1 1 1 1 1 1 1 1 0 0 0 0

Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9).

Individual Number 1 2 3 4 5 6 7 8 9 10 11 12
Actual Classification 1 1 1 1 1 1 1 1 0 0 0 0
Predicted Classification 0 0 1 1 1 1 1 1 1 0 0 0

Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column. One, if the actual classification is positive and the predicted classification is positive (1,1), this is called a true positive result because the positive sample was correctly identified by the classifier. Two, if the actual classification is positive and the predicted classification is negative (1,0), this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. Third, if the actual classification is negative and the predicted classification is positive (0,1), this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. Fourth, if the actual classification is negative and the predicted classification is negative (0,0), this is called a true negative result because the negative sample gets correctly identified by the classifier.

We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.

Individual Number 1 2 3 4 5 6 7 8 9 10 11 12
Actual Classification 1 1 1 1 1 1 1 1 0 0 0 0
Predicted Classification 0 0 1 1 1 1 1 1 1 0 0 0
Result FN FN TP TP TP TP TP TP FP TN TN TN

The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows:

Predicted condition
Total population
= P + N
Positive (PP) Negative (PN)
Actual condition
Positive (P) True positive (TP)
False negative (FN)
Negative (N) False positive (FP)
True negative (TN)
Sources: [4][5][6][7][8][9][10]

The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.

Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:

Predicted condition
Total

8 + 4 = 12

Cancer
7
Non-cancer
5
Actual condition
Cancer
8
6 2
Non-cancer
4
1 3

In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. and .

Table of confusion

In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.

For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).

According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).[11]

Other metrics can be included in a confusion matrix, each of them having their significance and use.

Predicted condition Sources: [12][13][14][15][16][17][18][19]
Total population
= P + N
Predicted positive (PP) Predicted negative (PN) Informedness, bookmaker informedness (BM)
= TPR + TNR − 1
Prevalence threshold (PT)
= TPR × FPR - FPR/TPR - FPR
Actual condition
Positive (P) [a] True positive (TP),
hit[b]
False negative (FN),
miss, underestimation
True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power
= TP/P = 1 − FNR
False negative rate (FNR),
miss rate
type II error [c]
= FN/P = 1 − TPR
Negative (N)[d] False positive (FP),
false alarm, overestimation
True negative (TN),
correct rejection[e]
False positive rate (FPR),
probability of false alarm, fall-out
type I error [f]
= FP/N = 1 − TNR
True negative rate (TNR),
specificity (SPC), selectivity
= TN/N = 1 − FPR
Prevalence
= P/P + N
Positive predictive value (PPV), precision
= TP/PP = 1 − FDR
False omission rate (FOR)
= FN/PN = 1 − NPV
Positive likelihood ratio (LR+)
= TPR/FPR
Negative likelihood ratio (LR−)
= FNR/TNR
Accuracy (ACC)
= TP + TN/P + N
False discovery rate (FDR)
= FP/PP = 1 − PPV
Negative predictive value (NPV)
= TN/PN = 1 − FOR
Markedness (MK), deltaP (Δp)
= PPV + NPV − 1
Diagnostic odds ratio (DOR)
= LR+/LR−
Balanced accuracy (BA)
= TPR + TNR/2
F1 score
= 2 PPV × TPR/PPV + TPR = 2 TP/2 TP + FP + FN
Fowlkes–Mallows index (FM)
= PPV × TPR
Matthews correlation coefficient (MCC)
= TPR × TNR × PPV × NPV - FNR × FPR × FOR × FDR
Threat score (TS), critical success index (CSI), Jaccard index
= TP/TP + FN + FP
  1. ^ the number of real positive cases in the data
  2. ^ A test result that correctly indicates the presence of a condition or characteristic
  3. ^ Type II error: A test result which wrongly indicates that a particular condition or attribute is absent
  4. ^ the number of real negative cases in the data
  5. ^ A test result that correctly indicates the absence of a condition or characteristic
  6. ^ Type I error: A test result which wrongly indicates that a particular condition or attribute is present


Confusion matrices with more than two categories

Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity.[20]

Perceived
vowel
Vowel
produced
i e a o u
i 15 1
e 1 1
a 79 5
o 4 15 3
u 2 2

See also

References

  1. ^ Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment. 62 (1): 77–89. Bibcode:1997RSEnv..62...77S. doi:10.1016/S0034-4257(97)00083-7.
  2. ^ Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. S2CID 55767944.
  3. ^ Opitz, Juri (2024). "A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice". Transactions of the Association for Computational Linguistics. 12: 820–836. arXiv:2404.16958. doi:10.1162/tacl_a_00675.
  4. ^ Provost, Foster; Fawcett, Tom (2013). Data science for business: what you need to know about data mining and data-analytic thinking (1. ed., 2. release ed.). Beijing Köln: O'Reilly. ISBN 978-1-4493-6132-7.
  5. ^ Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. Bibcode:2006PaReL..27..861F. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090.
  6. ^ Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
  7. ^ Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
  8. ^ Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
  9. ^ Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
  10. ^ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003.
  11. ^ Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
  12. ^ Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090.
  13. ^ Provost, Foster; Tom Fawcett (2013-08-01). "Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking". O'Reilly Media, Inc.
  14. ^ Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
  15. ^ Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
  16. ^ Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
  17. ^ Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
  18. ^ Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 13. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410.
  19. ^ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003.
  20. ^ Rialland, Annie (August 2005). "Phonological and phonetic aspects of whistled languages". Phonology. 22 (2): 237–271. CiteSeerX 10.1.1.484.4384. doi:10.1017/S0952675705000552. S2CID 18615779.


Read other articles:

Shonali BoseBose pada 2015.Lahir03 Juni 1965 (umur 58)Kolkata, IndiaPekerjaanPenulis, pembuat filmTahun aktif1992—sekarang Shonali Bose (kelahiran 3 Juni 1965) adalah seorang produser film, penulis, dan sutradara film asal India. Setelah membuat film fiturnya pada 2005, ia telah meraih beberapa penghargaan dan nominasi seperti Penghargaan Film Nasional, sebuah Bridgestone Narrative Award, dan sebuah Sundance Mahindra Global Filmmaker Award. Referensi Pranala luar Official Webpage...

 

 

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Januari 2016. Chassepierre (bahasa Walonia Tchespire) adalah sebuah desa di kabupaten Florenville, Provinsi Luxemburg, Walonia, Belgia. Chassepierre artinya adalah rumah batu dan desa ini merupakan sebuah desa seni yang terletak di tepi sungai Semois. Pada perombaka...

 

 

العلاقات التشيلية السويسرية تشيلي سويسرا   تشيلي   سويسرا تعديل مصدري - تعديل   العلاقات التشيلية السويسرية هي العلاقات الثنائية التي تجمع بين تشيلي وسويسرا.[1][2][3][4][5] مقارنة بين البلدين هذه مقارنة عامة ومرجعية للدولتين: وجه المقارنة تشيل�...

American humorist and writer Erma BombeckErma BombeckBornErma Louise Fiste(1927-02-21)February 21, 1927Bellbrook, Ohio, U.S.DiedApril 22, 1996(1996-04-22) (aged 69)San Francisco, California, U.S.OccupationHumorist, syndicated columnist, writerEducationUniversity of DaytonYears active1965–1996SpouseBill Bombeck (m. 1949)Children3[1] Erma Louise Bombeck (née Fiste; February 21, 1927 – April 22, 1996) was an American humorist who achieved great popularity for her newspaper...

 

 

TermodinamikaMesin panas klasik Carnot Cabang Klasik Statistik Kimia Termodinamika kuantum Kesetimbangan / Tak setimbang Hukum Awal Pertama Kedua Ketiga Sistem Keadaan Persamaan keadaan Gas ideal Gas nyata Wujud zat Kesetimbangan Volume kontrol Instrumen Proses Isobarik Isokorik Isotermis Adiabatik Isentropik Isentalpik Quasistatik Politropik Ekspansi bebas Reversibel Ireversibel Endoreversibilitas Siklus Mesin kalor Pompa kalor Efisiensi termal Properti sistemCatatan: Variabel konjugat ...

 

 

Robert Horry PosisiPower forward JulukanBig Shot RobLigaNBATinggi6 ft 10 in (2,08 m) Berat240 lb (109 kg)KlubSan Antonio SpursNegara  Amerika SerikatLahir25 Agustus 1970Hartford, MarylandKuliahUniversity of AlabamaDraftke-11, 1992 Houston RocketsKarier pro1992 – sekarangKlub sebelumnya Houston Rockets 1992-1995Phoenix Suns 1996Los Angeles Lakers 1996-2003 Robert Horry (lahir 25 Agustus 1970) adalah mantan[1]pemain NBA yang bermain untuk tim San Ant...

Serbian ceremonial, round loaf of bread ČesnicaHome-made ČesnicaTypeBreadPlace of originSerbiaMain ingredientsWheat flour, water  Media: Česnica A česnica (Serbian Cyrillic: чесница, Serbian pronunciation: [tʃěːsnit͜sa]; derived from the noun čest, meaning share), also called Božićna pogača (Serbian Cyrillic: Божићна погача, Christmas pogača) is the ceremonial, round loaf of bread that is an indispensable part of Christmas dinner in Serbian tr...

 

 

Voyager 2, un ejemplo de sonda espacial. Lanzada en 1977, voló por Júpiter, Saturno, Urano y Neptuno. Una sonda espacial es un dispositivo artificial que se envía al espacio con el fin de estudiar cuerpos de nuestro sistema solar, tales como planetas, satélites, asteroides o cometas a medida que recopila datos científicos.[1]​ Las sondas espaciales se suelen denominar también satélites artificiales, si bien, estrictamente hablando, una sonda se diferencia de un satélite en que n...

 

 

2005 film score by Dario MarianelliPride & PrejudiceFilm score by Dario MarianelliReleasedNovember 15, 2005 (2005-11-15) (U.S.)GenreClassicalLength41:22LabelDecca RecordsProducerNick AngelDario Marianelli chronology Sauf le respect que je vous dois(2005) Pride & Prejudice(2005) The Return(2006) Pride & Prejudice (Music from the Motion Picture) is the soundtrack to the 2005 film of the same name and was composed by Dario Marianelli and performed by Jean-Yves ...

密西西比州 哥伦布城市綽號:Possum Town哥伦布位于密西西比州的位置坐标:33°30′06″N 88°24′54″W / 33.501666666667°N 88.415°W / 33.501666666667; -88.415国家 美國州密西西比州县朗兹县始建于1821年政府 • 市长罗伯特·史密斯 (民主党)面积 • 总计22.3 平方英里(57.8 平方公里) • 陸地21.4 平方英里(55.5 平方公里) • ...

 

 

Spanish sculptor (1727–1797) You can help expand this article with text translated from the corresponding article in Spanish. (June 2014) Click [show] for important translation instructions. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translated text into the English Wikipedia. Do not translate text t...

 

 

Characteristic of an animal A biological ornament is a characteristic of an animal that appears to serve a decorative function rather than a utilitarian function. Many are secondary sexual characteristics, and others appear on young birds during the period when they are dependent on being fed by their parents. Ornaments are used in displays to attract mates, which may lead to the evolutionary process known as sexual selection. An animal may shake, lengthen, or spread out its ornament in order...

خطوط بروكسل الجوية Brussels Airlines     إياتاSN إيكاوBEL رمز النداءBEE-LINE تاريخ الإنشاء 7 نوفمبر 2006 الجنسية بلجيكا  بداية النشاط 25 مارس 2007 المطارات الرئيسية مطار بروكسل برنامج المسافر الدائم أميال وأكثر التحالفات تحالف ستار الشعار «الطيران بطريقتك» المقرات الرئيسية دايغة، م�...

 

 

Voce principale: Hockey Vercelli. Hockey VercelliStagione 2021-2022Sport hockey su pista Squadra Vercelli Allenatore Sergi Punset Presidente Gianni Torazzo Serie A18º Play-off scudettoQuarti di finale Coppa ItaliaQuarti di finale Miglior marcatoreCampionato: Tataranni (29 reti) StadioPalaPregnolato 2020-2021 2022-2023 Questa voce raccoglie le informazioni riguardanti l'Hockey Vercelli nelle competizioni ufficiali della stagione 2021-2022. Indice 1 Maglie e sponsor 2 Organigramma so...

 

 

Avión Torpedo Replica of Pedro Paulet's Avión Torpedo of 1902 Role Liquid-propelled rocket-powered aircraftType of aircraft National origin Peru Designer Pedro Paulet Status Project only Number built 0 The Avión Torpedo was a liquid-propelled rocket-powered aircraft project designed by Pedro Paulet in 1902. Paulet would spend decades attempting to achieve funding for the project throughout Europe and Latin America, but found no donors.[1] Design and development Peruvian Pedro Paul...

This article is about the coin. For other uses, see Fals (disambiguation). Fulus redirects here. For the modern unit of account, see Fils (currency). Medieval copper coin issued by the Umayyad caliphate A fals minted in Damascus between 696 and 750 Fals of al-Ma'mun, AH 219 (834/5 CE), al-Quds (Jerusalem). Under the Umayyads Jerusalem was known by its Roman name Iliya Filastin (Aelia Palaestina), but from the time of Caliph al-Ma'mun, it was given the Islamic religious name al-Quds (meaning �...

 

 

سفر الأمثالمعلومات عامةجزء من العهد القديمSapiential Books (en) الاسم المختصر Spr (بالألمانية) العنوان מִשְלֵי שְׁלֹמֹה (بالعبرية توراتية) النوع الفني أداب الحكمه لغة العمل أو لغة الاسم العبرانية مدخل في جدول مختصرات Proverbs[1] لديه جزء أو أجزاء  القائمة ... سفر الأمثال 1سفر الأمثا�...

 

 

فور ركر   الإحداثيات 31°20′N 85°43′W / 31.34°N 85.71°W / 31.34; -85.71   [1] تاريخ التأسيس 1 مايو 1942  تقسيم إداري  البلد الولايات المتحدة[2][3]  التقسيم الأعلى مقاطعة دال  خصائص جغرافية  المساحة 28.233028 كيلومتر مربع (1 أبريل 2010)  عدد السكان  عدد السكان...

جواز سفر ترينداد وتوباغومعلومات عامةنوع المستند جواز سفرالبلد ترينيداد وتوباغوالغرض التعريف (هوية شخصية)صادر عن  ترينيداد وتوباغوصالح في ترينيداد وتوباغومتطلبات الاستحقاق جنسية ترينداد وتوباغوالانتهاء 10 سنوات بالنسبة للبالغين، 5 سنوات للأطفال دون سن السادسة عشر.تعد...

 

 

Michele Pisacane Segretario della Camera dei deputatiDurata mandato18 gennaio 2011 –20 marzo 2013 PresidenteGianfranco Fini Deputato della Repubblica ItalianaDurata mandato28 aprile 2006 –14 marzo 2013 LegislaturaXV, XVI GruppoparlamentareXV:- UDEUR (fino al 29/03/2007)- Unione di Centro (dal 29/03/2007)XVI:- Unione di Centro (fino al 27/09/2010)- Misto (dal 27/09/2010 al 21/10/2010)- Gruppo misto/Noi Sud Libertà e Autonomia-PID (dal 21/10/2010 al 20/01/2011)-...