Co-training

Co-training is a machine learning algorithm used when there are only small amounts of labeled data and large amounts of unlabeled data. One of its uses is in text mining for search engines. It was introduced by Avrim Blum and Tom Mitchell in 1998.

Algorithm design

Co-training is a semi-supervised learning technique that requires two views of the data. It assumes that each example is described using two different sets of features that provide complementary information about the instance. Ideally, the two views are conditionally independent (i.e., the two feature sets of each instance are conditionally independent given the class) and each view is sufficient (i.e., the class of an instance can be accurately predicted from each view alone). Co-training first learns a separate classifier for each view using any labeled examples. The most confident predictions of each classifier on the unlabeled data are then used to iteratively construct additional labeled training data.[1]

The original co-training paper described experiments using co-training to classify web pages into "academic course home page" or not; the classifier correctly categorized 95% of 788 web pages with only 12 labeled web pages as examples.[2] The paper has been cited over 1000 times, and received the 10 years Best Paper Award at the 25th International Conference on Machine Learning (ICML 2008), a renowned computer science conference.[3][4]

Krogel and Scheffer showed in 2004 that co-training is only beneficial if the data sets are independent; that is, if one of the classifiers correctly labels a data point that the other classifier previously misclassified. If the classifiers agree on all unlabeled data, i.e. they are dependent, labeling the data does not create new information. In an experiment where dependence of the classifiers was greater than 60%, results worsened.[5]

Uses

Co-training has been used to classify web pages using the text on the page as one view and the anchor text of hyperlinks on other pages that point to the page as the other view. Simply put, the text in a hyperlink on one page can give information about the page it links to.[2] Co-training can work on "unlabeled" text that has not already been classified or tagged, which is typical for the text appearing on web pages and in emails. According to Tom Mitchell, "The features that describe a page are the words on the page and the links that point to that page. The co-training models utilize both classifiers to determine the likelihood that a page will contain data relevant to the search criteria." Text on websites can judge the relevance of link classifiers, hence the term "co-training". Mitchell claims that other search algorithms are 86% accurate, whereas co-training is 96% accurate.[6]

Co-training was used on FlipDog.com, a job search site, and by the U.S. Department of Labor, for a directory of continuing and distance education.[6] It has been used in many other applications, including statistical parsing and visual detection.[7]

References

  1. ^ Blum, A., Mitchell, T. Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, 1998, p. 92-100.
  2. ^ a b Committee on the Fundamentals of Computer Science: Challenges and Opportunities, National Research Council (2004). "6: Achieving Intelligence". Computer Science: Reflections on the Field, Reflections from the Field. The National Academies Press. ISBN 978-0-309-09301-9.
  3. ^ McCallum, Andrew (2008). "Best Papers Awards". ICML Awards. Retrieved 2009-05-03.
  4. ^ Shavik, Jude (2008). "10 Year Best Paper: Combining labeled and unlabled data with co-training". ICML Awards. Retrieved 2009-05-03.
  5. ^ Krogel, Marc-A; Tobias Scheffer (2004). "Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics" (PDF). Machine Learning. 57: 61–81. doi:10.1023/B:MACH.0000035472.73496.0c.
  6. ^ a b Aquino, Stephen (24 April 2001). "Search Engines Ready to Learn". Technology Review. Retrieved 2009-05-03.
  7. ^ Xu, Qian; Derek Hao Hu; Hong Xue; Weichuan Yu; Qiang Yang (2009). "Semi-supervised protein subcellular localization". BMC Bioinformatics. 10 (Suppl 1): S47. doi:10.1186/1471-2105-10-S1-S47. ISSN 1471-2105. PMC 2648770. PMID 19208149.
Notes

Read other articles:

Beruang cokelat siberiaRusia: Восто́чно-Сиби́рский бурый медведьcode: ru is deprecated Klasifikasi ilmiah Kerajaan: Animalia Filum: Chordata Kelas: Mamalia Ordo: Carnivora Famili: Ursidae Genus: Ursus Spesies: U. arctos Subspesies: U. arctos collaris Nama trinomial Ursus arctos collarisF. G. Cuvier, 1824 Sinonim jeniseensis Ognev, 1924sibiricus J. E. Gray, 1864 Beruang cokelat siberia (Ursus arctos collaris) adalah subspesies beruang cokelat yang meli...

 

 

Semarang TimurKecamatanPeta lokasi Kecamatan Semarang TimurNegara IndonesiaProvinsiJawa TengahKotaSemarangPemerintahan • Camat-Populasi • Total73.491 jiwaKode Kemendagri33.74.03 Kode BPS3374110 Luas7,70 km²Desa/kelurahan10 Semarang Timur (Jawa: ꦱꦼꦩꦫꦁ​​ꦮꦺꦠꦤ꧀, translit. Semarang Wétan) adalah sebuah kecamatan di Kota Semarang, Provinsi Jawa Tengah, Indonesia. Pranala luar (Indonesia) Keputusan Menteri Dalam Negeri Nomor 050-145 Tah...

 

 

Nggak Lagi LagiAlbum studio karya Itje TrisnawatiDirilis1983GenrePopLabelInsan RecordsKronologi Itje Trisnawati Karena Senyuman (1983)Karena Senyuman1983 Nggak Lagi Lagi (1983) Romeo Bercinta (1984)Romeo Bercinta1984 Romeo Bercinta merupakan album musik utama karya Itje Trisnawati. Dirilis pada tahun 1983. Lagu utamanya di album ini ialah Nggak Lagi Lagi. Daftar lagu Nggak Lagi Lagi Taburan Bunga Tak Seindah Khayalan Panah Merah Biarkan Aku Sendiri Tunas Cinta Yang Pertama Terlambat Pahla...

Municipality in Zambales, Philippines This article is about the municipality. For the economic zone and freeport area, see Subic Special Economic and Freeport Zone. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Subic, Zambales – news · newspapers · books · scholar · JSTOR (February 2011) (Learn how and whe...

 

 

Candi Liyangan saat diekskavasi.Situs Liyangan adalah kawasan pemukiman yang mencakup sisa-sisa bangunan (candi, rumah), jalan, sawah/ladang, serta berbagai artefak yang berlokasi di Dusun Liyangan, Desa Purbasari, Ngadirejo, Temanggung, Jawa Tengah. Tapak ini hingga tahun 2020 memiliki luas l.k. 4 hektare dan mungkin akan meluas, terletak di lereng timur Gunung Sindoro Meskipun laporan penemuan artefak di sini telah ada sebelumnya, secara resmi penemuan situs ini diumumkan pada tahun 2008.&#...

 

 

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

Міністерство оборони України (Міноборони) Емблема Міністерства оборони та Прапор Міністерства оборони Будівля Міністерства оборони у КиєвіЗагальна інформаціяКраїна  УкраїнаДата створення 24 серпня 1991Попередні відомства Міністерство оборони СРСР Народний комісарі...

 

 

  「俄亥俄」重定向至此。关于其他用法,请见「俄亥俄 (消歧义)」。 俄亥俄州 美國联邦州State of Ohio 州旗州徽綽號:七葉果之州地图中高亮部分为俄亥俄州坐标:38°27'N-41°58'N, 80°32'W-84°49'W国家 美國加入聯邦1803年3月1日,在1953年8月7日追溯頒定(第17个加入联邦)首府哥倫布(及最大城市)政府 • 州长(英语:List of Governors of {{{Name}}}]]) •&...

 

 

Building at the center of Islam's most important mosque, the Masjid al-Haram This article is about the Islamic holy site in Mecca. For other uses, see Kaba (disambiguation). Kaab redirects here. For other uses, see Kaab (disambiguation). The Kaabaٱلْكَعْبَة (al-Kaʿba)The Kaaba in December 2020ReligionAffiliationIslamRegionMecca ProvinceRiteTawafLeadershipPresident of the Affairs of the Two Holy Mosques: Abdul Rahman Al-SudaisLocationLocationGreat Mosque of Mecca, Mecca, Hejaz, Saud...

أرخبيل كولورادوس معلومات جغرافية الإحداثيات 22°24′00″N 84°25′00″W / 22.4°N 84.416666666667°W / 22.4; -84.416666666667   المسطح المائي المحيط الأطلسي  الحكومة البلد كوبا  التقسيم الإداري محافظة بينار ديل ريو  تعديل مصدري - تعديل   أرخبيل كولورادوس (بالإسبانية: Colorados Archipelago )....

 

 

1926 American silent drama film La BohèmePromotional posterDirected byKing VidorWritten byFred de Gresac(screenplay)Harry BehnRay Doyle(continuity)William M. ConselmanRuth Cummings(titles)Based onScènes de la vie de bohème(1847–49) novelby Henri MurgerProduced byIrving ThalbergStarringLillian GishJohn GilbertCinematographyHendrik Sartov [fr]Edited byHugh WynnMusic byWilliam Axt (uncredited)David Mendoza (uncredited)Distributed byMetro-Goldwyn-MayerRelease date February ...

 

 

تحتاج هذه المقالة إلى الاستشهاد بمصادر إضافية لتحسين وثوقيتها. فضلاً ساهم في تطوير هذه المقالة بإضافة استشهادات من مصادر موثوق بها. من الممكن التشكيك بالمعلومات غير المنسوبة إلى مصدر وإزالتها. أبو ظبي الرياضيةAD Sports معلومات عامة النوع قناة رياضية المالك شبكة أبو ظبي للإعل�...

Chemical compound MethamnetamineLegal statusLegal status DE: NpSG (Industrial and scientific use only) UK: Under Psychoactive Substances Act Illegal in Japan Identifiers IUPAC name N-Methyl-1-(naphthalen-2-yl)propan-2-amine CAS Number1178720-66-5 YPubChem CID17802040ChemSpider38754167UNIICAS64BB01BCompTox Dashboard (EPA)DTXSID801032850 Chemical and physical dataFormulaC14H17NMolar mass199.297 g·mol−13D model (JSmol)Interactive image SMILES CNC(C)Cc1ccc2ccccc2c1 InChI In...

 

 

This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources. Find sources: Video Content Protection System – news · newspapers · books · scholar · JSTOR (September 2011) (Learn how and when to remove this message) The Video Content Protection System (VCPS) is a standard for digital rights management, intended to enforce protection or DVD+R/+RW content and related media. It was de...

 

 

Operasi BolivarBagian dari Palagan Amerika dalam Perang Dunia IILokasiAmerika Latin[1]TujuanPengumpulan dan transmisi informasi rahasia dari Amerika Latin ke Eropa[1]Tanggal1940 - 1945[1][2]Pelaksana Nazi GermanylbsPalagan Amerika Pengeboman Pearl Harbor Kampanye Kepulauan Aleut Pertempuran Atlantik Pertempuran Sungai Plate Operasi Bolívar Insiden Machita Kaburnya tahanan perang Angler U.S. home front Pertempuran Caribbean Torpedo Alley Pengeboman Ellwood...

الموثوقية في الإحصاء والقياسات النفسية هي التطابق العام للقياس.[1] يُقال إن المقياس يتمتع بموثوقية عالية في حال نتج عنه نتائج مماثلة في ظل ظروف ثابتة. «إنّه خصائص مجموعة من درجات الاختبار التي تتعلق بكمية الخطأ العشوائي لعملية القياس والتي قد تكون مضمّنة في الدرجات. ال...

 

 

MyNetworkTV affiliate in South Bend, Indiana WMYS redirects here. For the Indianapolis radio station formerly known as WMYS, see WXNT. WMYS-LDSouth Bend, IndianaUnited StatesChannelsDigital: 28 (UHF)Virtual: 69BrandingMyMichianaProgrammingAffiliations69.1: MyNetworkTV69.2: Telemundofor others, see § SubchannelsOwnershipOwnerWeigel Broadcasting(WBND-TV Limited Partnership)Sister stationsWBND-LD, WCWW-LDHistoryFirst air dateDecember 2, 1987 (36 years ago) (1987-12-02)Former ...

 

 

American hybrid car (2010–2015) This article is about the first generation Volt. For a complete overview of all Volt models, see Chevrolet Volt. For similarly-named cars, see Chevrolet Bolt and Toyota Voltz. Opel Ampera redirects here. For the rebadged Chevrolet Bolt, see Opel Ampera-e. Motor vehicle Chevrolet Volt(first generation)OverviewManufacturerGeneral MotorsAlso calledHolden VoltOpel AmperaVauxhall AmperaProductionNovember 2010 – May 2015Model years2011–2015AssemblyUni...

1964 film by William Dieterle Quick, Let's Get MarriedDirected byWilliam DieterleWritten byAllan ScottProduced byWilliam MarshallStarringGinger RogersRay MillandBarbara EdenCinematographyRobert J. BronnerEdited byCarl LernerMusic byMichael ColicchioProductioncompaniesKay Lewis EnterprisesWilliam Marshall ProductionsDistributed byGolden EagleRelease date 1964 (1964) Running time96 minutesCountryUnited StatesLanguageEnglish Quick, Let's Get Married (also known as Seven Different Ways[1...

 

 

Francesc Macià Fonctions Président de la généralité de Catalogne (en fonction dès le 17 avril 1931) 14 décembre 1932 – 25 décembre 1933(1 an et 11 jours) Prédécesseur Josep de Vilamala (avant les décrets de Nueva Planta) Successeur Lluís Companys Biographie Date de naissance 21 octobre 1859 Lieu de naissance Vilanova i la Geltrú, Catalogne (Espagne) Date de décès 25 décembre 1933 (à 74 ans) Lieu de décès Barcelone, Catalogne (Espagne) Nationalité Espagnol...