Kernel principal component analysis

In the field of multivariate statistics, kernel principal component analysis (kernel PCA)[1] is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space.

Background: Linear PCA

Recall that conventional PCA operates on zero-centered data; that is,

,

where is one of the multivariate observations. It operates by diagonalizing the covariance matrix,

in other words, it gives an eigendecomposition of the covariance matrix:

which can be rewritten as

.[2]

(See also: Covariance matrix as a linear operator)

Introduction of the Kernel to PCA

To understand the utility of kernel PCA, particularly for clustering, observe that, while N points cannot, in general, be linearly separated in dimensions, they can almost always be linearly separated in dimensions. That is, given N points, , if we map them to an N-dimensional space with

where ,

it is easy to construct a hyperplane that divides the points into arbitrary clusters. Of course, this creates linearly independent vectors, so there is no covariance on which to perform eigendecomposition explicitly as we would in linear PCA.

Instead, in kernel PCA, a non-trivial, arbitrary function is 'chosen' that is never calculated explicitly, allowing the possibility to use very-high-dimensional 's if we never have to actually evaluate the data in that space. Since we generally try to avoid working in the -space, which we will call the 'feature space', we can create the N-by-N kernel

which represents the inner product space (see Gramian matrix) of the otherwise intractable feature space. The dual form that arises in the creation of a kernel allows us to mathematically formulate a version of PCA in which we never actually solve the eigenvectors and eigenvalues of the covariance matrix in the -space (see Kernel trick). The N-elements in each column of K represent the dot product of one point of the transformed data with respect to all the transformed points (N points). Some well-known kernels are shown in the example below.

Because we are never working directly in the feature space, the kernel-formulation of PCA is restricted in that it computes not the principal components themselves, but the projections of our data onto those components. To evaluate the projection from a point in the feature space onto the kth principal component (where superscript k means the component k, not powers of k)

We note that denotes dot product, which is simply the elements of the kernel . It seems all that's left is to calculate and normalize the , which can be done by solving the eigenvector equation

where is the number of data points in the set, and and are the eigenvalues and eigenvectors of . Then to normalize the eigenvectors , we require that

Care must be taken regarding the fact that, whether or not has zero-mean in its original space, it is not guaranteed to be centered in the feature space (which we never compute explicitly). Since centered data is required to perform an effective principal component analysis, we 'centralize' to become

where denotes a N-by-N matrix for which each element takes value . We use to perform the kernel PCA algorithm described above.

One caveat of kernel PCA should be illustrated here. In linear PCA, we can use the eigenvalues to rank the eigenvectors based on how much of the variation of the data is captured by each principal component. This is useful for data dimensionality reduction and it could also be applied to KPCA. However, in practice there are cases that all variations of the data are same. This is typically caused by a wrong choice of kernel scale.

Large datasets

In practice, a large data set leads to a large K, and storing K may become a problem. One way to deal with this is to perform clustering on the dataset, and populate the kernel with the means of those clusters. Since even this method may yield a relatively large K, it is common to compute only the top P eigenvalues and eigenvectors of the eigenvalues are calculated in this way.

Example

Input points before kernel PCA

Consider three concentric clouds of points (shown); we wish to use kernel PCA to identify these groups. The color of the points does not represent information involved in the algorithm, but only shows how the transformation relocates the data points.

First, consider the kernel

Applying this to kernel PCA yields the next image.

Output after kernel PCA with . The three groups are distinguishable using the first component only.

Now consider a Gaussian kernel:

That is, this kernel is a measure of closeness, equal to 1 when the points coincide and equal to 0 at infinity.

Output after kernel PCA, with a Gaussian kernel.

Note in particular that the first principal component is enough to distinguish the three different groups, which is impossible using only linear PCA, because linear PCA operates only in the given (in this case two-dimensional) space, in which these concentric point clouds are not linearly separable.

Applications

Kernel PCA has been demonstrated to be useful for novelty detection[3] and image de-noising.[4]

See also

References

  1. ^ Schölkopf, Bernhard; Smola, Alex; Müller, Klaus-Robert (1998). "Nonlinear Component Analysis as a Kernel Eigenvalue Problem". Neural Computation. 10 (5): 1299–1319. CiteSeerX 10.1.1.100.3636. doi:10.1162/089976698300017467. S2CID 6674407.
  2. ^ Scholkopf, Bernhard; Smola, Alexander; Müller, Klaus-Robert (December 1996). Nonlinear Component Analysis as a Kernel Eigenvalue Problem (PDF) (Technical report). Max-Planck-Institut für biologische Kybernetik. 44.
  3. ^ Hoffmann, Heiko (2007). "Kernel PCA for Novelty Detection". Pattern Recognition. 40 (3): 863–874. Bibcode:2007PatRe..40..863H. doi:10.1016/j.patcog.2006.07.009.
  4. ^ Kernel PCA and De-Noising in Feature Spaces. NIPS, 1999

Read other articles:

Alkaios dan Sappho, sosok merah Attika kalathos, skt. 470 SM, Staatliche Antikensammlungen (Inv. 2416) Alkaios dari Metilene (/ælˈsiːəs/; bahasa Yunani: Ἀλκαῖος ὁ Μυτιληναῖος; skt. 620 – abad ke-6 SM) merupakan seorang penyair lira merupakan seorang penyair lira dari pulau Lesbos di Yunani yang berjasa dengan menemukan Stanza Alkaik. Ia termasuk di dalam daftar kanonikal Penyair Sembilan Lira oleh sarjana-sarjana Periode Helenistik Iskandariyah. Ia merupa...

 

 

Famili GPCR adhesi manusia. Anggota didefinisikan oleh struktur hibrida yang tidak biasa dengan wilayah ekstraseluler yang panjang sering mengandung protein modul yang dikenal digabungkan ke daerah transmembran tujuh rentang melalui domain GPCR-Autoproteolsis INducing (GAIN) GPCR adhesi (adhesion G protein-protein-coupled receptor) adalah kelas 33 reseptor protein manusia dengan distribusi yang luas di sel embrio dan larva, sel-sel dari saluran reproduksi, neuron, leukosit, dan berbagai tumor...

 

 

Efforts to study asteroids that might impact Earth Plot of orbits of known potentially hazardous asteroids (size over 140 metres (460 ft) and passing within 7.6 million kilometres (4.7×10^6 mi) of Earth's orbit) as of early 2013 (alternative image). The term Spaceguard loosely refers to a number of efforts to discover, catalogue, and study near-Earth objects (NEO), especially those that may impact Earth (potentially hazardous objects). Asteroids are discovered by telescopes wh...

Chemical compound RomifidineClinical dataAHFS/Drugs.comInternational Drug NamesRoutes ofadministrationIVATCvet codeQN05CM93 (WHO) Legal statusLegal status Veterinary use only Identifiers IUPAC name N-(2-bromo-6-fluorophenyl)-4,5-dihydro-1H-imidazol-2-amine CAS Number65896-16-4 YPubChem CID71969ChemSpider64975 NUNII876351L05KCompTox Dashboard (EPA)DTXSID40216103 ECHA InfoCard100.158.065 Chemical and physical dataFormulaC9H9BrFN3Molar mass258.094 g·mol−13D model (JSm...

 

 

Cet article est une ébauche concernant une commune du Puy-de-Dôme. Vous pouvez partager vos connaissances en l’améliorant (comment ?). Le bandeau {{ébauche}} peut être enlevé et l’article évalué comme étant au stade « Bon début » quand il comporte assez de renseignements encyclopédiques concernant la commune. Si vous avez un doute, l’atelier de lecture du projet Communes de France est à votre disposition pour vous aider. Consultez également la page d’aid...

 

 

خبز جزائريتعديل - تعديل مصدري - تعديل ويكي بيانات تاريخ الخبز في الجزائر كانت الجزائر دولة مصدرة للقمح في قرون مضت لنتحول إلى دولة مستوردة له وبامتياز، فالجزائر في عهد الاحتلال الروماني كانت تلقب «بمطمورة روما» ونقل لنا مؤرخو ذلك الزمان أن الفلاحين كانوا يحصدون القمح مرتي�...

Pour les articles homonymes, voir Gonçalves. Maciel Gonçalves est un nom portugais ; le premier nom de famille (d'usage facultatif) est Maciel et le second est Gonçalves. José GonçalvesJosé Gonçalves lors du départ de la 1re étape des Quatre Jours de Dunkerque 2015InformationsNom dans la langue maternelle José Isidro Gonçalves MacielNom de naissance José Isidro Maciel GonçalvesNaissance 13 février 1989 (35 ans)BarcelosNationalité portugaiseSpécialité Coure...

 

 

Estonian military personnel (born 1972) Ilmar Tamm during a meeting in 2022 Ilmar Tamm (born on 4 May 1972, Tartu) is an Estonian military officer (Major General[1]).[2] In 1994, he graduated from the Finnish Military Academy.[3] 2008–2012, he was the head of Cooperative Cyber Defence Centre of Excellence. Since 2020, he was the head of Baltic Defence College.[3] In 2004, he was awarded by Order of the Cross of the Eagle, V Class.[2] References ^ Cont...

 

 

Pesta pada tahun 2010 di Uyo, Akwa Ibom Akwa Ibom merupakan sebuah negara bagian di Nigeria. Letaknya di bagian tenggara di negara itu. Ibu kotanya ialah Uyo. Negara bagian ini memiliki luas wilayah 7.081 km2. Dengan memiliki jumlah penduduk sebanyak 4.805.451 jiwa (2005). Pembagian administrasi Ikot-Ekpene Essien-Udim Abak Ukanafun Oruk-Anam Mbo Oron Uquo-Ibeno Eket Itu Uyo Etinan Ikot Abasi Ikono Ekpe-Atai Uruan Onna Nsit-Ubium Mkpat-Enin Ikono Ika Okobo Eastern-Obolo Esit-Eket Etim-Ekpo I...

Divisi Hubungan MasyarakatKepolisian Negara Republik IndonesiaLambang Divisi Humas PolriSingkatanDivhumas PolriIkhtisarDibentuk30 Oktober 1951 (1951-10-30)Struktur yurisdiksiLembaga nasionalIndonesiaWilayah hukumIndonesiaLembaga pemerintahKepolisian Republik IndonesiaPejabat eksekutifIrjen. Pol. Dr. Sandi Nugroho, S.IK, S.H, M.Hum., Kadiv HumasBrigjen. Pol. Trunoyudo Wisnu Andiko , S.I.K., KaropenmasBrigjen. Pol. Tjahyono Saputro, Karo PIDBrigjen. Pol. Gatot Repli Handoko, S.I.K., Karomu...

 

 

Ghalib ki Haveliغالب کی حویلی A bust of the Urdu poet, Mirza Ghalib, at the HaveliEstablished27 December 2000[1]LocationGali Qasim Jaan, BallimaranTypeMemorialKey holdingsHandwritten Poems of GhalibCollectionsBust of GhalibPublic transit accessChawri Bazaar Metro Station This article contains Urdu text. Without proper rendering support, you may see unjoined letters running left to right or other symbols instead of Urdu script. 28°39′16″N 77°13′32″E / ...

 

 

Housing estate in Tsuen Wan, Hong Kong Serenade Cove Serenade Cove (Chinese: 韻濤居) is a private housing estate in Tsuen Wan West, Tsuen Wan, New Territories, Hong Kong, near Belvedere Garden and Bayview Garden. It consists of 3 high-rise buildings developed by Wharf Holdings in 2001.[1][2][3] Politics Serenade Cove is located in Tsuen Wan West constituency of the Tsuen Wan District Council.[4] It was formerly represented by Angus Yick Shing-chung, who ...

莎拉·阿什頓-西里洛2023年8月,阿什頓-西里洛穿著軍服出生 (1977-07-09) 1977年7月9日(46歲) 美國佛羅里達州国籍 美國别名莎拉·阿什頓(Sarah Ashton)莎拉·西里洛(Sarah Cirillo)金髮女郎(Blonde)职业記者、活動家、政治活動家和候選人、軍醫活跃时期2020年—雇主內華達州共和黨候選人(2020年)《Political.tips》(2020年—)《LGBTQ國度》(2022年3月—2022年10月)烏克蘭媒�...

 

 

Portland Oregon TempleNumber42DedicationAugust 19, 1989, by Gordon B. HinckleySite7.3 acres (3.0 ha)Floor area80,500 sq ft (7,480 m2)Height181 ft (55 m)Official website • News & imagesChurch chronology ←Frankfurt Germany Temple Portland Oregon Temple →Las Vegas Nevada Temple Additional informationAnnouncedApril 7, 1984, by Spencer W. KimballGroundbreakingSeptember 20, 1986, by Gordon B. HinckleyOpen houseJune 15 July 15 – 8, 1989Designed byLeland...

 

 

American journalist and former editorial editor for the New York Times James BennetBennet at New America discussion in 2009BornJames Douglas Bennet (1966-03-28) March 28, 1966 (age 58)Boston, Massachusetts, USEducationYale University (BA)OccupationJournalistYears active1989–presentEmployersThe Washington Monthly (1989–1991)The New York Times (1991–2006; 2016–2020)The Atlantic (2006–2016)The Economist (2021–present)Spouse Sarah Jessup ​(m. 2001)...

Kipchak Turkic language of Central Asia Not to be confused with Fuyu Kyrgyz language. KyrgyzКыргыз тилиقىرعىز تىلى‎Kyrgyz written in Cyrillic and Perso-Arabic scriptsPronunciation[qɯɾʁɯzˈtʃɑ]Native toKyrgyzstan, Afghanistan, Tajikistan, Pakistan, ChinaRegionCentral AsiaEthnicityKyrgyzNative speakers5.15 million (2009 census)[1]Language familyTurkic Common TurkicKipchakKyrgyz–KipchakKyrgyzDialects Pamiri Kyrgyz Writing systemKyrgyz a...

 

 

History of Christianityin the British Isles General Anglican Communion Catholic Church in England and Wales Calendar of saints(Church of England) Religion in Ireland Religion in Scotland Celtic Christianity Religion in Wales Early Christianity in Roman Britain Legend of Christ in Britain Medieval Early Christian Ireland 400–800 Christianity in Medieval Scotland Anglo-Saxon Christianity Celtic Rite Hiberno-Scottish mission Early Modern Dissolution of the monasteries Welsh Bible Scottish Ref...

 

 

2005 studio album by MobyHotelStudio album by MobyReleasedMarch 14, 2005 (2005-03-14)Recorded2004StudioMoby's home studio, Electric Lady Studios, and Loho Studios in New York CityGenreAlternative rockelectronicadowntempotrip hopLength56:30 (disc one)67:51 (disc two)LabelMuteV2ProducerMobyMoby chronology 18 B Sides + DVD(2003) Hotel(2005) Go – The Very Best of Moby(2006) Moby studio albums chronology 18(2002) Hotel(2005) Last Night(2008) Alternate coverHotel: Ambient 2...

Place in Lower Carniola, SloveniaLoška GoraLoška GoraLocation in SloveniaCoordinates: 46°3′8.94″N 15°10′58.28″E / 46.0524833°N 15.1828556°E / 46.0524833; 15.1828556Country SloveniaTraditional regionLower CarniolaStatistical regionLower SavaMunicipalityRadečeArea • Total0.64 km2 (0.25 sq mi)Elevation318.3 m (1,044.3 ft)Population (2002) • Total5[1] Loška Gora (pronounced [ˈloːʃka ˈɡɔ...

 

 

オリンピックのロシア選手団 ロシアの国旗 IOCコード: RUS NOC: ロシアオリンピック委員会公式サイト 1996年アトランタオリンピック メダル国別順位: 2 位 金26 銀21 銅16 計0 夏季オリンピックロシア選手団 1900 • 1904 • 1908 • 1912 • 1920-1992 • 1996 • 2000 • 2004 • 2008 • 2012 • 2016 • 2020 冬季オリンピックロシア選手団 1994 • 199...