Schölkopf developed SVM methods achieving world record performance on the MNIST pattern recognition benchmark at the time.[2] With the introduction of kernel PCA, Schölkopf and coauthors argued that SVMs are a special case of a much larger class of methods, and all algorithms that can be expressed in terms of dot products can be generalized to a nonlinear setting by means of what is known as reproducing kernels.[3][4][5] Another significant observation was that the data on which the kernel is defined need not be vectorial, as long as the kernel Gram matrix is positive definite.[3] Both insights together led to the foundation of the field of kernel methods, encompassing SVMs and many other algorithms. Kernel methods are now textbook knowledge and one of the major machine learning paradigms in research and applications.
Starting in 2005, Schölkopf turned his attention to causal inference. Causal mechanisms in the world give rise to statistical dependencies as epiphenomena, but only the latter are exploited by popular machine learning algorithms. Knowledge about causal structures and mechanisms is useful by letting us predict not only future data coming from the same source, but also the effect of interventions in a system, and by facilitating transfer of detected regularities to new situations.[18]
Schölkopf and co-workers addressed (and in certain settings solved) the problem of causal discovery for the two-variable setting[19][20][21][22][23] and connected causality to Kolmogorov complexity.[24]
Around 2010, Schölkopf began to explore how to use causality for machine learning, exploiting assumptions of independence of mechanisms and invariance.[25] His early work on causal learning was exposed to a wider machine learning audience during his Posner lecture [26] at NeurIPS 2011, as well as in a keynote talk at ICML 2017.[27]
He assayed how to exploit underlying causal structures in order to make machine learning methods more robust with respect to distribution shifts[18][28][29] and systematic errors,[30] the latter leading to the discovery of a number of new exoplanets[31] including K2-18b, which was subsequently found to contain water vapour in its atmosphere, a first for an exoplanet in the habitable zone.
Education and employment
Schölkopf studied mathematics, physics, and philosophy in Tübingen and London. He was supported by the Studienstiftung and won the Lionel Cooper Memorial Prize for the best M.Sc. in Mathematics at the University of London.[32] He completed a Diplom in Physics, and then moved to Bell Labs in New Jersey, where he worked with Vladimir Vapnik, who became co-adviser of his PhD thesis at TU Berlin (with Stefan Jähnichen). His thesis, defended in 1997, won the annual award of the German Informatics Association.[33] In 2001, following positions in Berlin, Cambridge and New York, he founded the Department for Empirical Inference at the Max Planck Institute for Biological Cybernetics, which grew into a leading center for research in machine learning. In 2011, he became founding director at the Max Planck Institute for Intelligent Systems.[34][35]
With Alex Smola, Schölkopf co-founded the series of Machine Learning Summer Schools.[36] He also co-founded a Cambridge-Tübingen PhD Programme[37] and the Max Planck-ETH Center for Learning Systems.[38] In 2016, he co-founded the Cyber Valley research consortium.[39] He participated in the IEEE Global Initiative on "Ethically Aligned Design".[40]
Schölkopf is co-editor-in-Chief of the Journal of Machine Learning Research, a journal he helped found, being part of a mass resignation of the editorial board of Machine Learning (journal). He is among the world’s most cited computer scientists.[41] Alumni of his lab include Ulrike von Luxburg, Carl Rasmussen, Matthias Hein, Arthur Gretton, Gunnar Rätsch, Matthias Bethge, Stefanie Jegelka, Jason Weston, Olivier Bousquet, Olivier Chapelle, Joaquin Quinonero-Candela, and Sebastian Nowozin.[42]
As of late 2023, Schölkopf is also a scientific advisor to French research group Kyutai which is being funded by Xavier Niel, Rodolphe Saadé, Eric Schmidt, and others.[43]
^Schölkopf, P. Simard, A. J. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 640–646, Cambridge, MA, USA, 1998d. MIT Press
^Chapelle and B. Schölkopf. Incorporating invariances in nonlinear SVMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 609–616, Cambridge, MA, USA, 2002. MIT Press
^B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12(5):1207–1245, 2000a
^B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001b
^A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. Smola. A Kernel Method for the Two-Sample-Problem. Advances in Neural Information Processing Systems 19: 513—520, 2007
^A. J. Smola and A. Gretton and L. Song and B. Schölkopf. A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference: 13—31, 2007
^B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf and G. Lanckriet. Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11: 1517—1561, 2010
^A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. J. Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13: 723—773, 2012
^S. Harmeling, M. Hirsch, and B. Schölkopf. On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit. In Computer Vision and Pattern Recognition (CVPR), pages 1083–1090. IEEE, 2013
^A. Gretton, R. Herbrich, A. J. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005a
^A. Gretton, O. Bousquet, A. J. Smola and B. Schölkopf. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory: 16th International Conference, 2005b
^A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf and A. J. Smola. A Kernel Statistical Test of Independence. Advances in Neural Information Processing Systems 20, 2007
^ abB. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress
^P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 689–696, Red Hook, NY, USA, 2009. Curran
^D. Janzing, P. Hoyer, and B. Schölkopf. Telling cause from effect based on high-dimensional observations. In J. Fu ̈rnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 479–486, Madison, WI, USA, 2010. International Machine Learning Society
^J.M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016
^J. Peters, JM. Mooij, D. Janzing, and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014
^P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Schölkopf. Inferring deterministic causal relations. In P. Grünwald and P. Spirtes, editors, 26th Conference on Uncertainty in Artificial Intelligence, pages 143–150, Corvallis, OR, 2010. AUAI Press. Best student paper award
^Schölkopf, Bernhard; Janzing, Dominik; Peters, Jonas; Sgouritsa, Eleni; Zhang, Kun (27 June 2012). "On Causal and Anticausal Learning"(PDF). International Conference of Machine Learning.
^K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of JMLR Workshop and Conference Proceedings, pages 819–827, 2013
^D. Foreman-Mackey, B. T. Montet, D. W. Hogg, T. D. Morton, D. Wang, and B. Schölkopf. A systematic search for transiting planets in the K2 data. The Astrophysical Journal, 806(2), 2015