Discovery system (AI research)

A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientific discovery process. Ideally, an artificial intelligence system should be able to search systematically through the space of all possible hypotheses and yield the hypothesis - or set of equally likely hypotheses - that best describes the complex patterns in data.[1][2]

During the era known as the second AI summer (approximately 1978-1987), various systems akin to the era's dominant expert systems were developed to tackle the problem of extracting scientific hypotheses from data, with or without interacting with a human scientist. These systems included Autoclass,[3] Automated Mathematician,[4][5] Eurisko,[6] which aimed at general-purpose hypothesis discovery, and more specific systems such as Dalton, which uncovers molecular properties from data.

The dream of building systems that discover scientific hypotheses was pushed to the background with the second AI winter and the subsequent resurgence of subsymbolic methods such as neural networks. Subsymbolic methods emphasize prediction over explanation, and yield models which works well but are difficult or impossible to explain which has earned them the name black box AI. A black-box model cannot be considered a scientific hypothesis, and this development has even led some researchers to suggest that the traditional aim of science - to uncover hypotheses and theories about the structure of reality - is obsolete.[7][8] Other researchers disagree and argue that subsymbolic methods are useful in many cases, just not for generating scientific theories.[9][10][11]

Discovery systems from the 1970s and 1980s

  • Autoclass was a Bayesian Classification System written in 1986[3]
  • Automated Mathematician was one of the earliest successful discovery systems. It was written in 1977 and worked by generating a modifying small Lisp programs
  • Eurisko was a Sequel to Automated Mathematician written in 1984
  • Dalton is a still maintained program capable of calculating various molecular properties initially launched in 1983 and available in open source since 2017
  • Glauber is a scientific discovery method written in the context of computational philosophy of science launched in 1983

Modern discovery systems (2009–present)

After a couple of decades with little interest in discovery systems, the interest in using AI to uncover natural laws and scientific explanations was renewed by the work of Michael Schmidt, then a PhD student in Computational Biology at Cornell University. Schmidt and his advisor, Hod Lipson, invented Eureqa, which they described as a symbolic regression approach to "distilling free-form natural laws from experimental data".[12] This work effectively demonstrated that symbolic regression was a promising way forward for AI-driven scientific discovery.

Since 2009, symbolic regression has matured further, and today, various commercial and open source systems are actively used in scientific research. Notable examples include Eureqa, now a part of DataRobot AI Cloud Platform, AI Feynman,[13] and QLattice.[14]

References

  1. ^ Shen, Wei-Min (1990). "Functional transformations in AI discovery systems". Artificial Intelligence. 41 (3): 257–272. doi:10.1016/0004-3702(90)90045-2. S2CID 7219589.
  2. ^ Gil, Yolanda; Greaves, Mark; Hendler, James; Hirsh, Haym (2014-10-10). "Amplify scientific discovery with artificial intelligence". Science. 346 (6206): 171–172. Bibcode:2014Sci...346..171G. doi:10.1126/science.1259439. PMID 25301606. S2CID 206561353.
  3. ^ a b Cheeseman, Peter; Kelly, James; Self, Matthew; Stutz, John; Taylor, Will; Freeman, Don (1988-01-01). Laird, John (ed.). AutoClass: A Bayesian Classification System. San Francisco: Morgan Kaufmann. pp. 54–64. doi:10.1016/b978-0-934613-64-4.50011-6. ISBN 978-0-934613-64-4. Retrieved 2022-07-24. {{cite book}}: |work= ignored (help)
  4. ^ Ritchie, G.D.; Hanna, F.K. (August 1984). "AM: A case study in AI methodology". Artificial Intelligence. 23 (3): 249–268. doi:10.1016/0004-3702(84)90015-8.
  5. ^ Lenat, Douglas Bruce (1976). Am: An artificial intelligence approach to discovery in mathematics as heuristic search (Thesis).
  6. ^ Henderson, Harry (2007). "The Automated Mathematician". Artificial Intelligence: Mirrors for the Mind. Milestones in Discovery and Invention. Infobase Publishing. pp. 93–94. ISBN 9781604130591.
  7. ^ Anderson, Chris. "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete". Wired. Retrieved 2022-07-24.
  8. ^ Vutha, Amar (2 August 2018). "Could machine learning mean the end of understanding in science?". The Conversation. Retrieved 2022-07-24.
  9. ^ Canca, Cansu (2018-08-28). "Machine Learning as the Enemy of Science? Not Really". Bill of Health. Retrieved 2022-07-24.
  10. ^ Wilstrup, Casper Skern (2022-01-30). "Are we replacing science with an AI oracle?". Medium. Retrieved 2022-07-24.
  11. ^ Christiansen, Michael; Wilstrup, Casper; Hedley, Paula L. (2022-06-28). "Explainable "white-box" machine learning is the way forward in preeclampsia screening". American Journal of Obstetrics & Gynecology. 227 (5): 791. doi:10.1016/j.ajog.2022.06.057. PMID 35779588. S2CID 250160871.
  12. ^ Schmidt, Michael; Lipson, Hod (2009-04-03). "Distilling Free-Form Natural Laws from Experimental Data". Science. 324 (5923): 81–85. Bibcode:2009Sci...324...81S. doi:10.1126/science.1165893. PMID 19342586. S2CID 7366016.
  13. ^ Udrescu, Silviu-Marian; Tegmark, Max (2020-04-17). "AI Feynman: A physics-inspired method for symbolic regression". Science Advances. 6 (16): eaay2631. arXiv:1905.11481. Bibcode:2020SciA....6.2631U. doi:10.1126/sciadv.aay2631. PMC 7159912. PMID 32426452.
  14. ^ Broløs, Kevin René; Machado, Meera Vieira; Cave, Chris; Kasak, Jaan; Stentoft-Hansen, Valdemar; Batanero, Victor Galindo; Jelen, Tom; Wilstrup, Casper (2021-04-12). "An Approach to Symbolic Regression Using Feyn". arXiv:2104.05417 [cs.LG].