AIXI

AIXI ['ai̯k͡siː] is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000[1] and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.[2]

AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.

Definition

According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.[3]

AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment . The interaction proceeds in time steps, from to , where is the lifespan of the AIXI agent. At time step t, the agent chooses an action (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" , which consists of an "observation" (e.g., a camera image) and a reward , distributed according to the conditional probability , where is the "history" of actions, observations and rewards. The environment is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that is computable, that is, the observations and rewards received by the agent from the environment can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent.[4]

The only goal of the AIXI agent is to maximise , that is, the sum of rewards from time step 1 to m.

The AIXI agent is associated with a stochastic policy , which is the function it uses to choose actions at every time step, where is the space of all possible actions that AIXI can take and is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) can also be thought of as a stochastic policy (which is a function): , where the is the Kleene star operation.

In general, at time step (which ranges from 1 to m), AIXI, having previously executed actions (which is often abbreviated in the literature as ) and having observed the history of percepts (which can be abbreviated as ), chooses and executes in the environment the action, , defined as follows:[3]

or, using parentheses, to disambiguate the precedences

Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to time steps ahead (that is, from to ), weighs each of them by the complexity of programs (that is, by ) consistent with the agent's past (that is, the previously executed actions, , and received percepts, ) that can generate that future, and then picks the action that maximises expected future rewards.[4]

Let us break this definition down in order to attempt to fully understand it.

is the "percept" (which consists of the observation and reward ) received by the AIXI agent at time step from the environment (which is unknown and stochastic). Similarly, is the percept received by AIXI at time step (the last time step where AIXI is active).

is the sum of rewards from time step to time step , so AIXI needs to look into the future to choose its action at time step .

denotes a monotone universal Turing machine, and ranges over all (deterministic) programs on the universal machine , which receives as input the program and the sequence of actions (that is, all actions), and produces the sequence of percepts . The universal Turing machine is thus used to "simulate" or compute the environment responses or percepts, given the program (which "models" the environment) and all actions of the AIXI agent: in this sense, the environment is "computable" (as stated above). Note that, in general, the program which "models" the current and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown.

is the length of the program (which is encoded as a string of bits). Note that . Hence, in the definition above, should be interpreted as a mixture (in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity . Note that can also be written as , and is the sequence of actions already executed in the environment by the AIXI agent. Similarly, , and is the sequence of percepts produced by the environment so far.

Let us now put all these components together in order to understand this equation or definition.

At time step t, AIXI chooses the action where the function attains its maximum.

Parameters

The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen. The latter parameter can be removed by the use of discounting.

Optimality

AIXI's performance is measured by the expected total number of rewards it receives. AIXI has been proven to be optimal in the following ways.[2]

  • Pareto optimality: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.[citation needed]
  • Balanced Pareto optimality: like Pareto optimality, but considering a weighted sum of environments.
  • Self-optimizing: a policy p is called self-optimizing for an environment if the performance of p approaches the theoretical maximum for when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.

It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.[5]

However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.[6]

Computational aspects

Like Solomonoff induction, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent.[2] Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for Monte Carlo AIXI FAC-Context-Tree Weighting), which has had some success playing simple games such as partially observable Pac-Man.[4][7]

See also

References

  1. ^ Marcus Hutter (2000). A Theory of Universal Artificial Intelligence based on Algorithmic Complexity. arXiv:cs.AI/0004001. Bibcode:2000cs........4001H.
  2. ^ a b c — (2005). Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Texts in Theoretical Computer Science an EATCS Series. Springer. doi:10.1007/b138233. ISBN 978-3-540-22139-5. S2CID 33352850.
  3. ^ a b Hutter, Marcus. "Universal Artificial Intelligence". www.hutter1.net. Retrieved 2024-09-21.
  4. ^ a b c Veness, Joel; Kee Siong Ng; Hutter, Marcus; Uther, William; Silver, David (2009). "A Monte Carlo AIXI Approximation". arXiv:0909.0801 [cs.AI].
  5. ^ Leike, Jan; Hutter, Marcus (2015). Bad Universal Priors and Notions of Optimality (PDF). Proceedings of the 28th Conference on Learning Theory.
  6. ^ Soares, Nate. "Formalizing Two Problems of Realistic World-Models" (PDF). Intelligence.org. Retrieved 2015-07-19.
  7. ^ Playing Pacman using AIXI Approximation – YouTube

Read other articles:

Dalam nama Korean ini, nama keluarganya adalah Yoon. Yoon Hae-youngLahir5 Januari 1972 (umur 52)Seoul, Korea SelatanPendidikanDigital Seoul Culture Arts University - Beauty Arts Universitas Kyonggi - Multimedia and ActingPekerjaanAktrisTahun aktif1993-sekarangAgenHappy ActorsSuami/istri(1998-2005; cerai) (m. 2011)[1] Nama KoreaHangul윤해영 Alih AksaraYun Hae-yeongMcCune–ReischauerYun Hae-yŏng Yoon Hae-young (lahir 5 Januari 1972)[2] adalah aktris Korea Selatan....

 

 

artikel ini tidak memiliki pranala ke artikel lain. Tidak ada alasan yang diberikan. Bantu kami untuk mengembangkannya dengan memberikan pranala ke artikel lain secukupnya. (Pelajari cara dan kapan saatnya untuk menghapus pesan templat ini) Empresa Nacional de Aeronáutica de ChileJenisperusahaan milik negaraIndustriDirgantara, pertahananDidirikan16 Maret 1984; 39 tahun lalu (1984-03-16)PemilikAngkatan Udara ChiliSitus webwww.enaer.cl  ENAER (pengucapan bahasa Spanyol: [enaˈer...

 

 

ManjūJenisWagashiTempat asalJepangDaerahAsia TimurBahan utamaTepung terigu, tepung beras, gandum kuda, selai kacang merahSunting kotak info • L • BBantuan penggunaan templat ini  Media: Manjū Manjū (饅頭code: ja is deprecated , まんじゅう) adalah sejenis penganan manis tradisional Jepang yang dikukus, dibuat dalam berbagai bentuk dan menggabungkan berbagai bahan dan rasa yang berbeda. Biasanya, ada dua bagian utama manjū, kulit luarnya, terbuat dari tepung ga...

Badan Kekayaan Intelektual Uni Eropa (European Union Intellectual Property Office) adalah kantor yang mengurus Merk Dagang dan Hak Desain-rupa, yang diaplikasikan ke seluruh wilayah Uni Eropa. Hak-hak ini berdampingan dengan hak kekayaan intelektuil yang dimiliki oleh masing-masing nasional dan terhubung dengan sistem hak kekayaan intelektuil internasional. Kantor ini dikenal sebagai The European Union Intellectual Property Office (EUIPO) dan sejak tahun 2002 bertanggung-jawab atas penyalah-g...

 

 

Koordinat: 7°58′34.20″S 112°38′6.54″E / 7.9761667°S 112.6351500°E / -7.9761667; 112.6351500 SMA Negeri 3 MalangInformasiDidirikan8 Agustus 1952AkreditasiANomor Statistik Sekolah301056101003Nomor Pokok Sekolah Nasional20533665Kepala SekolahDrs. Amat, M.M.PdJumlah kelas27Jurusan atau peminatanIPA IPSRentang kelasX; XI; XII IPA, XII IPSKurikulumKurikulum 2013, Kurikulum MerdekaJumlah siswa809StatusSekolah Negeri‎NEM terendah8.929 (2014)NEM ter...

 

 

Election in New Jersey Main article: 1996 United States presidential election 1996 United States presidential election in New Jersey ← 1992 November 5, 1996 2000 →   Nominee Bill Clinton Bob Dole Ross Perot Party Democratic Republican Independent Home state Arkansas Kansas Texas Running mate Al Gore Jack Kemp Pat Choate Electoral vote 15 0 0 Popular vote 1,652,329 1,103,078 262,134 Percentage 53.72% 35.86% 8.52% County Results Clinton   ...

Cet article est une ébauche concernant l’eau. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Canal de la MartinièreCanal maritime de la basse Loire Canal de la Martinière, au niveau de la Machinerie des Champs Neufs, Frossay Géographie Pays France Coordonnées 47° 12′ 28″ N, 1° 47′ 06″ O Début La Martinière au Pellerin Fin Le Carnet à Frossay Traverse Loire-Atlan...

 

 

Voce principale: Unione Sportiva Salernitana 1919. Unione Sportiva SalernitanaStagione 1951-1952Sport calcio Squadra Salernitana Allenatore Rodolphe Hiden All. in seconda Mario Saracino Presidente Marcantonio Ferro Serie B8º posto Maggiori presenzeCampionato: Fragni, Taccola (36)Totale: Fragni, Taccola (36) Miglior marcatoreCampionato: De Andreis (11)Totale: De Andreis (11) StadioComunale (9.000)[1] 1950-1951 1952-1953 Si invita a seguire il modello di voce Questa pagina raccog...

 

 

GATEGATE Developer v5 main windowDeveloper(s)GATE research team, Dept. Computer Science, University of SheffieldInitial release1995; 29 years ago (1995)Stable release8.6.1 (January 17, 2020; 4 years ago (2020-01-17)) [±]Preview release9.0-SNAPSHOT (May 24, 2024 (Nightly builds released every day)) [±] Repositorygithub.com/GateNLP Written inJavaOperating systemCross-platformAvailable inEnglishTypeText mining Information ExtractionLi...

يفتقر محتوى هذه المقالة إلى الاستشهاد بمصادر. فضلاً، ساهم في تطوير هذه المقالة من خلال إضافة مصادر موثوق بها. أي معلومات غير موثقة يمكن التشكيك بها وإزالتها. (ديسمبر 2018) كلية العلوم القانونية والاقتصادية والاجتماعية معلومات التأسيس 1975 النوع تعليم جامعي عمومي تكاليف الدراس...

 

 

Establishment of racial discrimination as a policy within a society or organisation Not to be confused with Societal racism, Structural inequality, Structural violence, or Structural abuse. Institutional racism, also known as systemic racism, is defined as policies and practices that exist throughout a whole society or organization that result in and support a continued unfair advantage to some people and unfair or harmful treatment of others based on race or ethnic group. It manifests as dis...

 

 

American rodeo organization Professional Rodeo Cowboys AssociationSportRodeoFounded1936Countries United States Canada MexicoMost recentchampion(s) Stetson Wright, All-AroundOfficial websiteProRodeo.com This article is part of a series aboutProfessional Rodeo Cowboys Association Professional Rodeo Cowboys Association ProRodeo Hall of Fame Women's Professional Rodeo Association List of Professional Rodeo Cowboys Association Champions List of ProRodeo Hall of Fame inductees vte Th...

Speech by US president John F. Kennedy January 30, 1961, State of the Union AddressDateJanuary 30, 1961 (1961-01-30)Duration43 minutes[1]VenueHouse Chamber, United States CapitolLocationWashington, D.C.Coordinates38°53′23″N 77°00′32″W / 38.88972°N 77.00889°W / 38.88972; -77.00889TypeState of the Union AddressParticipantsJohn F. KennedyLyndon B. JohnsonSam RayburnPreviousJanuary 12, 1961, State of the Union AddressNext1962 State of the...

 

 

  لمعانٍ أخرى، طالع غلاسكو (توضيح). غلاسكو    علم شعار الاسم الرسمي (بالغيلية الإسكتلندية: Glaschu)‏    الإحداثيات 55°51′40″N 4°15′00″W / 55.861111111111°N 4.25°W / 55.861111111111; -4.25   [1] تقسيم إداري  البلد المملكة المتحدة (6 ديسمبر 1922–)[2][3]  التقسيم �...

 

 

Adria Airways IATA ICAO Kode panggil JP ADR ADRIA Didirikan14 Maret 1961PenghubungBandar Udara LjubljanaPenghubung sekunderBandar Udara Internasional PristinaProgram penumpang setiaMiles & MoreLounge bandaraSenator LoungeAliansiStar AllianceArmada13[1] (+1 order)Tujuan16 Winter 2011/12SloganYour home above the cloudsKantor pusatBandar Udara LjubljanaZgornji Brnik, Cerklje na Gorenjskem, SloveniaTokoh utamaKlemen Boštjančič (CEO)Situs webwww.adria.si Adria Airways CRJ 200 Adria ...

Université CornellHistoireFondation 1865StatutType Université privéeNom officiel Cornell UniversityRégime linguistique AnglaisFondateur Ezra Cornell, Andrew Dickson WhitePrésidente Martha E. PollackDevise « I would found an institution where any person can find instruction in any study », Ezra Cornell, 1865[1] aMembre de Ivy League, ORCID (d), Digital Library Federation (en)Site web www.cornell.eduChiffres-clésÉtudiants 23 620 (2020)Effectif 11 062 (2020)Enseignan...

 

 

Village in Eastern, North MacedoniaZletovo ЗлетовоVillagePanoramic view of the village ZletovoZletovoLocation within North MacedoniaCoordinates: 41°59′N 22°14′E / 41.983°N 22.233°E / 41.983; 22.233Country North MacedoniaRegion EasternMunicipality ProbištipPopulation (2002) • Total2,477Time zoneUTC+1 (CET) • Summer (DST)UTC+2 (CEST)Website. Zletovo (Macedonian: Злетово) is a village in the municipality of Probišt...

 

 

Dictionary of 184 philosophical terms Definitions The oldest, surviving manuscript of Definitions: Paris, Bibliothèque Nationale, Gr. 1807 (9th century), first pageAuthorPseudo-PlatoLanguageAncient GreekSubjectPhilosophy The Definitions (Greek: Ὅροι Horoi; Latin: Definitiones[1]) is a dictionary of 184 philosophical terms sometimes included in the corpus of Plato's works. Plato is generally not regarded as the editor of all of Definitions. Some ancient scholars attributed Defini...

Questa voce sull'argomento nuotatori cinesi è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Qin HaiyangNazionalità Cina Nuoto SpecialitàRana Record 50 m rana 2620 (2023) 100 m rana 5769 (2023) 200 m rana 2'0548 (2023) Palmarès Competizione Ori Argenti Bronzi Giochi olimpici 1 1 0 Mondiali 4 1 0 Mondiali in vasca corta 0 1 1 Giochi asiatici 6 1 2 Universiadi 5 0 0 Vedi maggiori dettagliStatistic...

 

 

For the album, see Green Tambourine (album). 1967 single by the Lemon PipersGreen TambourineA-side label of the US singleSingle by the Lemon Pipersfrom the album Green Tambourine B-sideNo Help from MeReleasedNovember 1967 (1967-11)Recorded1967StudioCleveland Recording Company, Cleveland[1]GenreBubblegum pop[2][3][4]psychedelic pop[4][5]psychedelic rock[6]raga rock[4]Length2:23LabelBuddahSongwriter(s)Paul LekaShelley Pin...