Imitation learning

Imitation learning is a paradigm in reinforcement learning, where an agent learns to perform a task by supervised learning from expert demonstrations. It is also called learning from demonstration and apprenticeship learning.[1][2][3]

It has been applied to underactuated robotics,[4] self-driving cars,[5][6][7] quadcopter navigation,[8] helicopter aerobatics,[9] and locomotion.[10][11]

Approaches

Expert demonstrations are recordings of an expert performing the desired task, often collected as state-action pairs .

Behavior Cloning

Behavior Cloning (BC) is the most basic form of imitation learning. Essentially, it uses supervised learning to train a policy such that, given an observation , it would output an action distribution that is approximately the same as the action distribution of the experts.[12]

BC is susceptible to distribution shift. Specifically, if the trained policy differs from the expert policy, it might find itself straying from expert trajectory into observations that would have never occurred in expert trajectories.[12]

This was already noted by ALVINN, where they trained a neural network to drive a van using human demonstrations. They noticed that because a human driver never strays far from the path, the network would never be trained on what action to take if it ever finds itself straying far from the path.[5]

DAgger

Dagger (Dataset Aggregation)[13] improves on behavior cloning by iteratively training on a dataset of expert demonstrations. In each iteration, the algorithm first collects data by rolling out the learned policy . Then, it queries the expert for the optimal action on each observation encountered during the rollout. Finally, it aggregates the new data into the datasetand trains a new policy on the aggregated dataset.[12]

Decision transformer

Architecture diagram of the decision transformer.

The Decision Transformer approach models reinforcement learning as a sequence modelling problem.[14] Similar to Behavior Cloning, it trains a sequence model, such as a Transformer, that models rollout sequences where is the sum of future reward in the rollout. During training time, the sequence model is trained to predict each action , given the previous rollout as context:During inference time, to use the sequence model as an effective controller, it is simply given a very high reward prediction , and it would generalize by predicting an action that would result in the high reward. This was shown to scale predictably to a Transformer with 1 billion parameters that is superhuman on 41 Atari games.[15]

Other approaches

See [16][17] for more examples.

Inverse Reinforcement Learning (IRL) learns a reward function that explains the expert's behavior and then uses reinforcement learning to find a policy that maximizes this reward.[18]

Generative Adversarial Imitation Learning (GAIL) uses generative adversarial networks (GANs) to match the distribution of agent behavior to the distribution of expert demonstrations.[19] It extends a previous approach using game theory.[20][16]

See also

Further reading

  • Hussein, Ahmed; Gaber, Mohamed Medhat; Elyan, Eyad; Jayne, Chrisina (2018-03-31). "Imitation Learning: A Survey of Learning Methods". ACM Computing Surveys. 50 (2): 1–35. doi:10.1145/3054912. hdl:10059/2298. ISSN 0360-0300.

References

  1. ^ Russell, Stuart J.; Norvig, Peter (2021). "22.6 Apprenticeship and Inverse Reinforcement Learning". Artificial intelligence: a modern approach. Pearson series in artificial intelligence (Fourth ed.). Hoboken: Pearson. ISBN 978-0-13-461099-3.
  2. ^ Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement learning: an introduction. Adaptive computation and machine learning series (Second ed.). Cambridge, Massachusetts: The MIT Press. p. 470. ISBN 978-0-262-03924-6.
  3. ^ Hussein, Ahmed; Gaber, Mohamed Medhat; Elyan, Eyad; Jayne, Chrisina (2017-04-06). "Imitation Learning: A Survey of Learning Methods". ACM Comput. Surv. 50 (2): 21:1–21:35. doi:10.1145/3054912. hdl:10059/2298. ISSN 0360-0300.
  4. ^ "Ch. 21 - Imitation Learning". underactuated.mit.edu. Retrieved 2024-08-08.
  5. ^ a b Pomerleau, Dean A. (1988). "ALVINN: An Autonomous Land Vehicle in a Neural Network". Advances in Neural Information Processing Systems. 1. Morgan-Kaufmann.
  6. ^ Bojarski, Mariusz; Del Testa, Davide; Dworakowski, Daniel; Firner, Bernhard; Flepp, Beat; Goyal, Prasoon; Jackel, Lawrence D.; Monfort, Mathew; Muller, Urs (2016-04-25). "End to End Learning for Self-Driving Cars". arXiv:1604.07316v1 [cs.CV].
  7. ^ Kiran, B Ravi; Sobh, Ibrahim; Talpaert, Victor; Mannion, Patrick; Sallab, Ahmad A. Al; Yogamani, Senthil; Perez, Patrick (June 2022). "Deep Reinforcement Learning for Autonomous Driving: A Survey". IEEE Transactions on Intelligent Transportation Systems. 23 (6): 4909–4926. arXiv:2002.00444. doi:10.1109/TITS.2021.3054625. ISSN 1524-9050.
  8. ^ Giusti, Alessandro; Guzzi, Jerome; Ciresan, Dan C.; He, Fang-Lin; Rodriguez, Juan P.; Fontana, Flavio; Faessler, Matthias; Forster, Christian; Schmidhuber, Jurgen; Caro, Gianni Di; Scaramuzza, Davide; Gambardella, Luca M. (July 2016). "A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots". IEEE Robotics and Automation Letters. 1 (2): 661–667. doi:10.1109/LRA.2015.2509024. ISSN 2377-3766.
  9. ^ "Autonomous Helicopter: Stanford University AI Lab". heli.stanford.edu. Retrieved 2024-08-08.
  10. ^ Nakanishi, Jun; Morimoto, Jun; Endo, Gen; Cheng, Gordon; Schaal, Stefan; Kawato, Mitsuo (June 2004). "Learning from demonstration and adaptation of biped locomotion". Robotics and Autonomous Systems. 47 (2–3): 79–91. doi:10.1016/j.robot.2004.03.003.
  11. ^ Kalakrishnan, Mrinal; Buchli, Jonas; Pastor, Peter; Schaal, Stefan (October 2009). "Learning locomotion over rough terrain using terrain templates". 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE. pp. 167–172. doi:10.1109/iros.2009.5354701. ISBN 978-1-4244-3803-7.
  12. ^ a b c CS 285 at UC Berkeley: Deep Reinforcement Learning. Lecture 2: Supervised Learning of Behaviors
  13. ^ Ross, Stephane; Gordon, Geoffrey; Bagnell, Drew (2011-06-14). "A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning". Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings: 627–635.
  14. ^ Chen, Lili; Lu, Kevin; Rajeswaran, Aravind; Lee, Kimin; Grover, Aditya; Laskin, Misha; Abbeel, Pieter; Srinivas, Aravind; Mordatch, Igor (2021). "Decision Transformer: Reinforcement Learning via Sequence Modeling". Advances in Neural Information Processing Systems. 34. Curran Associates, Inc.: 15084–15097. arXiv:2106.01345.
  15. ^ Lee, Kuang-Huei; Nachum, Ofir; Yang, Mengjiao; Lee, Lisa; Freeman, Daniel; Xu, Winnie; Guadarrama, Sergio; Fischer, Ian; Jang, Eric (2022-10-15), Multi-Game Decision Transformers, arXiv:2205.15241, retrieved 2024-10-22
  16. ^ a b Hester, Todd; Vecerik, Matej; Pietquin, Olivier; Lanctot, Marc; Schaul, Tom; Piot, Bilal; Horgan, Dan; Quan, John; Sendonaris, Andrew (2017-04-12). "Deep Q-learning from Demonstrations". arXiv:1704.03732v4 [cs.AI].
  17. ^ Duan, Yan; Andrychowicz, Marcin; Stadie, Bradly; Jonathan Ho, OpenAI; Schneider, Jonas; Sutskever, Ilya; Abbeel, Pieter; Zaremba, Wojciech (2017). "One-Shot Imitation Learning". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  18. ^ A, Ng (2000). "Algorithms for Inverse Reinforcement Learning". Proc. Of 17th International Conference on Machine Learning, 2000: 663–670.
  19. ^ Ho, Jonathan; Ermon, Stefano (2016). "Generative Adversarial Imitation Learning". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc. arXiv:1606.03476.
  20. ^ Syed, Umar; Schapire, Robert E (2007). "A Game-Theoretic Approach to Apprenticeship Learning". Advances in Neural Information Processing Systems. 20. Curran Associates, Inc.

Read other articles:

Peria hutan Momordica balsamina Buah peria hutan yang sudah masak dengan biji berwarna merah TaksonomiDivisiTracheophytaSubdivisiSpermatophytesKladAngiospermaeKladmesangiospermsKladeudicotsKladcore eudicotsKladSuperrosidaeKladrosidsKladfabidsOrdoCucurbitalesFamiliCucurbitaceaeTribusMomordiceaeGenusMomordicaSpesiesMomordica balsamina Linnaeus, 1753 lbs Peria hutan[1] atau pare alas[a] (Momordica balsamina) adalah spesies tumbuhan yang berasal dari genus Momordica.[2] Ta...

 

منتخب أيرلندا الشمالية لكرة القدم معلومات عامة بلد الرياضة  أيرلندا الشمالية الفئة كرة القدم للرجال  رمز الفيفا NIR  الاتحاد الاتحاد الأيرلندي لكرة القدم كونفدرالية يويفا (أوروبا) الملعب الرئيسي ويندسور بارك الموقع الرسمي الموقع الرسمي،  والموقع الرسمي  الطا�...

 

Divisi Pertama Liga InggrisBadan yang mengaturThe Football LeagueNegara Inggris  WalesDibentuk1888Dibubarkan2004Jumlah tim24 (1992–2004)Tingkat pada piramida1 (1888–1992)2 (1992–2004)Promosi keLiga Utama Inggris (1992–2004)Degradasi keDivisi KeduaPiala domestikPiala FAPiala ligaPiala LigaPiala Full Members (1985–1992)Piala internasionalPiala Champions Eropa(1956–1985, 1991–1992)Piala Winners Eropa(1960–1985, 1990–1999)Piala UEFA(1971–1985, 1990–2004)Piala Inter...

جان أوغوستان بارال (بالفرنسية: Jean-Augustin Barral)‏    معلومات شخصية الميلاد 31 يناير 1819   متز  الوفاة 10 سبتمبر 1884 (65 سنة)   فونتوني سو بوا  مكان الدفن مقبرة مونبارناس  مواطنة فرنسا  الحياة العملية المدرسة الأم المدرسة متعددة التقانات  المهنة كيميائي،  وقائد ...

 

Dalam nama yang mengikuti kebiasaan penamaan Slavia Timur ini, patronimiknya adalah Ivanovna dan nama keluarganya adalah Rodionova. Arina RodionovaRodionova, 2015Nama lengkapArina Ivanovna RodionovaKebangsaan Rusia (2004–2014) Australia (2014–sekarang)Tempat tinggalMelbourne, AustraliaLahir15 Desember 1989 (umur 34)Tambov, Rusia, Soviet UnionTinggi168 m (551 ft 2 in)Memulai pro2004[1]Tipe pemainRight-handed (two-handed backhand)Total hadiahUS$1,90...

 

London evening newspaper published from 1888 to 1960 This article is about the London evening newspaper founded in 1888. For the London evening newspaper founded in 1788, see The Star (1788). The StarPlacard for The Star announcing signing of the Treaty of Versailles, 28 June 1919TypeDailyFounder(s)T. P. O'ConnorLaunched1888LanguageEnglishCeased publication1960HeadquartersLondon, EnglandCityLondonCountryEnglandMedia of EnglandList of newspapers The Star was a London evening newspaper founded ...

Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article n’est pas rédigé dans un style encyclopédique (janvier 2022). Vous pouvez améliorer sa rédaction ! Les éditeurs de jeux de société sont des entreprises qui développent ou adaptent des jeux de société à destination des enfants, des familles ou des adultes. Ils peuvent distribuer leurs jeux eux-mêmes auprès des détaillants ou passer par des distributeurs de jeux de société. Princ...

 

Central bank Seychelles Central Bank of SeychellesBanque centrale des Seychelles Labank santral SeselHeadquartersVictoria, SeychellesCoordinates04°37′24″S 55°27′14″E / 4.62333°S 55.45389°E / -4.62333; 55.45389[1]Established1 January 1983; 41 years ago (1983-01-01)Ownership100% state ownership[2]GovernorCaroline AbelCentral bank ofSeychellesCurrencySeychellois rupeeSCR (ISO 4217)Reserves530 million USD[2&#...

 

Voce principale: Associazione Sportiva Lucchese Libertas 1905. Associazione Sportiva Lucchese LibertasStagione 1995-1996Le forze della Lucchese per la stagione 1995-1996 Sport calcio Squadra Lucchese Allenatore Bruno Bolchi All. in seconda Giampaolo Piaceri Presidente Egiziano Maestrelli Serie B6º posto Coppa ItaliaSecondo turno Maggiori presenzeCampionato: Giusti, Russo (37)Totale: Giusti, Russo (39) Miglior marcatoreCampionato: Rastelli (14)Totale: Rastelli (15) 1994-1995 1996-1997 S...

Questa voce sull'argomento cestisti italiani è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Jacopo VedovatoNazionalità Italia Altezza205 cm Peso106 kg Pallacanestro RuoloCentro Squadra Amici Pall. Udinese CarrieraGiovanili 2009-2010Pallacanestro Limena2011-2012 Pall. Treviso2012-2015 Universo Treviso Squadre di club 2009-2010Basket Insieme1 (5)2013-2015 Universo Treviso49 (92...

 

NGC 547Galassia ellitticaNGC 547 nelle immagini SDSS.ScopertaScopritoreWilliam Herschel Data1 ottobre 1785 [1] Dati osservativi(epoca J2000)CostellazioneBalena Ascensione retta01h 26m 00.6s [2] Declinazione-01° 20′ 43″ [2] Distanza248,3 mega anni luce (76,13 Mpc) a.l.   [2] Magnitudine apparente (V)12,2 [3] nella banda B: 13,2 [3][4] Redshift+0,018239 ± 0,000020 [2] Luminos...

 

В Википедии есть статьи о других людях с такой фамилией, см. Солдатёнков; Солдатёнков, Козьма. Козьма Терентьевич Солдатёнков А. Г. Горавский. Портрет К. Т. Солдатёнкова. 1857Холст, масло. 69,4 × 56,3 см (овал)Третьяковская галерея, Москва Дата рождения 10 (22) октября 1818(1818-10-22) Мес�...

Bienvenue chez les Casagrandes Logotype de la version française de la série. Données clés Type de série Série télévisée d'animation Titre original The Casagrandes Genre SitcomComédie Création Michael Rubiner Acteurs principaux Izabella Alvarez (en)Carlos PenaVegaSumalee Montano (en)Sonia Manzano (en)Carlos AlazraquiAlexa PenaVega Pays d'origine États-Unis Chaîne d'origine Nickelodeon Nb. de saisons 3 Nb. d'épisodes 70 Format couleur - HDTV 1080i Dolby Digital 5.1...

 

نادي راسينغ تأسس عام 1903 (منذ 121 سنة) الملعب ملعب الرئيس خوان دومينغو بيرون  البلد الأرجنتين  الدوري دوري الدرجة الأولى الأرجنتيني  الموقع الرسمي الموقع الرسمي  الطقم الرسمي الطقم الأساسي الطقم الاحتياطي تعديل مصدري - تعديل   نادي راسينغ أفيانيدا (بالإسبانية: Raci...

 

Чевиана — отрезок в треугольнике, соединяющий вершину треугольника с внутренней точкой на противоположной стороне[1]. Часто рассматриваются три таких отрезка, пересекающихся в одной точке, которые совместно называются чевианами. Название «чевиана» происходит о�...

Genus of flowering plants Isopogon Isopogon cuneatus Scientific classification Kingdom: Plantae Clade: Tracheophytes Clade: Angiosperms Clade: Eudicots Order: Proteales Family: Proteaceae Subfamily: Proteoideae Tribe: Leucadendreae Subtribe: Isopogoninae Genus: IsopogonR.Br. ex Knight[1] Type species Isopogon anemonifolius[2] Species 39 species (see text) Occurrence data from Australasian Virtual Herbarium Isopogon anethifolius, Maranoa GardensInfructescence of Isopogon anemon...

 

Golf competition Golf tournament2007 European Amateur Team ChampionshipTournament informationDates3–7 July 2007LocationIrvine, North Ayrshire, Scotland, United Kingdom55°35′42″N 4°39′34″W / 55.5950°N 04.6595°W / 55.5950; -04.6595Course(s)Western Gailes Golf ClubOrganized byEuropean Golf AssociationFormatQualification round: 36 holes stroke playKnock-out match-playStatisticsPar71Length7,014 yards (6,414 m)Field20 teams120 playersChampion IrelandJ...

 

1959 soundtrack album by Art Blakey and The Jazz MessengersDes femmes disparaissentSoundtrack album by Art Blakey and The Jazz MessengersReleased1959RecordedDecember 18 & 19, 1958Paris, FranceGenreJazzLength30:15LabelFontana (France)660 224 MRProducerMarcel RomanoArt Blakey and The Jazz Messengers chronology 1958 - Paris Olympia(1958) Des femmes disparaissent(1959) Art Blakey et les Jazz-Messengers au club St. Germain(1958) Des femmes disparaissent is a soundtrack album to the Fre...

Enzyme with key regulatory roles in most cells Adenylate cyclaseAdenylate cyclase (calmodulin sensitive) trimer, Bacillus anthracis Epinephrine binds its receptor, that associates with a heterotrimeric G protein. The G protein associates with adenylyl cyclase, which converts ATP to cAMP, spreading the signal.[1]IdentifiersEC no.4.6.1.1CAS no.9012-42-4 DatabasesIntEnzIntEnz viewBRENDABRENDA entryExPASyNiceZyme viewKEGGKEGG entryMetaCycmetabolic pathwayPRIAMprofilePDB structuresRCSB PDB...

 

アイドリッシュセブン ジャンル アイドル ゲーム ゲームジャンル 音楽・AVG 対応機種 iOS、Android 開発元 ギークス→G2Studios 発売元 バンダイナムコオンライン プロデューサー 下岡聡吉[1]、根岸綾香[1] ディレクター 井上良一(チーフ)[2]橋本啓(開発)[2]大塚雅規(運営)[2] キャラクターデザイン 種村有菜(原案)深川可純 シナリオ 都志�...