Chinchilla (language model)

Chinchilla is a family of large language models (LLMs) developed by the research team at Google DeepMind, presented in March 2022.[1]

Models

It is named "chinchilla" because it is a further development over a previous model family named Gopher. Both model families were trained in order to investigate the scaling laws of large language models.[2]

It claimed to outperform GPT-3. It considerably simplifies downstream utilization because it requires much less computer power for inference and fine-tuning. Based on the training of previously employed language models, it has been determined that if one doubles the model size, one must also have twice the number of training tokens. This hypothesis has been used to train Chinchilla by DeepMind. Similar to Gopher in terms of cost, Chinchilla has 70B parameters and four times as much data.[3]

Chinchilla has an average accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was still in the testing phase as of January 12, 2023.[4]

Chinchilla contributes to developing an effective training paradigm for large autoregressive language models with limited compute resources. The Chinchilla team recommends that the number of training tokens is twice for every model size doubling, meaning that using larger, higher-quality training datasets can lead to better results on downstream tasks.[5][6]

It has been used for the Flamingo vision-language model.[7]

Architecture

Both the Gopher family and Chinchilla family are families of transformer models.

In particular, they are essentially the same as GPT-2, with different sizes and minor modifications. Gopher family uses RMSNorm instead of LayerNorm; relative positional encoding rather than absolute positional encoding. The Chinchilla family is the same as the Gopher family, but trained with AdamW instead of Adam optimizer.

The Gopher family contains six models of increasing size, from 44 million parameters to 280 billion parameters. They refer to the largest one as "Gopher" by default. Similar naming conventions apply for the Chinchilla family.

Table 1 of [2] shows the entire Gopher family:

Model specifications for Gopher family
Parameter count Layers Number of heads Key/value size Internal dimension Max learning rate Batch size
44M 8 16 32 512 6 × 10−4 0.25M
117M 12 12 64 768 6 × 10−4 0.25M
417M 12 12 128 1,536 2 × 10−4 0.25M
1.4B 24 16 128 2,048 2 × 10−4 0.25M
7.1B 32 32 128 4,096 1.2 × 10−4 2M
Gopher 280B 80 128 128 16,384 4 × 10−5 3M → 6M

Table 4 of [1] compares the 70-billion-parameter Chinchilla with Gopher 280B.

Comparison between Chinchilla and Gopher
Parameter count Layers Number of heads Key/value size Internal dimension Max learning rate Batch size
Gopher 280B 80 128 128 16,384 4 × 10−5 3M → 6M
Chinchilla 70B 80 64 128 8,192 1 × 10−4 1.5M → 3M

See also

References

  1. ^ a b Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan (2022-03-29). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
  2. ^ a b Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard (2022-01-21). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv:2112.11446 [cs.CL].
  3. ^ Eliaçık, Eray (January 12, 2023). "Chinchilla AI is coming for the GPT-3's throne". Dataconomy. Archived from the original on March 26, 2023.
  4. ^ Hendrycks, Dan (2023-03-14), Measuring Massive Multitask Language Understanding, archived from the original on 2023-03-15, retrieved 2023-03-15
  5. ^ Chaithali, G. (April 9, 2022). "Check Out This DeepMind's New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks". Archived from the original on March 27, 2023. Retrieved January 15, 2023.
  6. ^ Wali, Kartik (April 12, 2022). "DeepMind launches GPT-3 rival, Chinchilla". Analytics India Magazine. Archived from the original on March 26, 2023. Retrieved January 15, 2023.
  7. ^ Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao (2022-12-06). "Flamingo: a Visual Language Model for Few-Shot Learning". Advances in Neural Information Processing Systems. 35: 23716–23736. arXiv:2204.14198.

Read other articles:

Laodike VMata uang logam Demetrios I (kemungkinan) dengan Laodike VPermaisuri MakedoniaBerkuasa178/177 SM – 168 SMTakhta178 atau 177 SMPendahuluPolycratiaPenerusaneksasi oleh RomawiPermaisuri Kekaisaran SeleukiaBerkuasa161 BC–150 SM(belum dikonfirmasi)PendahuluLaodike IVPenerusApama atau Cleopatra TheaInformasi pribadiKelahiranSeleukiaKematian150 SMDinastiSeleukiaAyahSeleukos IVIbuLaodike IVPasangan Perseus Demetrios I (kemungkinan) Anak Dengan Perseus: Alexander Filipus Andriscus (d...

 

L'accord parfait de fa dièse majeur se compose des notes suivantes : fa ♯, la ♯, do ♯ La tonalité de fa dièse majeur se développe en partant de la note tonique fa dièse. Elle est appelée F-sharp major en anglais et Fis-Dur dans l’Europe centrale. Acoustiquement, elle coïncide avec la tonalité de sol bémol majeur (une telle équivalence s'appelle enharmonie). L'armure coïncide avec celle de la tonalité relative ré dièse mineur. La lecture audio n'est pas prise en ...

 

Community area in Chicago Community area in Illinois, United StatesBeverlyCommunity areaCommunity Area 72 - Beverly HillsWelcome sign at the corner of 95th Street and Western AvenueLocation within ChicagoCoordinates: 41°42.6′N 87°40.8′W / 41.7100°N 87.6800°W / 41.7100; -87.6800CountryUnited StatesStateIllinoisCountyCookCityChicagoNeighborhoods List BeverlyEast BeverlyNorth BeverlyWest Beverly Area • Total3.19 sq mi (8.3 km2)Population...

Racing video game series This article is about the Formula One video game series by EA Sports. For the Formula One video game series by Psygnosis, see Formula One (video game series). For the 1993 video game developed by Lankhor, see F1 (video game). For Formula One video games, see Formula One video games. Video game seriesF1Genre(s)RacingDeveloper(s)CodemastersEA Redwood ShoresEA UKFeral InteractiveImage Space IncorporatedIntelligent GamesSumo DigitalTiertex Design StudiosVisual SciencePubl...

 

Irish recipient of the Victoria Cross Harry LysterBorn(1830-12-24)24 December 1830Blackrock, Dublin, IrelandDied1 February 1922 (aged 91)London, EnglandBuriedSt James the Less Churchyard, StubbingAllegiance United KingdomService/branchBengal Army British Indian ArmyBattles/warsIndian MutinySecond Anglo-Afghan War Battle of Ahmed Khel AwardsVictoria CrossCompanion of the Order of the BathMentioned in Despatches five timesRelationsHamilton Lyster Reed VC (nephew) Lieutenant General Ha...

 

Former ice hockey team of the World Hockey Association This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Minnesota Fighting Saints – news · newspapers · books · scholar · JSTOR (October 2021) (Learn how and when to remove this message) Minnesota Fighting SaintsCitySaint Paul, MinnesotaLeagueWorld Hockey Associ...

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

 

American comic book writer, editor, and publisher Joe PruettJoe Pruett in 2009NationalityAmericanArea(s)Writer, Editor, Publisher, LettererNotable worksNegative BurnKilroy Is Here Joe Pruett (January 8, 1966)[1] is an American comic book writer, editor, and publisher. Biography Pruett broke into the industry during the year of 1989 as Bob Burden's assistant on Flaming Carrot Comics, where he inked backgrounds, assisted on lettering, and transcribed scripts. He worked with Burden from ...

 

Yekaterina yang AgungЕкатерина ВеликаяVelikaya (yang Agung)Maharani dan Autokrat seluruh RusiaBerkuasa9 Juli 1762 – 17 November 1796(34 tahun, 131 hari)PendahuluPyotr IIIPenerusPavel IPermaisuri Kaisar RusiaPeriode5 Januari 1762 – 9 Juli 1762Informasi pribadiKelahiranSophie Friederike Auguste von Anhalt-Zerbst-Domburg(1729-05-02)2 Mei 1729Stettin, Pomerania, PrusiaKematian17 November 1796(1796-11-17) (umur 67)Istana Musim Dingin, Sankt-PeterburgPemakamanKate...

إقليم بوجدور   إقليم بوجدور تقسيم إداري البلد المغرب  [1] العاصمة بوجدور  التقسيم الأعلى جهة العيون الساقية الحمراء  خصائص جغرافية إحداثيات 26°08′00″N 14°30′00″W / 26.133333333333°N 14.5°W / 26.133333333333; -14.5   المساحة 43753 كيلومتر مربع  السكان التعداد السكاني 50556...

 

2003 film by Bille Woodruff HoneyTheatrical release posterDirected byBille WoodruffWritten by Alonzo Brown Kim Watson Produced by Marc Platt Andre Harrell Starring Jessica Alba Mekhi Phifer Joy Bryant Lil' Romeo CinematographyJohn R. LeonettiEdited by Mark Helfrich Emma E. Hickox Music byMervyn WarrenProductioncompanyNuAmerica EntertainmentDistributed byUniversal PicturesRelease date December 5, 2003 (2003-12-05) Running time94 minutesCountryUnited StatesLanguageEnglishBudget$1...

 

Questa voce o sezione sull'argomento storia contemporanea non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Segui i suggerimenti del progetto di riferimento. Un operaio addetto a una macchina a vapore (foto di Lewis Hine, 1920) Con il termine questione operaia si designava nella seconda metà dell'800 nel contesto della seconda rivoluzione industriale e de...

سرينيفاسا فارادهان (بالتاميلية: சாத்தமங்கலம் ரங்க ஐயங்கார் ஸ்ரீனிவாச வரதன்)‏  معلومات شخصية الميلاد 2 يناير 1940(1940-01-02)تشيناي  الهند البريطانية مواطنة الهند الولايات المتحدة  عضو في الجمعية الملكية،  والأكاديمية النرويجية للعلوم والآداب، ...

 

Piala Interkontinental 1968 Estudiantes Manchester United 2 1 Leg pertama Estudiantes Manchester United 1 0 Tanggal25 September 1968StadionLa Bombonera, Buenos AiresWasitHugo Sosa Miranda (Paraguay)Penonton25,134Leg kedua Manchester United Estudiantes 1 1 Tanggal16 Oktober 1968StadionOld Trafford, ManchesterWasitKonstantin Zečević (Yugoslavia)Penonton63,428← 1967 1969 → Piala Interkontinental 1968 adalah pertandingan sepak bola dua leg pada September dan Oktober 1968 antara juar...

 

British police procedural TV series (1955–1976) Dixon of Dock GreenJack Warner as Constable George DixonCreated byTed WillisStarringJack WarnerCountry of originUnited KingdomNo. of series22No. of episodes432 (399 missing) (list of episodes)ProductionRunning time30 minutes & 50 minutesOriginal releaseNetworkBBC Television Service/BBC1Release9 July 1955 (1955-07-09) –1 May 1976 (1976-05-01) Dixon of Dock Green is a BBC police procedural television series about daily life ...

USA Women's 3x3 Team United StatesFIBA ranking7FIBA zoneFIBA AmericasNational federationUSA BasketballOlympic GamesAppearances1Medals Gold: (2020)World CupAppearances6Medals Gold: (2012, 2014, 2023) Bronze: (2016)Pan American GamesAppearances2Medals Gold: (2019, 2023)AmeriCupAppearances3Medals Gold: (2021, 2023) Bronze: (2022)Youth Olympic GamesAppearances3Medals Gold: (2014, 2018) Bronze: (2010) Medal record Olympic Games 2020 Tokyo Team World Cup 2012 Greece Team 2014 Russia Team 2023 Aust...

 

Fictitious name assumed for a particular purpose Aliases redirects here. For other uses, see Alias (disambiguation). Pseud. redirects here. For the columnn in Private Eye, see List of regular mini-sections in Private Eye § Pseuds Corner. Allonym redirects here. For a definition of the term allonym, see the Wiktionary entry allonym. William Sydney Porter, who went by the pen name O. Henry or Olivier Henry, in 1909 A pseudonym (/ˈsjuːdənɪm/; from Ancient Greek ψευδώνυ�...

 

Solar field, Kibbutz Elifaz, Israel Energy consumption by source, Israel Most energy in Israel comes from fossil fuels. The country's total primary energy demand is significantly higher than its total primary energy production, relying heavily on imports to meet its energy needs. Total primary energy consumption was 304 TWh (1.037 quad) in 2016, or 26.2 million tonne of oil equivalent.[1] Electricity consumption in Israel was 57,149 GWh in 2017, while production was 64...

Voce principale: Novara Calcio. Novara CalcioStagione 1987-1988Sport calcio Squadra Novara Allenatore Angelo Pereni poi Roberto Bacchin Presidente Franco Nicolazzi Serie C212º posto nel girone B. Maggiori presenzeCampionato: Testa (33) Miglior marcatoreCampionato: Mazzeo (7) 1986-1987 1988-1989 Si invita a seguire il modello di voce Questa voce raccoglie le informazioni riguardanti il Novara Calcio nelle competizioni ufficiali della stagione 1987-1988. Indice 1 Rosa 2 Risultati 2.1 Cam...

 

Flightradar24URL www.flightradar24.com 言語 英語タイプ 航空機レーダー追跡サイト運営者 Flightradar24 AB設立者  スウェーデン Svenska Resenätverket AB収益 広告、有料版アプリ、有料会員から営利性 あり登録 不要(一部の機能は有料会員登録が必要)開始 2009年 Flightradar24(フライトレーダー24)は、飛行中の民間航空機の現在位置をリアルタイム表示するウェブサイトならびにスマ�...