GPT-1

Generative Pre-trained Transformer 1 (GPT-1)
Original author(s)OpenAI
Initial releaseJune 2018; 6 years ago (June 2018)
Repository
SuccessorGPT-2
Type
LicenseMIT[1]
Websiteopenai.com/blog/language-unsupervised/ Edit this on Wikidata
Original GPT architecture

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017.[2] In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training",[3] in which they introduced that initial model along with the general concept of a generative pre-trained transformer.[4]

Up to that point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use of datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;[3][5] many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.[5] In contrast, a GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.[3]

The use of a transformer architecture, as opposed to previous techniques involving attention-augmented RNNs, provided GPT models with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".[3]

Reason for choosing BookCorpus

BookCorpus was chosen as a training dataset partly because the long passages of continuous text helped the model learn to handle long-range information.[6] It contained over 7,000 unpublished fiction books from various genres. The rest of the datasets available at the time, while being larger, lacked this long-range structure (being "shuffled" at a sentence level).[3]

The BookCorpus text was cleaned by the ftfy library to standardized punctuation and whitespace and then tokenized by spaCy.[3]

Architecture

The GPT-1 architecture was a twelve-layer decoder-only transformer, using twelve masked self-attention heads, with 64-dimensional states each (for a total of 768). Rather than simple stochastic gradient descent, the Adam optimization algorithm was used; the learning rate was increased linearly from zero over the first 2,000 updates to a maximum of 2.5×10−4, and annealed to 0 using a cosine schedule.[3] GPT-1 has 117 million parameters.[4]

While the fine-tuning was adapted to specific tasks, its pre-training was not; to perform the various tasks, minimal changes were performed to its underlying task-agnostic model architecture.[3] Despite this, GPT-1 still improved on previous benchmarks in several language processing tasks, outperforming discriminatively-trained models with task-oriented architectures on several diverse tasks.[3]

Performance and evaluation

GPT-1 achieved a 5.8% and 1.5% improvement over previous best results[3] on natural language inference (also known as textual entailment) tasks, evaluating the ability to interpret pairs of sentences from various datasets and classify the relationship between them as "entailment", "contradiction" or "neutral".[3] Examples of such datasets include QNLI (Wikipedia articles) and MultiNLI (transcribed speech, popular fiction, and government reports, among other sources);[7] It similarly outperformed previous models on two tasks related to question answering and commonsense reasoning—by 5.7% on RACE,[8] a dataset of written question-answer pairs from middle and high school exams, and by 8.9% on the Story Cloze Test.[9]

GPT-1 improved on previous best-performing models by 4.2% on semantic similarity (or paraphrase detection), evaluating the ability to predict whether two sentences are paraphrases of one another, using the Quora Question Pairs (QQP) dataset.[3]

GPT-1 achieved a score of 45.4, versus a previous best of 35.0[3] in a text classification task using the Corpus of Linguistic Acceptability (CoLA). Finally, GPT-1 achieved an overall score of 72.8 (compared to a previous record of 68.9) on GLUE, a multi-task test.[10]

References

  1. ^ "gpt-2". GitHub. Archived from the original on 11 March 2023. Retrieved 13 March 2023.
  2. ^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  3. ^ a b c d e f g h i j k l m Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
  4. ^ a b "GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared". 11 April 2023. Archived from the original on 2023-04-15. Retrieved 2023-04-29.
  5. ^ a b Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.
  6. ^ Zhu, Yukun; Kiros, Ryan; Zemel, Richard; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (22 June 2015). "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". arXiv:1506.06724 [cs.CV]. # of books: 11,038 / # of sentences: 74,004,228 / # of words: 984,846,357 / mean # of words per sentence: 13 / median # of words per sentence: 11
  7. ^ Williams, Adina; Nangia, Nikita; Bowman, Samuel (1 June 2018). "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference" (PDF). Association for Computational Linguistics. Archived (PDF) from the original on 11 February 2020. Retrieved 23 January 2021. At 433k examples, this resource is one of the largest corpora available for natural language inference (a.k.a. recognizing textual entailment), [...] offering data from ten distinct genres of written and spoken English [...] while supplying an explicit setting for evaluating cross-genre domain adaptation.
  8. ^ Lai, Guokun; Xie, Qizhe; Hanxiao, Liu; Yang, Yiming; Hovy, Eduard (15 April 2017). "RACE: Large-scale ReAding Comprehension Dataset From Examinations". arXiv:1704.04683 [cs.CL].
  9. ^ Mostafazadeh, Nasrin; Roth, Michael; Louis, Annie; Chambers, Nathanael; Allen, James F. (3 April 2017). "LSDSem 2017 Shared Task: The Story Cloze Test" (PDF). Association for Computational Linguistics. Archived (PDF) from the original on 22 November 2020. Retrieved 23 January 2021. The LSDSem'17 shared task is the Story Cloze Test, a new evaluation for story understanding and script learning. This test provides a system with a four-sentence story and two possible endings, and the system must choose the correct ending. Successful narrative understanding (getting closer to human performance of 100%) requires systems to link various levels of semantics to commonsense knowledge.
  10. ^ Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omar; Bowman, Samuel R. (20 April 2018). "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding". arXiv:1804.07461 [cs.CL].

Read other articles:

История Тывы — относится к истории тувинцев, и их земли в Тыве. Содержание 1 Древняя история 2 Исторические государства на территории Тывы 3 Добровольное вступление Тувы под протекторат Российской империи 4 Образование Народной Республики Танну-Тува 5 Тува во время Второ�...

 

 

Indian Earth observation satellite RISAT-2BR1RISAT-2BR1 with its Radial Rib Antenna in deployed configuration.NamesRadar Imaging Satellite-2BR1Mission typeEarth observationRadar imaging satelliteOperatorISROCOSPAR ID2019-089F SATCAT no.44857Websitehttps://www.isro.gov.in/Mission duration5 years (planned)4 years, 3 months and 28 days (in progress) Spacecraft propertiesSpacecraftRISAR-2BR1BusRISATManufacturerIndian Space Research OrganisationLaunch mass615 kg (1,356 lb)...

 

 

Mountain in Victoria, Australia This article is about the mountain. For the national park, see Budj Bim National Park. For the nearby heritage areas, see Budj Bim heritage areas. Budj BimMount EcclesBudj BimVictoria, Australia Highest pointElevation178 m (584 ft)Coordinates38°3′46″S 141°55′32″E / 38.06278°S 141.92556°E / -38.06278; 141.92556GeographyLocationVictoria, AustraliaClimbingEasiest routeLava Canal track Australian National Heritage ...

Historic house in New York, United States United States historic placeThomas Youngs HouseU.S. National Register of Historic Places Thomas Youngs House, May 2023Show map of New YorkShow map of the United StatesLocation50 Mitchell Road., Pittsford, New YorkCoordinates43°5′0″N 77°30′17″W / 43.08333°N 77.50472°W / 43.08333; -77.50472Area9.1 acres (3.7 ha)Built1818Architectural styleFederalNRHP reference No.93000546[1]Added to NRHPJun...

 

 

Questa voce sull'argomento calciatori lussemburghesi è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. François Weber Nazionalità  Lussemburgo Calcio Ruolo Centrocampista Carriera Nazionale 1924 Lussemburgo1 (0) 1 I due numeri indicano le presenze e le reti segnate, per le sole partite di campionato.Il simbolo → indica un trasferimento in prestito.   Modifica dati su Wikidata · Manual...

 

 

Pour les articles homonymes, voir Gary. Romain GaryRomain Gary et Jean Seberg à Rome en 1961.BiographieNaissance 21 mai 1914Vilnius (Empire russe)Décès 2 décembre 1980 (à 66 ans)7e arrondissement de Paris (France)Nom de naissance Roman KacewPseudonymes Fosco Sinibaldi, Shatan Bogat, Émile Ajar, Romain Gary, Lucien Brulard, René DevilleNationalités française (à partir de 1935)lituaniennepolonaiseDomiciles Vilnius (d) (1914-1925), Nice (1928-1934), rue du Bac (1963-1980)Formatio...

United States federal law Espionage Act of 1917Long titleAn Act to punish acts of interference with the foreign relations, and the foreign commerce of the United States, to punish espionage, and better to enforce the criminal laws of the United States, and for other purposes.Enacted bythe 65th United States CongressEffectiveJune 15, 1917CitationsPublic lawPub. L.Tooltip Public Law (United States) 65–24Statutes at Large40 Stat. 217Legislative historyIntroduced in the Hous...

 

 

1993 single by Atari Teenage RiotKids Are United!Single by Atari Teenage Riotfrom the album Delete Yourself! B-sideCyberpunks Are Dead!Released1993Recorded1993 at Empire Studios, BerlinGenreDigital hardcoreLength3:35LabelVertigo Records, PhonogramSongwriter(s)Alec Empire, Jimmy Pursey, Dave ParsonsProducer(s)Atari Teenage Riot, David HarrowAtari Teenage Riot singles chronology ATR (1993) Kids Are United! (1993) Speed/Midijunkies (1995) Kids Are United! is a song by the German digital hardcore...

 

 

American actor, singer and songwriter (born 1981) For the album, see Leslie Odom Jr. (album). Leslie Odom Jr.Odom Jr. in 2016BornLeslie Lloyd Odom Jr. (1981-08-06) August 6, 1981 (age 42)New York City, U.S.EducationCarnegie Mellon University (BFA)OccupationsActorsingersongwriterYears active1998-presentSpouse Nicolette Robinson ​(m. 2012)​Children2Websiteleslieodomjr.com Leslie Lloyd Odom Jr. (/ˈoʊdəm/; born August 6, 1981)[1] is an American ac...

Type of military ranged weapon ALCM redirects here. For the US AGM-86 ALCM, see AGM-86 ALCM. For the Associates of the London College of Music, see London College of Music Examinations. An AGM-86 air-launched cruise missile in flight (1980) An air-launched cruise missile (ALCM) is a cruise missile that is launched from a military aircraft. Current versions are typically standoff weapons which are used to attack predetermined land targets with conventional, nuclear or thermonuclear payloads. S...

 

 

Vietnam as a Nuclear Power NinhThuận1  NinhThuận2 Binh Tien Xuan Phuong  Hoai MyDuc ChanhDuc Thạnh Ky Xuan class=notpageimage| Nuclear power plants in Vietnam(view)  Plants confirmed Plants tentative Vietnam is considering developing nuclear power for peaceful purposes based on modern, verified technology since 1995,[1] and firm proposals surfaced in 2006.[2] In November 2016 Vietnam suspended its nuclear power plans.[3] ...

 

 

Military march by Julius Fučík For the similarly titled march by John Philip Sousa, see The Gladiator March. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Entrance of the Gladiators – news · newspapers · books · scholar · JSTOR (June 2023) (Learn how and when to remove this message) Entrance of the Gladi...

اضغط هنا للاطلاع على كيفية قراءة التصنيف عصفور الشوك السيبيري حالة الحفظ أنواع غير مهددة أو خطر انقراض ضعيف جدا  (IUCN 3.1)[1] المرتبة التصنيفية نوع[2][3]  التصنيف العلمي المملكة: حيوانات الشعبة: حبليات الطائفة: طيور الرتبة: عصفوريات الفصيلة: عصافير الشوك الجنس: �...

 

 

Città di Castello komune di Italia Città di Castello (it) Tempat Negara berdaulatItaliaDaerah di ItaliaUmbraProvinsi di ItaliaProvinsi Perugia NegaraItalia Ibu kotaCittà di Castello PendudukTotal38.222  (2023 )GeografiLuas wilayah387,32 km² [convert: unit tak dikenal]Ketinggian288 m Berbatasan denganArezzo Citerna Cortona Mercatello sul Metauro (en) Monte Santa Maria Tiberina Monterchi (en) Montone San Giustino Sansepolcro Sant'Angelo in Vado (en) Umbertide Apecchio (en) Pietral...

 

 

Disambiguazione – Se stai cercando altri significati, vedi Agna (disambigua). Questa voce o sezione sull'argomento centri abitati del Veneto non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Agnacomune Agna – VedutaVeduta con la chiesa parrocchiale LocalizzazioneStato Italia Regione Veneto Provincia Padova AmministrazioneSindacoGianluca Pi...

Polish–Lithuanian coat of arms Coat of arms of the Polish-Lithuanian CommonwealthArmigerKing of Poland/Grand Duke of LithuaniaAdoptedFollowing 1386[Note 1][citation needed]ShieldQuarterly 1st and 4th Gules, an eagle argent, crowned or; 2nd and 3rd, Gules, Pogonia.[1][2][3][4] The coat of arms of the Polish–Lithuanian Commonwealth was the symbol of the Polish–Lithuanian Commonwealth, representing the union of the Crown of the Polish Kingdom...

 

 

National Hockey League team set to begin play in 2024 Utah Hockey Club 2024–25 Utah Hockey Club seasonConferenceWesternDivisionCentralFounded2024HistoryUtah Hockey Club2024–presentHome arenaDelta CenterCitySalt Lake City, UtahTeam colorsRock black, salt white, mountain blue[1][2]     Owner(s)Ryan SmithGeneral managerBill ArmstrongHead coachAndre TourignyCaptainVacantMinor league affiliatesTucson Roadrunners (AHL)Stanley Cups0Conference championships0Presiden...

 

 

Unitary authority in England Shropshire CouncilCoat of armsCouncil logoTypeTypeUnitary authority HistoryFounded1 April 1889 (1889-04-01)LeadershipChairVince Hunt, Conservative since 20 May 2021[1] LeaderLezley Picton, Conservative since 20 May 2021[1] Chief ExecutiveAndy Begley since 2020[2] StructureSeats74 councillors[3]Political groups Administration (39)   Conservative (39) Other parties (35)   Liberal Democrat (18)   L...

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Alternate route – news · newspapers · books · scholar · JSTOR (November 2022) (Learn how and when to remove this message) Types of special routes in the United States U.S. Route 58 Alternate serves as an alternate alignment to U.S. Route 58 in the western part ...

 

 

La signora SkeffingtonBette Davis in una sequenza del filmTitolo originaleMr. Skeffington Paese di produzioneStati Uniti d'America Anno1944 Durata145 min Dati tecniciB/N Generedrammatico RegiaVincent Sherman SoggettoElizabeth von Arnim SceneggiaturaJulius J. Epstein e Philip G. Epstein ProduttoreJulius J. Epstein e Philip G. Epstein Casa di produzioneWarner Bros. Distribuzione in italianoENIC (1949) FotografiaErnest Haller MontaggioRalph Dawson MusicheFranz Waxman ScenografiaRobert M. Haas e ...