GPT-2 completion using the Hugging Face Write With Transformer website, prompted with text from this article (All highlighted text after the initial prompt is machine-generated from the first suggested completion, without further editing.)
Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages.[2] It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.[3][4][5]
GPT-2 was created as a "direct scale-up" of GPT-1[6] with a ten-fold increase in both its parameter count and the size of its training dataset.[5] It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence,[2][7] which enabled it to translate texts, answer questions about a topic from a text, summarize passages from a larger text,[7] and generate text output on a level sometimes indistinguishable from that of humans, however it could become repetitive or nonsensical when generating long passages.[8] It was superseded by the GPT-3 and GPT-4 models, which are no longer open source.
GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model,[6] which uses attention instead of older recurrence- and convolution-based architectures.[9][10] Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant.[11][12] This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models.[6]
Training
Since the transformer architecture enabled massive parallelization, GPT models could be trained on larger corpora than previous NLP (natural language processing) models. While the GPT-1 model demonstrated that the approach was viable, GPT-2 would further explore the emergent properties of networks trained on extremely large corpora. CommonCrawl, a large corpus produced by web crawling and previously used in training NLP systems,[13] was considered due to its large size, but was rejected after further review revealed large amounts of unintelligible content.[2][13] Instead, OpenAI developed a new corpus, known as WebText; rather than scraping content indiscriminately from the World Wide Web, WebText was generated by scraping only pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The corpus was subsequently cleaned; HTML documents were parsed into plain text, duplicate pages were eliminated, and Wikipedia pages were removed (since their presence in many other datasets could have induced overfitting).[2]
While the cost of training GPT-2 is known to have been $256 per hour,[14][15] the amount of hours it took to complete training is unknown; therefore, the overall training cost cannot be estimated accurately.[16] However, comparable large language models using transformer architectures have had their costs documented in more detail; the training processes for BERT and XLNet consumed, respectively, $6,912 and $245,000 of resources.[15]
Release
GPT-2 was first announced on 14 February 2019. A February 2019 article in The Verge by James Vincent said that, while "[the] writing it produces is usually easily identifiable as non-human", it remained "one of the most exciting examples yet" of language generation programs:[17]
Give it a fake headline, and it’ll write the rest of the article, complete with fake quotations and statistics. Feed it the first line of a short story, and it’ll tell you what happens to your character next. It can even write fan fiction, given the right prompt.[17]
The Guardian described this output as "plausible newspaper prose";[8]Kelsey Piper of Vox said "one of the coolest AI systems I’ve ever seen may also be the one that will kick me out of my job".[18] GPT-2's flexibility was described as "impressive" by The Verge; specifically, its ability to translate text between languages, summarize long articles, and answer trivia questions were noted.[17]
A study by the University of Amsterdam employing a modified Turing test found that at least in some scenarios, participants were unable to distinguish poems generated by GPT-2 from those written by humans.[19]
Restrictions and partial release
While previous OpenAI models had been made immediately available to the public, OpenAI initially refused to make a public release of GPT-2's source code when announcing it in February, citing the risk of malicious use;[8] limited access to the model (i.e. an interface that allowed input and provided output, not the source code itself) was allowed for selected press outlets on announcement.[8] One commonly-cited justification was that, since generated text was usually completely novel, it could be used by spammers to evade automated filters; OpenAI demonstrated a version of GPT-2 fine-tuned to "generate infinite positive – or negative – reviews of products".[8]
Another justification was that GPT-2 could be used to generate text that was obscene or racist. Researchers such as Jeremy Howard warned of "the technology to totally fill Twitter, email, and the web up with reasonable-sounding, context-appropriate prose, which would drown out all other speech and be impossible to filter".[17] The Allen Institute for Artificial Intelligence, in response to GPT-2, announced a tool to detect "neural fake news".[20]
However, opinion was divided. A February 2019 article in The Verge argued that the threat posed by GPT-2 had been exaggerated;[21]Anima Anandkumar, a professor at Caltech and director of machine learning research at Nvidia, said that there was no evidence that GPT-2 had the capabilities to pose the threats described by OpenAI, and that what they did was the "opposite of open", characterizing their refusal to release the full model as "malicious BS".[21]The Gradient published an open letter to OpenAI requesting that they release the model publicly, comparing the threat posed by text-generation AI to the threat posed by the printing press, and giving Photoshop as an example of "a technology that has (thankfully) not destroyed modern society despite its potential for chaos":[22]
Thirty years later, society has emerged relatively unscathed despite Photoshop being simple enough for high school students to use and ubiquitous enough to commandeer its own verb. Why? Precisely because everyone knows about Photoshop.[22]
774M release
While OpenAI did not release the fully-trained model or the corpora it was trained on, description of their methods in prior publications (and the free availability of underlying technology) made it possible for GPT-2 to be replicated by others as free software; one such replication, OpenGPT-2, was released in August 2019, in conjunction with a freely licensed version of WebText called OpenWebText. The cloud compute costs for OpenGPT-2 were given as approximately $50,000.[23]
On August 20, 2019, OpenAI released a partial version of GPT-2, with 774 million parameters (roughly half the size of the full 1.5 billion parameter model).[24]
Full 1.5B release
Initial concerns that GPT-2 would lend itself to widespread misuse did not come to pass; The Verge said that "there are reasons to be skeptical about claims that AI technology will usher in some sort of ‘infopocalypse.’ For a start, we already have programs that can generate plausible text at high volume for little cost: humans."[25] By November 2019, OpenAI said that they had "seen no strong evidence of misuse so far", and the full version, with 1.5 billion parameters trained with forty gigabytes of data, "about eight thousand times larger than the collected works of Shakespeare",[26] was released on November 5, 2019.[3][4]
Small and Medium Releases
Two other smaller releases of GPT-2 are available, including the small version of 124M parameters and the medium size of 355M parameters. Both are available to download from Huggingface.[27][28]
Limitations
While GPT-2's ability to generate plausible passages of natural language text were generally remarked on positively, its shortcomings were noted as well, especially when generating texts longer than a couple paragraphs; Vox said "the prose is pretty rough, there’s the occasional non-sequitur, and the articles get less coherent the longer they get".[18]The Verge similarly noted that longer samples of GPT-2 writing tended to "stray off topic" and lack overall coherence;[17]The Register opined that "a human reading it should, after a short while, realize something's up", and noted that "GPT-2 doesn't answer questions as well as other systems that rely on algorithms to extract and retrieve information."[14]
GPT-2 deployment is resource-intensive; the full version of the model is larger than five gigabytes, making it difficult to embed locally into applications, and consumes large amounts of RAM. In addition, performing a single prediction "can occupy a CPU at 100% utilization for several minutes", and even with GPU processing, "a single prediction can take seconds". To alleviate these issues, the company Hugging Face created DistilGPT2, using knowledge distillation to produce a smaller model that "scores a few points lower on some quality benchmarks", but is "33% smaller and twice as fast".[citation needed]
Application and subsequent research
Even before the release of the full version, GPT-2 was used for a variety of applications and services, as well as for entertainment. In June 2019, a subreddit named r/SubSimulatorGPT2 was created in which a variety of GPT-2 instances trained on different subreddits made posts and replied to each other's comments, creating a situation where one could observe "an AI personification of r/Bitcoin argue with the machine learning-derived spirit of r/ShittyFoodPorn";[25] by July of that year, a GPT-2-based software program released to autocomplete lines of code in a variety of programming languages was described by users as a "game-changer".[29]
In 2019, AI Dungeon was launched, which used GPT-2 to generate dynamic text adventures based on user input.[30] AI Dungeon now offers access to the largest release of GPT-3 API as an optional paid upgrade, the free version of the site uses the 2nd largest release of GPT-3.[31] Latitude, the company formed around AI Dungeon, raised $3.3 million in seed funding in 2021.[32] Several websites host interactive demonstrations of different instances of GPT-2 and other transformer models.[33][34][35]
In February 2021, a crisis center for troubled teens announced that they would begin using a GPT-2-derived chatbot to help train counselors by allowing them to have conversations with simulated teens (this use was purely for internal purposes, and did not involve having GPT-2 communicate with the teens themselves).[36]
On May 9, 2023, OpenAI released a mapped version of GPT-2. OpenAI used successor model, GPT-4, to map each neuron of GPT-2 to determine their functions.[37]
Performance and evaluation
GPT-2 became capable of performing a variety of tasks beyond simple text production due to the breadth of its dataset and technique: answering questions, summarizing, and even translating between languages in a variety of specific domains, without being instructed in anything beyond how to predict the next word in a sequence.[17][18]
One example of generalized learning is GPT-2's ability to perform machine translation between French and English, for which task GPT-2's performance was assessed using WMT-14 translation tasks. GPT-2's training corpus included virtually no French text; non-English text was deliberately removed while cleaning the dataset prior to training, and as a consequence, only 10MB of French of the remaining 40,000MB was available for the model to learn from (mostly from foreign-language quotations in English posts and articles).[2]
Despite this, GPT-2 achieved 5 BLEU on the WMT-14 English-to-French test set (slightly below the score of a translation via word-for-word substitution). It was also able to outperform several contemporary (2017) unsupervised machine translation baselines on the French-to-English test set, where GPT-2 achieved 11.5 BLEU. This remained below the highest-performing contemporary unsupervised approach (2019), which had achieved 33.5 BLEU.[2] However, other models used large amounts of French text to achieve these results; GPT-2 was estimated to have used a monolingual French corpus approximately 1/500 the size of comparable approaches.[2]
GPT-2, but with modification to allow larger scaling.
175 billion
570 GB plaintext, 300 billion tokens of CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2).
GPT-2 was to be followed by the 175-billion-parameter GPT-3,[40] revealed to the public in 2020[41] (whose source code has never been made available). Access to GPT-3 is provided exclusively through APIs offered by OpenAI and Microsoft.[42] That was then later followed by GPT-4.
References
^"gpt-2". GitHub. Archived from the original on 11 March 2023. Retrieved 13 March 2023.
^Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (1 September 2014). "Neural Machine Translation by Jointly Learning to Align and Translate". arXiv:1409.0473 [cs.CL].
^Luong, Minh-Thang; Pham, Hieu; Manning, Christopher D. (17 August 2015). "Effective Approaches to Attention-based Neural Machine Translation". arXiv:1508.04025 [cs.CL].
^ abTrinh, Trieu H.; Le, Quoc V. (7 Jun 2018). "A Simple Method for Commonsense Reasoning". arXiv:1806.02847 [cs.CL].
^Hao, Karen (September 23, 2020). "OpenAI is giving Microsoft exclusive access to its GPT-3 language model". MIT Technology Review. Archived from the original on 2021-02-05. Retrieved 2020-09-25. The companies say OpenAI will continue to offer its public-facing API, which allows chosen users to send text to GPT-3 or OpenAI's other models and receive its output. Only Microsoft, however, will have access to GPT-3's underlying code, allowing it to embed, repurpose, and modify the model as it pleases.
School in Perth, Western Australia Penrhos CollegeEntrance to the College, in 2006LocationComo, Perth, Western AustraliaAustraliaCoordinates31°59′50″S 115°52′21″E / 31.99722°S 115.87250°E / -31.99722; 115.87250InformationFormer nameMethodist Ladies' College, South PerthTypeIndependent single-sex primary and secondary day and boarding schoolMottoStrive for the HighestDenominationUniting ChurchEstablished1952; 72 years ago (1952)Sister schoo...
Spanish Conquistador known for helping take over the last significant Mayan stronghold In this Spanish name, the first or paternal surname is de Ursúa (or de Urzúa) and the second or maternal family name is Arizmendi. Martín de Ursúa (or Urzúa) y Arizmendi (Spanish pronunciation: [maɾˈtin de wɾˈsu.a j aɾiθˈmendi]; February 22, 1653 – February 4, 1715), Count of Lizárraga and of Castillo,[1] was a Spanish conquistador in Central America during the lat...
Dalam artikel ini, nama keluarganya adalah Loke (陆).Yang Berhormat TuanAnthony Loke Siew FookAP陆兆福 Menteri PerhubunganPetahanaMulai menjabat 3 Desember 2022Penguasa monarkiAbdullahPerdana MenteriAnwar IbrahimPendahuluWee Ka SiongPenggantiPetahanaDaerah pemilihanSerembanMasa jabatan21 Mei 2018 – 24 Februari 2020Penguasa monarkiMuhammad V (2018-2019) Abdullah (2019-2020)WakilKamarudin JaffarPendahuluLiow Tiong LaiPenggantiWee Ka SiongDaerah pemilihanSerembanSekretaris Je...
Governorate of Yemen This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Raymah Governorate – news · newspapers · books · scholar · JSTOR (December 2009) (Learn how and when to remove this template message) You can help expand this article with text translated from the corresponding article in Arabic. (Dece...
Norwegian philosopher (1899–1990) Peter Wessel ZapffeZapffe in 1949Born(1899-12-18)18 December 1899Tromsø, NorwayDied12 October 1990(1990-10-12) (aged 90)Asker, NorwayAlma materUniversity of OsloOccupation(s)Philosopher, author, artist, lawyer, mountaineerNotable work The Last Messiah Om det tragiske Spouses Bergliot Espolin Johnson (m. 1935; div. 1941) Berit Riis Christensen (m. 1952) Awar...
2009 single by PortisheadChase the TearSingle by PortisheadReleased10 December 2009GenreElectronic rockLength5:15LabelAmnestySongwriter(s)PortisheadProducer(s)PortisheadPortishead singles chronology Magic Doors (2008) Chase the Tear (2009) Chase the Tear is a single released by the band Portishead on 10 December 2009, as a download-only for Human Rights Day to raise money for Amnesty International UK. It reached number 164 in the UK charts,[1] and was later released as a limited editi...
هنودمعلومات عامةنسبة التسمية الهند التعداد الكليالتعداد قرابة 1.21 مليار[1][2]تعداد الهند عام 2011ق. 1.32 مليار[3]تقديرات عام 2017ق. 30.8 مليون[4]مناطق الوجود المميزةبلد الأصل الهند البلد الهند الهند نيبال 4,000,000[5] الولايات المتحدة 3,982,398[6] الإمار...
دورة الألعاب العربية 2023 الجزائر دورة الألعاب العربية 2023دورة الألعاب العربية 2023 الدول المشاركة 22 انطلاق الألعاب 5 يوليو 2023 الملعب ملعب 5 جويلية 1962 الاختتام 15 يوليو 2023 الموقع الرسمي الألعاب العربية 2023 دورة الألعاب العربية 2011 تعديل مصدري - تعديل دورة الألعاب ال�...
Third Battle of KomáromPart of the Hungarian Revolution of 1848Date11 July 1849LocationKomárom, Kingdom of HungaryResult Austro-Russian victoryBelligerents Hungarian Revolutionary Army Austrian Empire Russian EmpireCommanders and leaders György Klapka Károly Leiningen-Westerburg Ernő Poeltenberg Julius Jacob von Haynau Franz Schlik Ludwig von Wohlgemuth Feodor Sergeyevich Panyutyin Ivan PaskevichStrength Total: 43,347 men - I. co...
East Asian staff weaponFor other uses, see Bo (disambiguation). A traditional rokushakubō is 1.82m (6 shaku) and wielded with both hands, due to its weight and size. A bō (棒), pong (Korean), pang (Cantonese), bang (Mandarin),[1][2] or kun (Okinawan) is a staff weapon used in Okinawa. Bō are typically around 1.8 m (71 in) long and used in Okinawan martial arts, while being adopted into Japanese arts such particular bōjutsu. Other staff-related weapons are the j�...
Defunct flying squadron of the Royal Air Force No. 119 Squadron RAFActive1 Jan 1918 - 6 Dec 1918 13 Mar 1941 - 17 Apr 1943 19 Jul 1944 – 25 May 1945Country United KingdomBranch Royal Air ForceMotto(s)By night and day[1][2]InsigniaSquadron Badge heraldryA sword, the point downwards, and an anchor in saltire[1][2]Squadron CodesNH (Jul 1944 - May 1945)[1][3][4]Military unit No. 119 Squadron RAF was a squadron of the Royal Air Force, flyin...
هذه المقالة بحاجة لصندوق معلومات. فضلًا ساعد في تحسين هذه المقالة بإضافة صندوق معلومات مخصص إليها. هذه مقالة غير مراجعة. ينبغي أن يزال هذا القالب بعد أن يراجعها محرر؛ إذا لزم الأمر فيجب أن توسم المقالة بقوالب الصيانة المناسبة. يمكن أيضاً تقديم طلب لمراجعة المقالة في الصفحة ...
Defensive position in baseball Left field redirects here. For other uses, see Leftfield (disambiguation). This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Left fielder – news · newspapers · books · scholar · JSTOR (December 2006) (Learn how and when to remove this message) The position of the left fielder In ...
В статье не хватает ссылок на источники (см. рекомендации по поиску). Информация должна быть проверяема, иначе она может быть удалена. Вы можете отредактировать статью, добавив ссылки на авторитетные источники в виде сносок. (13 сентября 2013) BMW E34 Общие данные Производитель B...
Regional airline of Mexico Aerolitoral redirects here. Not to be confused with Air Littoral. Aeroméxico Connect IATA ICAO Callsign 5D SLI COSTERA Founded1988; 36 years ago (1988)(as Aerolitotal)AOC #LVQF318F[1]HubsMexico City[2]Monterrey[2]Frequent-flyer programClub PremierAllianceSkyTeam (affiliate)Fleet size37Destinations60Parent companyAeroméxicoHeadquartersMexico City, MexicoWebsitewww.aeromexico.com Aerolitoral, S.A. de C.V., DBA Aeroméxi...
Presidente da Guiné Bissau República da Guiné-Bissau Brasão de armas da Guiné-Bissau No cargoUmaro Sissoco Embalódesde 27 de fevereiro de 2020 Residência Palácio Presidencial, Bissau Duração 5 anos Criado em 24 de setembro de 1973 Primeiro titular Luís Cabral Lista de Chefes de Estado de Guiné-Bissau: (as datas em itálico indicam posse de facto) № Foto Chefe de Estado Duração Partido 1 Luís Cabral 24 de setembro de 1973 — 14 de novembro de 1980 PAIGC 2 João Bernardo...
Everybody's Trying to Be My BabyArtistaCarl Perkins Autore/iCarl Perkins GenereRockabillyRock and rollRockSoft rockPop Esecuzioni notevoliThe Beatles Pubblicazione originaleIncisioneDance album of...Carl Perkins Data1957 EtichettaSun Records Durata2:23 Everybody's Trying to Be My Baby è una canzone del 1957 scritta da Carl Perkins, e pubblicata sull'album Dance Album of...Carl Perkins. I Beatles ne hanno registrato una cover, pubblicata su Beatles for Sale in Europa e su Beatles VI in Americ...