Large language model

A large language model (LLM) is a type of artificial intelligence that can understand and create human language. These models learn by studying huge amounts of text from books, websites, and other sources.[1]

How they work

LLMs work by finding patterns in language. They learn grammar, facts, and how words relate to each other by looking at billions of examples. The most powerful LLMs use a special design called a "transformer," which helps them process large amounts of text quickly.[2]

Limitations

While LLMs are powerful, they can make mistakes. They sometimes include biases from their training data, and they can produce incorrect information. They learn from existing text rather than having true understanding like humans do.[3]

History

Before 2017, language models were much simpler. The big change came when Google created the "transformer" design, which made language models much more powerful.[4]

Important developments include:

  • 2018: BERT was released, which helped computers better understand language[5]
  • 2019: GPT-2 was created but was considered so powerful that its creators worried about misuse[6]
  • 2022: ChatGPT was released and became very popular with the public[7]
  • 2023: GPT-4 came out and could understand both text and images[8]

Modern developments

Today, there are many different LLMs available. Some are private, like GPT-4, while others are open for anyone to use, like LLaMA and Mistral. As of 2024, GPT-4 was considered one of the most capable language models.[9]

References

  1. "Better Language Models and Their Implications". OpenAI. 2019-02-14. Archived from the original on 2020-12-19. Retrieved 2019-08-25.
  2. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Archived (PDF) from the original on 2024-02-21. Retrieved 2024-01-21.
  3. Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870. Archived from the original on 2023-11-17. Retrieved 2023-03-09.
  4. Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (2014). "Neural Machine Translation by Jointly Learning to Align and Translate". arXiv:1409.0473 [cs.CL].
  5. Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403. Archived from the original on 2022-04-03. Retrieved 2024-01-21.
  6. Hern, Alex (14 February 2019). "New AI fake text generator may be too dangerous to release, say creators". The Guardian. Archived from the original on 14 February 2019. Retrieved 20 January 2024.
  7. "ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months". Euronews. November 30, 2023. Archived from the original on January 14, 2024. Retrieved January 20, 2024.
  8. Heaven, Will (March 14, 2023). "GPT-4 is bigger and better than ChatGPT—but OpenAI won't say why". MIT Technology Review. Archived from the original on March 17, 2023. Retrieved January 20, 2024.
  9. "LMSYS Chatbot Arena Leaderboard". huggingface.co. Archived from the original on June 10, 2024. Retrieved June 12, 2024.