A large language model (LLM) is a type of artificial intelligence that can understand and create human language. These models learn by studying huge amounts of text from books, websites, and other sources.[1]
How they work
LLMs work by finding patterns in language. They learn grammar, facts, and how words relate to each other by looking at billions of examples. The most powerful LLMs use a special design called a "transformer," which helps them process large amounts of text quickly.[2]
Limitations
While LLMs are powerful, they can make mistakes. They sometimes include biases from their training data, and they can produce incorrect information. They learn from existing text rather than having true understanding like humans do.[3]
History
Before 2017, language models were much simpler. The big change came when Google created the "transformer" design, which made language models much more powerful.[4]
Important developments include:
2018: BERT was released, which helped computers better understand language[5]
2019: GPT-2 was created but was considered so powerful that its creators worried about misuse[6]
2022: ChatGPT was released and became very popular with the public[7]
2023: GPT-4 came out and could understand both text and images[8]
Modern developments
Today, there are many different LLMs available. Some are private, like GPT-4, while others are open for anyone to use, like LLaMA and Mistral. As of 2024[update], GPT-4 was considered one of the most capable language models.[9]
↑Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need"(PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Archived(PDF) from the original on 2024-02-21. Retrieved 2024-01-21.