Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model.[1][2]
A prompt is natural language text describing the task that an AI should perform.[3] A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?",[4] a command such as "write a poem in the style of Edgar Allan Poe about leaves falling",[5] or a longer statement including context, instructions,[6] and conversation history.
Prompt engineering may involve phrasing a query, specifying a style,[5] choice of words and grammar[7], providing relevant context[8] or assigning a role to the AI such as "act as a native French speaker".[9]
When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse"[10] or "Lo-fi slow BPM electro chill with organic samples".[11] Prompting a text-to-image model may involve adding, removing, emphasizing, and re-ordering words to achieve a desired subject, style,[1] layout, lighting,[12] and aesthetic.
History
In 2018, researchers first proposed that all previously separate tasks in NLP could be cast as a question answering problem over a context. In addition, they trained a first single, joint, multi-task model that would answer any task-related question like "What is the sentiment" or "Translate this sentence to German" or "Who is the president?"[13]
In 2021, researchers fine-tuned one generatively pretrained model (T0) on performing 12 NLP tasks (using 62 datasets, as each task can have multiple datasets). The model showed good performance on new tasks, surpassing models trained directly on just performing one task (without pretraining). To solve a task, T0 is given the task in a structured prompt, for example If {{premise}} is true, is it also true that {{hypothesis}}? ||| {{entailed}}. is the prompt used for making T0 solve entailment.[14]
A repository for prompts reported that over 2,000 public prompts for around 170 datasets were available in February 2022.[15]
In 2022 the chain-of-thought prompting technique was proposed by Google researchers.[16][17]
In 2023 several text-to-text and text-to-image prompt databases were publicly available.[18][19]
Text-to-text
Chain-of-thought
According to Google, Chain-of-thought (CoT) prompting is claimed to be a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps[20] before giving a final answer. In 2022, Google also claimed that chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a train of thought.[21][16][22] Chain-of-thought techniques hypothetically allow large language models to overcome difficulties with some reasoning tasks that require logical thinking and multiple steps to solve, such as arithmetic or commonsense reasoning questions, according to announcements from Google and Amazon.[23][24][25]
For example, given the question "Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?", Google claims that a CoT prompt might induce the LLM to answer "A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9."[16]
As originally proposed by Google,[16] each CoT prompt included a few Q&A examples. This made it a few-shot prompting technique. However, according to a researchers at Google and the University of Tokyo, simply appending the words "Let's think step-by-step",[26] has also proven effective, which makes CoT a zero-shot prompting technique. OpenAI claims that this prompt allows for better scaling as a user no longer needs to formulate many specific CoT Q&A examples.[27]
When applied to PaLM, a 540B parameter language model, Google claims that CoT prompting significantly aided the model, allowing it to perform comparably with task-specific fine-tuned models on several tasks, achieving state of the art results at the time on the GSM8K mathematical reasoningbenchmark.[16] According to Google, it is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability.[28][29]
Chain-of-thought prompting is just one of many prompt-engineering techniques. Various other techniques have been proposed. At least 29 distinct techniques have been published.[30]
Chain-of-Symbol (CoS) Prompting
A research collaboration between Westlake University, the Chinese University of Hong Kong, and the University of Edinburgy has claimed that chain-of-Symbol prompting in conjunction with CoT prompting assists LLMs with its difficulty of spatial reasoning in text. In other words, using arbitrary symbols such as ' / ' assist the LLM to interpret spacing in text. This is claimed to assist in reasoning and increases the performance of the LLM.[31]
Input:
There are a set of bricks. The yellow brick C is on top of the brick E. The yellow brick D is on top of the brick A. The yellow brick E is on top of the brick D. The white brick A is on top of the brick B. For the brick B, the color is white. Now we have to get a specific brick. The bricks must now be grabbed from top to bottom, and if the lower brick is to be grabbed, the upper brick must be removed first. How to get brick D?
B/A/D/E/C
C/E
E/D
D
Output:
So we get the result as C, E, D.
Few-shot learning
A prompt may include a few examples for a model to learn from, such as asking the model to complete "maison → house, chat → cat, chien →" (the expected response being dog),[32] an approach called few-shot learning.[33]
Generated knowledge prompting
Generated knowledge prompting[34] first prompts the model to generate relevant facts for completing the prompt, then proceed to complete the prompt. The completion quality is usually higher[citation needed], as the model can be conditioned on relevant facts.
Generate some knowledge about the concepts in the input.
Input: {question}
Knowledge:
Least-to-most prompting
Least-to-most prompting[35] prompts a model to first list the sub-problems to a problem, then solve them in sequence, such that later sub-problems can be solved with the help of answers to previous sub-problems.
Input:
Q: {question}
A: Let's break down this problem:
1.
Self-consistency decoding
Self-consistency decoding[36] performs several chain-of-thought rollouts, then selects the most commonly reached conclusion out of all the rollouts. If the rollouts disagree by a lot, a human can be queried for the correct chain of thought.[37]
Complexity-based prompting
Complexity-based prompting[38] performs several CoT rollouts, then select the rollouts with the longest chains of thought, then select the most commonly reached conclusion out of those.
Self-refine
Self-refine[39] prompts the LLM to solve the problem, then prompts the LLM to critique its solution, then prompts the LLM to solve the problem again in view of the problem, solution, and critique. This process is repeated until stopped, either by running out of tokens, time, or by the LLM outputting a "stop" token.
I have some code. Give one suggestion to improve readability. Don't fix the code, just give a suggestion.
Code: {code}
Suggestion:
Example refinement:
Code: {code}
Let's use this suggestion to improve the code.
Suggestion: {suggestion}
New Code:
Tree-of-thought
Tree-of-thought prompting[40] generalizes chain-of-thought by prompting the model to generate one or more "possible next steps", and then running the model on each of the possible next steps by breadth-first, beam, or some other method of tree search.[41]
Maieutic prompting
Maieutic prompting is similar to tree-of-thought. The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation, and so on. Inconsistent explanation trees are pruned or discarded. This improves performance on complex commonsense reasoning.[42]
Article: {article}
Q: Write a short summary of the article in 2-4 sentences that accurately incorporates the provided keywords.
Keywords: {keywords}
A:
Prompting to disclose uncertainty
By default, the output of language models may not contain estimates of uncertainty. The model may output text that appears confident, though the underlying token predictions have low likelihood scores. Large language models like GPT-4 can have accurately calibrated likelihood scores in their token predictions,[44] and so the model output uncertainty can be directly estimated by reading out the token prediction likelihood scores.
But if one cannot access such scores (such as when one is accessing the model through a restrictive API), uncertainty can still be estimated and incorporated into the model output. One simple method is to prompt the model to use words to estimate uncertainty.[45] Another is to prompt the model to refuse to answer in a standardized way if the input does not satisfy conditions.[citation needed]
Prompting to estimate model sensitivity
Research consistently demonstrates that LLMs are highly sensitive to subtle variations in prompt formatting, structure, and linguistic properties. Some studies have shown up to 76 accuracy points across formatting changes in few-shot settings.[46] Linguistic features significantly influence prompt effectiveness—such as morphology, syntax, and lexico-semantic changes—which meaningfully enhance task performance across a variety of tasks.[47][48] Clausal syntax, for example, improves consistency and reduces uncertainty in knowledge retrieval.[49] This sensitivity persists even with larger model sizes, additional few-shot examples, or instruction tuning.
To address sensitivity of models and make them more robust, several methods have been proposed. FormatSpread facilitates systematic analysis by evaluating a range of plausible prompt formats, offering a more comprehensive performance interval.[50] Similarly, PromptEval estimates performance distributions across diverse prompts, enabling robust metrics such as performance quantiles and accurate evaluations under constrained budgets.[51]
Retrieval-augmented generation (RAG) is a two-phase process involving document retrieval and answer formulation by a Large Language Model (LLM). The initial phase utilizes dense embeddings to retrieve documents. This retrieval can be based on a variety of database formats depending on the use case, such as a vector database, summary index, tree index, or keyword table index.[52]
In response to a query, a document retriever selects the most relevant documents. This relevance is typically determined by first encoding both the query and the documents into vectors, then identifying documents whose vectors are closest in Euclidean distance to the query vector. Following document retrieval, the LLM generates an output that incorporates information from both the query and the retrieved documents.[53] This method is particularly beneficial for handling proprietary or dynamic information that was not included in the initial training or fine-tuning phases of the model. RAG is also notable for its use of "few-shot" learning, where the model uses a small number of examples, often automatically retrieved from a database, to inform its outputs.
Graph retrieval-augmented generation
GraphRAG[54] (coined by Microsoft Research) is a technique that extends RAG with the use of a knowledge graph (usually, LLM-generated) to allow the model to connect disparate pieces of information, synthesize insights, and holistically understand summarized semantic concepts over large data collections.
It was shown to be effective on datasets like the Violent Incident Information from News Articles (VIINA).[55] By combining LLM-generated knowledge graphs with graph machine learning, GraphRAG substantially improves the comprehensiveness and diversity of generated answers for global sensemaking questions.
Earlier work showed the effectiveness of using a knowledge graph for question answering using text-to-query generation.[56] These techniques can be combined to search across both unstructured and structured data, providing expanded context and improved ranking.
Using language models to generate prompts
Large language models (LLM) themselves can be used to compose prompts for large language models.[57][58][59][60]
The automatic prompt engineer algorithm uses one LLM to beam search over prompts for another LLM:[61]
There are two LLMs. One is the target LLM, and another is the prompting LLM.
Prompting LLM is presented with example input-output pairs, and asked to generate instructions that could have caused a model following the instructions to generate the outputs, given the inputs.
Each of the generated instructions is used to prompt the target LLM, followed by each of the inputs. The log-probabilities of the outputs are computed and added. This is the score of the instruction.
The highest-scored instructions are given to the prompting LLM for further variations.
Repeat until some stopping criteria is reached, then output the highest-scored instructions.
CoT examples can be generated by LLM themselves. In "auto-CoT",[62] a library of questions are converted to vectors by a model such as BERT. The question vectors are clustered. Questions nearest to the centroids of each cluster are selected. An LLM does zero-shot CoT on each question. The resulting CoT examples are added to the dataset. When prompted with a new question, CoT examples to the nearest questions can be retrieved and added to the prompt.
In-context learning
Prompt engineering can possibly be further enabled by in-context learning, defined as a model's ability to temporarily learn from prompts. The ability for in-context learning is an emergent ability[63] of large language models. In-context learning itself is an emergent property of model scale, meaning breaks[64] in downstream scaling laws occur such that its efficacy increases at a different rate in larger models than in smaller models.[65][16]
In contrast to training and fine-tuning for each specific task, which are not temporary, what has been learnt during in-context learning is of a temporary nature. It does not carry the temporary contexts or biases, except the ones already present in the (pre)training dataset, from one conversation to the other.[66] This result of "mesa-optimization"[67][68] within transformer layers, is a form of meta-learning or "learning to learn".[69]
In 2022, text-to-image models like DALL-E 2, Stable Diffusion, and Midjourney were released to the public.[70] These models take text prompts as input and use them to generate AI art images. Text-to-image models typically do not understand grammar and sentence structure in the same way as large language models,[71] and require a different set of prompting techniques.
Prompt formats
A text-to-image prompt commonly includes a description of the subject of the art (such as bright orange poppies), the desired medium (such as digital painting or photography), style (such as hyperrealistic or pop-art), lighting (such as rim lighting or crepuscular rays), color and texture.[72]
The Midjourney documentation encourages short, descriptive prompts: instead of "Show me a picture of lots of blooming California poppies, make them bright, vibrant orange, and draw them in an illustrated style with colored pencils", an effective prompt might be "Bright orange California poppies drawn with colored pencils".[71]
Word order affects the output of a text-to-image prompt. Words closer to the start of a prompt may be emphasized more heavily.[1]
Artist styles
Some text-to-image models are capable of imitating the style of particular artists by name. For example, the phrase in the style of Greg Rutkowski has been used in Stable Diffusion and Midjourney prompts to generate images in the distinctive style of Polish digital artist Greg Rutkowski.[73]
Negative prompts
Demonstration of the effect of negative prompts on images generated with Stable Diffusion
Top: no negative prompt
Centre: "green trees"
Bottom: "round stones, round rocks"
Text-to-image models do not natively understand negation. The prompt "a party with no cake" is likely to produce an image including a cake.[71] As an alternative, negative prompts allow a user to indicate, in a separate prompt, which terms should not appear in the resulting image.[74]
Non-text prompts
Some approaches augment or replace natural language text prompts with non-text input.
Textual inversion and embeddings
For text-to-image models, "Textual inversion"[75] performs an optimization process to create a new word embedding based on a set of example images. This embedding vector acts as a "pseudo-word" which can be included in a prompt to express the content or style of the examples.
Image prompting
In 2023, Meta's AI research released Segment Anything, a computer vision model that can perform image segmentation by prompting. As an alternative to text prompts, Segment Anything can accept bounding boxes, segmentation masks, and foreground/background points.[76]
Using gradient descent to search for prompts
In "prefix-tuning",[77] "prompt tuning" or "soft prompting",[78] floating-point-valued vectors are searched directly by gradient descent, to maximize the log-likelihood on outputs.
Formally, let be a set of soft prompt tokens (tunable embeddings), while and be the token embeddings of the input and output respectively. During training, the tunable embeddings, input, and output tokens are concatenated into a single sequence , and fed to the large language models (LLM). The losses are computed over the tokens; the gradients are backpropagated to prompt-specific parameters: in prefix-tuning, they are parameters associated with the prompt tokens at each layer; in prompt tuning, they are merely the soft tokens added to the vocabulary.[79]
More formally, this is prompt tuning. Let an LLM be written as , where is a sequence of linguistic tokens, is the token-to-vector function, and is the rest of the model. In prefix-tuning, one provide a set of input-output pairs , and then use gradient descent to search for . In words, is the log-likelihood of outputting , if the model first encodes the input into the vector , then prepend the vector with the "prefix vector" , then apply .
For prefix tuning, it is similar, but the "prefix vector" is preappended to the hidden states in every layer of the model.
An earlier result[80] uses the same idea of gradient descent search, but is designed for masked language models like BERT, and searches only over token sequences, rather than numerical vectors. Formally, it searches for where is ranges over token sequences of a specified length.
Prompt injection is a family of related computer security exploits carried out by getting a machine learning model (such as an LLM) which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator.[81][82][83]
^ abcDiab, Mohamad; Herrera, Julian; Chernow, Bob (2022-10-28). "Stable Diffusion Prompt Book"(PDF). Retrieved 2023-08-07. Prompt engineering is the process of structuring words that can be interpreted and understood by a text-to-image model. Think of it as the language you need to speak in order to tell an AI model what to draw.
^Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilya (2019). "Language Models are Unsupervised Multitask Learners"(PDF). OpenAI. We demonstrate language models can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification
^"Introducing ChatGPT". OpenAI Blog. 2022-11-30. Retrieved 2023-08-16. what is the fermat's little theorem
^ abRobinson, Reid (August 3, 2023). "How to write an effective GPT-3 or GPT-4 prompt". Zapier. Retrieved 2023-08-14. "Basic prompt: 'Write a poem about leaves falling.' Better prompt: 'Write a poem in the style of Edgar Allan Poe about leaves falling.'
^Wahle, Jan Philip; Ruas, Terry; Xu, Yang; Gipp, Bela (2024). Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (eds.). "Paraphrase Types Elicit Prompt Engineering Capabilities". Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics: 11004–11033. arXiv:2406.19898.
^Wiggers, Kyle (2023-06-12). "Meta open sources an AI-powered music generator". TechCrunch. Retrieved 2023-08-15. Next, I gave a more complicated prompt to attempt to throw MusicGen for a loop: "Lo-fi slow BPM electro chill with organic samples."
^McCann, Bryan; Shirish, Nitish; Xiong, Caiming; Socher, Richard (2018). "The Natural Language Decathlon: Multitask Learning as Question Answering". arXiv:1806.08730 [cs.CL].
^Sanh, Victor; et al. (2021). "Multitask Prompted Training Enables Zero-Shot Task Generalization". arXiv:2110.08207 [cs.LG].
^Bach, Stephen H.; Sanh, Victor; Yong, Zheng-Xin; Webson, Albert; Raffel, Colin; Nayak, Nihal V.; Sharma, Abheesht; Kim, Taewoon; M Saiful Bari; Fevry, Thibault; Alyafeai, Zaid; Dey, Manan; Santilli, Andrea; Sun, Zhiqing; Ben-David, Srulik; Xu, Canwen; Chhablani, Gunjan; Wang, Han; Jason Alan Fries; Al-shaibani, Maged S.; Sharma, Shanya; Thakker, Urmish; Almubarak, Khalid; Tang, Xiangru; Radev, Dragomir; Mike Tian-Jian Jiang; Rush, Alexander M. (2022). "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". arXiv:2202.01279 [cs.LG].
^Sahoo, Pranab; Singh, Ayush Kumar; Saha, Sriparna; Jain, Vinija; Mondal, Samrat; Chadha, Aman (2024-02-05). "A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications". arXiv:2402.07927 [cs.AI].
^ abHu, Hanxu; Lu, Hongyuan; Zhang, Huajian; Song, Yun-Ze; Lam, Wai; Zhang, Yue (2023-10-03). "Chain-of-Symbol Prompting Elicits Planning in Large Language Models". arXiv:2305.10276 [cs.CL].
^Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
^Brown, Tom; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared D.; Dhariwal, Prafulla; Neelakantan, Arvind (2020). "Language models are few-shot learners". Advances in Neural Information Processing Systems. 33: 1877–1901. arXiv:2005.14165.
^ abZhou, Denny; Schärli, Nathanael; Hou, Le; Wei, Jason; Scales, Nathan; Wang, Xuezhi; Schuurmans, Dale; Cui, Claire; Bousquet, Olivier; Le, Quoc; Chi, Ed (2022-05-01). "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". arXiv:2205.10625 [cs.AI]. ...least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence.
^Wang, Xuezhi; Wei, Jason; Schuurmans, Dale; Le, Quoc; Chi, Ed; Narang, Sharan; Chowdhery, Aakanksha; Zhou, Denny (2022-03-01). "Self-Consistency Improves Chain of Thought Reasoning in Language Models". arXiv:2203.11171 [cs.CL].
^Diao, Shizhe; Wang, Pengcheng; Lin, Yong; Zhang, Tong (2023-02-01). "Active Prompting with Chain-of-Thought for Large Language Models". arXiv:2302.12246 [cs.CL].
^Long, Jieyi (2023-05-15). "Large Language Model Guided Tree-of-Thought". arXiv:2305.08291 [cs.AI].
^Yao, Shunyu; Yu, Dian; Zhao, Jeffrey; Shafran, Izhak; Griffiths, Thomas L.; Cao, Yuan; Narasimhan, Karthik (2023-05-17). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models". arXiv:2305.10601 [cs.CL].
^ abLi, Zekun; Peng, Baolin; He, Pengcheng; Galley, Michel; Gao, Jianfeng; Yan, Xifeng (2023). "Guiding Large Language Models via Directional Stimulus Prompting". arXiv:2302.11520 [cs.CL]. The directional stimulus serves as hints or cues for each input query to guide LLMs toward the desired output, such as keywords that the desired summary should include for summarization.
^Sclar, Melanie; Choi, Yejin; Tsvetkov, Yulia; Suhr, Alane (2024-07-01). "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting". arXiv:2310.11324 [cs.CL].
^Wahle, Jan Philip; Ruas, Terry; Xu, Yang; Gipp, Bela (2024). Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (eds.). "Paraphrase Types Elicit Prompt Engineering Capabilities". Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics: 11004–11033. arXiv:2406.19898.
^Sclar, Melanie; Choi, Yejin; Tsvetkov, Yulia; Suhr, Alane (2024-07-01). "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting". arXiv:2310.11324 [cs.CL].
^Polo, Felipe Maia; Xu, Ronald; Weber, Lucas; Silva, Mírian; Bhardwaj, Onkar; Choshen, Leshem; de Oliveira, Allysson Flavio Melo; Sun, Yuekai; Yurochkin, Mikhail (2024-10-30). "Efficient multi-prompt evaluation of LLMs". arXiv:2405.17202 [cs.CL].
^Edge, Darren; Trinh, Ha; Cheng, Newman; Bradley, Joshua; Chao, Alex; Mody, Apurva; Truitt, Steven; Larson, Jonathan (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization". arXiv:2404.16130 [cs.CL].
^Sequeda, Juan; Allemang, Dean; Jacob, Bryon (2023). "A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases". arXiv:2311.07509 [cs.AI].
^Singh, Chandan; Morris, John; Aneja, Jyoti; Rush, Alexander; Gao, Jianfeng (October 4, 2022). "Explaining Patterns in Data with Language Models via Interpretable Autoprompting". arXiv:2210.01848 [cs.LG].
^Zhou, Yongchao; Ioan Muresanu, Andrei; Han, Ziwen; Paster, Keiran; Pitis, Silviu; Chan, Harris; Ba, Jimmy (2022-11-01). "Large Language Models Are Human-Level Prompt Engineers". arXiv:2211.01910 [cs.LG].
^Zhang, Zhuosheng; Zhang, Aston; Li, Mu; Smola, Alex (2022-10-01). "Automatic Chain of Thought Prompting in Large Language Models". arXiv:2210.03493 [cs.CL].
^Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL]. In prompting, a pre-trained language model is given a prompt (e.g. a natural language instruction) of a task and completes the response without any further training or gradient updates to its parameters... The ability to perform a task via few-shot prompting is emergent when a model has random performance until a certain scale, after which performance increases to well-above random
^Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.
^Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
^Musser, George. "How AI Knows Things No One Told It". Scientific American. Retrieved 17 May 2023. By the time you type a query into ChatGPT, the network should be fixed; unlike humans, it should not continue to learn. So it came as a surprise that LLMs do, in fact, learn from their users' prompts—an ability known as in-context learning.
^Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG]. Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass
^"Mesa-Optimization". 31 May 2019. Retrieved 17 May 2023. Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer.
^Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL]. Training a model to perform in-context learning can be viewed as an instance of the more general learning-to-learn or meta-learning paradigm
^Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen-Or, Daniel (2022). "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion". arXiv:2208.01618 [cs.CV]. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model.
^Li, Xiang Lisa; Liang, Percy (2021). "Prefix-Tuning: Optimizing Continuous Prompts for Generation". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 4582–4597. doi:10.18653/V1/2021.ACL-LONG.353. S2CID230433941. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning... Prefix-tuning draws inspiration from prompting
^Lester, Brian; Al-Rfou, Rami; Constant, Noah (2021). "The Power of Scale for Parameter-Efficient Prompt Tuning". Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 3045–3059. arXiv:2104.08691. doi:10.18653/V1/2021.EMNLP-MAIN.243. S2CID233296808. In this work, we explore "prompt tuning," a simple yet effective mechanism for learning "soft prompts"...Unlike the discrete text prompts used by GPT-3, soft prompts are learned through back-propagation
^Sun, Simeng; Liu, Yang; Iter, Dan; Zhu, Chenguang; Iyyer, Mohit (2023). "How Does In-Context Learning Help Prompt Tuning?". arXiv:2302.11521 [cs.CL].
Aspect of bisexuality history The bisexual pride flag was designed in 1998 by Michael Page The history of bisexuality concerns the history of the bisexual sexual orientation. Ancient and medieval history of bisexuality, when the term did not exist as such, consists of anecdotes of sexual behaviour and relationships between people of the same and different sexes. A modern definition of bisexuality began to take shape in the mid-19th century within three interconnected domains of knowledge: bio...
For other places with the same name, see Hythe. St Leonard's Church at the Hythe. 'The Swan' public house in the Hythe.[1] The Hythe is an area in the southeastern part of Colchester in Essex, England, on the River Colne.[2] Historically it was a hamlet. The Hythe is home to the Paxmans Factory which manufactures automobile parts. Hythe railway station is on the Sunshine Coast Line. Services run towards Clacton-on-sea, Walton-on-the-Naze, Colchester and London. The Church of S...
Film genre Love film redirects here. For the UK-based video service, see LoveFilm. For other uses, see Romance. Tyrone Power passionately embraces Alice Faye in the 1938 film Alexander's Ragtime Band. Romance films involve romantic love stories recorded in visual media for broadcast in theatres or on television that focus on passion, emotion, and the affectionate romantic involvement of the main characters. Typically their journey through dating, courtship or marriage is featured. These films...
Oxyanion with a central atom of sulfur surrounded by 4 oxygen atoms This article is about the inorganic ion. For sulfate esters (commonly used in shampoo and personal care products), see Organosulfate. Sulfate Names IUPAC name Sulfate Other names Tetraoxosulfate(VI)Tetraoxidosulfate(VI) Identifiers CAS Number 14808-79-8 Y 3D model (JSmol) Interactive image ChEBI CHEBI:16189 ChemSpider 1085 ECHA InfoCard 100.108.048 EC Number 233-334-2 PubChem CID 1117 UNII 7IS9N8KPMG Y CompTox Dashb...
Subdistrict of Tallinn, Estonia Subdistrict of Tallinn in Harju County, EstoniaAstanguSubdistrict of TallinnAerial view of the central part of Astangu.Astangu within Haabersti District.Country EstoniaCounty Harju CountyCity TallinnDistrict HaaberstiPopulation (01.01.2015[1]) • Total3,274 Astangu (Estonian for Terrace) is a subdistrict (Estonian: asum) in the district of Haabersti, Tallinn, the capital of Estonia. It has a population of 3,274 (As of 1 January...
Species of bird Rainbow lorikeet Royal Botanic Gardens, Sydney Conservation status Least Concern (IUCN 3.1)[1] Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Aves Order: Psittaciformes Family: Psittaculidae Genus: Trichoglossus Species: T. moluccanus Binomial name Trichoglossus moluccanus(Gmelin, 1788) The rainbow lorikeet (Trichoglossus moluccanus) is a species of parrot found in Australia. It is common along the eastern seaboard, fro...
Suntiang gadang Suntiang yang dipakai perempuan dalam pernikahan Minangkabau, melambangkan beratnya tanggungjawab yang akan diemban seorang wanita setelah menikah. Pakaian tradisional perempuan di Minangkabau yang longgar dipadukan dengan jilbab dan perhiasan kepala: suntiang (kiri) dan tikuluak (kanan). Suntiang adalah perhiasan kepala bertingkat berwarna keemasan yang dipakai oleh perempuan Minangkabau. Hiasan ini berbentuk setengah lingkaran yang terdiri dari susunan ornamen bermotif flora...
Artikulationsställen Labiala Bilabiala Labiodentala Linguolabiala Koronala Interdentala Dentala Retroflexa Alveolara Postalveolara Alveolopalatala Dorsala Palatala Labiopalatala Velara Labiovelara Uvulara Faryngala Epiglottala Glottala Se även: artikulationssätt · Kategori:Konsonantljud Den här sidan kan innehålla fonetisk information skriven med IPA, som kan krångla i vissa webbläsare. Hjälp. Redigera den här mallen Den här artikeln behöver källhänvisningar för att kunn...
صفحة من نسخة من القرون الوسطى من كتاب Notitia Dignitatum من عام 1436، من مكتبة بودلي، أكسفورد. فلسطين ونهر الأردن، من Notitia Dignitatum. قائمة الرتب والوظائف[1] (باللاتينية: Notitia Dignitatum) هي وثيقة من وثائق الإمبراطورية الرومانية المتأخرة والتي تسرد تفاصيل التنظيم الإداري للإمبراطوريات الشر...
For the Star of Vergina or Macedonian Star, see Vergina Sun. For the archeological site and original Macedonian capital, see Aegae (Macedonia). Municipal unit in GreeceVergina ΒεργίναMunicipal unitVerginaLocation within the regional unit Coordinates: 40°29′N 22°19′E / 40.483°N 22.317°E / 40.483; 22.317CountryGreeceAdministrative regionCentral MacedoniaRegional unitImathiaMunicipalityVeroiaArea • Municipal unit69.0 km2 (26.6 sq ...
Beit El בֵּית אֵל بيت إيل Hebrew transcription(s) • ISO 259 Beit ʔel • Also spelled Bet El (official) Beit El Coordinates: 31°56′37.5531″N 35°13′21.1765″E / 31.943764750°N 35.222549028°E / 31.943764750; 35.222549028Coordinates: 31°56′37.5531″N 35°13′21.1765″E / 31.943764750°N 35.222549028°E / 31.943764750; 35.222549028 Region West Bank District Judea and Samaria Area Founded 1977 Government •...
Upper Paleolithic culture of Europe See also: Prehistoric Europe AurignacianLion drawings from the Chauvet Cave, 37,000 to 33,500 years old, and a map of Aurignacian sites.Geographical rangeEurasiaPeriodUpper PaleolithicDatesc. 43,000 – c. 28,000 BP[1][2]Type siteAurignacPreceded byAhmarian, ChâtelperronianFollowed byGravettian, Mal'ta–Buret' cultureDefined byBreuil and Cartailhac, 1906[3] The expansion of early modern humans from the Levant where the Levantine Au...
For other places with the same name, see Písek (disambiguation). Town in South Bohemian, Czech RepublicPísekTownChurch of the Nativity of the Virgin Mary, town walls and the Otava FlagCoat of armsPísekLocation in the Czech RepublicCoordinates: 49°18′32″N 14°8′51″E / 49.30889°N 14.14750°E / 49.30889; 14.14750Country Czech RepublicRegionSouth BohemianDistrictPísekFounded1254Government • MayorMichal ČapekArea • Total63.23 ...
Cet article est une ébauche concernant la politique mexicaine. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Vice-président de la république fédérale des États-Unis mexicains Emblème du Mexique(1893-1916) José María Pino Suárez,dernier vice-président de la République (1911-1913). Création 10 octobre 1824 Abrogation 19 février 1913 Premier titulaire Nicolás Bravo Rueda Dernier titulaire José Mar...
Searchmont Wagonette (1901) Searchmont Type VI Tourenwagen (1903) Die Searchmont Motor Company war ein US-amerikanischer Automobilhersteller in Philadelphia (Pennsylvania). Inhaltsverzeichnis 1 Beschreibung 2 Modelle 3 Literatur 4 Weblinks 5 Einzelnachweise Beschreibung Das Unternehmen entstand aus der Keystone Motor Company am gleichen Ort, die im Jahre 1900 von Theodore C. Search (dem Inhaber der Stetson Hat Company), Spencer Trask und anderen Geschäftsleuten aufgekauft wurde. Der Geschäf...
Part of the brain Brodmann area 25Brodmann area 25 is shown in orange.Medial surface of the brain with Brodmann's areas numbered.DetailsIdentifiersLatinarea subgenualisNeuroNames1029FMA68622Anatomical terms of neuroanatomy[edit on Wikidata] Brodmann area 25 (BA25) is the subgenual area, area subgenualis or subgenual cingulate area in the cerebral cortex of the brain and delineated based on its cytoarchitectonic characteristics. It is the 25th Brodmann area defined by Korbinian Brodmann ...
Country in West Asia This article is about the country. For other uses, see Jordan (disambiguation).HKJ redirects here. For other uses, see HKJ (disambiguation). Hashemite Kingdom of Jordanالمملكة الأردنية الهاشمية (Arabic)Al-Mamlaka al-Urduniyya al-Hāshimiyya Flag Coat of arms Motto: الله، الوطن، الملكAllāh, al-Waṭan, al-MalikGod, Country, King[1]Anthem: السلام الملكي الأردنيAl-Salām al-Malakī al-UrdunīTh...
Political party in Poland Third Way Trzecia DrogaLeadersSzymon HołowniaWładysław Kosiniak-KamyszFounded27 April 2023IdeologyChristian democracyPro-EuropeanismFactions:AgrarianismLiberalismLiberal conservatismPolitical positionCentre-right[1][2][3]European Parliament groupRenew (PL2050)EPP Group (PSL)Colors Yellow (Poland 2050) Green (Polish Coalition)MembersPoland 2050Polish CoalitionSejm64 / 460Senate12 / 100European Parliament3 / 53Regional as...
نادي الطرف السعودي تأسس عام 1400 هـ الملعب الأحساء السعودية البلد السعودية الدوري دوري الدرجة الثالثة السعودي 2015-2016 2015-2016 الإدارة المالك الهيئة العامة للرياضة سامي الدهام الطقم الأساسي الطقم الاحتياطي تعديل مصدري - تعديل نادي الطرف السعودي هو نادٍ رياضي ثقافي اج�...