Draft:Explore-then-commit algorithm
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
|
Comment: Source 2 is on arXiv, which is considered unreliable as there is no editorial or peer review needed to publish there. monkeysmashingkeyboards (talk) 21:57, 20 February 2026 (UTC)
Explore Then Commit (ETC) is an algorithm for the Multi-armed bandit problem where you have to do a trade-off between exploration/exploitation.
Multi-armed bandit problem
The Multi-armed bandit problem is a sequential game where one player has to choose at each turn between actions (arms). Behind every arm there is an unknown distribution that lies in a set known by the player (for example, can be the set of Gaussian distributions or Bernoulli distributions).
At each turn the player chooses (pulls) an arm , he then gets an observation of the distribution .
Regret minimization
The goal is to minimize the regret at time that is defined as
where
- is the mean of arm
- is the highest mean
- is the number of pulls of arm up to turn
The player has to find an algorithm that chooses at each turn which arm to pull based on the previous actions and observations to minimize the regret .
This is a trade-off problem between exploration to find the best arm (the arm with the highest mean) and exploitation to play as much as possible the arm that we think is the best arm.[1]
Algorithm

The idea of the algorithm is to explore each arm times. Then for the rest of the game the algorithm exploit by playing the arm with the highest mean. If the horizon is known then the number of exploration can depends on .
Adaptations of the algorithm exist[2] and can be found in the litterature for other settings[3].
Pseudocode
The player choose M
for each arm i do:
select arm i M times
update empirical mean mu[i]
for t from MK+1 to T do:
select arm a with highestempirical mean mu[a]
Theoritical results

When all arms are -sub gaussian, by choosing to explore each arm times, the regret at time verify
We can see the first term as the cost of the exploration
And the second term as the cost of not having exploring enough, leading to a probability of not having an optimal arm as the arm with the highest empirical mean.
Increasing increase the first term while decreasing the second term. The best possible must depends on the which is unknown by the player.
For two arms with Gaussian distribution of variance it was proved that ETC can't achieve the asymptotic optimal regret of the Equation of Lai-Robbins.[4]
References
- ^ a b Lattimore, Tor; Szepesvári, Csaba (2020). Bandit Algorithms. Cambridge University Press.
- ^ Jin, Tianyuan; Xu, Pan; Xiao, Xiaokui; Gu, Quanquan (2021). "Double Explore-then-Commit: Asymptotic Optimality and Beyond". Conference on Learning Theory. PMLR. pp. 2584–2633.
- ^ Nie, Guanyu; Agarwal, Mridul; Umrawal, Abhishek Kumar; Aggarwal, Vaneet; Quinn, Christopher John (2022). "An Explore-then-Commit Algorithm for Submodular Maximization under Full-Bandit Feedback". Uncertainty in Artificial Intelligence. PMLR. pp. 1541–1551.
- ^ Garivier, Aurélien; Kaufmann, Emilie; Lattimore, Tor (2016). "On Explore-Then-Commit Strategies". Advances in Neural Information Processing Systems 29.
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.

- provide significant coverage: discuss the subject in detail, not just brief mentions or routine announcements;
- are reliable: from reputable outlets with editorial oversight;
- are independent: not connected to the subject, such as interviews, press releases, the subject's own website, or sponsored content.
Please add references that meet all three of these criteria. If none exist, the subject is not yet suitable for Wikipedia.