Graduation Term
Spring 2025
Degree Name
Master of Science (MS)
Department
Department of Mathematics
Committee Chair
Mehdi Karimi
Committee Member
Xing Wang
Abstract
The landscape of decision-making in uncertain environments, as encapsulated by the multi-armed bandit (MAB) problem, traditionally oscillates between risk-neutral strategies that maximize expected rewards and risk-averse tactics that minimize potential losses. This thesis introduces a novel paradigm within this domain, advocating for a risk-seeking approach where decision-making is skewed towards maximizing potential high returns from the upper quantiles of reward distributions. This shift in focus addresses scenarios in high-stakes domains like financial investments and personalized healthcare, where the pursuit of extraordinary outcomes justifies greater risk exposure. The risk-seeking model proposed herein departs from conventional strategies by prioritizing arms that offer potentially higher but less predictable rewards. This approach is critical in explore-then-commit settings, a framework well-suited for environments where there is a cost associated with exploration and the opportunities for exploitation are limited. Unlike typical risk-averse models that rely on mean-variance or Conditional Value at Risk (CVaR) to curb the downside, the risk-seeking model posits that in certain contexts, the optimal strategy may not always be to pull the arm with the highest expected reward. Instead, the focus shifts to arms that could yield high rewards even if their average gains are not the highest. This thesis presents a dual contribution to the field of MAB problems. First, it develops a class of risk-seeking algorithms, termed OTE/FTE-RS-MAB (One/Finite-Time Exploitation Risk-Seeking Multi-Armed Bandit). These algorithms are designed to optimize the selection of an arm based on its potential to deliver the maximum reward in single or a given number of exploitations, rather than across an extended series of trials. An analytical framework is also established to calculate the minimal number of exploratory pulls required to ensure that the exploitation phase remains within a predefined threshold of regret, thus providing robustness against the variability inherent in risk-seeking approaches. Second, the thesis delves into the practical application of these theoretical constructs through a number of examples. In scenarios where exploration incurs significant costs, the proposed c-OTE-RS-MAB algorithm optimizes the trade-off between exploration expenses and regret. This is particularly important in settings such as clinical trials or one-off financial investments, where decisions have profound implications, and the luxury of repeated trials is absent. By synthesizing insights from a comprehensive study of existing literature and integrating innovative risk-seeking strategies, this work not only challenges the traditional paradigms of the MAB framework but also enhances the decision-making toolkit available for scenarios where potential high rewards justify higher risks. The findings underscore the necessity of re-evaluating the utility of conventional risk measures in favor of strategies that embrace the full spectrum of outcomes, including those that prioritize exceptional gains over consistent returns.
Access Type
Thesis-Open Access
Recommended Citation
Vahidi, Ali, "Risk-Seeking Multi-Armed Bandits in an Explore-Then-Commit Setting" (2025). Theses and Dissertations. 2082.
https://ir.library.illinoisstate.edu/etd/2082