A1631
Title: Bayesian Bandit portfolio: Customized Thompson sampling for investor preference
Authors: Vlad Bolovaneanu - Bucharest Academy of Economic Studies (Romania) [presenting]
Daniel Traian Pele - Bucharest University of Economic Studies, Institute for Economic Forecasting, Romanian Academy (Romania)
Abstract: Investor preference contributes decisively to the structure of financial portfolios. Allocation techniques that favor a more conservative or risky approach is common practice. The advent of reinforcement learning (RL) for portfolio optimization created new opportunities for maximizing an arbitrary objective in the intricate market environment. Agents learn to navigate the market by maximizing a custom metric which rewards good decisions and punishes bad ones. At first, using only trial-and-error strategies, agents gradually improve over time. The multi-armed bandit problem, a well-known RL problem, has rarely been pursued in the portfolio optimization literature. A novel approach is proposed using Thompson sampling (TS), which deviates from current research on how sampling is performed. Whilst all works using TS so far have been interested in its standard form with a binomial distribution, sampling from the return distribution posterior is opted for. A tractable distribution is obtained via modeling tails as Pareto-distributed. The approach creates the possibility for an in-depth encoding of investor preference in the optimization metric and is competitive when compared to other well-known portfolios.