package obandit

  1. Overview
  2. Docs
Ocaml Multi-Armed Bandits

Install

Dune Dependency

Authors

Maintainers

Sources

obandit-0.3.4.tbz
sha256=0a84abe0b800b06a14b302e632403950e9552d9fc0b5b2cf09d7262a4ddad7dd
md5=f5aa2c86eb25d4fad308d3de0dbc9288

doc/obandit/Obandit/MakeUCB1/index.html

Module Obandit.MakeUCB1

The UCB1 Bandit for stochastic regret minimization .

Parameters

module P : KBanditParam

Signature

type bandit = banditEstimates

The internal data structure of the bandit algorithm.

val initialBandit : bandit

The internal data structure of the bandit algorithm.

The initial state of the bandit algorithm.

val step : bandit -> float -> int * bandit

The initial state of the bandit algorithm.

step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.

OCaml

Innovation. Community. Security.