dockerfile/examples/omnivore/content-fetch/readabilityjs/test/test-pages/gflownet/source.html

3143 lines
225 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.css" integrity="sha384-AfEj0r4/OFrOo5t7NnNe46zW/tFgW6x/bCJG8FqQCEo3+Aro6EYUG4+cU+KJWu/X" crossorigin="anonymous" />
<script src="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.js" integrity="sha384-g7c+Jr9ZivxKLnZTDUhnkOnsh30B4H0rpLUpJ4jAIKs4fnJI+sEnkvrMWph2EDg4" crossorigin="anonymous"></script>
<script src="https://fpcdn.s3.amazonaws.com/apps/polygon-tools/0.4.6/polygon-tools.min.js" type="text/javascript"></script>
<script src="gflownet.js" type="text/javascript"></script>
<link rel="stylesheet" href="main.css" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
</head>
<body>
<div class="content">
<center>
<a href="http://folinoid.com/">[Home]</a>
</center><a name="s1" id="s1"></a>
<h3>
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
</h3>
<center>
<b><a href="https://folinoid.com">Emmanuel Bengio</a>, <a href="https://mj10.github.io/">Moksh Jain</a>, <a href="https://scholar.google.com/citations?user=TpuvCSwAAAAJ&amp;hl=en">Maksym Korablyov</a>, <a href="https://www.cs.mcgill.ca/~dprecup/">Doina Precup</a>, <a href="https://yoshuabengio.org/">Yoshua Bengio</a></b>
</center><br />
<center>
<b><a href="https://arxiv.org/abs/2106.04399">arXiv preprint</a>, <a href="https://github.com/bengioe/gflownet">code</a></b><br />
also see the <b><a href="https://arxiv.org/abs/2111.09266">GFlowNet Foundations</a></b> paper<br />
and a more recent (and thorough) <a href="https://tinyurl.com/gflownet-tutorial">tutorial on the framework</a>.
</center><br />
<i>What follows is a high-level overview of this work, for more details refer to our paper.</i> Given a reward <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> and a deterministic episodic environment where episodes end with a ``generate <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
x
</mi>
</mrow>
<annotation encoding="application/x-tex">
x
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">x</span></span></span></span>'' action, how do we generate diverse and high-reward <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
x
</mi>
</mrow>
<annotation encoding="application/x-tex">
x
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">x</span></span></span></span>s?<br />
We propose to use <i>Flow Networks</i> to model discrete <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
p
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
<mo>
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
p(x) \propto R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mrel"></span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> from which we can sample sequentially (like episodic RL, rather than iteratively as MCMC methods would). We show that our method, <b>GFlowNet</b>, is very useful on a combinatorial domain, drug molecule synthesis, because unlike RL methods it generates diverse <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
x
</mi>
</mrow>
<annotation encoding="application/x-tex">
x
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">x</span></span></span></span>s by design.<br />
<a name="s2" id="s2"></a>
<h3>
Flow Networks
</h3>A flow network is a directed graph with <i>sources</i> and <i>sinks</i>, and edges carrying some amount of flow between them through intermediate nodes -- think of pipes of water. For our purposes, we define a flow network with a single source, the root or <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_0
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>; the sinks of the network correspond to the terminal states. We'll assign to each sink <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
x
</mi>
</mrow>
<annotation encoding="application/x-tex">
x
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">x</span></span></span></span> an ``out-flow'' <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span>.<br />
<center>
<div class="scontainer" style="width:450px">
<div id="can1_div" style="position: relative;">
<canvas id="can1" width="450px" height="225px"></canvas><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 15px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{0}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 75px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
1
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{1}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 85px; left: 75px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
2
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{2}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 135px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
3
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{3}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 70px; left: 165px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mn>
3
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_{3}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 45px; left: 160px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="normal">
</mi>
</mrow>
<annotation encoding="application/x-tex">
\top
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 100px; left: 135px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
5
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{5}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">5</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 145px; left: 165px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mn>
5
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_{5}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">5</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 120px; left: 160px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="normal">
</mi>
</mrow>
<annotation encoding="application/x-tex">
\top
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 195px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
7
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{7}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">7</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 100px; left: 195px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
8
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{8}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">8</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 145px; left: 225px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mn>
8
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_{8}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">8</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 120px; left: 220px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="normal">
</mi>
</mrow>
<annotation encoding="application/x-tex">
\top
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 255px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
10
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{10}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 70px; left: 255px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
11
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{11}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 115px; left: 285px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mn>
11
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_{11}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 90px; left: 280px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="normal">
</mi>
</mrow>
<annotation encoding="application/x-tex">
\top
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 47.5px; left: 315px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
13
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{13}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 92.5px; left: 345px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mn>
13
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_{13}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 67.5px; left: 340px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="normal">
</mi>
</mrow>
<annotation encoding="application/x-tex">
\top
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 122.5px; left: 255px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
15
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{15}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">5</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 160px; left: 315px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
16
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{16}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">6</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 205px; left: 345px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mn>
16
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_{16}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">6</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 180px; left: 340px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="normal">
</mi>
</mrow>
<annotation encoding="application/x-tex">
\top
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"></span></span></span></span></span>
</div>
</div>
<script>
<![CDATA[
flownetworkBigger("can1")
]]>
</script>
</center>Given the graph structure and the out-flow of the sinks, we wish to calculate a valid <i>flow</i> between nodes, e.g. how much water each pipe is carrying. Generally there can be infinite solutions, but this is not a problem here -- any valid solution will do. For example above, there is almost no flow between <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
7
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_7
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">7</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
13
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{13}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span> that goes through <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
11
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{11}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>, it all goes through <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
10
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{10}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>, but the reverse solution would also be a valid flow.<br />
Why is this useful? Such a construction corresponds to a generative model. If we follow the flow, we'll end up in a terminal state, a sink, with probability <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
p
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
<mo>
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
p(x) \propto R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mrel"></span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span>. On top of that, we'll have the property that the in-flow of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_0
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>--the flow of the unique source--is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mo>
</mo>
<mi>
x
</mi>
</msub>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mi>
Z
</mi>
</mrow>
<annotation encoding="application/x-tex">
\sum_x R(x)=Z
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;"></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">x</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.07153em;">Z</span></span></span></span>, the partition function. If we assign to each intermediate node a <i>state</i> and to each edge an <i>action</i>, we recover a useful MDP.<br />
Let <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mi>
f
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
F(s,a)=f(s,s')
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> be the flow between <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
s
</mi>
</mrow>
<annotation encoding="application/x-tex">
s
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">s</span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span>, where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
T
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
T(s,a)=s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span>, i.e. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span> is the (deterministic) state transitioned to from state <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
s
</mi>
</mrow>
<annotation encoding="application/x-tex">
s
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">s</span></span></span></span> and action <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
a
</mi>
</mrow>
<annotation encoding="application/x-tex">
a
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">a</span></span></span></span>. Let <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mtable rowspacing="0.24999999999999992em" columnalign="right" columnspacing="">
<mtr>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<mi>
π
</mi>
<mo stretchy="false">
(
</mo>
<mi>
a
</mi>
<mi mathvariant="normal">
</mi>
<mi>
s
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mfrac>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<mrow>
<munder>
<mo>
</mo>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</munder>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
</mfrac>
</mrow>
</mstyle>
</mtd>
</mtr>
</mtable>
<annotation encoding="application/x-tex">
\begin{aligned}\pi(a|s) = \frac{F(s,a)}{\sum_{a'}F(s,a')}\end{aligned}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6063550000000004em;"><span style="top:-3.6063549999999998em;"><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span><span class="mopen">(</span><span class="mord mathnormal">a</span><span class="mord"></span><span class="mord mathnormal">s</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord"><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.427em;"><span style="top:-2.314em;"><span class="mord"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;"></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.17826999999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6778919999999999em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span><span style="top:-3.677em;"><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span></span> then following policy <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
π
</mi>
</mrow>
<annotation encoding="application/x-tex">
\pi
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span></span></span></span>, starting from <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_0
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>, leads to terminal state <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
x
</mi>
</mrow>
<annotation encoding="application/x-tex">
x
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">x</span></span></span></span> with probability <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> (see the paper for proofs and more rigorous explanations).<br />
<a name="s3" id="s3"></a>
<h3>
Approximating Flow Networks
</h3>As you may suspect, there are only few scenarios in which we can build the above graph explicitly. For drug-like molecules, it would have around <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mn>
1
</mn>
<msup>
<mn>
0
</mn>
<mn>
16
</mn>
</msup>
</mrow>
<annotation encoding="application/x-tex">
10^{16}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">6</span></span></span></span></span></span></span></span></span></span></span></span> nodes!<br />
Instead, we resort to function approximation, just like deep RL resorts to it when computing the (action-)value functions of MDPs.<br />
Our goal here is to approximate the flow <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
F(s,a)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span></span></span></span>. Earlier we called a <i>valid</i> flow one that correctly routed all the flow from the source to the sinks through the intermediary nodes. Let's be more precise. For some node <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span>, let the in-flow <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
F(s')
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> be the sum of incoming flows: <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mtable rowspacing="0.24999999999999992em" columnalign="right" columnspacing="">
<mtr>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<munder>
<mo>
</mo>
<mrow>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo>
:
</mo>
<mi>
T
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
</munder>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
</mstyle>
</mtd>
</mtr>
</mtable>
<annotation encoding="application/x-tex">
\begin{aligned}F(s') = \sum_{s,a:T(s,a)=s'} F(s,a)\end{aligned}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.683005em;"><span style="top:-3.683005em;"><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mrel">=</span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.808995em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">a</span><span class="mrel mtight">:</span><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">a</span><span class="mclose mtight">)</span><span class="mrel mtight">=</span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span></span> Here the set <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mo stretchy="false">
{
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo>
:
</mo>
<mi>
T
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
}
</mo>
</mrow>
<annotation encoding="application/x-tex">
\{s,a:T(s,a)=s'\}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mopen">{</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mrel">:</span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">}</span></span></span></span> is the set of state-action pairs that lead to <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span>. Now, let the out-flow be the sum of outgoing flows--or the reward if <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span> is terminal: <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mtable rowspacing="0.24999999999999992em" columnalign="right" columnspacing="">
<mtr>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mo>
+
</mo>
<munder>
<mo>
</mo>
<mrow>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo>
</mo>
<mi mathvariant="script">
A
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
</munder>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo separator="true">
,
</mo>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mi mathvariant="normal">
.
</mi>
</mrow>
</mstyle>
</mtd>
</mtr>
</mtable>
<annotation encoding="application/x-tex">
\begin{aligned}F(s') = R(s') + \sum_{a'\in\mathcal{A}(s')} F(s',a').\end{aligned}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.683005em;"><span style="top:-3.683005em;"><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mrel">=</span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mbin">+</span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.808995em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mrel mtight"></span><span class="mord mtight"><span class="mord mathcal mtight">A</span></span><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord">.</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span></span> Note that we reused <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
F(s')
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>. This is because for a valid flow, the in-flow is equal to the out-flow, i.e. the flow through <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
<annotation encoding="application/x-tex">
s'
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
F(s')
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>. Here <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="script">
A
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
\mathcal{A}(s)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathcal">A</span></span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mclose">)</span></span></span></span> is the set of valid actions in state <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
s
</mi>
</mrow>
<annotation encoding="application/x-tex">
s
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">s</span></span></span></span>, which is the empty set when <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
s
</mi>
</mrow>
<annotation encoding="application/x-tex">
s
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">s</span></span></span></span> is a sink. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
R(s)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mclose">)</span></span></span></span> is 0 unless <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
s
</mi>
</mrow>
<annotation encoding="application/x-tex">
s
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">s</span></span></span></span> is a sink, in which case <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo stretchy="false">
)
</mo>
<mo>
&gt;
</mo>
<mn>
0
</mn>
</mrow>
<annotation encoding="application/x-tex">
R(s)&gt;0
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mclose">)</span><span class="mrel">&gt;</span></span><span class="base"><span class="mord">0</span></span></span></span>.<br />
We can thus call the set of these equalities for all states <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo mathvariant="normal">
</mo>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s'\neq s_0
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.751892em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mrel"><span class="mrel"><span class="mord vbox"><span class="thinbox"><span class="rlap"><span class="inner"><span class="mrel"></span></span></span></span></span></span><span class="mrel">=</span></span></span><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span> the <i>flow consistency equations</i>: <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mtable rowspacing="0.24999999999999992em" columnalign="right" columnspacing="">
<mtr>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<munder>
<mo>
</mo>
<mrow>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo>
:
</mo>
<mi>
T
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
</munder>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mo>
+
</mo>
<munder>
<mo>
</mo>
<mrow>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo>
</mo>
<mi mathvariant="script">
A
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
</munder>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo separator="true">
,
</mo>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mi mathvariant="normal">
.
</mi>
</mrow>
</mstyle>
</mtd>
</mtr>
</mtable>
<annotation encoding="application/x-tex">
\begin{aligned}\sum_{s,a:T(s,a)=s'} F(s,a) = R(s') + \sum_{a'\in\mathcal{A}(s')} F(s',a').\end{aligned}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.683005em;"><span style="top:-3.683005em;"><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.808995em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">a</span><span class="mrel mtight">:</span><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">a</span><span class="mclose mtight">)</span><span class="mrel mtight">=</span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mbin">+</span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.808995em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mrel mtight"></span><span class="mord mtight"><span class="mord mathcal mtight">A</span></span><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord">.</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span></span>
<center>
<div class="scontainer" style="width:200px">
<div id="can2_div" style="position: relative;">
<canvas id="can2" width="200px" height="135px"></canvas><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 37.5px; left: 45px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
a
</mi>
<mn>
1
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
a_1
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 60px; left: 45px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
a
</mi>
<mn>
7
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
a_7
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">7</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 82.5px; left: 45px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
a
</mi>
<mn>
3
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
a_3
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 37.5px; left: 105px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
a
</mi>
<mn>
4
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
a_4
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">4</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 60px; left: 105px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
a
</mi>
<mn>
2
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
a_2
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 82.5px; left: 105px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
a
</mi>
<mn>
8
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
a_8
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">8</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 15px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{0}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 70px; left: 15px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
1
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{1}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 115px; left: 15px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
2
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{2}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 70px; left: 75px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
3
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{3}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 25px; left: 135px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
4
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{4}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">4</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 70px; left: 135px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
5
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{5}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">5</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span><span style="position: absolute; transform: translate(-50%, -55%) scale(0.9, 0.9); top: 115px; left: 135px;"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
s
</mi>
<mn>
6
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
s_{6}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">6</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span>
</div>
</div>
<script>
<![CDATA[
flownetworkEq("can2")
]]>
</script>
</center>Here the set of parents <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mo stretchy="false">
{
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo>
:
</mo>
<mi>
T
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msub>
<mi>
s
</mi>
<mn>
3
</mn>
</msub>
<mo stretchy="false">
}
</mo>
</mrow>
<annotation encoding="application/x-tex">
\{s,a:T(s,a)=s_3\}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mopen">{</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mrel">:</span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">}</span></span></span></span> is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mo stretchy="false">
{
</mo>
<mo stretchy="false">
(
</mo>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
<mo separator="true">
,
</mo>
<msub>
<mi>
a
</mi>
<mn>
1
</mn>
</msub>
<mo stretchy="false">
)
</mo>
<mo separator="true">
,
</mo>
<mo stretchy="false">
(
</mo>
<msub>
<mi>
s
</mi>
<mn>
1
</mn>
</msub>
<mo separator="true">
,
</mo>
<msub>
<mi>
a
</mi>
<mn>
7
</mn>
</msub>
<mo stretchy="false">
)
</mo>
<mo separator="true">
,
</mo>
<mo stretchy="false">
(
</mo>
<msub>
<mi>
s
</mi>
<mn>
2
</mn>
</msub>
<mo separator="true">
,
</mo>
<msub>
<mi>
a
</mi>
<mn>
3
</mn>
</msub>
<mo stretchy="false">
)
</mo>
<mo stretchy="false">
}
</mo>
</mrow>
<annotation encoding="application/x-tex">
\{(s_0, a_1), (s_1, a_7), (s_2, a_3)\}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mopen">{</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">)</span><span class="mpunct">,</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">7</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">)</span><span class="mpunct">,</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">)</span><span class="mclose">}</span></span></span></span>, and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi mathvariant="script">
A
</mi>
<mo stretchy="false">
(
</mo>
<msub>
<mi>
s
</mi>
<mn>
3
</mn>
</msub>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mo stretchy="false">
{
</mo>
<msub>
<mi>
a
</mi>
<mn>
2
</mn>
</msub>
<mo separator="true">
,
</mo>
<msub>
<mi>
a
</mi>
<mn>
4
</mn>
</msub>
<mo separator="true">
,
</mo>
<msub>
<mi>
a
</mi>
<mn>
8
</mn>
</msub>
<mo stretchy="false">
}
</mo>
</mrow>
<annotation encoding="application/x-tex">
\mathcal{A}(s_3)=\{a_2,a_4,a_8\}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathcal">A</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mopen">{</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">4</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">8</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">}</span></span></span></span>.<br />
By now our RL senses should be tingling. We've defined a value function recursively, with two quantities that need to match.<br />
<a name="s4" id="s4"></a>
<h4>
A TD-Like Objective
</h4>Just like one can cast the Bellman equations into TD objectives, so do we cast the flow consistency equations into an objective. We want <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
F
</mi>
<mi>
θ
</mi>
</msub>
</mrow>
<annotation encoding="application/x-tex">
F_\theta
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span> that minimizes the square difference between the two sides of the equations, but we add a few bells and whistles: <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mtable rowspacing="0.24999999999999992em" columnalign="right" columnspacing="">
<mtr>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<msub>
<mi mathvariant="script">
L
</mi>
<mrow>
<mi>
θ
</mi>
<mo separator="true">
,
</mo>
<mi>
ϵ
</mi>
</mrow>
</msub>
<mo stretchy="false">
(
</mo>
<mi>
τ
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<munder>
<mo>
</mo>
<mpadded lspace="-0.5width" width="0px">
<mrow>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo>
</mo>
<mi>
τ
</mi>
<mo mathvariant="normal">
</mo>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
</mrow>
</mpadded>
</munder>
<mtext>
</mtext>
<msup>
<mrow>
<mo fence="true">
(
</mo>
<mi>
log
</mi>
<mo>
</mo>
<mtext>
</mtext>
<mrow>
<mo fence="true">
[
</mo>
<mi>
ϵ
</mi>
<mo>
+
</mo>
<munder>
<mo>
</mo>
<mpadded lspace="-0.5width" width="0px">
<mrow>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo>
:
</mo>
<mi>
T
</mi>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
</mrow>
</mpadded>
</munder>
<mi>
exp
</mi>
<mo>
</mo>
<msubsup>
<mi>
F
</mi>
<mi>
θ
</mi>
<mi>
log
</mi>
<mo>
</mo>
</msubsup>
<mo stretchy="false">
(
</mo>
<mi>
s
</mi>
<mo separator="true">
,
</mo>
<mi>
a
</mi>
<mo stretchy="false">
)
</mo>
<mo fence="true">
]
</mo>
</mrow>
<mo>
</mo>
<mi>
log
</mi>
<mo>
</mo>
<mtext>
</mtext>
<mrow>
<mo fence="true">
[
</mo>
<mi>
ϵ
</mi>
<mo>
+
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mo>
+
</mo>
<munder>
<mo>
</mo>
<mpadded lspace="-0.5width" width="0px">
<mrow>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo>
</mo>
<mi mathvariant="script">
A
</mi>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
</mrow>
</mpadded>
</munder>
<mi>
exp
</mi>
<mo>
</mo>
<msubsup>
<mi>
F
</mi>
<mi>
θ
</mi>
<mi>
log
</mi>
<mo>
</mo>
</msubsup>
<mo stretchy="false">
(
</mo>
<msup>
<mi>
s
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo separator="true">
,
</mo>
<msup>
<mi>
a
</mi>
<mo mathvariant="normal" lspace="0em" rspace="0em">
</mo>
</msup>
<mo stretchy="false">
)
</mo>
<mo fence="true">
]
</mo>
</mrow>
<mo fence="true">
)
</mo>
</mrow>
<mn>
2
</mn>
</msup>
<mi mathvariant="normal">
.
</mi>
</mrow>
</mstyle>
</mtd>
</mtr>
</mtable>
<annotation encoding="application/x-tex">
\begin{aligned}\mathcal{L}_{\theta,\epsilon}(\tau) = \sum_{\mathclap{s'\in\tau\neq s_0}}\,\left(\log\! \left[\epsilon+{\sum_{\mathclap{s,a:T(s,a)=s'}}} \exp F^{\log}_\theta(s,a)\right]- \log\! \left[\epsilon + R(s') + \sum_{\mathclap{a'\in{\cal A}(s')}} \exp F^{\log}_\theta(s',a')\right]\right)^2.\end{aligned}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.3025189999999993em;"><span style="top:-4.302518999999999em;"><span class="mord"><span class="mord"><span class="mord"><span class="mord mathcal">L</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">ϵ</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.1132em;">τ</span><span class="mclose">)</span><span class="mrel">=</span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8478869999999998em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord vbox mtight"><span class="thinbox mtight"><span class="clap mtight"><span class="inner mtight"><span><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mrel mtight"></span><span class="mord mathnormal mtight" style="margin-right:0.1132em;">τ</span><span class="mrel mtight"><span class="mrel mtight"><span class="mord vbox mtight"><span class="thinbox mtight"><span class="rlap mtight"><span class="inner"><span class="mrel mtight"></span></span></span></span></span></span><span class="mrel mtight">=</span></span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span><span class="minner"><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05002em;"><span style="top:-2.2500000000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-3.2550000000000003em;"><span class="overlay" style="height:0.3em;width:0.875em;"><svg width="0.875em" height="0.3em" style="width:0.875em" viewbox="0 0 875 300" preserveaspectratio="xMinYMin">
<path d="M291 0 H417 V300 H291 z"></path></svg></span></span><span style="top:-4.05002em;"><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.0510099999999998em;"><span style="top:-2.2500000000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-2.8099900000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-4.05101em;"><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mord mathnormal">ϵ</span><span class="mbin">+</span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.808995em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord vbox mtight"><span class="thinbox mtight"><span class="clap mtight"><span class="inner mtight"><span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">a</span><span class="mrel mtight">:</span><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">a</span><span class="mclose mtight">)</span><span class="mrel mtight">=</span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mop">exp</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:-0.13889em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mtight">l</span><span class="mtight">o</span><span class="mtight" style="margin-right:0.01389em;">g</span></span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">s</span><span class="mpunct">,</span><span class="mord mathnormal">a</span><span class="mclose">)</span><span class="mclose"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.0510099999999998em;"><span style="top:-2.2500000000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-2.8099900000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-4.05101em;"><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span><span class="mbin"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.0510099999999998em;"><span style="top:-2.2500000000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-2.8099900000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-4.05101em;"><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mord mathnormal">ϵ</span><span class="mbin">+</span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mbin">+</span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.808995em;margin-left:0em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord vbox mtight"><span class="thinbox mtight"><span class="clap mtight"><span class="inner mtight"><span><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mrel mtight"></span><span class="mord mtight"><span class="mord mtight"><span class="mord mathcal mtight">A</span></span></span><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828285714285715em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span><span class="mop op-symbol large-op"></span></span></span></span><span class="vlist-s"></span></span></span></span><span class="mop">exp</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:-0.13889em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mtight">l</span><span class="mtight">o</span><span class="mtight" style="margin-right:0.01389em;">g</span></span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.801892em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mclose"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.0510099999999998em;"><span style="top:-2.2500000000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-2.8099900000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-4.05101em;"><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span><span class="mclose"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05002em;"><span style="top:-2.2500000000000004em;"><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-3.2550000000000003em;"><span class="overlay" style="height:0.3em;width:0.875em;"><svg width="0.875em" height="0.3em" style="width:0.875em" viewbox="0 0 875 300" preserveaspectratio="xMinYMin">
<path d="M457 0 H583 V300 H457 z"></path></svg></span></span><span style="top:-4.05002em;"><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:2.2550179999999997em;"><span style="top:-4.50391em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mord">.</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span></span></span> First, we match the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
log
</mi>
<mo>
</mo>
</mrow>
<annotation encoding="application/x-tex">
\log
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span></span></span></span> of each side, which is important since as intermediate nodes get closer to the root, their flow will become exponentially bigger (remember that <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
F
</mi>
<mo stretchy="false">
(
</mo>
<msub>
<mi>
s
</mi>
<mn>
0
</mn>
</msub>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<mi>
Z
</mi>
<mo>
=
</mo>
<msub>
<mo>
</mo>
<mi>
x
</mi>
</msub>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
F(s_0) = Z = \sum_x R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.07153em;">Z</span><span class="mrel">=</span></span><span class="base"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;"></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">x</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span>), but we care equally about all nodes. Second, we predict <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msubsup>
<mi>
F
</mi>
<mi>
θ
</mi>
<mi>
log
</mi>
<mo>
</mo>
</msubsup>
<mo>
</mo>
<mi>
log
</mi>
<mo>
</mo>
<mi>
F
</mi>
</mrow>
<annotation encoding="application/x-tex">
F^{\log}_\theta\approx\log F
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999999em;"><span style="top:-2.3986920000000005em;margin-left:-0.13889em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mtight">l</span><span class="mtight">o</span><span class="mtight" style="margin-right:0.01389em;">g</span></span></span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mrel"></span></span><span class="base"><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span></span></span></span> for the same reasons. Finally, we add an <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
ϵ
</mi>
</mrow>
<annotation encoding="application/x-tex">
\epsilon
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">ϵ</span></span></span></span> value inside the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
log
</mi>
<mo>
</mo>
</mrow>
<annotation encoding="application/x-tex">
\log
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span></span></span></span>; this doesn't change the minima of the objective, but gives more gradient weight to large values and less to small values.<br />
We show in the paper that a minimizer of this objective achieves our desiderata, which is to have <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
p
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
<mo>
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
p(x)\propto R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mrel"></span></span><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> when sampling from <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
π
</mi>
<mo stretchy="false">
(
</mo>
<mi>
a
</mi>
<mi mathvariant="normal">
</mi>
<mi>
s
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
\pi(a|s)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span><span class="mopen">(</span><span class="mord mathnormal">a</span><span class="mord"></span><span class="mord mathnormal">s</span><span class="mclose">)</span></span></span></span> as defined above.<br />
<a name="s5" id="s5"></a>
<h3>
GFlowNet as Amortized Sampling with an OOD Potential
</h3>It is interesting to compare GFlowNet with Monte-Carlo Markov Chain (MCMC) methods. MCMC methods can be used to sample from a distribution for which there is no analytical sampling formula but an energy function or unnormalized probability function is available. In our context, this unnormalized probability function is our reward function <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
<mo>
=
</mo>
<msup>
<mi>
e
</mi>
<mrow>
<mo>
</mo>
<mi>
e
</mi>
<mi>
n
</mi>
<mi>
e
</mi>
<mi>
r
</mi>
<mi>
g
</mi>
<mi>
y
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
</msup>
</mrow>
<annotation encoding="application/x-tex">
R(x)=e^{-energy(x)}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mrel">=</span></span><span class="base"><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8879999999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight">n</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span><span class="mord mathnormal mtight" style="margin-right:0.03588em;">g</span><span class="mord mathnormal mtight" style="margin-right:0.03588em;">y</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">x</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span></span></span></span>.<br />
Like MCMC methods, GFlowNet can turn a given energy function into samples but it does it in an amortized way, converting the cost a lot of very expensive MCMC trajectories (to obtain each sample) into the cost training a generative model (in our case a generative policy which sequentially builds up <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
x
</mi>
</mrow>
<annotation encoding="application/x-tex">
x
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal">x</span></span></span></span>). Sampling from the generative model is then very cheap (e.g. adding one component at a time to a molecule) compared to an MCMC. But the most important gain may not be just computational, but in terms of the ability to discover new modes of the reward function.<br />
MCMC methods are iterative, making many small noisy steps, which can converge in the neighborhood of a mode, and with some probability jump from one mode to a nearby one. However, if two modes are far from each other, MCMC can require <i>exponential</i> time to mix between the two. If in addition the modes occupy a tiny volume of the state space, the chances of initializing a chain near one of the unknown modes is also tiny, and the MCMC approach becomes unsatisfactory. Whereas such a situation seems hopeless with MCMC, GFlowNet has the potential to discover modes and jump there directly, if there is structure that relates the modes that it already knows, and if its inductive biases and training procedure make it possible to generalize there.<br />
GFlowNet does not need to perfectly know where the modes are: it is sufficient to make guesses which occasionally work well. Like for MCMC methods, once a point in the region of new mode is discovered, further training of GFlowNet will sculpt that mode and zoom in on its peak.<br />
Note that we can put <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> to some power <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
β
</mi>
</mrow>
<annotation encoding="application/x-tex">
\beta
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.05278em;">β</span></span></span></span>, a coefficient which acts like a temperature, and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<msup>
<mo stretchy="false">
)
</mo>
<mi>
β
</mi>
</msup>
<mo>
=
</mo>
<msup>
<mi>
e
</mi>
<mrow>
<mo>
</mo>
<mi>
β
</mi>
<mtext>
</mtext>
<mi>
e
</mi>
<mi>
n
</mi>
<mi>
e
</mi>
<mi>
r
</mi>
<mi>
g
</mi>
<mi>
y
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
</msup>
</mrow>
<annotation encoding="application/x-tex">
R(x)^\beta = e^{-\beta\; energy(x)}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05278em;">β</span></span></span></span></span></span></span></span><span class="mrel">=</span></span><span class="base"><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8879999999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span><span class="mord mathnormal mtight" style="margin-right:0.05278em;">β</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight">n</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span><span class="mord mathnormal mtight" style="margin-right:0.03588em;">g</span><span class="mord mathnormal mtight" style="margin-right:0.03588em;">y</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">x</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span></span></span></span>, making it possible to focus more or less on the highest modes (versus spreading probability mass more uniformly).<br />
<a name="s6" id="s6"></a>
<h3>
Generating molecule graphs
</h3>The motivation for this work is to be able to generate diverse molecules from a proxy reward <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
</mrow>
<annotation encoding="application/x-tex">
R
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span></span></span></span> that is imprecise because it comes from biochemical simulations that have a high uncertainty. As such, we do not care about the maximizer as RL methods would, but rather about a set of ``good enough'' candidates to send to a true biochemical assay.<br />
Another motivation is to have diversity: by fitting the distribution of rewards rather than trying to maximize the expected reward, we're likely to find more modes than if we were being greedy after having found a good enough mode, which again and again we've found RL methods such as PPO to do.<br />
Here we generate molecule graphs via a sequence of additive edits, i.e. we progressively build the graph by adding new leaf nodes to it. We also create molecules block-by-block rather than atom-by-atom.<br />
We find experimentally that we get both good molecules, and diverse ones. We compare ourselves to PPO and MARS (an MCMC-based method).<br />
Figure 3 shows that we're fitting a distribution that makes sense. If we change the reward by exponentiating it as <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msup>
<mi>
R
</mi>
<mi>
β
</mi>
</msup>
</mrow>
<annotation encoding="application/x-tex">
R^\beta
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05278em;">β</span></span></span></span></span></span></span></span></span></span></span> with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
β
</mi>
<mo>
&gt;
</mo>
<mn>
1
</mn>
</mrow>
<annotation encoding="application/x-tex">
\beta&gt;1
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.05278em;">β</span><span class="mrel">&gt;</span></span><span class="base"><span class="mord">1</span></span></span></span>, this shifts the reward distribution to the right.<br />
Figure 4 shows the top-<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
k
</mi>
</mrow>
<annotation encoding="application/x-tex">
k
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span> found as a function of the number of episodes.<br />
<center>
<img src="gfn_fig34.png" width="650px" />
</center><br />
Finally, Figure 5 shows that using a biochemical measure of diversity to estimate the number of distinct modes found, GFlowNet finds much more varied candidates.<br />
<center>
<img src="gfn_fig5.png" width="650px" />
</center><br />
<a name="s7" id="s7"></a>
<h4>
Active Learning experiments
</h4>The above experiments assume access to a reward <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
</mrow>
<annotation encoding="application/x-tex">
R
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span></span></span></span> that is cheap to evaluate. In fact it uses a neural network <i>proxy</i> trained from a large dataset of molecules. This setup isn't quite what we would get when interacting with biochemical assays, where we'd have access to much fewer data. To emulate such a setting, we consider our oracle to be a <i>docking simulation</i> (which is relatively expensive to run, ~30 cpu seconds).<br />
In this setting, there is a limited budget for calls to the true oracle <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
O
</mi>
</mrow>
<annotation encoding="application/x-tex">
O
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.02778em;">O</span></span></span></span>. We use a proxy <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
M
</mi>
</mrow>
<annotation encoding="application/x-tex">
M
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.10903em;">M</span></span></span></span> initialized by training on a limited dataset of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo separator="true">
,
</mo>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
(x, R(x))
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mclose">)</span></span></span></span> pairs <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
D
</mi>
<mn>
0
</mn>
</msub>
</mrow>
<annotation encoding="application/x-tex">
D_0
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>, where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
<mo stretchy="false">
(
</mo>
<mi>
x
</mi>
<mo stretchy="false">
)
</mo>
</mrow>
<annotation encoding="application/x-tex">
R(x)
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> is the true reward from the oracle. The generative model (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
π
</mi>
<mi>
θ
</mi>
</msub>
</mrow>
<annotation encoding="application/x-tex">
\pi_{\theta}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>) is then trained to fit <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
R
</mi>
</mrow>
<annotation encoding="application/x-tex">
R
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.00773em;">R</span></span></span></span> but as predicted by the proxy <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
M
</mi>
</mrow>
<annotation encoding="application/x-tex">
M
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.10903em;">M</span></span></span></span>. We then sample a batch <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
B
</mi>
<mo>
=
</mo>
<mo stretchy="false">
{
</mo>
<msub>
<mi>
x
</mi>
<mn>
1
</mn>
</msub>
<mo separator="true">
,
</mo>
<msub>
<mi>
x
</mi>
<mn>
2
</mn>
</msub>
<mo separator="true">
,
</mo>
<mo>
</mo>
<msub>
<mi>
x
</mi>
<mi>
k
</mi>
</msub>
<mo stretchy="false">
}
</mo>
</mrow>
<annotation encoding="application/x-tex">
B=\{x_1, x_2, \dots x_k\}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.05017em;">B</span><span class="mrel">=</span></span><span class="base"><span class="mopen">{</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mpunct">,</span><span class="minner"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mclose">}</span></span></span></span> where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<msub>
<mi>
x
</mi>
<mi>
i
</mi>
</msub>
<mo>
</mo>
<msub>
<mi>
π
</mi>
<mi>
θ
</mi>
</msub>
</mrow>
<annotation encoding="application/x-tex">
x_i\sim \pi_{\theta}
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span></span></span></span><span class="mrel"></span></span><span class="base"><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">θ</span></span></span></span></span><span class="vlist-s"></span></span></span></span></span></span></span></span>, which is evaluated with the oracle <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
O
</mi>
</mrow>
<annotation encoding="application/x-tex">
O
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.02778em;">O</span></span></span></span>. The proxy <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
M
</mi>
</mrow>
<annotation encoding="application/x-tex">
M
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.10903em;">M</span></span></span></span> is updated with this newly acquired and labeled batch, and the process is repeated for <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>
N
</mi>
</mrow>
<annotation encoding="application/x-tex">
N
</annotation>
</semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="mord mathnormal" style="margin-right:0.10903em;">N</span></span></span></span> iterations.<br />
By doing this on the molecule setting we again find that we can generate better molecules. This showcases the importance of having these diverse candidates.<br />
<center>
<img src="gfn_fig7.png" width="325px" />
</center><br />
For more figures, experiments and explanations, check out <a href="https://arxiv.org/abs/2106.04399">the paper</a>, or reach out to us!<br />
</div>
<div style="height: 10em;"></div>
</body>
</html>