Home
Blog

A variational approach to Pontryagin's Maximum Principle for stochastic optimal control and stopping problems.

The thesis develops a new variational approach to Pontryagin’s Maximum Principle for stochastic optimal control problems, including settings where the terminal time is itself a control variable. By combining functional analysis, backward stochastic differential equations (BSDEs), and control theory, it establishes necessary optimality conditions for random dynamical systems and illustrates them on quadratic control and natural resource valuation problems.

Published

10 December 2025

Introduction

This thesis develops a variational framework for deriving Pontryagin’s Maximum Principle in stochastic optimal control and stopping problems. It establishes necessary conditions for optimality through functional-analytic methods and adjoint calculus via backward stochastic differential equations. This approach was first researched by Lev Potryagin in the 1950s in deterministic systems [1] and was later extended to stochastic systems where it found a complete theory formalization by Yong and Zhou [2]. Another classical method to address this class of problems is via the Dynamic Programming, introduced by Bellman, which uses the concept of recursion to obtain both necessary and sufficient optimality conditions. It requires to solve the Hamilton-Jacobi-Bellman partial differential equation to get the value function. This thesis focuses on the PMP to avoid dealing with the viscosity solutions of the HJB whose complexity escalates rapidly with state space dimensions.

This thesis is divided into three main sections. Section 3 introduces the theory of backward stochastic differential equations (BSDEs) for both deterministic and random final times, which are needed to represent the adjoint dynamics. Section 4 formulates the stochastic optimal control problem with deterministic final time and proves the PMP using a variational approach based on Gâteaux and Fréchet differentiability of the control-to-state map. Section 5 extends the PMP to the case where the terminal time is a stopping time and part of the control.

Novelties

The contributions of this thesis cover: (i) an alternative proof of the stochastic PMP using Gâteaux/Fréchet differentiability of the control-to-state map; (ii) an extension to random terminal times where the stopping time is part of the control; (iii) applications to quadratic control and natural resource valuation.

The thesis introduces the required functional-analytic and stochastic calculus background and develops BSDE theory for both deterministic and random final time. Then, it proves the PMP with a variational approach and finally it extends this framework to when the final time is part of the control.

BSDE

Backward Stochastic Differential Equations (BSDEs) were first introduced by Pardoux and Peng [3] and serve as the mathematical backbone for representing adjoint dynamics. Unlike standard SDEs which evolve from a given initial condition, BSDEs run backward in time from a terminal condition requiring adapted solutions.

Unlike the case for an ODE, one cannot simply “revert” the time variable since in the case of BSDEs we are looking for a solution that is adapted to a given filtration and the time reversal would destroy the adaptiveness.

Deterministic Final Time

Given a filtered probability space $(\Omega, \mathcal{F}, \{\mathcal{F}\_t\}_{t\ge0}, \mathbb{P})$ supporting a $d$-dimensional Brownian motion $W_t$, we consider BSDEs in the form:

\[Y\_t = \xi + \int_{t}^{T} f(s, Y_s, Z_s) \mathrm{d} s - \int_{t}^{T} Z_s \mathrm{d} W_s,\]

where the pair $(Y,Z)$ is an unknown adapted solution to be found, $\xi:\Omega \to \mathbb{R}^n$ is the terminal condition $\mathcal{F}_t$-measurable and $f$ is the generator function which we require to be measurable in the second and third argument. The martingale representation theorem guarantees the existence of the process $Z$, which “corrects” the backward evolution of $Y$ and restore the adaptiveness.

Theorem. Under standard Lipschitz and integrability conditions on $\xi$ and $f$, the BSDE admits a unique square-integrable solution $(Y,Z)$.

This is proved via a fixed-point argument on a suitable weighted Banach space.

Random Final Time

Fix a time horizon $T>0$ and let the terminal time be a $\mathcal{F}_t$-stopping time $\tau$, such that $\tau\leq T$ a.s., the BSDE takes the form for all $t \ge 0$,

\[Y\_t = \xi + \int_{t \wedge \tau}^{\tau} f(s, Y_s, Z_s) \mathrm{d} s - \int_{t \wedge \tau}^{\tau} Z_s \mathrm{d} W_s,\]

where $\xi$ is an $\mathcal{F}_\tau$-measurable random variable and $f$ is a function measurable in the second and third argument.

Theorem. Under standard Lipschitz and integrability conditions on $\xi$ and $f$, the BSDE with random terminal time admits a unique square-integrable solution $(Y,Z)$.

Again, the proof relies on a fixed-point argument, but the presence of the stopping time requires additional technical care to ensure measurability and integrability up to $\tau$.

PMP: deterministic final time

The following proof of the PMP is an alternative version of the more common approach using the spike variation method present in Yong and Zhou [2].

Given a filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t), \mathbb{P})$ supporting a $d$-dimensional Brownian motion and let $T>0$ be a fixed finite horizon.

To formulate the stochastic optimal control problem, we need first the controlled state equation:

\[\begin{cases} \mathrm{d} X\_t = b(t, X_t, u_t) \mathrm{d} t + \sigma(t, X\_t, u\_t)\mathrm{d} W_t,\\ X_0 = x_0 \in \mathbb{R}^n, \end{cases}\]

with controls $u$ is an admissible set $\mathcal{U}_{ad}$ of progressively, $p$-integrable processes with $p>2$, taking values in a convex set $U\subseteq\mathbb{R}^k$. The coefficients $b$ and $\sigma$ satisfy classical Lipschitz continuity assumptions such that the control equation admits a unique $p$-integrable solution. The objective is to minimize the cost functional

\[J(u, X):=\mathbb{E}\left[\int_0^T f(s,X_s, u_s) \mathrm{d} s + h(X_t)\right],\]

where $f$ and $h$ are continuous functions which represent the running cost and the terminal cost respectively. Rigorously we are seeking for $\overline{u} \in \mathcal{U}_{ad}$ such that

\[J(\overline{u}, \overline{X}) = \inf_{u \in \mathcal{U}_{ad}}J(u,X)\]

where $\overline{X}$ is the corresponding solution to the state equation with control $\overline{u}$.

First, we introduce the control-to-state map $S:\mathcal{U}_{ad} \to \mathcal{X}$, where $\mathcal{X}$ is the natural space where the solution of the state equation lives (i.e. $\mathcal{X} = L^2(\Omega;\, C^0([0,T];\,\mathbb{R}^n))$), and $S$ is the function which maps to each control the corresponding unique solution of the state equation.

The Gâteaux differentiability of $S$ is established next. For any perturbation in the direction $h\in L^p(\Omega\times[0,T];\, \mathbb{R}^k)$ of the control, the Gâteaux derivative is found to be given by the solution of the linearized state equation

\[\begin{aligned} \mathrm{d} Y_t^h &= \left( D_x b(t,\theta_t)[Y_t^h] + D_u b(t,\theta_t)[h] \right) \mathrm{d} t + \left( D_x \sigma(t,\theta_t)[Y_t^h] + D_u \sigma(t,\theta_t)[h] \right) \mathrm{d} W_t \\ Y_0 &= 0, \end{aligned}\]

where $\theta_t := (X_t, u_t)$. Under standard Lipschitz assumptions, this linear system admits a unique square-integrable solution.

Since Gâteaux differentiability does not guarantee the chain rule, we further prove the Fréchet differentiability of $S$ by checking that the map $u\mapsto DS(u) \in \mathcal{L}(\mathcal{U}, \mathcal{X})$ is continuous.

Next, an application of the chain rule lets us prove the Gâteaux differentiability of the cost functional $J$ seen as just a function of the control variable, i.e. $\widetilde{J}(u) := J(u, S(u))$. Its Gâteaux derivative is found to be given by:

\[D \widetilde{J}(u)[h] = \mathbb{E}\Bigg[\int_{0}^{T} D_xf(t, X^u, u)[Y^h_t] D_uf(t, X^u, u)[h]\,\mathrm{d} t + D_xh(X^u_t)[Y^h_t] \Bigg].\]

The dependence on $Y^h$ is not ideal as its dependence from the direction is not explicit and needs to be removed by some manipulation. To this aim, we introduce the adjoint BSDE with deterministic terminal time:

\[\begin{cases} \mathrm{d} p_t = \Big[ -p_t^T D_x b(t,\theta^u_t) - \text{Tr}(q_t^T D_x \sigma(t,\theta^u_t))-D_xf(t,\theta^u_t) \Big] \mathrm{d} t + q_t \mathrm{d} W_t, \\ p_t = D_xh(X_t^u). \end{cases}\]

where the pair $(p,q)$ is an unknown solution to be found. Under standard Lipschitz assumptions on the derivatives of the coefficients, this BSDE admits a unique, square-integrable solution. Moreover, introducing the Hamiltonian function

\[H(t,x,u,p,q):= p^T b(t,x,u) + \text{Tr}(q^T \sigma(t,x,u)) + f(t,x,u),\]

we can write the Gâteaux derivative of $\widetilde{J}$ as

\[D \widetilde{J}(u) = \mathbb{E}\left[\int_{0}^{T} D_uH(t, X^u_t, u_t, p_t, q_t) \mathrm{d} t \right].\]

At this point we are ready to state Pontryagin’s Maximum Principle.

Theorem (PMP – deterministic final time). Under the above assumptions, if $\overline{u} \in \mathcal{U}_{ad}$ is an optimal control with associated state $\overline{X}$, let $( p,q )$ be the unique solution to the adjoint BSDE, then for every other control $u \in \mathcal{U}_{ad}$ it holds that

\[\mathbb{E}\left[\int_{0}^{T} D_u H(t,\overline{X}_t, \overline{u}_t, p_t, q_t)[u_t-\overline{u}_t]\mathrm{d} t \right] \geq 0.\]

Moreover, if the Hamiltonian is convex in $u$, then the optimal control $\overline{u}$ satisfies the pointwise condition $\mathbb{P}$-a.s. for almost every $t \in[0,T]$ for all $u \in \mathcal{U}_{ad}$

\[H(t,\overline{X}_t, \overline{u}_t, p_t, q_t) \leq H(t,\overline{X}_t, u_t, p_t, q_t).\]

This is the necessary condition for optimality: the Hamiltonian must be minimized almost surely at each time by the optimal control.

PMP: random terminal time

In this section we will extend Pontryagin’s Maximum Principle to the case where the controller may also decide the horizon of the problem. Some particular cases where previously analyzed by Yang [4], [5], but there has not been a generalization of such results that covers the full class of random terminal times.

Formally, let the terminal time be a $(\mathcal{F}_t)$-stopping time $\tau$ such that $\tau\leq T$ a.s. and is part of the control. We consider the controlled state equation over a random interval $[0,\tau]$:

\[\begin{cases} \mathrm{d} X_t = b(t, X_t, u_t) \mathrm{d} t + \sigma(t, X_t, u_t)\mathrm{d} W_t,\\ X_0 = x_0 \in \mathbb{R}^n, \end{cases}\]

where the controls $u$ are, again, taken in $\mathcal{U}_{ad}$. The coefficients satisfy, again, the basic assumptions needed for the existence of the unique $p$-integrable solution of the state equation. The objective is to minimize the cost functional

\[J(u, \tau, X):= \mathbb{E}\left[\int_{0}^{\tau} f(s, \tau, X_s, u_s)\mathrm{d} s + h(X_\tau) \right],\]

over $(u,\tau)$.

The control-to-state map $S$ is now defined from $\mathcal{U}_{ad} \times \mathcal{S}_t$ to $\mathcal{X}$, where $\mathcal{S}_t$ is the set of admissible stopping times. Note that $S$ does not depend on $\tau$, indeed the state equation is solved up to time $T$ and then restricted to $[0,\tau]$ hence its differentiability with respect to $u$ has been already proven in the deterministic-final-time case with the same linearized state equation.

Next, in the same fashion as before, an extended version of the adjoint BSDE with random terminal time is introduced:

\[\begin{cases} \mathrm{d} p_t = \big[ -p_t^T D_x b(t,\theta^u_t) - \text{Tr}(q\_t^T D_x \sigma(t,\theta^u\_t)) -D_xf(t,\theta^u\_t) \big] \mathrm{d} t + q_t \mathrm{d} W_t, \\ p_\tau = D_xh(X_\tau^u). \end{cases}\]

This BSDE admits a unique square-integrable solution under standard Lipschitz assumptions on the derivatives of the coefficients. The solutions need to be extended to the whole interval $[0, T]$ by continuity $p_t\equiv D_x(X_\tau)$ and $q_t \equiv 0$ for all $t > \tau$. Then the extended equation keeps solving the BSDE:

\[\begin{cases} \mathrm{d} p_t = \big[ -\mathbb{1}_{\{t \le \tau\}}p_t^T D_x b(t,\theta^u_t)- \text{Tr}(q_t^T D_x \sigma(t,\theta^u_t)) -\mathbb{1}_{\{t \le \tau\}}D_xf(t,\theta^u_t) \big] \mathrm{d} t + q_t \mathrm{d} W_t, \\ p_t = D_xh(X_\tau^u). \end{cases}\]

An extended version of the Hamiltonian function is also introduced as:

\[\mathcal{H}(t,\tau, x,u,p,q) := \mathbb{1}_{\{t \leq \tau\}}p^T b(t,x,u) +\text{Tr}(q^T \sigma(t,x,u)) + \mathbb{1}_{\{t \leq \tau\}}f(t,\tau,x,u).\]

Using this notation, the extended adjoint equation can be rewritten in a more compact form as

\[\begin{cases} \mathrm{d} p_t = -D_x \mathcal{H}(t,\tau, X_t^u, u_t, p_t, q_t) \mathrm{d} t + q_t \mathrm{d} W_t, \\ p_t = D_xh(X_\tau^u). \end{cases}\]

Now we can represent the gradient of the cost functional $\widetilde{J}(u,\tau) := J(u,\tau, S(u,\tau))$ as:

\[D \widetilde{J}(u,\tau)[v, \rho] = \mathbb{E}\Bigg[\int_{0}^{T} D_u \mathcal{H}(t,\tau, X_t^u, u_t, p_t, q_t)[v]+ D_\tau \mathcal{H}(t,\tau, X_t^u, u_t, p_t, q_t)[\rho]\mathrm{d} t + \frac{1}{2} \text{Tr}(\sigma(\tau, \theta_\tau)^T D_{x x}h(X_\tau)\sigma(\tau, \theta_\tau))\cdot \rho\Bigg],\]

where $v \in L^p$ and $\rho \in L^2$ are perturbations of the control and stopping time respectively.

Let’s note that the first term of this representation is the classical variation with respect to the control $u$. The second term represents the variation with respect to the stopping time. The third term is a correction term accounting for the final cost.

We are now ready to state the stochastic PMP with random terminal time.

Theorem (PMP – random terminal time). Under the above assumptions, let $(\overline{u}, \overline{\tau}) \in \mathcal{U}_{ad} \times \mathcal{S}_t$ be an optimal control-stopping time pair with associated state $\overline{X}$, and let $(p,q)$ be the unique solution to the adjoint BSDE, then:

Maximum condition for the control $u$: for all $u\in \mathcal{U}_{ad}$,

\[\mathbb{E}\left[\int_{0}^{\overline{\tau}} D_u \mathcal{H}(t,\overline{\tau}, \overline{X}_t, \overline{u}_t, p_t, q_t)[u_t-\overline{u}_t] \mathrm{d} t\right] \geq 0.\]

Moreover if the extended Hamiltonian $\mathcal{H}$ is convex in $u$, then let $\rho$ be a $(\mathcal{F}_t)_t$-stopping time, if $\rho \in [0, \overline{\tau}]$ a.s. it holds that for all $u \in \mathcal{U}_{ad}$

\[\mathcal{H}(\rho, \overline{\tau}, \overline{X}_\rho, \overline{u}_\rho, p_\rho, q_\rho) \leq \mathcal{H}(\rho, \overline{\tau}, \overline{X}_\rho, u_\rho, p_\rho, q_\rho).\]

Maximum condition for the stopping time $\tau$: for all $\tau \in \mathcal{S}_t$,

\[\mathbb{E}\Bigg[\int_{0}^{T}2 D_\tau \mathcal{H}(t,\overline{\tau}, \overline{X}_t^u, \overline{u}_t, p_t, q_t)[\tau-\overline{\tau}] \mathrm{d} t+ \mathrm{Tr}(\sigma(\overline{\tau}, \overline{\theta}_{\overline{\tau}})^T D_{x x}h(\overline{X}_{\overline{\tau}})\sigma(\overline{\tau}, \overline{\theta}_{\overline{\tau}})) (\tau - \overline{\tau}) \Bigg]\geq 0.\]

As it can be seen it was not possible to obtain a pointwise condition for the stopping time as done for the control variable, due to the correction term introduced by the final cost. In the special case where $h \equiv 0$ or $D_{x x} h \equiv 0$, a pointwise condition can be obtained similarly to the control variable case.

Future Developments

There are many possible future developments of the arguments presented in this thesis. One of them is the presentation of the existence of the optimal control, in terms of $u$ and $\tau$. This may require some further assumptions on compactness/closeness of the admissible control set and/or lower semi-continuity of the cost functional. Another research topic might be the findings of sufficient conditions on optimality. In the setting of our thesis the last expansion is very intuitive because we may require that the second order derivative of the cost functional is positive around the optimal control pair.

References

L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, The mathematical theory of optimal processes. NY: Wiley, 1962.
J. Yong and X. Y. Zhou, Stochastic controls: Hamiltonian systems and HJB equations, no. 43. in Applications of mathematics. New York: Springer, 1999.
E. Pardoux and S. G. Peng, “Adapted solution of a backward stochastic differential equation,” Systems & Control Letters, Jan. 1990, doi: 10.1016/0167-6911(90)90082-6.
S. Yang, “Stochastic maximum principle for optimal control problem with a stopping time cost functional,” International Journal of Control, vol. 95, no. 7, pp. 1777–1788, July 2022, doi: 10.1080/00207179.2021.1872801.
S. Yang, “A varying terminal time structure for stochastic optimal control under constrained condition,” International Journal of Robust and Nonlinear Control, vol. 30, no. 13, pp. 5181–5204, Sept. 2020, doi: 10.1002/rnc.5048.