Machine Learning Surrogates for Decision-Making

Machine Learning Surrogates for Decision-Making

JustinDumouchelle
Justin Dumouchelle
University of Toronto

Operations research has had an invaluable impact on how decisions are made in a wide range of real-world applications. With the constantly increasing need to solve complex decision-making problems, the integration of machine learning and classical optimization techniques has become an emerging area of research. In this article, ORMS Tomorrow editor, Justin Dumouchelle, outlines his research on machine-learning augmented methods for nested optimization problems.

A wide range of real-world optimization problems in logistics, finance, and healthcare, among others, can be solved through mixed-integer linear programming (MILP) techniques. While solving these optimization problems can be challenging, the size of the problems that modern solvers can handle has increased significantly thanks to algorithmic improvements. Despite this, MILP techniques alone cannot address essential factors that decision-makers are concerned with. For example, a decision-maker may have uncertainty in problem parameters. While considering these factors is necessary, it leads to significantly more challenging optimization problems, often to the point where state-of-the-art algorithms are intractable.

Motivation

To better illustrate why incorporating uncertainty is useful, and in some cases necessary, we will start with a concrete example of a facility location problem. In this problem, a decision-maker aims to open a subset of n facilities to serve m clients (Figure 1).

ORtmr_SS25_1
Figure 1: (a) The initial set of candidate facility locations and clients that must be served, (b) Opened facilities, and (c) Assignment decisions.

 The goal of the decision-maker is typically to minimize the sum of the cost to open facilities and serve clients (1a), while ensuring constraints such as not exceeding the capacity of the facilities (1b) and serving all clients (1c) are satisfied.

Mathematically, this problem can be expressed as

ORtmr_SS25_2
 

where x and y are the decision variables. For the facility opening decisions (x), xi = 1 if facility i is open, and xi = 0 otherwise. For the assignment decisions (y), yij = 1 if client j is served by facility i, and 0 otherwise. c and b represent the cost of opening facilities and serving clients, Ci is the capacity of facility i, and dj is the demand of client j. Although the location problem is NP-hard, state-of-the-art MILP solvers can solve large-scale instances, often within seconds.

Two-Stage Stochastic Programming (2SP). While the previous formulation is sufficient in many cases, it may be important for decision-makers to consider uncertainty. Suppose we have the following realistic assumption: The set of clients changes each day, as illustrated in Figure 2. In 2SP, each day is referred to as a scenario and is denoted by ξ1, . . . , ξk. In this context, a scenario, ξk, is a m-dimensional binary vector where ξk,j = 1 if client j is available in day k, and 0 otherwise. Typically, a scenario, ξ, is assumed to be sampled from a known probability distribution.

ORtmr_SS25_3
Figure 2: A different subset of active clients for each day.

In 2SP, the objective of the decision-maker is to minimize the cost of open facilities and serving clients in expectation, which can be formulated as

ORtmr_SS25_4
 

where Q is the optimal objective value from the optimization problem associated with the serving decisions and K is the number of scenarios. For more on 2SP, see (Shapiro et al., 2021).

Other Complications. Outside of 2SP, decision-makers may be concerned with various factors complicating the problem. For example, decision-makers may also be concerned about uncertainty in the worst-case, i.e., robust optimization (Ben-Tal and Nemirovski, 2002), or about the interaction between two decision-makers, i.e., bilevel optimization (Colson et al., 2007).

Practically, any of these considerations may be crucial to decision-makers. However, they also result in significantly less computationally tractable problems. For example, when uncertainty is considered, a deterministic problem that takes seconds to solve may take hours.

A Unified Learning-Based Framework

Although the considerations outlined in the previous section differ significantly in their use case and objective, they all share one key commonality: they are nested optimization problems. 

Traditional optimization approaches must be carefully designed to address each problem class, type of decision variables, and constraints. Despite the differences in optimization when considering the problems in a traditional lens, we can take a unified view through the perspective of machine learning.

Value Function Approximation

Machine learning can be used to approximate the value function(s) of the nested problems. A value function refers to the optimal objective value of the optimization problem over some set of parameters, e.g., in (2), Q is parameterized by x and ξ.

2SP. To illustrate the idea of value function approximation, let’s consider our previous 2SP problem (2). Here, our objective is train a machine learning model fθ (a function f parameterized by θ) to approximate the value function of the assignment problem (Q), i.e.,

ORtmr_SS25_5

for many x, ξ pairs. Alternatively, we can also approximate the expected value function, i.e., (1/K)(SUMQ(x, ξk)). However, this requires special attention regarding the machine learning model, which will be outlined later in the article, and in detail in Dumouchelle et al. (2022).

Making Decisions

Ultimately, machine learning will only be useful if we can leverage it to make high-quality decisions. In most learning-based approaches for operations research, machine learning approaches directly predict solutions or help guide existing algorithms, e.g., branch-and-bound for MILP.

With value function approximation, a different approach is taken. Specifically, a surrogate MILP problem is formulated by embedding the trained predictive model. For example, in 2SP, a surrogate for (2) using (3) is given by

ORtmr_SS25_6

where the predictive model fθ is represented as constraints. To represent certain machine learning models as constraints, a wide range of software libraries (Ceccon et al., 2022; Bergman et al., 2022) provide the ability to represent machine learning models in optimization problems. Modern MILP solvers, such as Gurobi and SCIP, even have their own implementations.

ORtmr_SS25_7
Figure 3: Reduction in computing time compared to the integer L-shaped method (Laporte and Louveaux, 1993) for stochastic server location benchmark instances from Ahmed et al. (2015) with n = 10 facilities and m = 50 clients. Although the value function approximation approach is heuristic, optimal solutions are found for every instance of this problem.

Designing Machine Learning Models

A critical consideration when designing a machine learning model for value function approximation is that the complexity of the surrogate optimization problem increases with the model’s size. In addition, decision-makers may need to solve similar instances of the same problem routinely, making models designed to generalize across instances useful. A significant contribution of Dumouchelle et al. (2022, 2024a,b) is in the design of the deep learning architectures that address both of these considerations.

In the remainder of this article, we outline how to design machine learning models to address the latter consideration. These concepts apply in the context of value function approximation and, more generally, to predicting any value of interest for optimization problems. To make everything concrete, we will consider an example. Assume we are working with a deterministic optimization problem (1), and we want to learn the value function of the assignment problem as a function of the opening decisions (x).

ORtmr_SS25_8

where Y(x) denotes constraints (1b) and (1c).

Fixed-Size Input Models. When faced with the learning task in (5), one approach is to directly use the opening decision variables as the input to the model. This results in a fixed-sized input, making standard machine learning models, such as linear regression, decision trees, and feed-forward neural networks, suitable. However, these come with a significant limitation: a model trained for one optimization problem can not be used on another problem. For example, suppose we aim to learn the value function over opening decisions (x) and variable sets of clients (P), where P defines the locations (relative distances), demands, and number of clients. In this case, a model trained for one realization of P may be unusable on another! For some problems, determining a fixed-dimension representation of P, i.e., feature engineering, may be suitable, but as discussed in the next section, deep learning models can be carefully designed to achieve this as well.

Set-Based Models. To address the limitations of fixed-size input models, set-based deep learning architectures (Zaheer et al., 2017) can be used to not only represent optimization problems, but also bias the predictive model to be invariant to size and ordering. If we consider the facility location example with varying clients (P), we can first identify the relevant features for the clients (Figure 4).

ORtmr_SS25_9
Figure 4: For each client, relevant features include the location (latitude and longitude) and the demand.

From the client features, we can then construct a set-based architecture that learns an aggregated representation of all the clients (Figure 5). As the vectors v1, . . . , vn are summed, the model produces the same output regardless of the order of the clients. Moreover, an arbitrary number of clients can be handled. In addition to capturing variability associated with the optimization problem, these models are trained end-to-end with gradient descent, making them a well-motivated choice from both operations research and machine learning perspectives.

ORtmr_SS25_10
Figure 5: A set-based architecture for generalizing across features related to clients.  Each set of client features, u1, . . . , um, is passed through the same feed forward network (Φ1) to compute vi = Φ1(fi). These vectors are then aggregated (summed), concatenated with the decision variables x, and passed through a second feed forward network (Φ2) to make the final prediction fθ(x, u1, . . . , um) = Φ2(x, Φ1(u1) + . . . + Φ1(um)).

Generalizations & Extensions. The above section depicts how we can learn across instances that vary in parameters relevant only to the clients. However, these architectures can be extended to capture variability among various aspects of interest to the decision-maker, e.g., scenario sets in 2SP or decisions. Other deep learning architectures, such as graph neural networks, may also be appropriate for capturing relevant variants in problems.

As decision-making problems grow in complexity and scale, research at the intersection of machine learning and operations research has become a growing field to address these challenges. This article outlines Justin Dumouchelle’s research, which focuses on tackling nested optimization problems by using machine learning to approximate the value functions of optimization problems and make decisions by leveraging the trained models in a surrogate optimization problem.

References:

Ahmed, S., Garcia, R., Kong, N., Ntaimo, L., Parija, G., Qiu, F., Sen, S., 2015. Siplib: A stochastic integer programming test problem library. See http://www2.isye.gatech.edu/sahmed/siplib.

Ben-Tal, A., Nemirovski, A., 2002. Robust optimization–methodology and applications. Mathematical programming 92, 453–480.

Bergman, D., Huang, T., Brooks, P., Lodi, A., Raghunathan, A.U., 2022. Janos: an integrated predictive and prescriptive modeling framework. INFORMS Journal on Computing 34, 807–816.

Ceccon, F., Jalving, J., Haddad, J., Thebelt, A., Tsay, C., Laird, C.D., Misener, R., 2022. Omlt: Optimization machine learning toolkit. Journal of Machine Learning Research 23, 1–8.

Colson, B., Marcotte, P., Savard, G., 2007. An overview of bilevel optimization. Annals of operations research 153, 235–256. Dumouchelle, J., Julien, E., Kurtz, J., Khalil, E.B., 2024a. Neur2BiLO: Neural bilevel optimization.

Dumouchelle, J., Julien, E., Kurtz, J., Khalil, E.B., 2024b. Neur2RO: Neural two-stage robust optimization, in: The Twelfth International Conference on Learning Representations.

Dumouchelle, J., Patel, R., Khalil, E.B., Bodur, M., 2022. Neur2SP: Neural two-stage stochastic programming. Advances in Neural Information Processing Systems 35.

Kaleem, W., Subramanyam, A., 2024. Neural embedded mixed-integer optimization for location-routing problems. arXiv preprint arXiv:2412.05665 .

Laporte, G., Louveaux, F.V., 1993.  The integer l-shaped method for stochastic integer programs with complete recourse. Operations research letters 13, 133–142.

Ntaimo, L., 2024.    Example applications of stochastic programming, in: Computational Stochastic Programming: Models, Algorithms, and Implementation. Springer, pp. 111–152.

Ntaimo, L., Sen, S., 2005. The million-variable “march” for stochastic combinatorial optimization. Journal of Global Optimization 32, 385–400.

Shapiro, A., Dentcheva, D., Ruszczynski, A., 2021. Lectures on stochastic programming: modeling and theory. SIAM.

Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J., 2017. Deep sets. Advances in neural information processing systems 30.

Acknowledgements: We would like to thank Pavithra S. Murali and Özgür Oskay for taking time to review this article. Figures for facility location are adapted from the stochastic server location problem Ntaimo and Sen (2005); Ntaimo (2024). Photo credit goes to marcinjozwiak for the header photo.