
# Grasping Optimization

Life is full of optimization problems. All of classical physics flows from the principle of least action (although, strictly speaking, the action is stationary; most of the time the difference doesn't matter). Chemistry under gravity is driven by minimization of free energy, which in turn drives the natural and sexual selection of biology, which in turn drive the interaction styles and temperaments, respectively, described by Keirsey. A related fact is that businesses strive to maximize profit. When they fail to, they are trying to maximize some social gain, instead.

The machine learning systems driving so much technological progress are at their core optimization systems. A key imperative as more and more sophisticated systems are drive by optimization is to explain the decisions they make. When your car is self-driving, you want to know why it decided to turn.

The mathematical directions to these systems are encoded in the form of an objective function, which collates all the desirable outcomes, including the tradeoffs among them. Understanding the tradeoffs being made is central to understanding the behavior of the system. Mathematical models are very good at doing what they're told to do, and very bad at doing what you wish you had told them to do.

A very concrete example might involve some products, where showing one means not showing another. Maximizing profit may mean giving up profit from one product, to get more profit from another.

Examining the tradeoffs that the system is making is key to verifying that the tradeoffs it is being told are acceptable, actually do conform to the intentions of the business.

## A Simple Algorithm

If you're simply using an optimization system, your problem of understanding what it does is much simpler than if you're designing optimization algorithms. For the purposes of use, it's enough to use a very simple mental model. It's not a practical algorithm, but it's still a useful way of thinking about the problem.

Imagine taking all possible parameter values, and making a list of them. You examine the first collection of parameter values, and compute the objective function for those parameter values. Since we've only examined one, this is the best we've found so far. Then, until the list is exhausted, we take the next set of parameter values on the list. We compute the objective function with these values. There are now three possible outcomes:

• The best-so-far value is better than the new value, in which case we discard the new value, and continue
• The new value is better than the best-so-far value, in which case we replace the best-so-far parameters with the new ones, and continue
• The two values are equal, in which case it doesn't matter whether we keep the best-so-far, or replace it with the new, and continue

The last possibility merits more examination. But first, notice that once we have exhausted the list of all possibilities, whatever is best-so-far in the end, is in fact best. If we modified the third possibility above, so that when the new value equals the best-so-far, so that we keep all parameter values having the same objective function value, then in the end we have all the parameter values having the best possible value of the objective function.

And this is where tradeoffs come in. If two different parameter set values produce the same objective function value, then the optimization system is being told that passing from one to the other is an acceptable tradeoff. The objective function is usually made up of various terms, some of which may increase, while others may decrease, in a way that the total stays the same, by trading off the impact on one term with another.

Many such systems have a feature that provides a tool for gaining that insight. They contain a regularization term, to prevent parameter choices that are too extreme. For example, with parameters $w_1$, $w_2$, $\dots$, $w_n$, the regularization term might be $w_1^2 + w_2^2 + \dots + w_n^2$, which obviously increases with any increase in any of the $w_i$'s. The form of the regularization term in other cases is generally likewise transparent, in the sense that it is easy to reason about how changes in parameter choices will affect it.

So if we have a solution of $w_i$'s that make the total objective function as small as possible, then any other choice of $w_i$'s must make the objective function larger. So if we relax any particular $w_i$ by making it slightly smaller, the objective function must increase. The regularization term, however, gets smaller, so the other terms of the objective function must increase to compensate. By examining those individual terms, we can see exactly what tradeoffs are being made, in arriving at the best solution.

More importantly, we can close the modeling loop. Either we confirm that the tradeoffs are exactly what we wanted, or we see exactly how we must modify our objective function to embody the right tradeoffs.

I'm Andrew Winkler. I hope you've enjoyed exploring these ideas with me. Thanks for your time. If you've got any questions, send an email to 4af502d7148512d4fee9@cloudmailin.net.

The Hertz Foundation has been a valuable support, not only during graduate school, but ever since. If you can, please consider supporting them with a donation.

# How can I help support this work?

If you can support this work directly, please check out Patreon

As an Amazon Associate I earn from qualifying purchases.