当前位置：首页 > news >正文

Optimization Theory

news 2026/6/13 7:53:23

Linear Programming

What is LP?

An optimization problem can be written as

\[\begin{align*} \text{minimize} \quad & f(\boldsymbol{\boldsymbol{x}}) \\ \text{subject to} \quad & g_i(\boldsymbol{x})\le 0, \quad i=1,2,\cdots,m, \\ & h_j(\boldsymbol{x})=0, \quad j=1,2,\cdots,p, \end{align*} \]

where $\boldsymbol{x}\in\mathbb{R}^n$ is the variable to be determined, and $f,g_i,h_j$ are given functions.

However, in general it's hard to solve. A special case is linear programming (LP), where $f,g_i,h_j$ are all linear functions.

A linear program (in general form) is an optimization problem of the form $$ \begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{a_i^\top x}\ge b_i, \quad i\in M_1, \\ & \boldsymbol{a_i^\top x}\le b_i, \quad i\in M_2, \\ & \boldsymbol{a_i^\top x}= b_i, \quad i\in M_3, \\ & x_j\ge 0, \quad j\in N_1, \\ & x_j\le 0, \quad j\in N_2, \\ \end{align*} $$ where $\boldsymbol{x}\in\mathbb{R}^n$ is the variable to be determined, $\boldsymbol{c,a_i}\in\mathbb{R}^n$ are given coefficient vectors, and $b_i\in\mathbb{R}$ are given scalars. Here $M_1,M_2,M_3$ are disjoint inde\boldsymbol{x} sets with $M_1\cup M_2\cup M_3=\{1,2,\cdots,m\}$, and $N_1,N_2$ are disjoint inde\boldsymbol{x} sets with $N_1\cup N_2=\{1,2,\cdots,n\}$.

This can be transformed to the standard form of LP.

A linear program (in standard form) is an optimization problem of the form $$ \begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{Ax}=\boldsymbol{b}, \\ & \boldsymbol{x}\ge \boldsymbol{0}, \end{align*} $$ where $\boldsymbol{x}\in\mathbb{R}^n$ is the variable to be determined, $\boldsymbol{c}\in\mathbb{R}^n$ is a given coefficient vector, $\boldsymbol{A}\in\mathbb{R}^{m\times n}$ is a given coefficient matrix, and $\boldsymbol{b}\in\mathbb{R}^m$ is a given vector.

Any linear program in general form can be transformed to an equivalent linear program in standard form.

Each constraint can be successfully transformed:

For the constraints in the form $\boldsymbol{a_i^\top x}\ge b_i$, we can add a slack variable $s_i$ and transform it to $\boldsymbol{a_i^\top x}-s_i=b_i$ and $s_i\ge 0$.
For the constraints in the form $\boldsymbol{a_i^\top x}\le b_i$, we can multiply both sides by $-1$ so that it reduces to the previous case.
For the constraints in the form $x_j\le 0$, we can multiply both sides by $-1$ so that it reduces to the case $x_j\ge 0$.

Now our program is in the form

\[\begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{Ax}=\boldsymbol{b}, \\ & x_j\ge 0, \quad j\in N, \end{align*} \]

The last step is to eliminate the unconstrained variables. For each unconstrained variable $x_j$, we can replace it with $x_j^+-x_j^-$, where $x_j^+,x_j^-\ge 0$. Then the program is in standard form.

The Geometry of LP

In this section, we will discuss some geometric properties of linear programs.

A polyhedron is a set that can be described in the form $\{\boldsymbol{x}\in\mathbb{R}^n|\boldsymbol{Ax}\ge\boldsymbol{b}\}$, where $\boldsymbol{A}$ is an $m\times n$ matrix and $\boldsymbol{b}$ is a vector in $\mathbb{R}^m$.

Clearly, the feasible set of any linear programming problem is a polyhedron.

We will see that polyhedra is convex.

A set $S \subset \mathbb{R}^{n}$ is convex if for any $\boldsymbol{x}, \boldsymbol{y} \in S$, and any $\lambda \in [0,1]$, we have $\lambda\boldsymbol{x} + (1-\lambda)\boldsymbol{y} \in S$.

Let $\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}$ be vectors in $\mathbb{R}^{n}$ and let $\lambda_{1},\dots,\lambda_{k}$ be nonnegative scalars whose sum is unity.

The vector $\sum_{i=1}^{k}\lambda_{i}\boldsymbol{x}^{i}$ is said to be a convex combination of the vectors $\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}$.
The convex hull of the vectors $\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}$ is the set of all convex combinations of these vectors.

The result that follows establishes some important facts related to convexity.

The intersection of convex sets is convex.
Every polyhedron is a convex set.
A convex combination of a finite number of elements of a convex set also belongs to that set.
The convex hull of a finite number of vectors is a convex set.

If we find all the "corners" of a polyhedron, their convex hull is the polyhedron itself.

So how to define "corner"? We have three definitions: extreme point, vertex, and basic feasible solution.

Let $P$ be a polyhedron. A vector $\boldsymbol{x} \in P$ is an extreme point of $P$ if we cannot find two vectors $\boldsymbol{y}, \boldsymbol{z} \in P$, both different from $\boldsymbol{x}$, and a scalar $\lambda \in [0, 1]$, such that $\boldsymbol{x} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}$.

Let $P$ be a polyhedron. A vector $\boldsymbol{x} \in P$ is a vertex of $P$ if there exists some $\boldsymbol{c}$ such that $\boldsymbol{c}^{\top}\boldsymbol{x} < \boldsymbol{c}^{\top}\boldsymbol{y}$ for all $\boldsymbol{y}$ satisfying $\boldsymbol{y} \in P$ and $\boldsymbol{y} \neq \boldsymbol{x}$.

Besides, we can define "corner" in a linear algebra way.

If a vector $\boldsymbol{x}^{*}$ satisfies $\boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}$ for some $i$ in $M_{1}$, $M_{2}$, or $M_{3}$ (recall the definition of general form of LP), we say that the corresponding constraint is active or binding at $\boldsymbol{x}^{*}$.

Linear algebra tells us that if we have $n$ linearly independent constraints active at a point in $\mathbb{R}^n$, then this point is uniquely determined.

Let $\boldsymbol{x}^{*}$ be an element of $\mathbb{R}^{n}$ and let $I = \{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}$ be the set of indices of constraints that are active at $\boldsymbol{x}^{*}$. Then, the following are equivalent:

There exist $n$ vectors in the set $\{\boldsymbol{a}_{i} \mid i \in I\}$, which are linearly independent.
The span of the vectors $\boldsymbol{a}_{i}, i \in I$, is all of $\mathbb{R}^{n}$, that is, every element of $\mathbb{R}^{n}$ can be expressed as a linear combination of the vectors $\boldsymbol{a}_{i}, i \in I$.
The system of equations $\boldsymbol{a}_{i}^{\top}\boldsymbol{x} = b_{i}, i \in I$, has a unique solution.

Consider a polyhedron $P$ defined by linear equality and inequality constraints, and let $\boldsymbol{x}^{*}$ be an element of $\mathbb{R}^{n}$.

The vector $\boldsymbol{x}^{*}$ is a basic solution if:
- All equality constraints are active;
- Out of the constraints that are active at $\boldsymbol{x}^{*}$, there are $n$ of them that are linearly independent.
If $\boldsymbol{x}^{*}$ is a basic solution that satisfies all of the constraints, we say that it is a basic feasible solution.

Finally, we have the following important theorem that shows the equivalence of the three definitions of "corner".

Let $P$ be a nonempty polyhedron and let $\boldsymbol{x}^{*} \in P$. Then, the following are equivalent:

$\boldsymbol{x}^{*}$ is a vertex;
$\boldsymbol{x}^{*}$ is an extreme point;
$\boldsymbol{x}^{*}$ is a basic feasible solution.

(Vertex $\Rightarrow$ Extreme point) Suppose $\boldsymbol{x}^{*}$ is a vertex but not an extreme point. Then there exist $\boldsymbol{y}, \boldsymbol{z} \in P$ such that $\boldsymbol{y} \neq \boldsymbol{x}^{*}$, $\boldsymbol{z} \neq \boldsymbol{x}^{*}$, and $\boldsymbol{x}^{*} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}$ for some $\lambda \in (0, 1)$. Thus,

\[\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \lambda\boldsymbol{c}^{\top}\boldsymbol{y} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{z} > \lambda\boldsymbol{c}^{\top}\boldsymbol{x}^{*} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \boldsymbol{c}^{\top}\boldsymbol{x}^{*}, \]

which is a contradiction.

(Extreme point $\Rightarrow$ Basic feasible solution) Suppose $\boldsymbol{x}^{*}$ is an extreme point but not a basic feasible solution. Since $\boldsymbol{x}^{*} \in P$, it satisfies all the constraints, and all equality constraints are active at $\boldsymbol{x}^{*}$. Let $I = \{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}$ be the set of indices of constraints that are active at $\boldsymbol{x}^{*}$. Since $\boldsymbol{x}^{*}$ is not a basic feasible solution, the vectors $\{\boldsymbol{a}_{i} \mid i \in I\}$ do not contain $n$ linearly independent vectors. Thus, by the previous proposition, there exists a nonzero vector $\boldsymbol{d}$ such that $\boldsymbol{a}_{i}^{\top}\boldsymbol{d} = 0$ for all $i \in I$. Let $\epsilon > 0$ be sufficiently small so that $\boldsymbol{y}=\boldsymbol{x}^{*} + \epsilon\boldsymbol{d}$ and $\boldsymbol{z}=\boldsymbol{x}^{*} - \epsilon\boldsymbol{d}$ satisfy all the constraints. Then, we have $\boldsymbol{x}^{*} = \frac{1}{2}(\boldsymbol{y}+\boldsymbol{z})$, which contradicts the assumption that $\boldsymbol{x}^{*}$ is an extreme point.

(Basic feasible solution $\Rightarrow$ Vertex) W.l.o.g., we assume $M_2=\emptyset$. Suppose $\boldsymbol{x}^{*}$ is a basic feasible solution. Let $I=\{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}$ and $\boldsymbol{c}=\sum_{i \in I}\boldsymbol{a}_{i}$. Then,

\[\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \sum_{i \in I}\boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = \sum_{i \in I}b_{i}. \]

For any $\boldsymbol{y} \in P$, we have

\[\boldsymbol{c}^{\top}\boldsymbol{y} = \sum_{i \in I}\boldsymbol{a}_{i}^{\top}\boldsymbol{y} \geq \sum_{i \in I}b_{i} = \boldsymbol{c}^{\top}\boldsymbol{x}^{*}. \]

Furthermore, the equality holds if and only if $\boldsymbol{a}_{i}^{\top}\boldsymbol{y} = b_{i}$ for all $i \in I$. Since $\boldsymbol{x}^{*}$ is a basic feasible solution, the system of equations has a unique solution, which is $\boldsymbol{y} = \boldsymbol{x}^{*}$. Therefore, $\boldsymbol{x}^{*}$ is a vertex.

Clearly, the number of basic solutions is bounded above by the number of ways that we
can choose all possible sets of linearly independent constraints from the given constraints.

Given a finite number of linear inequality constraints, there can only be a finite number of basic or basic feasible solutions.

The next question is, how to find all the basic feasible solutions of a polyhedron? First, let's see how to find all the basic solutions of a linear program in standard form. W.l.o.g., we always assume that the rows of $\boldsymbol{A}$ are linearly independent, otherwise we can remove the dependent rows.

Consider the constraints $\boldsymbol{Ax} = \boldsymbol{b}$ and $\boldsymbol{x} \geq \boldsymbol{0}$ and assume that the $m \times n$ matrix $\boldsymbol{A}$ has linearly independent rows. A vector $\boldsymbol{x} \in \mathbb{R}^{n}$ is a basic solution if and only if we have $\boldsymbol{Ax} = \boldsymbol{b}$, and there exist indices $B(1), \dots, B(m)$ such that:

The columns $\boldsymbol{A}_{B(1)}, \dots, \boldsymbol{A}_{B(m)}$ are linearly independent;
If $i \neq B(1), \dots, B(m)$, then $x_{i} = 0$.

($\Rightarrow$) Suppose $\boldsymbol{x}$ is a basic solution. Let $B(1),\ldots, B(k)$ be the indices of the nonzero components of $\boldsymbol{x}$. Since $\boldsymbol{x}$ is a basic solution, the system of equations formed by the active constraints $\boldsymbol{Ax}=\boldsymbol{b}$ and $x_i=0,i\neq B(1),\ldots,B(k)$ has a unique solution. Equivalently, $\sum_{i=1}^k \boldsymbol{A}_{B(i)}x_{B(i)}=\boldsymbol{b}$ has a unique solution. Thus, the columns $\boldsymbol{A}_{B(1)},\ldots,\boldsymbol{A}_{B(k)}$ are linearly independent. Immediately we have $k\le m$. So we add $m-k$ indices to $B(1),\ldots,B(k)$ such that the columns $\boldsymbol{A}_{B(1)},\ldots,\boldsymbol{A}_{B(m)}$ are linearly independent. Now both conditions are satisfied.

($\Leftarrow$) Suppose $\boldsymbol{x}$ satisfies the two conditions. Then we have

\[\sum_{i=1}^m \boldsymbol{A}_{B(i)}x_{B(i)} = \sum_{i=1}^n \boldsymbol{A}_i x_i = \boldsymbol{Ax}=\boldsymbol{b}. \]

Thus, $\boldsymbol{x}$ are unique determined. By proposition, there are $n$ linearly independent active constraints, and this implies that $\boldsymbol{x}$ is a basic solution.

Now we can find all the basic solutions by enumerating all the ways to choose $m$ linearly independent columns of $\boldsymbol{A}$, setting the corresponding variables as unknowns and the rest as zero, and solving the linear system. Finally, we can check which basic solutions are feasible.

Choose $m$ linearly independent columns $\boldsymbol{A}_{B(1)}, \ldots, \boldsymbol{A}_{B(m)}$.
Let $x_{i} = 0$ for all $i \neq B(1), \ldots, B(m)$.
Solve the system of $m$ equations $\boldsymbol{Ax} = \boldsymbol{b}$ for the unknowns $x_{B(1)}, \cdots, x_{B(m)}$.

However, some polyhedrons do not have extreme points at all. It turns out that the existance of extreme points depends on whether a polyhedron contains an infinite line or not.

A polyhedron $P\subset \mathbb{R}^n$ contains a line if there exist a vector $\boldsymbol{x}\in P$ and a nonzero vector $\boldsymbol{d}\in\mathbb{R}^n$ such that $\boldsymbol{x}+\lambda\boldsymbol{d}\in P$ for all scalars $\lambda$.

Suppose that the polyhedron $P = \{\boldsymbol{x} \in \mathbb{R}^{n} \mid \boldsymbol{a}_{i}^{\prime}\boldsymbol{x} \geq b_{i},\ i = 1, \ldots, m\}$ is nonempty. Then, the following are equivalent:

The polyhedron $P$ has at least one extreme point.
There exist $n$ vectors out of the family $\boldsymbol{a}_{1}, \ldots, \boldsymbol{a}_{m}$, which are linearly independent.
The polyhedron $P$ does not contain a line.

(1 $\Rightarrow$ 2) Suppose that $P$ has an extreme point $\boldsymbol{x}^{*}$, then $\boldsymbol{x}^{*}$ is a basic feasible solution, and there exist $n$ constraints that active at $\boldsymbol{x}^{*}$, with the corresponding vectors being linearly independent.

(2 $\Rightarrow$ 3) Suppose that there exist $n$ vectors $\boldsymbol{a}_{i_1},\boldsymbol{a}_{i_2},\cdots,\boldsymbol{a}_{i_n}$ that are linearly independent, and $P$ contains a line $\boldsymbol{x}+\lambda\boldsymbol{d}$, where $\boldsymbol{d}\neq \boldsymbol{0}$. We then have $\boldsymbol{a}_{i_j}^{\top}\boldsymbol{d}\neq 0$ for some $j\in\{1,2,\cdots,n\}$. We conclude that $\boldsymbol{a}_{i_j}^{\top}(\boldsymbol{x}+\lambda\boldsymbol{d})\ge b_{i_j}$ cannot hold for all $\lambda$. This contradicts the assumption that $P$ contains a line.

(3 $\Rightarrow$ 1) Let $\boldsymbol{x}$ be an element of $P$ and let $I=\{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x} = b_{i}\}$. If $n$ of the vectors $\boldsymbol{a}_{i}, i \in I$, are linearly independent, then $\boldsymbol{x}$ is a basic feasible solution. Otherwise, by the previous proposition, there exists a nonzero vector $\boldsymbol{d}$ such that $\boldsymbol{a}_{i}^{\top}\boldsymbol{d} = 0$ for all $i \in I$. Since $P$ does not contain a line, there exists some $\lambda^*$ such that $\boldsymbol{y}=\boldsymbol{x}+\lambda^*\boldsymbol{d}\in P$ and some $j\notin I$ such that $\boldsymbol{a}_{j}^{\top}\boldsymbol{y} = b_{j}$. By moving from $\boldsymbol{x}$ to $\boldsymbol{y}$, we have added at least one new active constraint. Repeating this process, we will eventually find a point with $n$ linearly independent active constraints, which is a basic feasible solution and thus an extreme point.

Bounded polyhedrons and polyhedrons in standard form do not contain any line.

Every nonempty bounded polyhedron and every nonempty polyhedron in standard form has at least one extreme point.

To end this section, we have the following important theorem that shows the importance of extreme points in linear programming.

If a linear program has an optimal solution, then at least one of its optimal solutions is an extreme point.

Consider the linear programming problem of minimizing $\boldsymbol{c}^{\top}\boldsymbol{x}$ over a polyhedron $P=\{\boldsymbol{x} \in \mathbb{R}^{n} \mid \boldsymbol{Ax} \ge\boldsymbol{b}\}$ and let $v$ be the optimal value. Then $Q=\{\boldsymbol{x} \in \mathbb{R}^{n} \mid \boldsymbol{Ax} \ge\boldsymbol{b}, \boldsymbol{c}^{\top}\boldsymbol{x} = v\}$ is also a polyhedron. By the above theorem, $P$ contains no line, so $Q$ contains no line either. Therefore, $Q$ has an extreme point $\boldsymbol{x}^{*}$.

Suppose $\boldsymbol{x}^{*}$ is not an extreme point of $P$. Then there exist $\boldsymbol{y}, \boldsymbol{z} \in P$ such that $\boldsymbol{y} \neq \boldsymbol{x}^{*}$, $\boldsymbol{z} \neq \boldsymbol{x}^{*}$, and $\boldsymbol{x}^{*} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}$ for some $\lambda \in (0, 1)$. It follows that $v=\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \lambda\boldsymbol{c}^{\top}\boldsymbol{y} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{z}$. Since $v$ is optimal, we have $\boldsymbol{c}^{\top}\boldsymbol{y} = v$ and $\boldsymbol{c}^{\top}\boldsymbol{z} = v$. Thus, $\boldsymbol{y}, \boldsymbol{z} \in Q$, which contradicts the assumption that $\boldsymbol{x}^{*}$ is an extreme point of $Q$. Therefore, $\boldsymbol{x}^{*}$ is an extreme point of $P$, and it is an optimal solution of the linear program.

Simplex Method

Back to the linear programming problem, our goal is to find an optimal solution. By the above theorem, we only need to check the extreme points.

The simplex method is an algorithm that iteratively moves along the "edges" of the polyhedron from one extreme point to another, until it reaches an optimal extreme point.

Initialization

First, we need to find an initial basic feasible solution. We can use the two-phase method.

Assume the original LP is in standard form:

\[\begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{Ax}=\boldsymbol{b}, \\ & \boldsymbol{x}\ge \boldsymbol{0}. \end{align*} \]

Consider the auxiliary LP:

\[\begin{align*} \text{minimize} \quad & \sum_{i=1}^m y_i \\ \text{subject to} \quad & \boldsymbol{Ax}+\boldsymbol{y}=\boldsymbol{b}, \\ & \boldsymbol{x}\ge \boldsymbol{0}, \\ & \boldsymbol{y}\ge \boldsymbol{0}. \end{align*} \]

The original LP is feasible if and only of the auxiliary LP has the optimal objective value $0$. So we can solve the auxiliary LP first. However, the auxiliary LP has an obvious basic feasible solution: $\boldsymbol{x}=\boldsymbol{0}$ and $\boldsymbol{y}=\boldsymbol{b}$. Thus, we can use the simplex method to solve the auxiliary LP, get an initial basic feasible solution of the original LP, and then use the simplex method again to solve the original LP.