$$ \newcommand{\rmap}[3]{#1:#2\rightarrow #3} \newcommand{\lmap}[3]{#1:#2\leftarrow #3} \newcommand{\map}[3]{\rmap{#1}{#2}{#3}} \newcommand{\reals}[0]{\mathbb{R}} \newcommand{\xreals}[0]{\mathbb{R}\cup\{\infty\}} \newcommand{\ub}[1]{\rm{ub\ #1}} \newcommand{\lb}[1]{\rm{lb\ #1}} \newcommand{\glb}[1]{\rm{glb\ #1}} \newcommand{\lub}[1]{\rm{lub\ #1}} \newcommand{\ftom}[4]{\glb{ \left\{#2#1 |\, {}^*#2 = #3\ \rm{and}\ #2^* = #4\right\}}} \usepackage{mathrsfs} \newcommand{\alg}[1]{\mathscr{#1}} \newcommand{\complexes}[0]{\mathbb{C}} $$

A Simple One-pass Linear Solver for Matrix Equations

Abstract

We show a single pass algorithm for solving linear equations of the form $Ax=b$, where $b$ is a given vector and $A$ is a linear transformation on a finite dimensional Hilbert space over the Quaternions. It is harmless to think of $A$ as a matrix, which need not be square. Since the quaternions contain complex numbers, which in turn contain real numbers, which in turn contain rational ones, the solutions work, in particular, in these cases. Whenever we say number, you are free to interpret that to mean real number, or rational number, or complex number, or quaternion, as you prefer. You can then safely skip the Preliminaries section.

Preliminaries

The quaternions can be described as quadruples of real numbers, written $a + bi + cj + dk$, where $a$, $b$, $c$, and $d$ are real numbers, and $1$, $i$, $j$, $k$ form the basis of a 4-dimensional real vector space. Addition is defined by the usual addition of vectors. Multiplication is defined by the rules that $i^2 = j^2 = k^2 = -1$, while $ij=k$, $jk =i$, $ki = j$, $ji = - ij$, $ik = - ki$, $jk = - kj$. $1$ has its usual property that $1i = i1 = i$, $1j = j1 = j$, $1k = k1 = k$. Real numbers commute with all quaternions. There is an involution defined by $(a + bi + cj + dk)^* = (a - bi - cj - dk)$. Evidently $q$ is real if and only if $q^* = q$. It is readily verified that $(q_1q_2)^* = q_2^*q_1^*$, and that $q^{**}=q$. In particular, $q^*q$ is necessarily real. In fact it is the sum of the squares $a^2 + b^2 + c^2 + d^2$, and $q^*q=0$ if and only if $q=0$, that is $a=b=c=d=0$.

Vector spaces are defined in the usual way; care must be taken because scalar multiplication isn't commutative. For example, given a set $X$, we can form a vector space by taking all the functions on $X$ taking values in the quaternions. We can add functions by adding their values: $(f+g)(x) = f(x) + g(x)$. We can multiply functions by quaternions by defining $(qf)(x) = q f(x), and (fq)(x) = f(x)q$. Note, however, that in general $fq$ is not equal to $qf$. A structure of Hilbert space is defined by a pairing $\langle f, g\rangle$ which is bilinear, so that $\langle f,g+h\rangle = \langle f,g\rangle + \langle f,h\rangle$ and for which $\langle f, gq \rangle = \langle f, g \rangle q$, and $\langle g, f\rangle = \langle f, g\rangle^*$. It must also be positive definite, meaning that both $\langle f, f\rangle \gt 0$, unless $f=0$. An easy example, if $X$ is a finite set, is to take the sum $\langle f, g\rangle = \sum_{x\in X}f(x)^*g(x)$.

An orthonormal basis can be constructed with only minor care taken to respect the failure of multiplication to commute. If $f_1$, $\dots$, $f_k$ are orthogonal, nonzero vectors, and if $f$ is linearly independent of them, then $f - \sum_{1 \le j \le k} f_j \langle f_j, f\rangle$ is orthogonal to $f_1$, $\dots$, $f_k$, and is not zero. Thus we can inductively build a sequence of nonzero, orthogonal vectors, so long as there remains a vector which is linearly independent of those chosen so far. That this process must ultimately terminate is the definition of finite dimension. If $f$ is linearly independent, which is to say non-zero, then we can divide it by $\sqrt{\langle f, f \rangle}$, since its argument is guaranteed to be a positive number. The resulting vector $u$ will then satisfy $\langle u, u\rangle = 1$. An orthogonal set can thus be converted into an orthonormal one; it is an easy check that the orthogonality will be undisturbed. By this process every finite dimensional vector space can be converted into column vectors of quaternions, where the $j$th component of the vector $v$ is simply $\langle b_j, v \rangle$ where $b_j$ is the $j$th basis vector of such an orthonormal sequence, and every linear map is likewise converted into a matrix.

Linear Equations

Given the linear equation $Ax=b$, where $A$ and $b$ are known, we seek $x$ satisfying the equation. No solution need exist. If a solution does exist, however, then $A^*Ax=A^*b$, where $A^*$ is the adjoint of $A$, satisfying $ \langle x, Ay\rangle = \langle A^*x, y\rangle$ for all $x$ and $y$.

Suppose, on the other hand, that $A^*Ax_1 = A^*b$ and $A^*Ax_2 = A^*b$. Then $A^*Ax_1 = A^*Ax_2$, so $A^*Ax_1 - A^*Ax_2 = 0$, so $A^*A(x_1 - x_2) = 0$. But then $\langle A^*A(x_1-x_2), x_1 - x_2\rangle = 0$, so $\langle A(x_1-x_2), A(x_1 - x_2)\rangle = 0$. But then $A(x_1 - x_2) = 0$, so $Ax_1 - Ax_2 = 0$, so $Ax_1 = Ax_2$. Thus, either $Ax_1 = b$ and $Ax_2 = b$, or neither $Ax_1$ nor $Ax_2$ is $b$. If any of the solutions of $A^*Ax = A^*b$ satisfies $Ax = b$, then all of them do. If any of the solutions of $A^*Ax = A^*b$ fails to satisfy $Ax = b$, then all of them do.

So to solve $Ax=b$ it is enough to find all the solutions of $A^*Ax=A^*b$, and then compute, for any one of them, $Ax$. If it's $b$, then all the solutions of $A^*Ax = A^*b$ are also solutions of $Ax = b$, and of course all the solutions of $Ax=b$ are also solutions of $A^*Ax = A^*b$. If it's not $b$, then no solution can exist for the equation $Ax=b$.

Linear Positive Equations

It is enough, then, to consider equations of the form $A^*Ax=b$. Now notice that $(A^*A)^* = A^*A^{**} = A^*A$, and $\langle x, A^*Ax\rangle = \langle Ax, Ax\rangle \ge 0$. So we consider the linear equation $Ax=b$, where $A^* = A$ and $\langle x, Ax\rangle \ge 0$, which is to say that $A$ is self-adjoint, and positive semi-definite. We have just seen how this case enables the solution of the general case, as well.

We describe here a single pass algorithm for solving such equations. Apart from its intrinsic mathematical interest, there are algorithms for machine learning that require, at each iteration, just such solutions. The usual factorization methods are suboptimal for this, however, because only one solution is required, and both $A$ and $b$ change with each pass. This means that the computational expense in incurred in the factorization cannot be amortized over multiple choices for $b$.

The algorithm is established by induction.

The base case

The base case is in one dimension, where we have the equation $\alpha x = \beta$, with $\alpha^* = \alpha$ and $\alpha \ge 0$.

case 1

If $\alpha = 0$ and $\beta = 0$, then every number $x$ is a solution.

case 2

If $\alpha = 0$ and $\beta$ is not, no solution can exist.

case 3

If $\alpha > 0$, then $x = \beta/\alpha$. (The notation is unambiguous, since $\alpha$ is a real number, and so lies in the center, which consists of the numbers which commute with all other numbers.)

The induction

For the induction, we consider $$\begin{bmatrix} \alpha & b^* \\ a & A \end{bmatrix}$$ where $\alpha$ is a number, $a$ is a (column) vector and $A$ is a matrix, of one less dimension.

Then $$\begin{bmatrix} \alpha & b^* \\ a & A \end{bmatrix}^* = \begin{bmatrix}\alpha^* & a^* \\ b & A^* \end{bmatrix}$$ so then $A^* = A$ and $a=b$ and $\alpha^* = \alpha$.

$$\begin{bmatrix} \chi \\ x \end{bmatrix}^* \begin{bmatrix} \alpha & a^* \\ a & A \end{bmatrix} \begin{bmatrix} \chi \\ x \end{bmatrix} = \begin{bmatrix} \chi^* & x^* \end{bmatrix} \begin{bmatrix} \alpha \chi + a^* x \\ a\chi + Ax \end{bmatrix} = \chi^*\alpha\chi + \chi^*a^*x + x^*a\chi + x^*Ax \ge 0. $$

Choosing $x=0$, $\chi=1$ we see that $\alpha \ge 0$.

case 1

Consider first the case where $\alpha=0$. Choose $x=a$, so that we get $\chi^*a^*a + a^*a\chi + a^*Aa,$ or $a^*a(\chi^* + \chi) + a^*Aa$. This is a linear function of the real part of $\chi$, with slope $a^*a$. Unless $a^*a=0$, it cannot be constrained from being negative. Therefore, it must be the case that $a^*a = 0$. But then also $a=0$. Then we just get $x^*Ax$, and we arrive at just $$\begin{bmatrix}0 & 0 \\ 0 & A\end{bmatrix}$$ where $A$ is self adjoint and positive semi-definite. To solve $$\begin{bmatrix}0 & 0 \\ 0 & A\end{bmatrix}\begin{bmatrix}\chi\\x\end{bmatrix} = \begin{bmatrix}\beta\\b\end{bmatrix},$$ we see immediately that $\beta$ must be $0$. If it is not, there can be no solution. If $\beta=0$, then $\chi$ is unconstrained, while $x$ must be any solution of $Ax=b,$ which by induction we can assume to be known.

case 2

The other case is where $\alpha \gt 0$. Then $\alpha\chi^*\chi + \chi^*a^*x + x^*a\chi + x^*Ax = \alpha(\chi + a^*x/\alpha)^*(\chi + a^*x/\alpha) + x^*(A-aa^*/\alpha)x$.

The first term is always nonnegative. It's $0$ precisely when $\chi = -a^*x/\alpha).$ If the second term is also nonnegative, then obviously so is the sum. On the other hand if the second term is ever negative, for any choice of $x$, then by choosing $\chi$ to make the first term be $0$ will result in a negative sum.

So when $\alpha$ is positive, $$\begin{bmatrix}\alpha&a^*\\a&A\end{bmatrix}$$ is positive semi-definite if and only if $A-aa^*/\alpha$ is. In this case, we consider $$\begin{bmatrix}\alpha & a* \\ a & A\end{bmatrix}\begin{bmatrix}\chi \\ x \end{bmatrix} = \begin{bmatrix}\beta \\ b\end{bmatrix},$$ or $$\begin{bmatrix}\alpha\chi + a^*x \\ a\chi + Ax\end{bmatrix} = \begin{bmatrix}\beta \\ b\end{bmatrix},$$ from which we conclude that $\alpha\chi + a^*x = \beta$, and $a\chi + Ax = b$, or that $\chi = (\beta - a^*x)/\alpha$, and so $a(\beta - a^*x)/\alpha + Ax=b$. In particular, as soon as $x$ is known, the corresponding $\chi$ is determined. To find the possible $x$, we rearrange the last equation as $(A-aa^*/\alpha)x = b - a\beta/\alpha$. But this last problem is again a problem involving a self-adjoint positive semi-definite matrix, and so we can assume we know all of its solutions. For each, we simply calculate the corresponding $\chi,$ to obtain $$\begin{bmatrix}(\beta - a^*x)/\alpha \\ x \end{bmatrix}.$$

Implementation

This algorithm is readily implemented. Notice that $\chi$ depends directly only on $\alpha$, $\beta$, $a$, and $x$. It doesn't depend directly on $A$ or $b$, only through $x$. This means that we can recycle storage, replacing $b$ with $b-a\beta/\alpha$ and $A$ with $A-aa^*/\alpha$.

We conclude with a C language implementation of the positive definite real (floating point) case.

void _positive_solver(double ** A, double * b, double * x, int n) {
    if (n < 1) exit(-1);

    if (n == 1) {
        x[0] = b[0] / A[0][0];
        return;
    }

    double * bvec = b + 1;
    double * Avec = A[0] + 1;
    double ** Asubb = A + 1;
    double * xvec = x + 1;

    int m = n - 1;

    for (int j = 0; j < m; j++) {
        bvec[j] -= Avec[j] * b[0] / A[0][0];
        for (int i = 0; i < m - j; i++) {
            Asub[i][j] -= Avec[i] * Avec[j] / A[0][0];
        }
    }

    _positive_solver(Asub, bvec, xvec, m)

    double p = 0; for (int k = 0; k < m; k++) p += Avec[k] * xvec[k];
    x[0] = (b[0] - p) / A[0][0];

    return;
}

#include 

double * positive_solver(double ** A, double * b, int n) {
    double * x = (double *) malloc (n * sizeof(double));
    _positive_solver(A, b, x, n);
    return x;
}

I'm Andrew Winkler. I hope you've enjoyed exploring these ideas with me. Thanks for your time. If you've got any questions, send an email to 4af502d7148512d4fee9@cloudmailin.net.

The Hertz Foundation has been a valuable support, not only during graduate school, but ever since. If you can, please consider supporting them with a donation.

How can I help support this work?

If you can support this work directly, please check out Patreon

As an Amazon Associate I earn from qualifying purchases.