Simple Optimal Algorithm for Solving Rank-r Complex Equations Ax = b Rendering Sparse Solution xS with Support Surprisingly Confined to the Indices of r Independent Columns of A
DOI:
https://doi.org/10.9734/bpi/ratmcs/v4/10806FKeywords:
Singular Value Decomposition (SVD), Compact SVD, Thin SVD, Overdetermined linear equations, Underdetermined linear equations, Compressed sensing, Linear Regression, Linear programming, Convex optimization, Moore-Penrose generalized inverse, Row reduced echelon form, Matrix canonical form, Principle of parsimonyAbstract
In this article we will study optimal sparse solutions for linear equations \(\mathbf{A x}=\mathbf{b}\), where \(\mathbf{A} \in \mathbb{C}^{m \times n}\) is known, \(\mathrm{x} \in \mathbb{C}^n\) is unknown, \(\mathbf{b} \in \mathbb{C}^m\) is the measurement vector (possibly noisy, i.e., \(\mathbf{b}=\mathbf{A x}+\mathbf{e}\) ), and \(\operatorname{rank}(\mathbf{A})=r\). The proposed algorithm computes an optimal sparse solution, say \(\mathrm{x}_S=\left(\mathrm{x}_S(1), \ldots\right.\), \(\left.\mathrm{x}_S(n)\right)^T(T\) denotes transposition), by using \(\mathbf{A}\) to precompute indices \(\alpha=\left(\alpha_1, \ldots, \alpha_r\right)\) satisfying 1 \( \leq \alpha_1<\cdots<\alpha_r \leq n\) that render \(\mathrm{x}_S(\boldsymbol{\beta})=\mathbf{0}_{n-r}\), where \(\beta\) denotes \(\alpha\) 's complementary indices. We will show that \(\mathrm{x}_S(\alpha)\) is a basic solution associated with the columns indexed by \(\alpha\) of some intermediate underdetermined linear equations whose underlying matrix is (\(\mathit{r}\) x \(\mathit{n}\))-dimensional. Hence, the cardinality \(\mathit{s}\) of \(\mathrm{s}:=\operatorname{support}\left(\mathrm{x}_s\right):=\left\{k: \mathrm{x}_S(k) \neq 0\right\}\) satisfies \(s=r\) for a non-degenerate basic solution and s < r for a degenerate basic solution. The proposed algorithm is based on the compact canonical form (C-CF): \(\mathbf{A}=\mathbf{P Q}\), where both \(\mathbf{P} \in \mathbb{C}^{m \times r}\) and \(\mathbf{Q} \in \mathbb{C}^{r \times n}\) have rank \(r\). We will focus on its special case, i.e., the compact singular value decomposition (C-SVD): \(\mathbf{A}=\mathbf{U}_r\left(\Sigma_r \mathbf{V}_r^*\right)\), where \(\mathbf{U}_r^* \mathbf{U}_r=\mathbf{I}_r, \mathbf{V}_r^* V_r=\mathbf{I}_r, \mathbf{\Sigma}_r\) is a diagonal matrix whose diagonal elements are the r positive singular values of \(\mathbf{A}\) in nonincreasing order, \(\mathbf{I}_r\) is the (\(\mathit{r}\) x \(\mathit{r}\)) dimensional unit matrix, and * denotes the Hermitian operator. A special case of C-SVD for reducing computations is the thin SVD (T-SVD), where we retain the first \(\mathit{t}\) singular values of \(\mathbf{A}\); we thus obtain an (\(\mathit{m}\) x \(\mathit{n}\))-dimensional rank- \(\mathit{t}\) matrix, say \(\mathbf{A}^{(t)}=\mathbf{U}_t\left(\Sigma_t \mathbf{V}_t^*\right)\). Notice that an optimal sparse solution, say \(\mathbf{x}_S\), associated with \(\mathbf{A}^{(t)}\) will satisfy | support (\(\mathrm{x}_S\)) | \(\leq t < r.\) We wish to minimize \(\|\mathbf{b}\) - \(\mathbf{A x}\|\) , where \(\|.\|\) denotes any norm. Using the C-CF: \(\mathbf{A = P Q}\) the proposed algorithm consists of following two stages (i) find \(\hat{\mathbf{w}}\) minimizing \(\|\mathbf{b}-\mathbf{P w}\|\), possibly with constraints on e; and, (ii) any solution \(\hat{\mathrm{x}}\)of the underdetermined linear equations \(\mathbf{Q x}=\hat{\mathbf{w}}\) will be an optimal solution for \(\min \|\mathbf{b}-\mathbf{A x}\|\). Hence, in stage (i) sparsity is not an issue, whereas in stage (ii) \(\exists \alpha=\left(\alpha_1, \ldots, \alpha_r\right)\) such that Q's submatrix whose columns are indexed by \( \alpha\), say \(\mathbf{Q}_\alpha\), is regular, also called nonsingular. The desired sparse solution can be obtained by letting (i) \(\mathrm{x}_S=\mathbf{0}_n\), (ii) computing the basic solution \(\mathrm{x}_B:=\mathrm{Q}_\alpha^{-1} \hat{\mathrm{w}}\), (iii) letting \(\mathrm{x}_S(\alpha)=\mathrm{x}_B\), and outputting \(\mathrm{x}_S\). \(\mathrm{A}\) sparse solution is guaranteed to exist when \(\mathit{r}\) < \(\mathit{n}\), i.e., for the underdetermined case \((r \leq m < n)\), the square case \((r \leq m = n)\), and the overdetermined case \((r \leq n < m)\). The proposed algorithm is most efficient when we wish to minimize the Euclidean norm \( \|\mathbf{b}-\mathbf{A x}\|_2\) without any constraints. We could thus provide a new proof of the Moore-Penrose solution, say \(\mathrm{x}_{M P}\), published independently by Moore in 1920, Bjerhammar in 1951, and Penrose in 1955. The outline of this proof is as follows: (i) compute \(\hat{\mathbf{w}}\) minimizing \(\|\mathbf{b}-\mathbf{P w}\|_2\); and, then (ii) compute \(\mathrm{x}_{M L}=\arg \min \|\mathrm{x}\|_2\) subject to \(\mathbf{Q x}=\hat{\mathrm{w}}\). We thus obtain that \(\mathrm{x}_{M P}=\mathrm{x}_{M L}\). However, as shown above we can also compute a sparse solution \(\mathrm{x}_S\) satisfying \(\mathbf{Q} \mathbf{x}_S=\hat{\mathbf{w}}\) and since also \(\mathbf{Q} \mathbf{x}_{M P}=\hat{\mathbf{w}}\) we obtain \(\left\|\mathbf{b}-\mathbf{A} \mathbf{x}_S\right\|_2=\left\|\mathbf{b}-\mathbf{A x}_{M P}\right\|_2\), i.e., \(\mathrm{x}_S\) preserves optimality. The sparse solution \(\mathrm{x}_S\) thus obtained should be preferred over \(\mathrm{x}_{M P}\) since it preserves optimality, does not require any optimization software, and by the parsimony principle gives a much simpler explanation of the observations. Surprisingly, $\alpha$ can be precomputed from A, therefore, we only have to compute \(\hat{\mathrm{w}}\) followed by computing the basic solution \(\mathrm{x}_B:=\mathrm{Q}_\alpha^{-1} \hat{\mathbf{w}}\); then, letting \(\mathrm{x}_S=\mathbf{0}_n\) followed by letting \(\mathrm{x}_S(\alpha)=\mathrm{x}_B\) renders the desired \(\mathrm{x}_S\). In retrospect it turns out that any \(\mathit{r}\) independent columns of \(\mathbf{A}\), say \(\mathbf{A}_\alpha:=\mathbf{A}((1, \ldots, m), \alpha)\), suffice to compute \(\mathrm{x}_S\) while the remaining columns of \(\mathbf{A}\) can be discarded. Hence, the optimal solution of \(\mathbf{A}_\alpha \mathbf{y}+\mathbf{e}=\mathbf{b}\) possibly with constraints on \(\mathbf{e}\), say \(\hat{\mathbf{y}}\), renders \(\mathrm{x}_S(\alpha)=\hat{\mathrm{y}}\)