f11 – Large Scale Linear Systems

This chapter provides functions for the solution of large sparse systems of simultaneous linear equations. These include **iterative** methods for real nonsymmetric and symmetric linear systems and **direct** methods for general real linear systems. Further direct methods are currently available in Chapters f01 and f04.

This section is only a brief introduction to the solution of sparse linear systems. For a more detailed discussion see for example Duff et al. (1986) and Demmel et al. (1999) for direct methods, or Barrett et al. (1994) for iterative methods.

A matrix $A$ may be described as **sparse** if the number of zero elements is sufficiently large that it is worthwhile using algorithms which avoid computations involving zero elements.

If $A$ is sparse, and the chosen algorithm requires the matrix coefficients to be stored, a significant saving in storage can often be made by storing only the nonzero elements. A number of different formats may be used to represent sparse matrices economically. These differ according to the amount of storage required, the amount of indirect addressing required for fundamental operations such as matrix-vector products, and their suitability for vector and/or parallel architectures. For a survey of some of these storage formats see Barrett et al. (1994).

This storage format represents a sparse matrix $A$, with nnz nonzero elements, in terms of three one-dimensional arrays – a double array a and two Integer arrays irow and icol. These arrays are all of dimension at least nnz. a contains the nonzero elements themselves, while irow and icol store the corresponding row and column indices respectively.

For example, the matrix

might be represented in the arrays a, irow and icol as

$$A=\left(\begin{array}{rrrrr}1& \phantom{-}2& -1& -1& -3\\ 0& -1& 0& 0& -4\\ 3& 0& 0& 0& 2\\ 2& 0& 4& 1& 1\\ -2& 0& 0& 0& 1\end{array}\right)$$ |

- ${\mathbf{a}}=\left(1,2,-1,-1,-3,-1,-4,3,2,2,4,1,1,-2,1\right)$
- ${\mathbf{irow}}=\left(1,1,1,1,1,2,2,3,3,4,4,4,4,5,5\right)$
- ${\mathbf{icol}}=\left(1,2,3,4,5,2,5,1,5,1,3,4,5,1,5\right)$.

Notes

(i) | The general format specifies no ordering of the array elements, but some functions may impose a specific ordering. For example, the nonzero elements may be required to be ordered by increasing row index and by increasing column index within each row, as in the example above. nag_sparse_nsym_sort (f11zac) is a utility function provided to order the elements appropriately (see Section 2.2). |

(ii) | With this storage format it is possible to enter duplicate elements. These may be interpreted in various ways (e.g., raising an error, ignoring all but the first entry, all but the last, or summing). |

This storage format is suitable for symmetric and Hermitian matrices, and is identical to the CS format described in Section 2.1.1, except that only the lower triangular nonzero elements are stored. Thus, for example, the matrix

might be represented in the arrays a, irow and icol as

$$A=\left(\begin{array}{rrrrrr}4& \phantom{-}1& \phantom{-}0& \phantom{-}0& -1& 2\\ 1& 5& 0& 2& 0& 0\\ 0& 0& 2& 1& 0& -1\\ 0& 2& 1& 3& 1& 0\\ -1& 0& 0& 1& 4& 0\\ 2& 0& -1& 0& 0& 3\end{array}\right)$$ |

- ${\mathbf{a}}=\left(4,1,5,2,2,1,3,-1,1,4,2,-1,3\right)$.
- ${\mathbf{irow}}=\left(1,2,2,3,4,4,4,5,5,5,6,6,6\right)$,
- ${\mathbf{icol}}=\left(1,1,2,3,2,3,4,1,4,5,1,3,6\right)$.

This storage format also uses three one-dimensional arrays – a double array a and two Integer arrays irowix and icolzp. The array a and irowix are of dimension at least $\mathit{nnz}$, while icolzp is of dimension at least ${\mathbf{n}}+1$. a contains the nonzero elements, going down the first column, then the second and so on. For example, the matrix in Section 2.1.1 above will be represented by

- ${\mathbf{a}}=\left(1,3,2,-2,2,-1,-1,4,-1,1,-3,-4,2,1,1\right)$.

- ${\mathbf{irowix}}=\left(1,3,4,5,1,2,1,4,1,4,1,2,3,4,5\right)$.

- ${\mathbf{icolzp}}=\left(1,5,7,9,11,16\right)$

The example in Section 2.1.2 above will be represented by

- ${\mathbf{a}}=\left(4,1,-1,2,1,5,2,2,1,-1,2,1,3,1,-1,1,4,2,-1,3\right)$
- ${\mathbf{irowix}}=\left(1,2,5,6,1,2,4,3,4,6,2,3,4,5,1,4,5,1,3,6\right)$
- ${\mathbf{icolzp}}=\left(1,5,8,11,15,18,21\right)$

Direct methods for the solution of the linear algebraic system

aim to determine the solution vector $x$ in a fixed number of arithmetic operations, which is determined a priori by the number of unknowns. For example, an $LU$ factorization of $A$ followed by forward and backward substitution is a direct method for (1).

$$Ax=b$$ | (1) |

If the matrix $A$ is sparse it is possible to design **direct** methods which exploit the sparsity pattern and are therefore much more computationally efficient than the algorithms in Chapter f07, which in general take no account of sparsity. However, if the matrix is very large and sparse, then **iterative** methods, with an appropriate preconditioner, (see Section 2.3) may be more efficient still.

This chapter provides a direct $LU$ factorization method for sparse real systems. This method is based on special coding for supernodes, broadly defined as groups of consecutive columns with the same nonzero structure, which enables use of dense BLAS kernels. The algorithms contained here come from the SuperLU software suite (see Demmel et al. (1999)). An important requirement of sparse $LU$ factorization is keeping the factors as sparse as possible. It is well known that certain column orderings can produce much sparser factorizations than the normal left-to-right ordering. It is well worth the effort, then, to find such column orderings since they reduce both storage requirements of the factors, the time taken to compute them and the time taken to solve the linear system. The row reorderings, demanded by partial pivoting in order to keep the factorization stable, can further complicate the choice of the column ordering, but quite good and fast algorithms have been developed to make possible a fairly reliable computation of an appropriate column ordering for any sparsity pattern. We provide one such algorithm (known in the literature as COLAMD) through one function in the suite. Similar to the case for dense matrices, functions are provided to compute the $LU$ factorization with partial row pivoting for numerical stability, solve (1) by performing the forward and backward substitutions for multiple right hand side vectors, refine the solution, minimize the backward error and estimate the forward error of the solutions, compute norms, estimate condition numbers and perform diagnostics of the factorization. For more details see Section 3.4.

In contrast to the direct methods discussed in Section 2.2, **iterative** methods for (1) approach the solution through a sequence of approximations until some user-specified termination criterion is met or until some predefined maximum number of iterations has been reached. The number of iterations required for convergence is not generally known in advance, as it depends on the accuracy required, and on the matrix $A$ – its sparsity pattern, conditioning and eigenvalue spectrum.

Faster convergence can often be achieved using a **preconditioner** (see Golub and Van Loan (1996) and Barrett et al. (1994)). A preconditioner maps the original system of equations onto a different system

which hopefully exhibits better convergence characteristics. For example, the condition number of the matrix $\stackrel{-}{A}$ may be better than that of $A$, or it may have eigenvalues of greater multiplicity.

$$\stackrel{-}{A}\stackrel{-}{x}=\stackrel{-}{b}\text{,}$$ | (2) |

An unsuitable preconditioner or no preconditioning at all may result in a very slow rate or lack of convergence. However, preconditioning involves a trade-off between the reduction in the number of iterations required for convergence and the additional computational costs per iteration. Setting up a preconditioner may also involve non-negligible overheads. The application of preconditioners to real nonsymmetric and real symmetric systems of equations is further considered in Sections 2.4 and 2.5.

Many of the most effective iterative methods for the solution of (1) lie in the class of non-stationary **Krylov subspace methods** (see Barrett et al. (1994)). For real nonsymmetric
matrices this class includes:

- the restarted generalized minimum residual (RGMRES) method (see Saad and Schultz (1986));
- the conjugate gradient squared (CGS) method (see Sonneveld (1989));
- the polynomial stabilized bi-conjugate gradient (Bi-CGSTAB$\left(\ell \right)$) method (see Van der Vorst (1989) and Sleijpen and Fokkema (1993));
- the transpose-free quasi-minimal residual method (TFQMR) (see Freund and Nachtigal (1991) and Freund (1993)).

Here we just give a brief overview of these algorithms as implemented in this chapter.

RGMRES is based on the Arnoldi method, which explicitly generates an orthogonal basis for the Krylov subspace $\mathrm{span}\left\{{A}^{k}{r}_{0}\right\}$, $k=0,1,2,\dots \text{}$, where ${r}_{0}$ is the initial residual. The solution is then expanded onto the orthogonal basis so as to minimize the residual norm. For real nonsymmetric matrices the generation of the basis requires a ‘long’ recurrence relation, resulting in prohibitive computational and storage costs. RGMRES limits these costs by restarting the Arnoldi process from the latest available residual every $m$ iterations. The value of $m$ is chosen in advance and is fixed throughout the computation. Unfortunately, an optimum value of $m$ cannot easily be predicted.

CGS is a development of the bi-conjugate gradient method where the nonsymmetric Lanczos method is applied to reduce the coefficient matrix to tridiagonal form: two bi-orthogonal sequences of vectors are generated starting from the initial residual ${r}_{0}$ and from the shadow residual ${\hat{r}}_{0}$ corresponding to the arbitrary problem ${A}^{\mathrm{H}}\hat{x}=\hat{b}$, where $\hat{b}$ is chosen so that ${r}_{0}={\hat{r}}_{0}$. In the course of the iteration, the residual and shadow residual ${r}_{i}={P}_{i}\left(A\right){r}_{0}$ and ${\hat{r}}_{i}={P}_{i}\left({A}^{\mathrm{H}}\right){\hat{r}}_{0}$ are generated, where ${P}_{i}$ is a polynomial of order $i$, and bi-orthogonality is exploited by computing the vector product
${\rho}_{i}=\left({\hat{r}}_{i},{r}_{i}\right)=\left({P}_{i}\left({A}^{\mathrm{H}}\right){\hat{r}}_{0}{P}_{i}\left(A\right){r}_{0}\right)=\left({\hat{r}}_{0},{P}_{i}^{2}\left(A\right){r}_{0}\right)$. Applying the ‘contraction’ operator ${P}_{i}\left(A\right)$ twice, the iteration coefficients can still be recovered without advancing the solution of the shadow problem, which is of no interest. The CGS method often provides fast convergence; however, there is no reason why the contraction operator should also reduce the once reduced vector ${P}_{i}\left(A\right){r}_{0}$: this can lead to a highly irregular convergence.

Bi-CGSTAB$\left(\ell \right)$ is similar to the CGS method. However, instead of generating the sequence $\left\{{P}_{i}^{2}\left(A\right){r}_{0}\right\}$, it generates the sequence $\left\{{Q}_{i}\left(A\right){P}_{i}\left(A\right){r}_{0}\right\}$ where the ${Q}_{i}\left(A\right)$ are polynomials chosen to minimize the residual after the application of the contraction operator ${P}_{i}\left(A\right)$. Two main steps can be identified for each iteration: an OR (Orthogonal Residuals) step where a basis of order $\ell $ is generated by a Bi-CG iteration and an MR (Minimum Residuals) step where the residual is minimized over the basis generated, by a method similar to GMRES. For $\ell =1$, the method corresponds to the Bi-CGSTAB method of Van der Vorst (1989). However, as $\ell $ increases, numerical instabilities may arise.

The transpose-free quasi-minimal residual method (TFQMR) (see Freund and Nachtigal (1991) and Freund (1993)) is conceptually derived from the CGS method. The residual is minimized over the space of the residual vectors generated by the CGS iterations under the simplifying assumption that residuals are almost orthogonal. In practice, this is not the case but theoretical analysis has proved the validity of the method. This has the effect of remedying the rather irregular convergence behaviour with wild oscillations in the residual norm that can degrade the numerical performance and robustness of the CGS method. In general, the TFQMR method can be expected to converge at least as fast as the CGS method, in terms of number of iterations, although each iteration involves a higher operation count. When the CGS method exhibits irregular convergence, the TFQMR method can produce much smoother, almost monotonic convergence curves. However, the close relationship between the CGS and TFQMR method implies that the overall speed of convergence is similar for both methods. In some cases, the TFQMR method may converge faster than the CGS method.

Faster convergence can usually be achieved by using a **preconditioner**. A left preconditioner ${M}^{-1}$ can be used by the RGMRES, CGS and TFQMR methods, such that $\stackrel{-}{A}={M}^{-1}A\sim {I}_{n}$ in (2), where ${I}_{n}$ is the identity matrix of order $n$; a right preconditioner ${M}^{-1}$ can be used by the Bi-CGSTAB$\left(\ell \right)$ method, such that $\stackrel{-}{A}=A{M}^{-1}\sim {I}_{n}$. These are formal definitions, used only in the design of the algorithms; in practice, only the means to compute the matrix-vector products $v=Au$ and $v={A}^{\mathrm{H}}u$ (the latter only being required when an estimate of ${\Vert A\Vert}_{1}$ or ${\Vert A\Vert}_{\infty}$ is computed internally), and to solve the preconditioning equations $Mv=u$ are required, that is, explicit information about $M$, or its inverse is not required at any stage.

Preconditioning matrices $M$ are typically based on incomplete factorizations (see Meijerink and Van der Vorst (1981)), or on the approximate inverses occurring in stationary iterative methods (see Young (1971)). A common example is the **incomplete $LU$ factorization**

where $L$ is lower triangular with unit diagonal elements, $D$ is diagonal, $U$ is upper triangular with unit diagonals, $P$ and $Q$ are permutation matrices, and $R$ is a remainder matrix. A **zero-fill** incomplete $LU$ factorization is one for which the matrix

has the same pattern of nonzero entries as $A$. This is obtained by discarding any **fill** elements (nonzero elements of $S$ arising during the factorization in locations where $A$ has zero elements). Allowing some of these fill elements to be kept rather than discarded generally increases the accuracy of the factorization at the expense of some loss of sparsity. For further details see Barrett et al. (1994).

$$M=PLDUQ=A-R$$ |

$$S=P\left(L+D+U\right)Q$$ |

Three of the best known iterative methods applicable to real symmetric linear systems are the conjugate gradient (CG) method (see Hestenes and Stiefel (1952) and Golub and Van Loan (1996)) and Lanczos type methods based on SYMMLQ and MINRES (see Paige and Saunders (1975)).

For the CG method the matrix $A$ should ideally be positive definite. The application of CG to indefinite matrices may lead to failure, or to lack of convergence. The SYMMLQ and MINRES methods are suitable for both positive definite and indefinite symmetric matrices. They are more robust than CG, but less efficient when $A$ is positive definite.

The methods start from the residual ${r}_{0}=b-A{x}_{0}$, where ${x}_{0}$ is an initial estimate for the solution (often ${x}_{0}=0$), and generate an orthogonal basis for the Krylov subspace $\mathrm{span}\left\{{A}^{k}{r}_{0}\right\}$, for $\mathit{k}=0,1,\dots $, by means of three-term recurrence relations (see Golub and Van Loan (1996)). A sequence of symmetric tridiagonal matrices $\left\{{T}_{k}\right\}$ is also generated. Here and in the following, the index $k$ denotes the iteration count. The resulting symmetric tridiagonal systems of equations are usually more easily solved than the original problem. A sequence of solution iterates $\left\{{x}_{k}\right\}$ is thus generated such that the sequence of the norms of the residuals $\left\{\Vert {r}_{k}\Vert \right\}$ converges to a required tolerance. Note that, in general, the convergence is not monotonic.

In exact arithmetic, after $n$ iterations, this process is equivalent to an orthogonal reduction of $A$ to symmetric tridiagonal form, ${T}_{n}={Q}^{\mathrm{T}}AQ$; the solution ${x}_{n}$ would thus achieve exact convergence. In finite-precision arithmetic, cancellation and round-off errors accumulate causing loss of orthogonality. These methods must therefore be viewed as genuinely iterative methods, able to converge to a solution **within a prescribed tolerance**.

The orthogonal basis is not formed explicitly in either method. The basic difference between the methods lies in the method of solution of the resulting symmetric tridiagonal systems of equations: the CG method is equivalent to carrying out an $LD{L}^{\mathrm{T}}$ (Cholesky) factorization whereas the Lanczos method (SYMMLQ) uses an $LQ$ factorization. The MINRES method on the other hand minimizes the residual into 2-norm.

A preconditioner for these methods must be **symmetric and positive definite**, i.e., representable by $M=E{E}^{\mathrm{T}}$, where $M$ is nonsingular, and such that $\stackrel{-}{A}={E}^{-1}A{E}^{-\mathrm{T}}\sim {I}_{n}$ in (2), where ${I}_{n}$ is the identity matrix of order $n$. These are formal definitions, used only in the design of the algorithms; in practice, only the means to compute the matrix-vector products $v=Au$ and to solve the preconditioning equations $Mv=u$ are required.

Preconditioning matrices $M$ are typically based on incomplete factorizations (see Meijerink and Van der Vorst (1977)), or on the approximate inverses occurring in stationary iterative methods (see Young (1971)). A common example is the **incomplete Cholesky factorization**

where $P$ is a permutation matrix, $L$ is lower triangular with unit diagonal elements, $D$ is diagonal and $R$ is a remainder matrix. A **zero-fill** incomplete Cholesky factorization is one for which the matrix

has the same pattern of nonzero entries as $A$. This is obtained by discarding any **fill** elements (nonzero elements of $S$ arising during the factorization in locations where $A$ has zero elements). Allowing some of these fill elements to be kept rather than discarded generally increases the accuracy of the factorization at the expense of some loss of sparsity. For further details see Barrett et al. (1994).

$$M=PLD{L}^{\mathrm{T}}{P}^{\mathrm{T}}=A-R$$ |

$$S=P\left(L+D+{L}^{\mathrm{T}}\right){P}^{\mathrm{T}}$$ |

The direct method functions available in this chapter largely follow the LAPACK scheme in that four different functions separately handle the tasks of factorizing, solving, refining and condition number estimating. See Section 3.4.

The iterative method functions available in this chapter divide essentially into two types: utility functions and Black Box functions.

At present there are suites of basic functions for real symmetric and nonsymmetric systems, and for complex non-Hermitian systems.

In general, it is not possible to recommend one of these methods (RGMRES, CGS, Bi-CGSTAB$\left(\ell \right)$, or TFQMR) in preference to another. RGMRES is popular, but requires the most storage, and can easily stagnate when the size $m$ of the orthogonal basis is too small, or the preconditioner is not good enough. CGS can be the fastest method, but the computed residuals can exhibit instability which may greatly affect the convergence and quality of the solution. Bi-CGSTAB$\left(\ell \right)$ seems robust and reliable, but it can be slower than the other methods. TFQMR can be viewed as a more robust variant of the CGS method: it shares the CGS method speed but avoids the CGS fluctuations in the residual, which may give, rise to instability. Some further discussion of the relative merits of these methods can be found in Barrett et al. (1994).

The utility functions provided for real nonsymmetric matrices use the coordinate storage (CS) format described in Section 2.1.1. nag_sparse_nsym_fac (f11dac) computes a preconditioning matrix based on incomplete $LU$ factorization. The amount of fill-in occurring in the incomplete factorization can be controlled by specifying either the level of fill, or the drop tolerance. Partial or complete pivoting may optionally be employed, and the factorization can be modified to preserve row-sums.

nag_sparse_nsym_sort (f11zac) orders the nonzero elements of a real sparse nonsymmetric matrix stored in general CS format.

The Black Box function nag_sparse_nsym_fac_sol (f11dcc) solves a real sparse nonsymmetric linear system, represented in CS format, using RGMRES, CGS, Bi-CGSTAB$\left(\ell \right)$, or TFQMR, with incomplete $LU$ preconditioning. nag_sparse_nsym_sol (f11dec) is similar, but has options for no preconditioning, Jacobi preconditioning or SSOR preconditioning.

The utility functions provided for real symmetric matrices use the symmetric coordinate storage (SCS) format described in Section 2.1.2. nag_sparse_sym_chol_fac (f11jac) computes a preconditioning matrix based on incomplete Cholesky factorization. The amount of fill-in occurring in the incomplete factorization can be controlled by specifying either the level of fill, or the drop tolerance. Diagonal Markowitz pivoting may optionally be employed, and the factorization can be modified to preserve row-sums.

nag_sparse_sym_sort (f11zbc) orders the nonzero elements of a real sparse symmetric matrix stored in general SCS format.

The Black Box function nag_sparse_sym_chol_sol (f11jcc) solves a real sparse symmetric linear system, represented in SCS format, using a conjugate gradient or Lanczos method, with incomplete Cholesky preconditioning. nag_sparse_sym_sol (f11jec) is similar, but has options for no preconditioning, Jacobi preconditioning or SSOR preconditioning.

The suite of functions nag_superlu_column_permutation (f11mdc), nag_superlu_lu_factorize (f11mec), nag_superlu_solve_lu (f11mfc), nag_superlu_condition_number_lu (f11mgc), nag_superlu_refine_lu (f11mhc), nag_superlu_matrix_product (f11mkc), nag_superlu_matrix_norm (f11mlc) and nag_superlu_diagnostic_lu (f11mmc) implement the COLAMD/SuperLU direct real sparse solver and associated utilities. You are expected to first call nag_superlu_column_permutation (f11mdc) to compute a suitable column permutation for the subsequent factorization by nag_superlu_lu_factorize (f11mec). nag_superlu_solve_lu (f11mfc) then solves the system of equations. A solution can be further refined by nag_superlu_refine_lu (f11mhc), which also minimizes the backward error and estimates a bound for the forward error in the solution. Diagnostics are provided by nag_superlu_condition_number_lu (f11mgc) which computes an estimate of the condition number of the matrix using the factorization output by nag_superlu_lu_factorize (f11mec), and nag_superlu_diagnostic_lu (f11mmc) which computes the reciprocal pivot growth (a numerical stability measure) of the factorization. The two utility functions, nag_superlu_matrix_product (f11mkc), which computes matrix-matrix products in the particular storage scheme demanded by the suite, and nag_superlu_matrix_norm (f11mlc) which computes quantities relating to norms of a matrix in that particular storage scheme, complete the suite.

Some other functions specifically designed for direct solution of sparse linear systems can currently be found in Chapters f01, f04 and f07. In particular, the following functions allow the direct solution of symmetric positive definite systems:

Variable band (skyline) | nag_real_cholesky_skyline (f01mcc) and nag_real_cholesky_skyline_solve (f04mcc) |

Functions for the solution of band and tridiagonal systems can be found in Chapters f04 and f07.

Do you have a real system and want to use a direct method? | _ yes |
f11mdc, f11mec and f11mfc | ||

no | |
||||

Symmetric positive definite? | _ yes |
Incomplete Cholesky preconditioner? | _ yes |
f11jac and f11jcc |

| | no | |
|||

| | f11jec | |||

no | |
||||

Incomplete $LU$ preconditioner? | _ yes |
f11dac and f11dcc | ||

no | |
||||

f11dec |

ℓ) or TFQMR solver function |

ℓ) or TFQMR solver function |

ℓ) or TFQMR solver |

LU preconditioning |

ℓ) or TFQMR solver |

LU preconditioning |

LU factorization, |

LU factorization |

LU factorization |

None.

Barrett R, Berry M, Chan T F, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C and Van der Vorst H (1994) *Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods* SIAM, Philadelphia

Demmel J W, Eisenstat S C, Gilbert J R, Li X S and Li J W H (1999) A supernodal approach to sparse partial pivoting *SIAM J. Matrix Anal. Appl. * **20** 720–755

Duff I S, Erisman A M and Reid J K (1986) *Direct Methods for Sparse Matrices* Oxford University Press, London

Freund R W (1993) A transpose-free quasi-minimal residual algorithm for non-Hermitian linear systems *SIAM J. Sci. Comput.* **14** 470–482

Freund R W and Nachtigal N (1991) QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems *Numer. Math.* **60** 315–339

Golub G H and Van Loan C F (1996) *Matrix Computations* (3rd Edition) Johns Hopkins University Press, Baltimore

Hestenes M and Stiefel E (1952) Methods of conjugate gradients for solving linear systems *J. Res. Nat. Bur. Stand.* **49** 409–436

Meijerink J and Van der Vorst H (1977) An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix *Math. Comput.* **31** 148–162

Meijerink J and Van der Vorst H (1981) Guidelines for the usage of incomplete decompositions in solving sets of linear equations as they occur in practical problems *J. Comput. Phys.* **44** 134–155

Paige C C and Saunders M A (1975) Solution of sparse indefinite systems of linear equations *SIAM J. Numer. Anal.* **12** 617–629

Saad Y and Schultz M (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems *SIAM J. Sci. Statist. Comput.* **7** 856–869

Sleijpen G L G and Fokkema D R (1993) BiCGSTAB$\left(\ell \right)$ for linear equations involving matrices with complex spectrum *ETNA* **1** 11–32

Sonneveld P (1989) CGS, a fast Lanczos-type solver for nonsymmetric linear systems *SIAM J. Sci. Statist. Comput.* **10** 36–52

Van der Vorst H (1989) Bi-CGSTAB, a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems *SIAM J. Sci. Statist. Comput.* **13** 631–644

Young D (1971) *Iterative Solution of Large Linear Systems* Academic Press, New York