In this section we derive Newton’s method and propose an inversion free method to solve Eq. (1 ).
Preliminaries
In this subsection provide some important notations, definitions and lemmas that will be exploited in our proofs.
The notation \(\rho (\bullet )\) stand for spectral radius; \(A^{T}\) and \(A^{*}\) denotes the transpose and conjugate transpose of matrix A, respectively; \(\Vert A\Vert _{F}=\sqrt{\mathrm {trace}(A^{T}A)}\) denotes the Frobenius norm of matrix A induced by the inner product; for \(A=[a_{ij}]\in \mathbb {C}^{m\times n}\) and \(B\in \mathbb {C}^{p\times q},\) then \(A\otimes B=[a_{ij}B]\in \mathbb {C}^{mp\times nq}\) denotes the Kronecker product of matrices A and B; \(\mathrm {vec}(A)=[a_{1},a_{2},\cdots , a_{n}]^{T}\) stands for the vec operator on matrix A, where \(a_{i}\) is the ith column of the matrix A.
Definition 1
[7, 8] Let \(f:\mathbb {C}^{n\times n}\mapsto \mathbb {C}^{n \times n}\) be a matrix function. The Fr\(\acute{\text {e}}\)chet derivative of matrix function f at A in the direction E is the unique linear operator \(L_{f}\) that maps E to \(L_ {f} (A, E)\) such that
$$\begin{aligned} f(A+E)-f(A)- L_{f}(A,E)=O(\Vert E\Vert ^{2}),~~ \text {for all}~~ A,E\in \mathbb {C}^{n\times n}. \end{aligned}$$
Definition 2
[9, 10] Fr\(\acute{\text {e}}\)chet derivative of a matrix function \(e^{X}\) at \(X_{0}\) in the direction Z is
$$\begin{aligned} L_{f}(X_{0},Z)=\int _{0}^{1} e^{tX_{0}}Ze^{(1-t)X_{0}} dt\approx e^{X_{0}/2}Z e^{X_{0}/2}. \end{aligned}$$
(2)
Definition 3
Let a matrix A be \(m\times m\) square matrix. A is a Z- matrix if all its off-diagonal elements are non-positive.
Definition 4
A matrix \(A\in \mathbb {R}^{n\times n}\) is an M-matrix if \(A=sI-B\) for some nonnegative B and s with \(s>\rho (B).\)
Lemma 1
[2] For a Z-matrix A the following are equivalent:
-
(i)
A is a nonsingular M-matrix.
-
(ii)
\(A^{-1}\) is nonnegative.
-
(iii)
\(Av>0~(\ge 0)\) for some vector \(v>0~(\ge 0).\)
-
(iv)
All eigenvalue of A have positive real parts.
Lemma 2
[17] For any symmetric matrix X it holds that
$$\begin{aligned} \mathrm {trace}\left[ \frac{1}{2}\left( Y+Y^{T}\right) ^{T}X\right] =\mathrm {trace}(Y^{T}X), \end{aligned}$$
(3)
where Y is any arbitrary \(n\times n\) real matrix.
Lemma 3
[18] Let \(A,B\in \mathbb {C}^{n\times n},\) then \(\left\| e^{A}-e^{B}\right\| \le \Vert A-B\Vert e^{\mathrm {max}(\Vert A\Vert ,~\Vert B\Vert )}.\)
Newton’s method for Eq. (1)
In this subsection, we derive Newton’s method for Eq. (1). Let define a map
$$\begin{aligned} F(X)=X-A^{*}e^{X}A-I=0. \end{aligned}$$
(4)
Before applying Newton’s method, we need to evaluate the Fr\(\acute{\text {e}}\)chet derivative of F(X). From (2) and (4), we have
$$\begin{aligned} \begin{aligned} F(X+Z)&=X+Z-\left[ A^{*}\left( e^{X+Z}-e^{X}\right) A+A^{*}e^{X}A\right] -I\\&=X+ A^{*}e^{X}A -I+ Z-\left[ A^{*}\left( e^{X+Z}-e^{X}\right) A\right] \\&=F(X)+ Z-A^{*}e^{X/2}Ze^{X/2}A+O(\Vert Z\Vert ^{2}). \end{aligned} \end{aligned}$$
(5)
We see that the Fr\(\acute{\text {e}}\)chet derivative is a linear operator, \(F_{X}^{'}(Z):\mathbb {C}^{n\times n}\rightarrow \mathbb {C}^{n\times n},\) defined by
$$\begin{aligned} F_{X}^{'}(Z)=Z-A^{*}e^{X/2}Ze^{X/2}A. \end{aligned}$$
(6)
Applying the vec operator in (6) we have
$$\begin{aligned} \mathrm {vec}(F_{X}^{'}(Z))=\mathcal {D}_{X}\mathrm {vec}(Z), \end{aligned}$$
(7)
where \(\mathcal {D}_{X}=I_{n^{2}}-\left( e^{X/2}A\right) ^{T}\otimes \left( A^{*}e^{X/2}\right)\) is the Kronecker Fr\(\acute{\text {e}}\)chet derivative of F(X).
Lemma 4
Suppose that \(0\le \left( e^{X/2}A\right) ^{T}\otimes \left( A^{*}e^{X/2}\right) <I_{n^{2}}.\) Then,
$$\begin{aligned} I_{n^{2}}-\left( e^{X/2}A\right) ^{T}\otimes \left( A^{*}e^{X/2}\right) \quad \text {is a nonsingular} ~ M\text {-matrix.} \end{aligned}$$
Proof
The proof is straight forward from Definitions 3, 4 and Lemma 1. Thus it is omitted here. \(\square\)
Since \(I_{n^{2}}-\left( e^{X/2}A\right) ^{T}\otimes \left( A^{*}e^{X/2}\right)\) is invertible under assumptions made in Lemma 4. Then, Newton’s step is computed in the iteration
$$\begin{aligned} Z-A^{*}e^{X/2}Ze^{X/2}A=-F(X) \end{aligned}$$
(8)
and the solution of (1) is given by the Newton’s iteration
$$\begin{aligned} X_{i+1}=X_{i}-\left[ F'_{X_{i}}\right] ^{-1}F(X_{i})\qquad \text {for all}\quad i=0,1,2\cdots . \end{aligned}$$
(9)
The analysis lead to Algorithm 1.
Remark 1
Newton’s method for (1) is not applicable if the Kronecker Fr\(\acute{\text {e}}\)chet derivative \(F'_{X}\) in step 3 of Algorithm 1 is singular. Also, Algorithm 1 does not ensure the existence of the symmetric solution. Moreover, when the size of the coefficient matrix A in Eq. (1) is large, Algorithm 1 consume more computer time and memory. To overcome these complications and drawbacks, we extend the idea of conjugate gradient method to Algorithm 2 which works even if the Kronecker Fr\(\acute{\text {e}}\)chet derivative \(F'_{X}\) is singular and ensures the existence of the symmetric solution of (1).
Consider the linear algebraic system
$$\begin{aligned} { Ax=b,} \end{aligned}$$
(10)
where A isa real square matrix, b is a vector of scalar real numbers and x is a unknown vector. For solving system (10), we have the following conjugate gradient method.
Conjugate gradient algorithm [27]
-
(i)
Choose \(x_i\) from a set of real numbers and set \(r_0=b-Ax_0,\alpha _0=\Vert r_0\Vert ^2, d_0=r_0\);
-
(ii)
for \(i=0, 1, \cdots\) until convergence do:
-
(iii)
\(s_i=Ad_i\);
-
(iv)
\(t_i=\alpha _i/(d_i^T s_i);~ x_{i+1}=x_i +t_id_i;~ r_{i+1}=r_i-t_is_i;~\beta _{i+1}=\Vert r_{i+1}\Vert ^2/\Vert r_i\Vert ^2; ~d_{i+1}=r_{i+1}+\beta _{i+1}d_i\);
-
(v)
end for.
Generally, the conjugate gradient method is not desirable for solving the non-square system \(Bx=c\), where matrix B is non-square. This motivates us to explore new iterative methods like the conjugate gradient algorithm which can be represented as
$$\begin{aligned} x_{i+1}=x_i+t_id_i, \end{aligned}$$
(11)
where parameter \(t_i\) and vector \(d_i\) are to be obtained. It is clear that (11) cannot be implemented directly to solve Newton’s step Z in its present form. Thus, the conjugate gradient method is refined and extended to solve symmetric Newton’s step Z. The details of algorithm are presented as follows.
Remark 2
In Algorithm 2, the sequence of matrices \(\mathcal {Q}_{k}\) and \(Z_{pk}\) are symmetric for all \(k=0,1,\cdots .\)
We have the following results from Algorithm 2.
Lemma 5
Let \(Z_{p}\) be a symmetric solution of pth Newton’s iteration (8), and the sequences \(\left\{ \mathcal {M}_{k}\right\} ,\) \(\left\{ R_{k}\right\} ,\) \(\left\{ Z_{pk}\right\}\) be generated by Algorithm 2. Then,
$$\begin{aligned} \mathrm {trace}\left[ \mathcal {M}_{k}^{T}\left( Z_{p}-Z_{pk}\right) \right] =\left\| R_{k}\right\| ^{2}, \quad \text {for all}\quad k=0,1,\cdots . \end{aligned}$$
Proof
From Algorithm 2, we have
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ \mathcal {M}_{k}^{T}\left( Z_{p}-Z_{pk}\right) \right]&=\mathrm {trace}\left\{ \left[ R_{k}-\left( A^{*}e^{X_{p}/2}\right) ^{T}R_{k}\left( e^{X_{p}/2}A\right) ^{T}\right] ^{T} \left( Z_{p}-Z_{pk}\right) \right\} \\&=\mathrm {trace}\left\{ R_{k}^{T}\left[ Z_{p}-Z_{pk}-\left( A^{*}e^{X_{p}/2}\right) \left( Z_{p}-Z_{pk}\right) \left( e^{X_{p}/2}A\right) \right] \right\} \\&=\mathrm {trace}\left\{ R_{k}^{T}\left[ -F(X)-\left[ Z_{pk}-\left( A^{*}e^{X_{p}/2}\right) Z_{pk}\left( e^{X_{p}/2}A\right) \right] \right] \right\} \\&=\mathrm {trace}\left\{ R_{k}^{T}R_{k}\right\} =\left\| R_{k}\right\| ^{2}. \end{aligned} \end{aligned}$$
(12)
Hence the proof is completed. \(\square\)
Lemma 6
Suppose that \(Z_{p}\) is a symmetric solution of pth Newton’s iteration (8) and the sequences \(R_{k},~\mathcal {Q}_{k}\) are generated by Algorithm 2. Then, it holds that \(\mathrm {trace}\left[ \mathcal {Q}_{k}^{T}\left( Z_{p}-Z_{pk}\right) \right] =\left\| R_{k}\right\| ^{2},~ \text {for all}\quad k=0,1,\cdots ;~ \mathrm {trace}(R_{k}^{T}R_{j}) =0~ \text {and}~~ \mathrm {trace}(\mathcal {Q}_{k}^{T}\mathcal {Q}_{j}) =0,~~ \text {for} ~~ k>j=0,1,\cdots ,l,\quad l\ge 1.\)
Proof
We prove via mathematical induction. For \(k=0,\) it follows from Algorithm 2, Lemma 2 and Lemma 5 that
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ \mathcal {Q}_{0}^{T}\left( Z_{p}-Z_{p0}\right) \right]&=\mathrm {trace}\left[ \frac{1}{2}\left( \mathcal {M}_{0}+\mathcal {M}_{0}^{T}\right) ^{T} \left( Z_{p}-Z_{p0}\right) \right] \\&=\mathrm {trace}\left[ \mathcal {M}_{0}^{T} \left( Z_{p}-Z_{p0}\right) \right] \\&=\left\| R_{0}\right\| ^{2}.\\ \end{aligned} \end{aligned}$$
(13)
Now assume that \(\mathrm {trace}\left[ \mathcal {Q}_{k}^{T}\left( Z_{p}-Z_{pk}\right) \right] =\left\| R_{k}\right\| ^{2}, \quad \text {for all}\quad k=0,1,\cdots\) hold true for \(k=h\in \mathbb {N},\) we need to show that the statement it also holds for \(k=h+1\in \mathbb {N}\). From Algorithm 2, Lemma 2 and Lemma 5, we have
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ \mathcal {Q}_{h+1}^{T}\left( Z_{p}-Z_{ph+1}\right) \right]&=\mathrm {trace}\left\{ \left[ \frac{1}{2}\left( \mathcal {M}_{h+1}+\mathcal {M}_{h+1}^{T}\right) ^{T} +\beta _{h}\mathcal {Q}_{h} \right] ^{T}\left( Z_{p}-Z_{ph+1}\right) \right\} \\&=\mathrm {trace}\left[ \mathcal {M}_{h+1}^{T}\left( Z_{p}-Z_{ph+1}\right) \right] +\beta _{h}\mathrm {trace}\left[ \mathcal {Q}_{h}^{T}\left( Z_{p}-Z_{ph+1}\right) \right] \\&=\left\| R_{h+1}\right\| ^{2}+\beta _{h}\mathrm {trace}\left[ \mathcal {Q}_{h}^{T}\left( Z_{p}-Z_{ph}-\alpha _{h}\mathcal {Q}_{h}\right) \right] \\&=\left\| R_{h+1}\right\| ^{2}+\beta _{h}\mathrm {trace}\left[ \mathcal {Q}_{h}^{T}\left( Z_{p}-Z_{ph}\right) \right] -\beta _{h}\alpha _{h}\left\| \mathcal {Q}_{h}\right\| ^{2}\\&=\left\| R_{h+1}\right\| ^{2}+\beta _{h}\left\| R_{h}\right\| ^{2}-\beta _{h}\left\| R_{h}\right\| ^{2}\\&=\left\| R_{h+1}\right\| ^{2}+\left\| R_{h+1}\right\| ^{2}-\left\| R_{h+1}\right\| ^{2}=\left\| R_{h+1}\right\| ^{2}.\\ \end{aligned} \end{aligned}$$
(14)
As requred, the lemma is proved.
Similarly, we prove that \(\mathrm {trace}(R_{k}^{T}R_{j}) =0~~ \text {and}~~ \mathrm {trace}(\mathcal {Q}_{k}^{T}\mathcal {Q}_{j}) =0,~~ \text {for} ~~ k>j=0,1,\cdots ,l,\quad l\ge 1\) via mathematical induction.
Step 1: For \(l=1,\) it follows that
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ R_{1}^{T}R_{0} \right]&=\mathrm {trace}\left\{ \left[ -F(X_{p})-\left[ Z_{p1}-A^{*}e^{X_{p}/2}Z_{p1}e^{X_{p}/2}A\right] \right] ^{T}R_{0}\right\} \\&=\mathrm {trace}\left\{ \left[ -F(X_{p})-\left[ Z_{0}-A^{*}e^{X_{p}/2}Z_{0}e^{X_{p}/2}A\right. \right. \right. \\&+ \left. \left. \left. \alpha _{0}(\mathcal {Q}_{0}-A^{*}e^{X_{p}/2}\mathcal {Q}_{0}e^{X_{p}/2}A)\right] \right] ^{T}R_{0}\right\} \\&=\mathrm {trace}\left\{ \left[ R_{0}- \alpha _{0}\left( \mathcal {Q}_{0}-A^{*}e^{X_{p}/2}\mathcal {Q}_{0}e^{X_{p}/2}A\right) \right] ^{T}R_{0}\right\} \\&=\left\| R_{0}\right\| ^{2}-\mathrm {trace}\left\{ \alpha _{0}\left( \mathcal {Q}_{0}^{T}\left[ R_{0}-\left( A^{*}e^{X_{p}/2}\right) ^{T}R_{0}\left( e^{X_{p}/2}A\right) ^{T}\right] \right) \right\} \\&\quad =\left\| R_{0}\right\| ^{2}-\alpha _{0}\mathrm {trace}\left[ \mathcal {Q}_{0}^{T}\mathcal {M}_{0}\right] \\&=\left\| R_{0}\right\| ^{2}-\alpha _{0}\mathrm {trace}\left[ \mathcal {Q}_{0}^{T}\frac{1}{2} \left( \mathcal {M}_{0}+\mathcal {M}_{0}^{T}\right) \right] \\&=\left\| R_{0}\right\| ^{2}-\alpha _{0}\mathrm {trace}\left[ \mathcal {Q}_{0}^{T}\mathcal {Q}_{0}\right] =0,\\ \end{aligned} \end{aligned}$$
(15)
and
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ \mathcal {Q}_{1}^{T}\mathcal {Q}_{0} \right]&=\mathrm {trace}\left[ \left[ \frac{1}{2}\left( \mathcal {M}_{1}+\mathcal {M}_{1}^{T}\right) +\beta _{0}\mathcal {Q}_{0}\right] ^{T} \mathcal {Q}_{0}\right] \\&=\mathrm {trace}\left( \mathcal {M}_{1}^{T}\mathcal {Q}_{0}\right) +\beta _{0}\mathrm {trace}\left( \mathcal {Q}_{0}^{T}\mathcal {Q}_{0}\right) \\&=\mathrm {trace}\left[ \left[ R_{1}-\left( A^{*}e^{X_{p}/2}\right) ^{T}R_{1}\left( e^{X_{p}/2}A\right) ^{T}\right] ^{T}\mathcal {Q}_{0}\right] +\beta _{0}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=\mathrm {trace}\left[ R_{1}^{T}\left[ \mathcal {Q}_{0}-\left( A^{*}e^{X_{p}/2}\right) \mathcal {Q}_{0}\left( e^{X_{p}/2}A\right) \right] \right] +\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=\mathrm {trace}\left[ R_{1}^{T}\left[ \frac{1}{\alpha _{0}}(Z_{p1}-Z_{p0})-\frac{1}{\alpha _{0}}\left( A^{*}e^{X_{p}/2}\right) (Z_{p1}-Z_{p0})\left( e^{X_{p}/2}A\right) \right] \right] \\&+\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=\frac{1}{\alpha _{0}}\mathrm {trace}\left[ R_{1}^{T}\left[ (Z_{p1}-Z_{p0})-\left( A^{*}e^{X_{p}/2}\right) (Z_{p1}-Z_{p0})\left( e^{X_{p}/2}A\right) \right] \right] \\&+\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=\frac{1}{\alpha _{0}}\mathrm {trace}\left[ R_{1}^{T}(R_{0}-R_{1})\right] +\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=\frac{1}{\alpha _{0}}\left( \mathrm {trace}\left[ R_{1}^{T}R_{0} \right] -\mathrm {trace}\left[ R_{1}^{T}R_{1}\right] \right) +\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=-\frac{1}{\alpha _{0}}\mathrm {trace}\left[ R_{1}^{T}R_{1} \right] +\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}\\&=-\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}+\frac{\Vert R_{1}\Vert ^{2}}{\Vert R_{0}\Vert ^{2}}\left\| \mathcal {Q}_{0}\right\| ^{2}=0. \end{aligned} \end{aligned}$$
(16)
Now, assume that \(\mathrm {trace}(R_{k}^{T}R_{j}) =0~ \text {and}~~ \mathrm {trace}(\mathcal {Q}_{k}^{T}\mathcal {Q}_{j}) =0,~~ \text {for} ~~ k>j=0,1,\cdots ,l,\quad l\ge 1\) holds for \(l=s\in \mathbb {N}.\) We show that it holds for \(l=s+1\in \mathbb {N}.\) From Algorithm 2, we have
$$\begin{aligned}&\mathrm {trace}\left[ R_{s+1}^{T}R_{s} \right] \\&=\mathrm {trace}\left[ \left[ R_{s}-\alpha _{s}\left( \mathcal {Q}_{s}-A^{*}e^{X_{p}/2}\mathcal {Q}_{s}e^{X_{p}/2}A\right) \right] ^{T}R_{s} \right] \\&=\mathrm {trace}\left[ R_{s}^{T}R_{s} \right] -\alpha _{s}\mathrm {trace}\left[ \left[ \left( \mathcal {Q}_{s}-A^{*}e^{X_{p}/2}\mathcal {Q}_{s}e^{X_{p}/2}A\right) \right] ^{T}R_{s} \right] \\&=\Vert R_{s}\Vert ^{2}-\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T} \left( R_{s}-(A^{*}e^{X_{p}/2})^{T}R_{s}(e^{X_{p}/2}A)^{T}\right) \right] \\&=\Vert R_{s}\Vert ^{2}-\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T} \mathcal {M}_{s} \right] \\&=\Vert R_{s}\Vert ^{2}-\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T}\frac{1}{2}( \mathcal {M}_{s}+ \mathcal {M}_{s}^{T})\right] \\&=\Vert R_{s}\Vert ^{2}-\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T}( \mathcal {Q}_{s}- \beta _{s-1}\mathcal {Q}_{s-1})\right] \\&=\Vert R_{s}\Vert ^{2}-\alpha _{s}\Vert \mathcal {Q}_{s}\Vert ^{2}+\alpha _{s}\beta _{s-1}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T}\mathcal {Q}_{s-1})\right] \\&=\Vert R_{s}\Vert ^{2}-\Vert R_{s}\Vert ^{2}+0=0.\\ \end{aligned}$$
(17)
Similarly, we have
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ \mathcal {Q}_{s+1}^{T}\mathcal {Q}_{s} \right]&=\mathrm {trace}\left[ \left[ \frac{1}{2}\left( \mathcal {M}_{s+1}+\mathcal {M}_{s+1}^{T}\right) +\beta _{s}\mathcal {Q}_{s}\right] ^{T}\mathcal {Q}_{s} \right] \\&=\mathrm {trace}\left[ \mathcal {M}_{s+1}^{T}\mathcal {Q}_{s} \right] +\beta _{s}\Vert \mathcal {Q}_{s}\Vert ^{2}\\&=\mathrm {trace}\left[ \left[ R_{s+1}-\left( A^{*}e^{X_{p}/2}\right) ^{T}R_{s+1}\left( e^{X_{p}/2}A\right) ^{T}\right] ^{T}\mathcal {Q}_{s} \right] +\beta _{s}\Vert \mathcal {Q}_{s}\Vert ^{2}\\&=\mathrm {trace}\left[ R_{s+1}^{T} \left[ \mathcal {Q}_{s}-\left( A^{*}e^{X_{p}/2}\right) \mathcal {Q}_{s}\left( e^{X_{p}/2}A\right) \right] \right] +\beta _{s}\Vert \mathcal {Q}_{s}\Vert ^{2}\\&=\mathrm {trace}\left[ R_{s+1}^{T} \frac{1}{\alpha _{s}} (R_{s}-R_{s+1})\right] +\beta _{s}\Vert \mathcal {Q}_{s}\Vert ^{2}\\&= -\frac{1}{\alpha _{s}} \Vert R_{s+1}\Vert ^{2}+\beta _{s}\Vert \mathcal {Q}_{s}\Vert ^{2}\\&= -\frac{\Vert \mathcal {Q}_{s}\Vert ^{2}}{\Vert R_{s}\Vert ^{2}} \Vert R_{s+1}\Vert ^{2}+\frac{\Vert R_{s+1}\Vert ^{2}}{\Vert R_{s}\Vert ^{2}}\Vert \mathcal {Q}_{s}\Vert ^{2}=0.\\ \end{aligned} \end{aligned}$$
(18)
Thus, we have seen that \(\mathrm {trace}\left[ R_{k}^{T}R_{k-1} \right] =0\) and \(\mathrm {trace}\left[ \mathcal {Q}_{k}^{T}\mathcal {Q}_{k-1} \right] =0,\) for all \(k=0,1,\cdots , l.\)
Step2: We assume that \(\mathrm {trace}\left[ R_{s}^{T}R_{j} \right] =0\) and \(\mathrm {trace}\left[ \mathcal {Q}_{s}^{T}\mathcal {Q}_{j} \right] =0,\) for all \(j=0,1,\cdots , l-1.\) By Algorithm 2 and Lemma 2, together with the assumptions made, it follows that
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ R_{s+1}^{T}R_{j} \right]&=\mathrm {trace}\left[ \left[ R_{s}-\alpha _{s}\left( \mathcal {Q}_{s}-A^{*}e^{X_{p}/2}\mathcal {Q}_{s}e^{X_{p}/2}A\right) \right] ^{T}R_{j} \right] \\&=\mathrm {trace}\left[ R_{s}^{T}R_{j} \right] -\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T} \left( R_{j}-(A^{*}e^{X_{p}/2})^{T}R_{j}(e^{X_{p}/2}A)^{T}\right) \right] \\&=\mathrm {trace}\left[ R_{s}^{T}R_{j} \right] -\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T} \mathcal {M}_{j} \right] \\&=0-\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T} \frac{1}{2}(\mathcal {M}_{j}+\mathcal {M}_{j}^{T}) \right] \\&=-\alpha _{s}\mathrm {trace}\left[ \mathcal {Q}_{s}^{T} (\mathcal {Q}_{j}-\beta _{j-1}\mathcal {Q}_{j-1}) \right] =0.\\ \end{aligned} \end{aligned}$$
(19)
Finally, we prove that \(\mathrm {trace}\left[ \mathcal {Q}_{s+1}^{T}\mathcal {Q}_{j} \right] =0.\)
$$\begin{aligned} \begin{aligned} \mathrm {trace}\left[ \mathcal {Q}_{s+1}^{T}\mathcal {Q}_{j} \right]&=\mathrm {trace}\left[ \left[ \frac{1}{2}\left( \mathcal {M}_{s+1}+\mathcal {M}_{s+1}^{T}\right) +\beta _{s}\mathcal {Q}_{s}\right] ^{T}\mathcal {Q}_{j} \right] \\&=\mathrm {trace}\left[ \mathcal {M}_{s+1}^{T}\mathcal {Q}_{j} \right] \\&=\mathrm {trace}\left[ \left[ R_{s+1}-\left( A^{*}e^{X_{p}/2}\right) ^{T}R_{s+1}\left( e^{X_{p}/2}A\right) ^{T}\right] ^{T}\mathcal {Q}_{j} \right] \\&=\mathrm {trace}\left[ R_{s+1}^{T} \left[ \mathcal {Q}_{j}-\left( A^{*}e^{X_{p}/2}\right) \mathcal {Q}_{j}\left( e^{X_{p}/2}A\right) \right] \right] \\&=\mathrm {trace}\left[ R_{s+1}^{T} \frac{1}{\alpha _{j}} (R_{j}-R_{j+1})\right] \\&= \frac{1}{\alpha _{j}}\mathrm {trace}\left[ R_{s+1}^{T} R_{j}\right] - \frac{1}{\alpha _{j}}\mathrm {trace}\left[ R_{s+1}^{T} R_{j+1}\right] =0,\\ \end{aligned} \end{aligned}$$
(20)
for all \(j=0,1,\cdots ,s-1.\) The proof is completed. \(\square\)
From Lemma 6, we see that if \(k>0,\) and \(R_{i}\ne 0,\) for all \(i=0,1,\cdots ,k.\) Then, the sequences \(R_{i},~R_{j}\) generated by Algorithm 2 are orthogonal for all \(j\ne i.\) We give the following remark for for later use.
Remark 3
From Lemma 6, for the Newton’s iteration (8) to have a symmetric solution, then the sequences \(\left\{ R_{k}\right\}\) and \(\left\{ \mathcal {Q}_{k}\right\}\) generated by Algorithm 2 should be nonzero.
If there exist a positive number \(k>0\) such that \(R_{i} \ne 0\) for all \(i=0,1,\cdots ,k\) in Algorithm 2, then, the matrices \(R_{i}\) and \(R_{j}\) are orthogonal for all \(i\ne j.\)
Theorem 4
Assume that the pth Newton’s iteration (8) has a symmetric solution. Then, for any symmetric initial guess \(Z_{p0},\) its symmetric solution can be obtained with finite iterative steps.
Proof
From Lemma 6, suppose that \(R_{k}\ne 0\) for \(k=0,1,\cdots ,n^{2}-1.\) Since the pth Newton’s iteration (8) has a symmetric solution, then from Remark 3, it is certain that there exist a positive integer k such that \(\mathcal {Q}_{k}\ne 0.\) Thus, we can compute \(Z_{pn^{2}}\) and \(R _{n^{2}}\) by Algorithm 2. Also, from Lemma 6, we know that \(\mathrm {trace}(R_{n^{2}}^{T}R_{k})=0\) for all \(k=0,1,\cdots ,n^{2}-1\) and \(\mathrm {trace}(R_{i}^{T}R_{k})=0\) for all \(i,j,=0,1,\cdots ,n^{2}-1\) with \(i\ne j.\) We see that the set of matrices \(R_{0},R_{1},\cdots ,R_{n^{2}-1}\) forms an orthogonal basis of the matrix space \(\mathbb {R}^{n\times n}.\) But we know that \(\mathrm {trace}(R_{n^{2}}^{T}R_{k})=0\) holds true if \(R_{n^{2}}=0,\) this implies that \(Z_{p{n^{2}}}\) is the solution of the pth Newton’s iteration(8). \(\square\)
Now, we prove the convergence of Algorithm 1 to symmetric solution.
Theorem 5
Assume that (1) has a symmetric solution and each Newton’s iteration is consistent for symmetric initial guess \(X_{0}.\) The sequence \(\left\{ X_{k}\right\}\) is generated by Algorithm 1 with \(X_{0}\) such that \(\lim _{k\rightarrow \infty } X_{k}=X_{*},\) and the matrix \(X_{*}\) satisfies \(F(X_{*})=0,\) then, \(X_{*}\) is a symmetric solution of (1).
Proof
Since all Newton’s iteration have symmetric solution, from Theorem 4 and Newton’s method we can obtain the sequence \(\{X_{k}\}\) which is the set of symmetric matrices. Furthermore, the Newton’s sequence converges to a solution \(X_{*}\) which is a symmetric solution of (1). \(\square\)
Perturbation and error bound estimate for the approximate symmetric positive definite solution of Eq. (1)
In this subsection, we investigate a perturbation and error estimates for the approximate symmetric positive definite solution of the nonlinear matrix Eq. (1). We will use a fixed point method to find the approximate symmetric solution.
Lemma 7
Suppose A is a nonsingular matrix with \(\rho (A)\le 1/e\) and X is the symmetric positive definite solution of (1). Then, \(\Vert A\Vert ^{2}\Vert e^{X}\Vert \le 1.\)
Proof
Let define a map \(G(X)=I+A^{*}e^{X}A.\) G(X) has a fixed point in [I, 2I]( see [16]). Thus, from the assumption that \(\rho (A)\le 1/e\), \(X\le 2I\) and \(G(X)=I+A^{*}e^{X}A,\) it follows that
$$\begin{aligned} I\le I+A^{*}e^{X}A\le \left( 1+\Vert A\Vert ^{2}e^{\Vert X\Vert }\right) I=2I. \end{aligned}$$
\(\square\)
Theorem 6
Suppose that \(X^{\mathrm {sol.}}\) is the symmetric positive definite solution of (1) such that \(\displaystyle {\Vert A\Vert ^{2}\left\| e^{\widetilde{X^{\mathrm {sol.}}}}\right\| \le 1}\) and \(\displaystyle {\frac{1}{\left\| X^{\mathrm {sol.}}\right\| }\le 1}.\) Then,
$$\begin{aligned} \frac{\left\| \triangle X^{\mathrm {sol.}}\right\| }{\left\| X^{\mathrm {sol.}}\right\| }\le \frac{1}{\theta }\left( \frac{\Vert \triangle I\Vert }{\Vert I\Vert }+\frac{2\Vert \triangle A\Vert }{\Vert A\Vert }\right) , \end{aligned}$$
(21)
where
$$\begin{aligned} \theta = 1-\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }>0. \end{aligned}$$
Proof
Consider the equations
$$\begin{aligned} X^{\mathrm {sol.}}-A^{*} e^{X^{\mathrm {sol.}}}A=I\ \end{aligned}$$
(22)
and
$$\begin{aligned} \widetilde{ X^{\mathrm {sol.}}}-\widetilde{A^{*}}\widetilde{ e^{X^{\mathrm {sol.}}}}{\widetilde{A}}={\widetilde{I}}. \end{aligned}$$
(23)
Let \(\triangle A={\widetilde{A}}-A,\) \(\triangle X^{\mathrm {sol.}}=\widetilde{ X^{\mathrm {sol.}}}- X^{\mathrm {sol.}},\) and \(\triangle I={\widetilde{I}}-I.\) Then, we have
$$\begin{aligned} \begin{aligned} \triangle I&={\widetilde{I}}-I\\&=\widetilde{ X^{\mathrm {sol.}}}-\widetilde{A^{*}}\widetilde{ e^{X^{\mathrm {sol.}}}}{\widetilde{A}}-\left( X^{\mathrm {sol.}}-A^{*} e^{X^{\mathrm {sol.}}}A\right) \\&=\triangle X^{\mathrm {sol.}}-\widetilde{A^{*}}\widetilde{ e^{X_{*}}}{\widetilde{A}}+A^{*} e^{X^{\mathrm {sol.}}}A\\&=\triangle X^{\mathrm {sol.}}-(A+\triangle A)^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}(A+\triangle A)+A^{*} e^{X^{\mathrm {sol.}}}A\\&=\triangle X^{\mathrm {sol.}}-A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}A-A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}\triangle A-\triangle A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}} A\\&-\triangle A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}\triangle A+A^{*} e^{X^{\mathrm {sol.}}}A\\&=\triangle X^{\mathrm {sol.}}-A^{*}\left( \widetilde{ e^{X^{\mathrm {sol.}}}}- e^{X^{\mathrm {sol.}}}\right) A-A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}\triangle A-\triangle A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}} A.\\ \end{aligned} \end{aligned}$$
(24)
Since both \(\triangle A^{*} \rightarrow 0\) and \(\triangle A\rightarrow 0\) in (24), then the term \(\triangle A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}\triangle A\) is neglected.
For convenience, let \(N=A^{*}\left( \widetilde{ e^{X^{\mathrm {sol.}}}}- e^{X^{\mathrm {sol.}}}\right) A\) and \(H=A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}\triangle A-\triangle A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}} A,\) we have,
$$\begin{aligned} \Vert \triangle I\Vert \ge \Vert \triangle X^{\mathrm {sol.}}\Vert -\Vert N\Vert -\Vert H\Vert . \end{aligned}$$
(25)
It follows that
$$\begin{aligned} \begin{aligned} \Vert N\Vert&=\left\| A^{*}\left( \widetilde{ e^{X^{\mathrm {sol.}}}}- e^{X^{\mathrm {sol.}}}\right) A\right\| \\&\le \Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }\left\| \triangle X^{\mathrm {sol.}}\right\| \\ \end{aligned} \end{aligned}$$
(26)
and
$$\begin{aligned} \begin{aligned} \Vert H\Vert&\le \Vert A^{*}\Vert \left\| \widetilde{ e^{X^{\mathrm {sol.}}}}\right\| \Vert \triangle A\Vert +\Vert \triangle A^{*}\Vert \left\| \widetilde{ e^{X^{\mathrm {sol.}}}} \right\| \Vert A\Vert \\&= \Vert A\Vert \left( \left\| \widetilde{ e^{X^{\mathrm {sol.}}}}\right\| +\left\| \widetilde{ e^{X^{\mathrm {sol.}}}} \right\| \right) \Vert \triangle A\Vert \\&= 2\Vert A\Vert \Vert \triangle A\Vert \left\| \widetilde{ e^{X^{\mathrm {sol.}}}}\right\| .\\ \end{aligned} \end{aligned}$$
(27)
Now, from (25) we have,
$$\begin{aligned}&\Vert \triangle I\Vert \ge \left\| \triangle X^{\mathrm {sol.}}\right\| -\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }\Vert \triangle X^{\mathrm {sol.}}\Vert -2\Vert A\Vert \Vert \triangle A\Vert \left\| \widetilde{ e^{X^{\mathrm {sol.}}}}\right\| \end{aligned}$$
(28)
$$\begin{aligned}&=\Vert \triangle X^{\mathrm {sol.}}\Vert \left( 1-\Vert A\Vert ^{2}e^{\mathrm {max}\left( \Vert X^{\mathrm {sol.}}\Vert ,~\Vert \widetilde{X^{\mathrm {sol.}}}\Vert \right) }\right) -2\Vert A\Vert \Vert \triangle A\Vert \Vert \widetilde{ e^{X^{\mathrm {sol.}}}}\Vert \end{aligned}$$
(29)
$$\begin{aligned}&\Vert \triangle X^{\mathrm {sol.}}\Vert \le \frac{1}{1-\Vert A\Vert ^{2}e^{\mathrm {max}(\Vert X^{\mathrm {sol.}}\Vert ,~\Vert \widetilde{X^{\mathrm {sol.}}}\Vert )}}(\Vert \triangle I\Vert +2\Vert A\Vert \Vert \triangle A\Vert \Vert \widetilde{ e^{X^{\mathrm {sol.}}}}\Vert ) \end{aligned}$$
(30)
$$\begin{aligned}&\frac{\left\| \triangle X^{\mathrm {sol.}}\right\| }{\left\| X^{\mathrm {sol.}}\right\| }\nonumber \\&\quad \le \frac{1}{1-\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }}\left( \frac{\Vert \triangle I\Vert }{\Vert I\Vert }\frac{\Vert I\Vert }{\Vert X^{\mathrm {sol.}}\Vert }+\frac{2\Vert \triangle A\Vert \left\| \widetilde{ e^{X^{\mathrm {sol.}}}}\right\| }{\Vert A\Vert }\frac{\Vert A\Vert ^{2}}{\left\| X^{\mathrm {sol.}}\right\| }\right) \end{aligned}$$
(31)
It follows from \(\Vert A\Vert ^{2}\le \frac{\Vert I\Vert }{\left\| \widetilde{e^{X^{\mathrm {sol.}}}}\right\| }\) and \(\frac{1}{\left\| X^{\mathrm {sol.}}\right\| }\le 1\) that
$$\begin{aligned} \frac{\Vert \triangle X_{*}\Vert }{\Vert X^{\mathrm {sol.}}\Vert }\le \frac{1}{\theta }\left( \frac{\Vert \triangle I\Vert }{\Vert I\Vert }+\frac{2\Vert \triangle A\Vert }{\Vert A\Vert }\right) , \end{aligned}$$
(32)
where
$$\begin{aligned} \theta = 1-\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }>0. \end{aligned}$$
Which completes the proof. \(\square\)
In Theorem 7, we derive the error estimate for \(\widetilde{X^{\mathrm {sol.}}}.\)
Theorem 7
Let \(\widetilde{X^{\mathrm {sol.}}}\) approximate the symmetric positive definite solution of (1) such that the residual \(R\left( \widetilde{X^{\mathrm {sol.}}}\right) =\widetilde{ X^{\mathrm {sol.}}}-A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}A-I.\) Then,
$$\begin{aligned} \left\| R\left( {\widetilde{X^{\mathrm {sol.}}}}\right) \right\| \le \theta _{1}\left\| \widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}}\right\| , \quad \text { where} \quad \theta _{1}= 1+\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }. \end{aligned}$$
Proof
Suppose that \(\widetilde{X^{\mathrm {sol.}}}\) approximate the symmetric positive definite solution of (1), it follows that
$$\begin{aligned} \begin{aligned} R\left( \widetilde{X^{\mathrm {sol.}}}\right)&=\widetilde{ X^{\mathrm {sol.}}}-A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}A-I\\&=\widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}} -A^{*}\widetilde{ e^{X^{\mathrm {sol.}}}}A+A^{*} e^{X^{\mathrm {sol.}}}A\\&=\left( \widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}}\right) -A^{*}\left( \widetilde{ e^{X^{\mathrm {sol.}}}}-e^{X^{\mathrm {sol.}}}\right) A\\&=\left( \widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}}\right) -A^{*}\left( \int _{0}^{1}e^{(1-s)X^{\mathrm {sol.}}}\left( \widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}}\right) e^{\widetilde{sX^{\mathrm {sol.}}}}ds\right) A,\\ \end{aligned} \end{aligned}$$
(33)
by Lemma 3. From (33) we see that
$$\begin{aligned} \left\| R\left( {\widetilde{X^{\mathrm {sol.}}}}\right) \right\| \le \left\| \left( \widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}}\right) \right\| \left( 1+\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }\right) . \end{aligned}$$
Then, we have \(\left\| R\left( {\widetilde{X^{\mathrm {sol.}}}}\right) \right\| \le \theta _{1}\left\| \widetilde{ X^{\mathrm {sol.}}}-X^{\mathrm {sol.}}\right\| ,\) where
$$\begin{aligned} \theta _{1}= 1+\Vert A\Vert ^{2}e^{\mathrm {max}\left( \left\| X^{\mathrm {sol.}}\right\| ,~\left\| \widetilde{X^{\mathrm {sol.}}}\right\| \right) }. \end{aligned}$$
Hence, the proof is completed. \(\square\)