This post is adapted from Linear recurrence using Cayley-Hamilton Theorem.
Problem
Given a linear recurrence expressed as \(A_n = \sum\limits_{i=0}^{n - 1} C_i A_i\), we want to find out \(A_n\) for some large \(n\).
Naive Solution
\(\begin{bmatrix} C_{n-1} & C_{n-2} & C_{n-3} & \dots & C_{0} \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix}\begin{bmatrix} A_{n-1} \\ A_{n-2} \\ A_{n-3} \\ \vdots \\ A_{0} \end{bmatrix}=\begin{bmatrix} \sum\limits_{i=0}^{n-1} C_{i}A_{i}=A_{n} \\ A_{n-1} \\ A_{n-2} \\ \vdots \\ A_{1} \end{bmatrix} \\\\\text{if we let } T = \begin{bmatrix} C_{n-1} & C_{n-2} & C_{n-3} & \dots & C_{0} \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix}\\\\\text{we have } \begin{bmatrix} A_{m} \\ A_{m-1} \\ A_{m-2} \\ \vdots \\ A_{m-(n-1)} \end{bmatrix} = T^{m - (n-1)} \begin{bmatrix} A_{n-1} \\ A_{n-2} \\ A_{n-3} \\ \vdots \\ A_{0} \end{bmatrix}\\\) Matrix multiplicaton takes \(O(n^3)\), and with fast power this can be done in \(O(n^3 lg(m))\).
Faster Solution
There is some property to be exploited in the bottom left part of \(T\) being the identity matrix. In fact, any matrix of form \(T\) has charactersitic polynomial \(x^{n} - C_{n-1} x^{n-1} - C_{n-2} x^{n-2} \dots - C_{0} x^{0}\), i.e., \(T^{n} = C_{n-1}T^{n-1}+C_{n-2}T^{n-2}\dots+C_{0}T^{0}\).
The observation is that we only need the first row of the matrix, which can be computed efficiently with binary exponentiation.
Let \(\lambda^{i}\) denote the last row of \(T^{i}\), note that the last row of \(T^{i+1}\) is always equal to the second last row of \(T^{i}\).
From \(\lambda^{m}\) we can derive \(\lambda^{m+1}\) in \(O(n)\) time, as \(T^{m+1}=T(\sum\limits_{i=0}^{n-1} (T^{m})_{0,i} T^{i})\\ = (T^{m})_{0,n-1} T^{n} + \sum\limits_{i=1}^{n-1} (T^{m})_{0,i-1} T^{i} \\ = (T^{m})_{0,n-1} \sum\limits_{i=0}^{n-1} C^{i} T^{i} + \sum\limits_{i=1}^{n-1} (T^{m})_{0,i-1} T^{i}\)
From \(\lambda^{m}\) we can derive \(\lambda^{2m}\) in \(O(n^2)\) time, as we can just do polynomial multiplication. To eliminate each term with power greater than \(n-1\) in \(O(n)\), we can precompute \(\lambda^{i} \text{for i in [0, 2n)}\), which can be done in \(O(n^2)\). The easy way to precompute this is to start from \(T^{0}=I\) and advance up \(2n\) times with the incremental method above.
Conclusion
Given that \(T\) is a matrix of size \(n\) that describes a linear recursive sequence, getting the first row of \(T^{m}\) costs \(O(n^2 lg(m))\) time.