This post is adapted from Linear recurrence using Cayley-Hamilton Theorem.

Problem

Given a linear recurrence expressed as \(A_n = \sum\limits_{i=0}^{n - 1} C_i A_i\), we want to find out \(A_n\) for some large \(n\).

Naive Solution

\(\begin{bmatrix} C_{n-1} & C_{n-2} & C_{n-3} & \dots & C_{0} \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix}\begin{bmatrix} A_{n-1} \\ A_{n-2} \\ A_{n-3} \\ \vdots \\ A_{0} \end{bmatrix}=\begin{bmatrix} \sum\limits_{i=0}^{n-1} C_{i}A_{i}=A_{n} \\ A_{n-1} \\ A_{n-2} \\ \vdots \\ A_{1} \end{bmatrix} \\\\\text{if we let } T = \begin{bmatrix} C_{n-1} & C_{n-2} & C_{n-3} & \dots & C_{0} \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix}\\\\\text{we have } \begin{bmatrix} A_{m} \\ A_{m-1} \\ A_{m-2} \\ \vdots \\ A_{m-(n-1)} \end{bmatrix} = T^{m - (n-1)} \begin{bmatrix} A_{n-1} \\ A_{n-2} \\ A_{n-3} \\ \vdots \\ A_{0} \end{bmatrix}\\\) Matrix multiplicaton takes \(O(n^3)\), and with fast power this can be done in \(O(n^3 lg(m))\).

Faster Solution

There is some property to be exploited in the bottom left part of \(T\) being the identity matrix. In fact, any matrix of form \(T\) has charactersitic polynomial \(x^{n} - C_{n-1} x^{n-1} - C_{n-2} x^{n-2} \dots - C_{0} x^{0}\), i.e., \(T^{n} = C_{n-1}T^{n-1}+C_{n-2}T^{n-2}\dots+C_{0}T^{0}\).


The observation is that we only need the first row of the matrix, which can be computed efficiently with binary exponentiation.


Let \(\lambda^{i}\) denote the last row of \(T^{i}\), note that the last row of \(T^{i+1}\) is always equal to the second last row of \(T^{i}\).


From \(\lambda^{m}\) we can derive \(\lambda^{m+1}\) in \(O(n)\) time, as \(T^{m+1}=T(\sum\limits_{i=0}^{n-1} (T^{m})_{0,i} T^{i})\\ = (T^{m})_{0,n-1} T^{n} + \sum\limits_{i=1}^{n-1} (T^{m})_{0,i-1} T^{i} \\ = (T^{m})_{0,n-1} \sum\limits_{i=0}^{n-1} C^{i} T^{i} + \sum\limits_{i=1}^{n-1} (T^{m})_{0,i-1} T^{i}\)


From \(\lambda^{m}\) we can derive \(\lambda^{2m}\) in \(O(n^2)\) time, as we can just do polynomial multiplication. To eliminate each term with power greater than \(n-1\) in \(O(n)\), we can precompute \(\lambda^{i} \text{for i in [0, 2n)}\), which can be done in \(O(n^2)\). The easy way to precompute this is to start from \(T^{0}=I\) and advance up \(2n\) times with the incremental method above.

Conclusion

Given that \(T\) is a matrix of size \(n\) that describes a linear recursive sequence, getting the first row of \(T^{m}\) costs \(O(n^2 lg(m))\) time.

Snippet

template<typename T> // tested on Project Euler 258
vector<T> fast_linear_recurrence(const vector<T> &t, long long p) {
  auto advance = [&](const vector<T> &u) {
    vector<T> v(t.size());
    v[0] = u.back() * t[0];
    for (int i = 1; i < t.size(); ++i) v[i] = u[i - 1] + u.back() * t[i];
    return v;
  };

  // kk[i] = lambda(t ** i)
  vector<vector<T>> kk(2 * t.size(), vector<T>(t.size()));
  kk[0][0] = 1;
  for (int i = 1; i < 2 * t.size(); ++i) kk[i] = advance(kk[i - 1]);
  if (p < kk.size()) return kk[p];

  auto square = [&](const vector<T> &u) {
    vector<T> v(2 * t.size());
    for (int j = 0; j < u.size(); ++j)
      for (int k = 0; k < u.size(); ++k)
        v[j + k] = v[j + k] + u[j] * u[k];
    for (int j = u.size(); j < v.size(); ++j)
      for (int k = 0; k < u.size(); ++k)
        v[k] = v[k] + v[j] * kk[j][k];
    v.resize(u.size());
    return v;
  };

  vector<T> m(kk[1]);
  for (int i = 62 - __builtin_clzll(p); ~i; --i) {
    m = square(m);
    if (p >> i & 1) m = advance(m);
  }
  
  return m;
}