Prologue Ongoing advances in LLMs call for a mathematical model that can analytically model its computability. To do so, it is important to start with a basic model of LLM that captures the core properties, still allowing mathematical tractability. Let’s start with mathematical model for RNNs.
Introduction Consider an RNN network composed of
neurons or processors. A simple equation to describe the network is
where is the state of the processor in time t+1, and {A,B,c} are the weights. The assumption is that the network has finite processors
, fixed weights
and only state
and input signal
are functions of time.
A typical RNN consists of real numbers (weights) that constitute its interior, while the input and the output are passed along digital lines. This is why such networks perform the so-called “analog computing”, where ‘analog’ refers to the real-weight property of them, and ‘computing’ is expressed in a conventional digital, Turing-machine sense. Additionally, out of processors, only a subset
is allowed to interact with the environment, implying rest
are hidden layers.

Fig 1: The box with water encoded in the form of Boolean circuits
or neural networks, inside of which analog world exists. The I/O
tape in the bottom of box is where digital readout happens after
the computation is performed.
Definition A language is recognised by the formal net
in time
, so that
and
. Family of circuits
can compute the language
in time
, such that
.
If , then, what is the relation between
and
.
Theorem For functions , we have two results:
(i) CIRCUIT(F(n)) is subset (or equal to) NET(nF^{2}(n))
(ii)NET(F(n)) is subset (or equal to) CIRCUIT(F^{3}(n))
Proof of (i)
The proof follows from the construction below:

Fig 2: A formal net consists of three networks: input network
, retrieval network
,
and output network .
The formal network NET that can simulate a nonuniform family of circuits CIRCUIT consists of three networks:
(i) which welcomes two inputs
and
, and converts real number sequence
(except that,
can not be irrational number because of complications explained later) into a series expansion in base 9. This series expansion is given by
.
(ii) welcomes the input
in base 2, which could be an input signal 1 1 1 …. 1 0 0 0, where 1’s are of size
. The network outputs following sequence: 0 0 0 0….
(see Appendix).
(iii) welcomes
and
and outputs two lines: x_{0} and y_{0}, where the nth symbol in
is the response of
to input
.
In the above network, only, the middle network is dominating, and its time complexity is given by, , such that
or,
Remarks
1. The weights of the retrieval network are typically rational numbers, except one weight, which encodes the circuit .
2. One might ask what is the meaning of . To simulate a neural network using a family of Boolean circuits
puts impetus on how the family
is then defined. To give a context, Shor’s algorithm of factoring large numbers into their prime factors consists of two steps: quantum Fourier transform and modular expontentiation. The first step depends on the qubits used for the computation. Similarly, computation using neural network (as super-Turing machine) relies on
(analog to qubits). If one plans to write a prime factoring algorithm using
, then the time complexity of the algorithm would become independent of
, but built on the exact details of how
is used to write the algorithm.
Conclusion
- What we have seen is a mathematical model for RNN, simulated using a family of Boolean circuits
of size
.
- It would be interesting to see next what kind of retrieval network a transformer has. Is it possible to find time complexity of a transformer? The biggest difference between transformer and RNN is the input sequences are processed; temporal in RNN and simultaneously in transformer.
Appendix:
I am confused upon reading some elements related to proving Lemma 3.3.
Source:
Siegelmann, Hava T., and Eduardo D. Sontag. “Analog computation via neural networks.” Theoretical Computer Science 131.2 (1994): 331-360.