Mathematical model for computability using neural networks: Part 1 (RNN)

Prologue Ongoing advances in LLMs call for a mathematical model that can analytically model its computability. To do so, it is important to start with a basic model of LLM that captures the core properties, still allowing mathematical tractability. Let’s start with mathematical model for RNNs.

Introduction Consider an RNN network \mathbb{N} composed of N neurons or processors. A simple equation to describe the network is

x^{+} = \sigma (Ax + Bu + c)

where x^{+} is the state of the processor in time t+1, and {A,B,c} are the weights. The assumption is that the network has finite processors N, fixed weights A,B,c and only state x and input signal u are functions of time.

A typical RNN consists of real numbers (weights) that constitute its interior, while the input and the output are passed along digital lines. This is why such networks perform the so-called “analog computing”, where ‘analog’ refers to the real-weight property of them, and ‘computing’ is expressed in a conventional digital, Turing-machine sense. Additionally, out of N processors, only a subset p is allowed to interact with the environment, implying rest N-p are hidden layers.

Fig 1: The box with water encoded in the form of Boolean circuits
or neural networks, inside of which analog world exists. The I/O
tape in the bottom of box is where digital readout happens after
the computation is performed.


Definition A language L is recognised by the formal net N in time T, so that \phi_{N}=L and T_{N}=T. Family of circuits S_{C} can compute the language L in time T, such that \phi_{C}=L.

If \phi_{C}=\phi_{L}, then, what is the relation between T_{N} and S_{C}.

Theorem For functions F >= n, we have two results:
(i) CIRCUIT(F(n)) is subset (or equal to) NET(nF^{2}(n))
(ii)NET(F(n)) is subset (or equal to) CIRCUIT(F^{3}(n))

Proof of (i)

The proof follows from the construction below:

Fig 2: A formal net consists of three networks: input network
\mathbb{N}_{1}, retrieval network \mathbb{N}_{r},
and output network \mathbb{N}_{s}.

The formal network NET that can simulate a nonuniform family of circuits CIRCUIT consists of three networks:

(i) \mathbb{N}_{1} which welcomes two inputs u_{1} and u_{2}, and converts real number sequence \alpha (except that, \alpha can not be irrational number because of complications explained later) into a series expansion in base 9. This series expansion is given by \hat{en}[\alpha].
(ii) \mathbb{N_{R}(c)} welcomes the input u_{2} in base 2, which could be an input signal 1 1 1 …. 1 0 0 0, where 1’s are of size n. The network outputs following sequence: 0 0 0 0….\hat{en}[c_{n}]0 0 0.. (see Appendix).
(iii) \mathbb{N_{s}} welcomes \hat{en}[\alpha] and \hat{en}[c_{|\alpha|}] and outputs two lines: x_{0} and y_{0}, where the nth symbol in x_{0} is the response of C to input \alpha.

In the above network, only, the middle network is dominating, and its time complexity is given by, T=O(\sum_{i=1}^{\alpha} l(c_{i})), such that

T=O(\alpha) S_{C}^{2}(\alpha)

or,

T=O(n S^{2}n)

Remarks
1. The weights of the retrieval network are typically rational numbers, except one weight, which encodes the circuit C.
2. One might ask what is the meaning of S(n). To simulate a neural network using a family of Boolean circuits C puts impetus on how the family C is then defined. To give a context, Shor’s algorithm of factoring large numbers into their prime factors consists of two steps: quantum Fourier transform and modular expontentiation. The first step depends on the qubits used for the computation. Similarly, computation using neural network (as super-Turing machine) relies on C (analog to qubits). If one plans to write a prime factoring algorithm using C, then the time complexity of the algorithm would become independent of C, but built on the exact details of how C is used to write the algorithm.


Conclusion

  1. What we have seen is a mathematical model for RNN, simulated using a family of Boolean circuits C of size S(n).
  2. It would be interesting to see next what kind of retrieval network a transformer has. Is it possible to find time complexity of a transformer? The biggest difference between transformer and RNN is the input sequences are processed; temporal in RNN and simultaneously in transformer.


    Appendix:

    I am confused upon reading some elements related to proving Lemma 3.3.


    Source:
    Siegelmann, Hava T., and Eduardo D. Sontag. “Analog computation via neural networks.” Theoretical Computer Science 131.2 (1994): 331-360.

Published by Saksham

Ph.D. graduate in fluid dynamics from the University of Cambridge

Leave a comment