Mathematical model for computability using neural networks: Part 1 (RNN)

Prologue Ongoing advances in LLMs call for a mathematical model that can analytically model its computability. To do so, it is important to start with a basic model of LLM that captures the core properties, still allowing mathematical tractability. Let’s start with mathematical model for RNNs.

Introduction Consider an RNN network $\mathbb{N}$ composed of $N$ neurons or processors. A simple equation to describe the network is

$x^{+} = \sigma (Ax + Bu + c)$

where $x^{+}$ is the state of the processor in time t+1, and {A,B,c} are the weights. The assumption is that the network has finite processors $N$ , fixed weights $A,B,c$ and only state $x$ and input signal $u$ are functions of time.

A typical RNN consists of real numbers (weights) that constitute its interior, while the input and the output are passed along digital lines. This is why such networks perform the so-called “analog computing”, where ‘analog’ refers to the real-weight property of them, and ‘computing’ is expressed in a conventional digital, Turing-machine sense. Additionally, out of $N$ processors, only a subset $p$ is allowed to interact with the environment, implying rest $N-p$ are hidden layers.

Fig 1: The box with water encoded in the form of Boolean circuits
or neural networks, inside of which analog world exists. The I/O
tape in the bottom of box is where digital readout happens after
the computation is performed.

Definition A language $L$ is recognised by the formal net $N$ in time $T$ , so that $\phi_{N}=L$ and $T_{N}=T$ . Family of circuits $S_{C}$ can compute the language $L$ in time $T$ , such that $\phi_{C}=L$ .

If $\phi_{C}=\phi_{L}$ , then, what is the relation between $T_{N}$ and $S_{C}$ .

Theorem For functions $F >= n$ , we have two results:
(i) CIRCUIT(F(n)) is subset (or equal to) NET(nF^{2}(n))
(ii)NET(F(n)) is subset (or equal to) CIRCUIT(F^{3}(n))

Proof of (i)

The proof follows from the construction below:

Fig 2: A formal net consists of three networks: input network
$\mathbb{N}_{1}$ , retrieval network $\mathbb{N}_{r}$ ,
and output network $\mathbb{N}_{s}$ .

The formal network NET that can simulate a nonuniform family of circuits CIRCUIT consists of three networks:

(i) $\mathbb{N}_{1}$ which welcomes two inputs $u_{1}$ and $u_{2}$ , and converts real number sequence $\alpha$ (except that, $\alpha$ can not be irrational number because of complications explained later) into a series expansion in base 9. This series expansion is given by $\hat{en}[\alpha]$ .
(ii) $\mathbb{N_{R}(c)}$ welcomes the input $u_{2}$ in base 2, which could be an input signal 1 1 1 …. 1 0 0 0, where 1’s are of size $n$ . The network outputs following sequence: 0 0 0 0…. $\hat{en}[c_{n}]0 0 0..$ (see Appendix).
(iii) $\mathbb{N_{s}}$ welcomes $\hat{en}[\alpha]$ and $\hat{en}[c_{|\alpha|}]$ and outputs two lines: x_{0} and y_{0}, where the nth symbol in $x_{0}$ is the response of $C$ to input $\alpha$ .

In the above network, only, the middle network is dominating, and its time complexity is given by, $T=O(\sum_{i=1}^{\alpha} l(c_{i}))$ , such that

$T=O(\alpha) S_{C}^{2}(\alpha)$

or,

$T=O(n S^{2}n)$

Remarks
1. The weights of the retrieval network are typically rational numbers, except one weight, which encodes the circuit $C$ .
2. One might ask what is the meaning of $S(n)$ . To simulate a neural network using a family of Boolean circuits $C$ puts impetus on how the family $C$ is then defined. To give a context, Shor’s algorithm of factoring large numbers into their prime factors consists of two steps: quantum Fourier transform and modular expontentiation. The first step depends on the qubits used for the computation. Similarly, computation using neural network (as super-Turing machine) relies on $C$ (analog to qubits). If one plans to write a prime factoring algorithm using $C$ , then the time complexity of the algorithm would become independent of $C$ , but built on the exact details of how $C$ is used to write the algorithm.

Conclusion

What we have seen is a mathematical model for RNN, simulated using a family of Boolean circuits $C$ of size $S(n)$ .
It would be interesting to see next what kind of retrieval network a transformer has. Is it possible to find time complexity of a transformer? The biggest difference between transformer and RNN is the input sequences are processed; temporal in RNN and simultaneously in transformer.

Appendix:

I am confused upon reading some elements related to proving Lemma 3.3.

Source:
Siegelmann, Hava T., and Eduardo D. Sontag. “Analog computation via neural networks.” Theoretical Computer Science 131.2 (1994): 331-360.

Mathematical model for computability using neural networks: Part 1 (RNN)

Published by Saksham

Leave a comment Cancel reply

Share this:

Related

Published by Saksham

Leave a comment Cancel reply