General Theory of Neural Networks

From gene regulatory networks to artificial neural networks

Jul 10, 2024

Find me on Twitter @robleclerc

If there’s a God equation, it’ll most certainly contain some flavor of the fundamental equation for a Formal Neuron123:

Despite vast implementation constraints spanning diverse biological systems, a clear pattern emerges the repeated and recursive evolution of Universal Activation Networks (UANs). These networks consist of nodes (Universal Activators) that integrate weighted inputs from other units or environmental interactions and activate at a threshold, resulting in an action or an intentional broadcast. Minimally, Universal Activator Networks include gene regulatory networks, cell networks, neural networks, cooperative social networks, and sufficiently advanced artificial neural networks (See Appendix for a more general form of this equation).

Evolvability and generative open-endedness define Universal Activation Networks, setting them apart from other dynamic networks, complex systems or replicators. Evolvability implies robustness and plasticity in both structure and function, differentiable performance, inheritable replication, and selective mechanisms. They evolve, they learn, they adapt, they get better and their open-enedness lies in their capacity to form higher-order networks subject to a new level of selection.

Surprisingly, this special class of networks has yet to be recognized as unifying under a general theory with common principles and is usually lumped in with a broader class of networks that lack the properties of evolvability and open-endedness. Identifying shared characteristics provides a null hypothesis for observing and understanding the world and a basis for good engineering design. Conversely, null hypothesis violations reveal theoretical shortcomings or hidden factors that we wouldn’t have known to look for without a more general unified theory.

A Brief History of Universal Activator Networks

Prokaryotes emerged 3.5 billion years ago, their gene networks acting like rudimentary brains. These networks controlled chemical reactions and cellular processes, laying the foundation for complexity. Multicellular organisms appeared 1.5 billion years later, with cells communicating through mechanical, electrical, and chemical signals.

Cnidarians evolved neurons 600 million years ago, creating the first rudimentary nervous system. These specialized cells assumed executive control of the organism, enhancing environmental interaction. The same basic structure of a neuron—soma, dendrites, and axon—persists across species today. 200 million years later, Bilaterians emerged and began developing more centralized nervous systems.

Some Bilaterians went on to evolve complex brains and cooperative behaviors, leading to sophisticated social networks. From ant colonies to human societies, neural advancements enabled increasingly complex interactions. The network at one level begets the node in the next, a recurring theme in evolutionary complexity.

Artificial neural networks, inspired by biological brains, emerged 80 years ago. Initially simple in silico representations, they were coopted into the foundation of artificial intelligence. Until recently, open-endedness seemed limited to biological systems, but today's Large Language Models (LLMs) challenge this notion.

From ancient prokaryotic gene regulation to modern LLMs, diverse systems have converged on a fundamental mechanism: Universal Activator Networks (UANs). These inherently adaptable, scalable, and open-ended systems exhibit a familial similarity. Understanding the design principles behind evolvability and open-endedness in one network can inform our research in others and help us engineer more powerful artificial neural networks and agentic AI and vice versa.

I propose that UANs represent a fundamental phenomenon of the physical universe, bridging natural sciences (physics, chemistry, biology) and formal sciences (mathematics, logic, computer science). Their ubiquity offers greater explanatory power than established theories of complex systems. The fractal-like recurrence of UANs across diverse domains suggests a causal relationship, rather than coincidence, hinting at a unifying science, deeply hidden.

Here there be Dragons

A general theory of neural networks under UANs, while promising, requires rigorous examination. Key considerations include:

Biological Specificity: UANs must account for the complex, evolved behaviors of biological networks without oversimplification.
Empirical Validation: Comprehensive studies across diverse biological systems are necessary to substantiate UAN universality.
Evolutionary Mechanisms: The pathways and pressures shaping UAN evolution demand deeper investigation.
Emergence and Complexity: Understanding how complex behaviors arise from UAN interactions challenges reductionist approaches and requires further study.

These points highlight areas for focused research rather than limitations of the theory. The UAN framework's strength lies in its potential to unify diverse systems, from gene regulatory networks to neural systems.

The following section examines artificial gene regulatory networks as a foundation for UAN principles. These simpler precursor systems to neural networks offer insights into how UANs at one level become nodes at the next, illuminating their open-ended nature. This approach is a starting point for identifying generalizable UAN principles, testable across various systems including artificial neural networks.

Insights from Artificial Gene Regulatory Networks

Topology is all that matters

In early 2000, von Dassow et al. developed an in silico model of Drosophila melanogaster's thorax segmentation using known chemical binding constants and gene expression levels. Like the various activation functions used in artificial neural networks, genes integrate their regulatory inputs with activation following a threshold-based response described by the Hill equation, enabling linear inputs to produce a non-linear output (See Appendix).

The model's predictive power was remarkable. With an exhaustive list of empirically derived parameters, the model initially failed until two undiscovered gene interactions were incorporated, enabling accurate pattern generation. The model not only reproduced known mutant phenotypes but also predicted novel ones, subsequently confirmed experimentally. This model demonstrated that the gene regulatory network adhered to basic underlying computational principles, validating the model’s biological fidelity, and demonstrating its potential as a discovery tool.

**Figure 1:** Topology of the Gene regulatory network of the segmentation in drosophila from Von Dassow (2000). Ovals represent genes, rounded boxes represent transcription factors. Terminating arrows represent upregulation, and terminating dots represent inhibition. The hexagon represents receptors on the cell.

Testing the network for robustness, the network maintained stable behavior despite parameter variations across 2-3 orders of magnitude. Even random assignment of values to the 48 network parameters resulted in robust function 1/200 of the time. This extraordinary insensitivity to implementation details is both astonishing and logical: astonishing due to the degree of robustness and logical because gene networks must withstand environmental noise and genetic variation to be evolvable. Without robustness, selective traits can not be transmitted. The authors concluded that network topology determined function, not the implementation details.

Reducing Gene Networks to Boolean Networks

In 2003, Albert and Othmer distilled von Dassow's model into a Boolean network of simple on/off switches. Despite this extra level of abstraction, network topology still accurately predicted gene expression patterns. This simplification reveals a profound discovery: nature effectively evolved an intricate Rube Goldberg machine to implement what is fundamentally a Boolean computational circuit. As discussed in the companion article, An Introduction to Neurons, Backpropagation and Transformers, networks of Boolean gates (AND, OR, NOT) can construct any Turing Complete machine.

Nature's topological convergence to an equivalent Boolean circuit suggests that much biological complexity may be a byproduct of implementation constraints and, therefore, abstractable. Stripping away implementation details exposed a simpler, more tractable computational circuit. This raises an intriguing question: call all UAN networks ultimately reduce to Boolean circuits (and subnetworks) while preserving core topology?

Evolutionary Convergence to a Network’s Critical Topology

During my PhD research from 2003-2009, I investigated the evolution of artificial gene regulatory networks, building upon the work of von Dassow (2000) and Albert and Othmer (2003). My goal was to find evidence for the evolution of evolvability by evolving populations of in silico multicellular organisms. I focused on the emergence of modularity, genetic robustness, environmental robustness, and phenotypic plasticity.

In my 2009 paper, Survival of the Sparsest: Robust Gene Networks Are Parsimonious, I derived an equilibrium equation for network density based on rates of mutation, deletion, and addition of gene interactions (w_ij). Initializing networks at their equilibrium density and then subjecting them to stabilizing selection, I observed evolution consistently favored sparser densities than their equilibrium, approaching a minimal threshold where the network could no longer function. This demonstrated the cost of network complexity and that most network interactions were, in fact, spurious, with the weights bearing no information that contributed to the network’s function, a dynamic that’s overlooked when network topology is fixed.

**Figure 2.** The expected equilibrium density of a network can be computed from the rate parameters for the mutation, deletion and addition of network connections. Plot shows the average evolved response in connectivity density for different parameter sets (circle, square, diamond), as compared to the expected equilibrium density in the absence of selection (thick dashed line). Populations were evolved for 50,000 From Leclerc 2009.

This process parallels thermodynamics, where reducing complexity is akin to decreasing entropy to a minimum energy state. Non-essential complexity reduces modularity and increases coupling, thereby hindering evolvability. When a network is allowed to evolve its connectivity, evolution minimizes the cost of complexity by retaining only necessary and sufficient interactions, enhancing evolvability, while constraining plausible network configurations. Note that while an individual network may be constrained, a network of cells effectively forms a larger network which can provide additional degrees of freedom and computing power to the overall unit of selection.

A network topology with the minimum number of interactions required to perform a function can be termed the critical topology. In contrast, a parsimonious topology is the network configuration that results from evolutionary processes under stabilizing selection. As a null hypothesis, we can assume that the parsimonious and critical topologies will be one and the same. Differences between them may be attributed to constraints or additional functional requirements. It’s an open question whether the critical topology and its isoforms represent a single network topology (one global optima) or if there can be one more critical topology for a given function (many global optima).

Evolution may permit more spurious complexity when selection pressures are relaxed, such as during ecological release in the Cambrian Explosion. These additional interactions increase the degrees of freedom in the gradient ascent to higher fitness peaks and can help avoid getting stuck on a local optimum.

My findings anticipated later developments in machine learning, such as Hinton et al’s discovery that pruning artificial neural networks enhances performance, leading to so-called dropout strategies. My research extended this concept by demonstrating that forcing networks to their critical topology reveals their core computational circuits.

A Series of Conjectures

The research presented above provides a conceptual foundation linking gene regulatory networks and artificial neural networks and highlights principles that I believe will apply to all Universal Activation Networks (UANs). It leads to conjectures guiding our understanding of both biological and artificial networks. From these insights, I’m proposing a series of conjectures:

Universal Activation Networks (UANs) can simulate the function of any other activator networks across biology, physics, and artificial intelligence. From gene regulatory networks to advanced AI, UANs share a common computational structure. This universality implies that the same fundamental principles of computation can be used to model and understand diverse network types, providing a unified framework spanning different domains. In short, these networks are computationally homomorphic. Divergences in expected architectures indicate confounding constraints or unique opportunities inherent to the specific system. These divergences may reveal key insights, offering deeper truths about the system in question. By pursuing computational principles shared by UANs, we can develop a cohesive understanding that bridges gaps between traditionally disparate fields.

UANs operate according to either computational principles or magic. The Law of Excluded Middle permits no third option: a network either performs computations, or it does not. Although UANs can be complex and our current tools may fall short, these networks are not black boxes; and are fundamentally explainable. This may seem obvious, but when I first proposed this during my PhD, most biologists I spoke with rejected the computational principle and were reaching for the excluded middle. Similarly, there’s a widespread belief that transformer networks are black boxes and therefore tacitly incomprehensible. We must staunchly reject any notion of neo-dualism (for instance those asserting quantum effects for consciousness), affirming that all networks, biological or artificial, adhere to comprehensible computational principles until proven otherwise. This principle enforces a common language (computation).

Computation is the null hypothesis. When analyzing biological or artificial networks evolved to perform a function, we should assume they perform specific computations until proven otherwise. This principle is supported by studies like von Dassow et al., which demonstrated that gene networks require precise interactions to function, even when those interactions were not yet known. This approach shifts our focus from merely cataloging network components to uncovering the underlying computational principles, enabling us to make testable predictions and draw parallels between diverse types of networks. Understanding why different networks implement functions differently is crucial. For example, how do spiking neurons evolve from the basic logic of formal neurons, and how much of this evolution is due to constraints and evolutionary vestiges? The Albert and Othmer paper showed that the Drosophila segmentation network could be reduced to a Boolean network, yet nature’s circuits are far more complex. Is the network multiplexing, reusing the same genetics in different contexts, or is this a limitation of chemical-based information processing? A computational null hypothesis helps identify biases and uncover new areas for inquiry, demystifying complex systems and driving profound insights into both biological and artificial networks. By recognizing the computational nature of all networks, we bridge gaps across disciplines, fostering innovation and a deeper understanding of the natural and engineered world.

A UANs critical topology, and its implied gating logic, dictate its function, not the implementation details. Intuitively, a circuit designer can recognize the function of different 2-bit adders (See Figure 3). Good engineering design will determine the optimal implementation, taking into account context such as methods, cost, and availability. This principle is universal across fields. In neural networks, the topology of connections between neurons is more crucial to function than the specific activation functions. In biology, the network of protein-protein interactions within a cell determines cellular behavior more than the individual proteins themselves. In computer science, the efficiency of sorting algorithms (like quicksort vs. mergesort) depends on their structural logic rather than the specific coding language. Assuming that the critical topology dictates its function, we have a basis to understand what we ought to expect. When our observations don’t comport with our expectations, we’re given a clue that one or more confounding factors are influencing the network topology.
Figure 3: (Source) Two combinational circuits implementing the adder function. The circuit shows an adder built with AND (round end), NOT (Open Dots), and two 4-in OR gates. The network on the right is composed of two half-adders implemented with the XOR function. In both cases, Topology defines how the system computes.
In a fully connected UNP network, most interaction weights are spurious. An interaction weight is spurious if it does not performantly contribute to the network’s robustness or functional output. Spurious interactions introduce unnecessary complexity and additional targets for perturbations, thereby decreasing the network's overall efficiency and robustness. To contribute, an interaction weight must transmit an information-bearing signal. Interactions can be determined as information-bearing when their removal decreases overall functional performance across a sufficiently rich distribution of contexts (inputs) in the network operates. Interactions that, when removed, have no effect on function or tend to reduce performance are spurious.

Extreme pruning of Activator Networks reveals the necessary and sufficient circuit topology (the critical topology). Dense networks without the ability to evolve or prune their topology tend to develop many spurious connections, leading to inefficiency and overfitting. In artificial neural networks, this manifests as noise rather than meaningful patterns and excessive computational resource requirements. Similarly, biological systems such as gene regulatory networks and cellular signaling pathways favor sparse, functionally critical interactions due to evolutionary pressures, ensuring efficiency and robustness. This principle underscores the tradeoffs between sparsity and adaptability in maintaining the functionality and reliability of complex systems across diverse domains. By focusing on essential connections and eliminating unnecessary ones, we enhance both the efficiency, generalizability, and interpretability of these networks.

For any functional Universal Activator Network, there exists a critical topology—a minimal network structure that retains full functionality and efficiency. By systematically removing spurious connections and nodes, a UAN can be reduced to its most fundamental functional form, which still performs the intended computations without losing performance across the input-output map. The process of extreme pruning not only reveals the critical topology but also enhances the overall efficiency and performance of the UAN by eliminating redundancy and reducing noise and improving resource efficiency. This critical topology represents the core architecture necessary for the network's operation, shedding light on the essential elements of network design and can help us understand network function. The principle of extreme pruning to reveal the critical topology is universally applicable across different types of adaptive networks, including biological neural networks, artificial neural networks, and gene regulatory networks.

**Figure 4:** With spurious interactions, the number of viable topologies for a given function will be arbitrarily large. In comparison, Critical topologies for a given function, given a context, will represent a vanishingly small set of potential topologies. An evolutionary process will tend to converge on a critical topology.

Allowed to evolve its network connectivity, a UAN will prune its network topology to the necessary and sufficient network topology (the critical network) given the functional context and implementation constraints. This pruning enhances performance by reducing overfitting, improving resource efficiency, and ensuring robustness. However retaining additional complexity may also help with learning and adaptation, as we see in the plasticity of biological neural networks. Whenever a network maintains a topology that exceeds its expected critical density, this implies the presence of hidden constraints or adaptive opportunities. The requirement is simple benefits > costs. These might include implementation constraints, evolutionary pressures, environmental adaptations, or untapped functional capabilities that warrant further exploration. This principle underscores the importance of optimizing network connectivity to balance efficiency and adaptability across various domains.

For example, while networks at their critical topology reduce coupling and, therefore increase robustness; they can also be catastrophically sensitive to perturbations since each interaction weight plays an essential role. Sparse networks are also less evolvable than dense networks, explaining how very large LLM models avoid getting stuck in local optima during gradient descent. With millions of degrees of freedom, there is always a way down the valley.

Final Note on Conjectures

These conjectures make specific claims about the computational nature of various biological and artificial systems. As with any scientific hypothesis, observational data is expected to sometimes contradict these claims. However, this does not necessarily mean the hypothesis is wrong; instead, it often indicates the presence of confounding factors that were not initially considered.

A historical example is the discovery of Neptune. The observed orbit of Uranus did not match predictions based on Newtonian mechanics, suggesting that the null hypothesis—that the known planets alone account for Uranus's orbit—was incorrect. The deviation led to the prediction and subsequent discovery of Neptune, an unknown planet, acting as a confounding variable.

In the context of UANs, encountering data that contradicts our conjectures can catalyze a deeper and more pointed investigation. For instance, if a gene regulatory network in a model organism exhibits unexpected behavior, this may not imply that the fundamental principles of UANs are flawed. Rather, it suggests additional layers of complexity, unknown interactions, or constraints at play. Identifying and accounting for confounding variables can lead to important scientific discoveries and a more nuanced understanding of the systems under study.

Conclusion

A General Theory of Neural Networks unifies diverse but familial systems, from gene regulatory networks to artificial neural networks. However, the implications are not yet fully understood. While I’ve identified common principles like critical topology and the importance of network structure over implementation details, we're still in the early stages of exploring how these insights can be applied across disciplines.

The next crucial step is rigorous empirical testing of the UAN theory's predictions in various domains to identify universal features. This will help determine whether UANs represent a fundamental principle of information processing in nature, or if they are a useful but limited analogy.

If you like this post, please share.

Appendix

Below is a mapping of various networks, showing their specific formulas, and then a normalized form for easier comparison.

Variable Descriptions:

From here we can generate a more universal equation that handles all the special cases:

Conceptually, we might try to characterize this as an integration of factors:

<NEXT STATE> = <INTRINSIC INTERACTIONS/STATE> + <NETWORK INTERACTIONS> + <ENVORNMENTAL INTERACTIONS> + <PHYSICS>

Variable Explanations

List of Transformation Functions Possible Forms for and

Whether the activation function occurs outside the sum or insight the sum has different implications:

Outside the sum (typical in artificial neural networks):
- Applies the nonlinearity after aggregating all inputs
- Often used in feedforward and recurrent neural networks
- Allows for a linear combination of inputs before introducing nonlinearity
Inside the sum (common in biological models):
- Applies the nonlinearity to each input individually before summing
- Often seen in models of biological systems like gene regulatory networks
- Can represent nonlinear interactions between individual components

Special Cases Depending on the system, you might set one of the activation functions to the identity function to deactivate it, adapting the universal form to specific types of networks:

A brief note on nomenclature: bridging scientific disciplines is challenging due to variations in terminology. Terms often carry precise meanings unique to each field, leading to confusion. For clarity, I’ll use 'Universal Activator' as the fundamental unit, rather than 'Neuron,' which is loaded with biological interpretations and is already conflated. Here Activator Network term will refer to a general class of networks, not a specific type. I ask readers to interpret these terms flexibly.

See Companion article for details on this equation and a more in-depth introduction to major concepts in artificial neural networks

This excludes solid material models, like the Isin model of ferromagnetism whose weights do not change under some dynamics of learning, adaption, or evolution. It also excludes for now more amorphous systems like ecology webs, which are dynamic systems, but may be more analogous to a community of single-celled organisms rather than a multicellular organism with an open-ended evolutionary potential.

Rob Leclerc

Discussion about this post