Artificial Intuition: The Improbable Deep Learning Revolution by Perez Carlos

Artificial Intuition: The Improbable Deep Learning Revolution by Perez Carlos

Author:Perez, Carlos
Language: eng
Format: epub
Publisher: Intuition Machine Inc
Published: 2017-12-15T00:00:00+00:00


Figure 6.16 Dissipative Langevin equation [ZHA2]

The Bellman equation in Reinforcement Learning is a derivation of Hamilton-Jacobi equation we find in physics describing the evolution of a dynamical system:

Figure 6.17 Hamilton-Jacobi equation [ZHA2]

So it’s established that there is a relationship between the equations of evolution in physical systems and that of Deep Learning networks. However, the methods of Deep Learning have its origins from optimization methods. Optimization methods arrive at convergence when a global extremum is discovered as a solution to the objective function. DL systems differ from classical optimization in that it is overly parameterized and the objective is not optimization but rather another objective known as generalization. Generalization itself is a complicated subject, however the ‘theory’ here is that SGD will arrive at a stable minima and as a consequence generalization will be achieved. However, the open question is, why does stochastic gradient descent (SGD) even converge?

Classic optimization will tell you that the high-dimensional spaces found in Deep Learning is problematic. Yet for Deep Learning practitioners, stochastic gradient descent works surprisingly well. This is unintuitive for many experts in the optimization field. High dimensional problems are supposed to be non-convex and therefore extremely hard to optimize. An extremely simplistic method like SGD is not expected to be effective in the high complexity and high dimensionality space that deep learning networks find themselves in.

Experimental evidence has shown that in high dimensional spaces, the space neighboring the minimal point have a much higher probability of being a saddle point. A saddle point gifts the optimization process with many more opportunities to escape the minima and move forwards. This argument explains why large networks don’t appear to often get stuck in a non-optimal state. I therefore propose that rather than think of Deep Learning from the more conventional viewpoint of being optimization, one should think of Deep Learning instead as a physical system and residing in a non-equilibrium regime. This approach aligns much better with the experimental evidence. Furthermore, it aligns with another theme that an approach to understanding complexity should be based on physical motivations and not abstract mathematical ones.

Engeland’s phenomena of Dissipative Adaptation is mechanism found in dynamical systems that may explain how and why deep learning systems converge into stable attractor basins.

Dissipative Adaptation [PRN] provides an explanation as to why self-replicating structures arise in physical systems. Dissipative Adaption describes the dynamics of a system in contact with a thermal reservoir and with an external energy source acting also on the system. In said system, different configurations of the system are not equally able to absorb energy from that external source. The absorption of energy from an external source allows the system configuration to traverse activation barriers too high to jump rapidly by thermal fluctuations alone. If energy is dissipated after a jump, then this energy is not available for the system to reversibly jump back from where it came. Even though any given change in configuration of the system is random, the most likely configuration (as a consequence of



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.