Ai 4 All
Ai 4 All
A P REPRINT
Vincent Boucher∗
MONTRÉAL.AI
Montreal, Quebec, Canada
[email protected]
November 5, 2020
A BSTRACT
For the purpose of entrusting all sentient beings with powerful AI tools to learn, deploy and scale AI
in order to enhance their prosperity, to settle planetary-scale problems and to inspire those who, with
AI, will shape the 21st Century, MONTRÉAL.AI introduces this VIP AI 101 CheatSheet for All.
1 AI-First
TODAY’S ARTIFICIAL INTELLIGENCE IS POWERFUL AND ACCESSIBLE TO ALL. AI is capable of transform-
ing industries and opens up a world of new possibilities. What’s important is what you do with AI and how you
embrace it. To pioneer AI-First innovations advantages: start by exploring how to apply AI in ways never thought of.
The Emerging Rules of the AI-First Era: Search and Learning.
"Search and learning are general purpose methods that continue to scale with increased computation, even as the
available computation becomes very great." — Richard Sutton in The Bitter Lesson
The Best Way Forward For AI2 .
"... so far as I’m concerned, system 1 certainly knows language, understands language... system 2... it does involve
certain manipulation of symbols... Gary Marcus ... Gary proposes something that seems very natural... a hybrid
architecture... I’m influenced by him... if you look introspectively at the way the mind works... you’d get to that
distinction between implicit and explicit... explicit looks like symbols." — Nobel Laureate Danny Kahneman at
AAAI-20 Fireside Chat with Daniel Kahneman https://vimeo.com/390814190
In The Next Decade in AI 3 , Gary Marcus proposes a hybrid, knowledge-driven, reasoning-based approach, centered
around cognitive models, that could provide the substrate for a richer, more robust AI than is currently possible.
∗
Founding Chairman at MONTRÉAL.AI http://www.montreal.ai and QUÉBEC.AI http://www.quebec.ai.
2
https://montrealartificialintelligence.com/aidebate/
3
https://arxiv.org/abs/2002.06177v3
A PREPRINT - N OVEMBER 5, 2020
2 Getting Started
Tinker with neural networks in the browser with TensorFlow Playground http://playground.tensorflow.org/.
• AI Paygrades https://aipaygrad.es/.
• CS231n Python Tutorial With Google Colab4 .
• Learn with Google AI https://ai.google/education/.
• Made With ML Topics https://madewithml.com/topics/.
• One Place for Everything AI https://aihub.cloud.google.com/.
• Deep Learning Drizzle https://deep-learning-drizzle.github.io.
• Google Dataset Search (Blog5 ) https://datasetsearch.research.google.com.
• AI Literacy for K-12 School Children https://aieducation.mit.edu/resources.
• Learning resources from DeepMind https://deepmind.com/learning-resources.
• Papers With Code (Learn Python 3 in Y minutes6 ) https://paperswithcode.com/state-of-the-art.
"Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and
find links to where the data is." — Natasha Noy
JupyterLab is an interactive development environment for working with notebooks, code and data 11 .
2
A PREPRINT - N OVEMBER 5, 2020
3 Deep Learning
Learning according to Mitchell (1997):
"A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience E." — Tom Mitchell
After the Historical AI Debate12 : "Yoshua Bengio and Gary Marcus on the Best Way Forward for AI" https://
montrealartificialintelligence.com/aidebate/, there have been clarifications on the term "deep learning"13 .
"Deep learning is inspired by neural networks of the brain to build learning machines which discover rich and useful
internal representations, computed as a composition of learned features and functions." — Yoshua Bengio
"DL is constructing networks of parameterized functional modules and training them from examples using
gradient-based optimization." — Yann LeCun
"... replace symbols by vectors and logic by continuous (or differentiable) functions." — Yann LeCun
Deep learning allows computational models that are composed of multiple processing layers to learn REPRESEN-
TATIONS of (raw) data with multiple levels of abstraction[2]. At a high-level, neural networks are either encoders,
decoders, or a combination of both14 . Introductory course http://introtodeeplearning.com. See also Table 1.
Deep learning assumes that the data was generated by the composition of factors potentially at multiple levels in a
hierarchy15 . Deep learning (distributed representations + composition) is a general-purpose learning procedure.
"When you first study a field, it seems like you have to memorize a zillion things. You don’t. What you need is to identify
the 3-5 core principles that govern the field. The million things you thought you had to memorize are various
combinations of the core principles." — J. Reed
12
https://www.zdnet.com/article/devils-in-the-details-in-bengio-marcus-ai-debate/
13
https://www.zdnet.com/article/whats-in-a-name-the-deep-learning-debate/
14
https://github.com/lexfridman/mit-deep-learning
15
https://www.deeplearningbook.org
3
A PREPRINT - N OVEMBER 5, 2020
4
A PREPRINT - N OVEMBER 5, 2020
GitHub31 .
v A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active
and Open World Learning. Mundt et al., 202032 .
The universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite
number of neurons can solve any given problem to arbitrarily close accuracy as long as you add enough parameters.
Neural Networks + Gradient Descent + GPU33 :
• Infinitely flexible function: Neural Network (multiple hidden layers: Deep Learning)34 .
• All-purpose parameter fitting: Backpropagation3536 . Backpropagation is the key algorithm that makes training
deep models computationally tractable and highly efficient37 . The backpropagation procedure is nothing more
than a practical application of the chain rule for derivatives.
"You have relatively simple processing elements that are very loosely models of neurons. They have connections coming
in, each connection has a weight on it, and that weight can be changed through learning." — Geoffrey Hinton
Deep learning : connect a dataset, a model, a cost function and an optimization procedure.
"Deep learning has fully solved the curse of dimensionality. It vanished like an RNN gradient!" — Ilya Sutskever
When a choice must be made, just feed the (raw) data to a deep neural network (Universal function approximators).
5
A PREPRINT - N OVEMBER 5, 2020
"Virtually all modern observers would concede that genes and experience work together; it is “nature and nurture”,
not “nature versus nurture”. No nativist, for instance, would doubt that we are also born with specific biological
machinery that allows us to learn. Chomsky’s Language Acquisition Device should be viewed precisely as an innate
learning mechanism, and nativists such as Pinker, Peter Marler (Marler, 2004) and myself (Marcus, 2004) have
frequently argued for a view in which a significant part of a creature’s innate armamentarium consists not of specific
knowledge but of learning mechanisms, a form of innateness that enables learning." — Gary Marcus, Innateness,
AlphaZero, and Artificial Intelligence39
The deep convolutional network, inspired by Hubel and Wiesel’s seminal work on early visual cortex, uses hierarchical
layers of tiled convolutional filters to mimic the effects of receptive fields, thereby exploiting the local spatial correlations
present in images[1]. See Figure 4. Demo https://ml4a.github.io/demos/convolution/.
A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to an output 3D
volume with some differentiable function that may or may not have parameters40 . Reading41 .
In images, local combinations of edges form motifs, motifs assemble into parts, and parts form objects4243 .
Representation learning : the language of neural networks. The visual vocabulary of a convolutional neural network
seems to emerge from low level features such as edges and orientations, and builds up textures, patterns and composites,
. . . and builds up even further into complete objects. This relates to Wittgenstein’s "language-game" in Philosophical
Investigations44 , where a functional language emerge from simple tasks before defining a vocabulary45 .
"DL is essentially a new style of programming – "differentiable programming" – and the field is trying to work out the
reusable constructs in this style. We have some: convolution, pooling, LSTM, GAN, VAE, memory units, routing units,
etc." — Thomas G. Dietterich
39
https://arxiv.org/abs/1801.05667
40
http://cs231n.github.io/convolutional-networks/
41
https://ml4a.github.io/ml4a/convnets/
42
http://yosinski.com/deepvis
43
https://distill.pub/2017/feature-visualization/
44
https://en.wikipedia.org/wiki/Philosophical_Investigations
45
https://media.neurips.cc/Conferences/NIPS2018/Slides/Deep_Unsupervised_Learning.pdf
6
A PREPRINT - N OVEMBER 5, 2020
Recurrent neural networks are networks with loops in them, allowing information to persist52 . RNNs process an input
sequence one element at a time, maintaining in their hidden units a ‘state vector’ that implicitly contains information
about the history of all the past elements of the sequence[2]. For sequential inputs. See Figure 5.
ht h0 h1 h2 h3 ht
A = A A A A ... A
xt x0 x1 x2 x3 xt
Figure 6: Google Smart Reply System is built on a pair of recurrent neural networks. Diagram by Chris Olah
"I feel like a significant percentage of Deep Learning breakthroughs ask the question “how can I reuse weights in
multiple places?” – Recurrent (LSTM) layers reuse for multiple timesteps – Convolutional layers reuse in multiple
locations. – Capsules reuse across orientation." — Andrew Trask
46
https://colab.research.google.com/drive/1umJnCp8tZ7UDTYSQsuWdKRhqbHts38AC
47
https://www.youtube.com/playlist?list=PLzUTmXVwsnXod6WNdg57Yc3zFx_f-RYsq
48
https://atcold.github.io/pytorch-Deep-Learning/en/week13/13-3/
49
https://arxiv.org/abs/2001.02890
50
https://arxiv.org/abs/2004.15004
51
http://poloclub.github.io/cnn-explainer/
52
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
7
A PREPRINT - N OVEMBER 5, 2020
3.4 Transformers
Transformers are generic, simples and exciting machine learning architectures designed to process a connected set of
units (tokens in a sequence, pixels in an image, etc.) where the only interaction between units is through self-attention.
Transformers’ performance limit seems purely in the hardware (how big a model can be fitted in GPU memory)56 .
The fundamental operation of transformers is self-attention (a sequence-to-sequence operation, Figure 8): an attention
mechanism relating different positions of a single sequence in order to compute a representation of the same sequence57 .
Let’s call the input vectors (of dimension k) :
x1 , x2 , ..., xt (1)
Let’s call the corresponding output vectors (of dimension k) :
y1 , y2 , ..., yt (2)
The self attention operation takes a weighted average over all the input vectors :
X
yi = wij xj (3)
j
The weight wij is derived from a function over xi and xj . The simplest option is the dot product (with softmax) :
T
exi xj
wij = P xT x (4)
ei j
j
53
https://www.youtube.com/playlist?list=PLU40WL8Ol94IJzQtileLTqGZuXtGlLMP_
54
https://www.bioinf.jku.at/publications/older/2604.pdf
55
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
56
http://www.peterbloem.nl/blog/transformers
57
https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
8
A PREPRINT - N OVEMBER 5, 2020
9
A PREPRINT - N OVEMBER 5, 2020
"I think transfer learning is the key to general intelligence. And I think the key to doing transfer learning will be the
acquisition of conceptual knowledge that is abstracted away from perceptual details of where you learned it from." —
Demis Hassabis
"Give a robot a label and you feed it for a second; teach a robot to label and you feed it for a lifetime." — Pierre
Sermanet
Unsupervised learning is a paradigm for creating AI that learns without a particular task in mind: learning for the
sake of learning73 . It captures some characteristics of the joint distribution of the observed random variables (learn the
underlying structure). The variety of tasks include density estimation, dimensionality reduction, and clustering.[4]74 .
Self-supervised learning is derived form unsupervised learning where the data provides the supervision. E.g.
Word2vec75 , a technique for learning vector representations of words, or word embeddings. An embedding is a
mapping from discrete objects, such as words, to vectors of real numbers76 .
10
A PREPRINT - N OVEMBER 5, 2020
Figure 10: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., 2020
"Self-supervised learning is a method for attacking unsupervised learning problems by using the mechanisms of
supervised learning." — Thomas G. Dietterich
min max[IEx∼pdata (x) [logDθd (x)] + IEz∼pz (z) [log(1 − Dθd (Gθg (z)))]] (5)
θg θd
Goodfellow et al. used an interesting analogy where the generative model can be thought of as analogous to a team of
counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous
to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their
methods until the counterfeits are indistiguishable from the genuine articles. See Figure 11.
77
https://paperswithcode.com/task/self-supervised-image-classification
78
https://www.fast.ai/2020/01/13/self_supervised/
79
https://arxiv.org/abs/2010.00578
80
https://arxiv.org/abs/1911.05722
81
https://arxiv.org/abs/1905.09272
82
https://arxiv.org/abs/2002.05709
83
https://arxiv.org/abs/2001.07685
84
https://arxiv.org/abs/2010.07432
85
https://arxiv.org/abs/1912.01991
11
A PREPRINT - N OVEMBER 5, 2020
Figure 11: GAN: Neural Networks Architecture Pioneered by Ian Goodfellow at University of Montreal (2014).
86
https://medium.com/@kcimc/how-to-recognize-fake-ai-generated-images-4d1f6f9a2842
87
https://colab.research.google.com/drive/1CQ2XTMoUB7b9i9USUh4kp8BoCag1z-en
88
https://medium.com/tensorflow/introducing-tf-gan-a-lightweight-gan-library-for-tensorflow-2-0-36d767e1abae
89
https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f
90
https://arxiv.org/abs/1905.08233
91
https://arxiv.org/abs/2003.03581
92
https://github.com/EvgenyKashin/stylegan2-distillation
93
https://arxiv.org/abs/2001.06937
12
A PREPRINT - N OVEMBER 5, 2020
3.5.3 Capsule
Stacked Capsule Autoencoders. The inductive biases in this unsupervised version of capsule networks give rise to
object-centric latent representations, which are learned in a self-supervised way—simply by reconstructing input images.
Clustering learned representations is enough to achieve unsupervised state-of-the-art classification performance on
MNIST (98.5%). Reference: blog by Adam Kosiorek.97 Code98 .
Capsules learn equivariant object representations (applying any transformation to the input of the function has the same
effect as applying that transformation to the output of the function).
Figure 13: Stacked Capsule Autoencoders. Image source: Blog by Adam Kosiorek.
4 Autonomous Agents
We are on the dawn of The Age of Artificial Intelligence.
13
A PREPRINT - N OVEMBER 5, 2020
An autonomous agent is any device that perceives its environment and takes actions that maximize its chance of
success at some goal. At the bleeding edge of AI, autonomous agents can learn from experience, simulate worlds and
orchestrate meta-solutions. Here’s an informal definition99 of the universal intelligence of agent π 100 :
X
Υ(π) := 2−K(µ) Vµπ (6)
µ∈E
"Intelligence measures an agent’s ability to achieve goals in a wide range of environments." — Legg and Hutter, 2007
Reinforcement learning (RL) studies how an agent can learn how to achieve goals in a complex, uncertain environment
(Figure 14) [5]. Recent superhuman results in many difficult environments combine deep learning with RL (Deep
Reinforcement Learning). See Figure 15 for a taxonomy of RL algorithms.
ú Spinning Up in Deep RL - Proximal Policy Optimization (PPO), Colab Notebook101 .
v RL Tutorial, Behbahani et al.102 .
v An Opinionated Guide to ML Research103 .
v CS 188 : Introduction to Artificial Intelligence104 .
v Introduction to Reinforcement Learning by DeepMind105 .
v Isaac Gym https://developer.nvidia.com/isaac-gym.
v Discovering Reinforcement Learning Algorithms, Oh et al.106 .
v The NetHack Learning Environment, Küttler et al.107 GitHub108 .
v "My Top 10 Deep RL Papers of 2019" by Robert Tjarko Lange109 .
v Behavior Priors for Efficient Reinforcement Learning, Tirumala et al.110 .
v Deep tic-tac-toe https://zackakil.github.io/deep-tic-tac-toe/.
99
https://arxiv.org/abs/0712.3329
100
Where µ is an environment, K is the Kolmogorov complexity function, E is the space of all computable reward summable
environmental measures with respect to the reference machine U and the value function Vµπ is the agent’s “ability to achieve”.
101
https://colab.research.google.com/drive/1piaU0x7nawRpSLKOTaCEdUG0KAR2OXku
102
https://github.com/eemlcommunity/PracticalSessions2020/blob/master/rl/EEML2020_RL_Tutorial.ipynb
103
http://joschu.net/blog/opinionated-guide-ml-research.html
104
https://inst.eecs.berkeley.edu/~cs188/fa18/
105
https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
106
https://arxiv.org/abs/2007.08794
107
https://arxiv.org/abs/2006.13760
108
https://github.com/facebookresearch/nle
109
https://roberttlange.github.io/posts/2019/12/blog-post-9/
110
https://arxiv.org/pdf/2010.14274.pdf
14
A PREPRINT - N OVEMBER 5, 2020
Figure 15: A Taxonomy of RL Algorithms. Source: Spinning Up in Deep RL by Achiam et al. | OpenAI
15
A PREPRINT - N OVEMBER 5, 2020
• Q-Learning: Playing Atari with Deep Reinforcement Learning (DQN). Mnih et al, 2013[10]. See Figure 17.
v Q-Learning in enormous action spaces via amortized approximate maximization, de Wiele et al.126 .
v TF-Agents (DQN Tutorial) | Colab https://colab.research.google.com/github/tensorflow/agents.
An RL agent learns the stochastic policy function that maps state to action and act by sampling policy.
126
https://arxiv.org/abs/2001.08116
16
A PREPRINT - N OVEMBER 5, 2020
τ = (s0 , a0 , r0 , s1 , a1 , r1 , . . . , sT −1 , aT −1 , rT −1 , sT ) (10)
Increase probability of actions that lead to high rewards and decrease probability of actions that lead to low rewards:
"T −1 #
X
∇θ Eτ [R(τ )] = Eτ ∇θ log π(at |st , θ)R(τ ) (11)
t=0
πθ (s, α1 )
πθ (s, α2 )
πθ (s, α3 )
πθ (s, α4 )
πθ (s, α5 )
s
Vψ (s)
• Policy Optimization: Asynchronous Methods for Deep Reinforcement Learning (A3C). Mnih et al, 2016[8].
• Policy Optimization: Proximal Policy Optimization Algorithms (PPO). Schulman et al, 2017[9].
4.1.3 Model-Based RL
In Model-Based RL, the agent generates predictions about the next state and reward before choosing each action.
127
https://arxiv.org/abs/2009.04416
128
https://github.com/openai/phasic-policy-gradient
129
https://arxiv.org/abs/1805.02070
130
https://arxiv.org/abs/1909.01500
17
A PREPRINT - N OVEMBER 5, 2020
Figure 20: World Model’s Agent consists of: Vision (V), Memory (M), and Controller (C). | Ha et al, 2018[11]
• Learn the Model: Recurrent World Models Facilitate Policy Evolution (World Models131 ). The world model
agent can be trained in an unsupervised manner to learn a compressed spatial and temporal representation of
the environment. Then, a compact policy can be trained. See Figure 20. Ha et al, 2018[11].
• Learn the Model: Learning Latent Dynamics for Planning from Pixels https://planetrl.github.io/.
• Given the Model: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
(AlphaZero). Silver et al, 2017[14]. AlphaGo Zero Explained In One Diagram132 .
v Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Schrittwieser et al.133 . Pseudocode134 .
SuperDyna.135 The ambition: a general AI agent for Artificial Biological Reinforcement Learning.
1. Interact with the world: sense, update state and take an action
2. Learn from what just happened: see what happened and learn from it
3. Plan: (while there is time remaining in this time step) imagine hypothetical states and actions you might take
4. Discover : curate options and features and measure how well they’re doing
The first complete and scalable general AI-agent architecture that has all the most important capabilities and desiderata:
"In practice, I work primarily in reinforcement learning as an approach to artificial intelligence. I am exploring ways to
represent a broad range of human knowledge in an empirical form–that is, in a form directly in terms of experience–and
in ways of reducing the dependence on manual encoding of world state and knowledge." — Richard S. Sutton
131
https://worldmodels.github.io
132
https://applied-data.science/static/main/res/alpha_go_zero_cheat_sheet.png
133
https://arxiv.org/abs/1911.08265
134
https://arxiv.org/src/1911.08265v2/anc/pseudocode.py
135
https://insidehpc.com/2020/02/video-toward-a-general-ai-agent-architecture/
136
https://slideslive.com/38921889/biological-and-artificial-reinforcement-learning-4
18
A PREPRINT - N OVEMBER 5, 2020
"The future of high-level APIs for AI is... a problem-specification API. Currently we only search over network weights,
thus "problem specification" involves specifying a model architecture. In the future, it will just be: "tell me what data
you have and what you are optimizing"." — François Chollet
19
A PREPRINT - N OVEMBER 5, 2020
Figure 23: A comparison of the original LSTM cell vs. two new good generated. Top left: LSTM cell. [19]
In her Nobel Prize in Chemistry 2018 Lecture "Innovation by Evolution: Bringing New Chemistry to Life" (Nobel
Lecture)†144 , Prof. Frances H. Arnold said :
"Nature ... invented life that has flourished for billions of years. (...) Equally awe-inspiring is the process by which
Nature created these enzyme catalysts and in fact everything else in the biological world. The process is evolution, the
grand diversity-generating machine that created all life on earth, starting more than three billion years ago. (...)
evolution executes a simple algorithm of diversification and natural selection, an algorithm that works at all levels
of complexity from single protein molecules to whole ecosystems." — Prof. Frances H. Arnold
20
A PREPRINT - N OVEMBER 5, 2020
"Evolution is a slow learning algorithm that with the sufficient amount of compute produces a human brain." —
Wojciech Zaremba
Natural evolutionary strategy directly evolves the weights of a DNN and performs competitively with the best deep
reinforcement learning algorithms, including deep Q-networks (DQN) and policy gradient methods (A3C)[21].
Neuroevolution, which harnesses evolutionary algorithms to optimize neural networks, enables capabilities that are
typically unavailable to gradient-based approaches, including learning neural network building blocks, architectures
and even the algorithms for learning[12].
". . . evolution — whether biological or computational — is inherently creative, and should routinely be expected to
surprise, delight, and even outwit us." — The Surprising Creativity of Digital Evolution, Lehman et al.[22]
The ES algorithm is a “guess and check” process, where we start with some random parameters and then repeatedly:
Neural architecture search has advanced to the point where it can outperform human-designed models[13].
"Caterpillar brains LIQUIFY during metamorphosis, but the butterfly retains the caterpillar’s memories!" — M. Levin
"Open-ended" algorithms are algorithms that endlessly create. Brains and bodies evolve together in nature.
"We’re machines," says Hinton. ""We’re just produced biologically (...)" — Katrina Onstad, Toronto Life
v Evolution Strategies147 .
v VAE+CPPN+GAN148 .
v Demo: ES on CartPole-v1149 .
v AutoML-Zero: Evolving Machine Learning Algorithms From Scratch, Real et al.150 Code151 .
v Spiders Can Fly Hundreds of Miles Riding the Earth’s Magnetic Fields152 .
v A Visual Guide to ES http://blog.otoro.net/2017/10/29/visual-evolution-strategies/.
v Xenobots A scalable pipeline for designing reconfigurable organisms, Kriegman et al.153 . Learn154 . Evolve155 .
147
https://lilianweng.github.io/lil-log/2019/09/05/evolution-strategies.html
148
https://colab.research.google.com/drive/1_OoZ3z_C5Jl5gnxDOE9VEMCTs-Fl8pvM
149
https://colab.research.google.com/drive/1bMZWHdhm-mT9NJENWoVewUks7cGV10go
150
https://arxiv.org/abs/2003.03384
151
https://github.com/google-research/google-research/tree/master/automl_zero
152
https://www.cell.com/current-biology/fulltext/S0960-9822(18)30693-6
153
https://www.pnas.org/content/early/2020/01/07/1910837117
154
https://cdorgs.github.io
155
https://github.com/skriegman/reconfigurable_organisms
21
A PREPRINT - N OVEMBER 5, 2020
Silver et al.[15] introduced an algorithm based solely on reinforcement learning, without human data, guidance or
domain knowledge. Starting tabula rasa (and being its own teacher!), AlphaGo Zero achieved superhuman performance.
AlphaGo Zero showed that algorithms matter much more than big data and massive amounts of computation.
Self-play mirrors similar insights from coevolution. Transfer learning is the key to go from self-play to the real world156 .
"Open-ended self play produces: Theory of mind, negotiation, social skills, empathy, real language understanding." —
Ilya Sutskever, Meta Learning and Self Play
"We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build models of the agents it
encounters, from observations of their behaviour alone." — Machine Theory of Mind, Rabinowitz et al.[25]
Cooperative Agents. Learning to Model Other Minds, by OpenAI[24], is an algorithm which accounts for the fact that
other agents are learning too, and discovers self-interested yet collaborative strategies. Also: OpenAI Five160 .
Figure 25: Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker https://ai.facebook.
com/blog/pluribus-first-ai-to-beat-pros-in-6-player-poker
"Artificial Intelligence is about recognising patterns, Artificial Life is about creating patterns." — Mizuki Oka et al.
Active Learning Without Teacher. In Intrinsic Social Motivation via Causal Influence in Multi-Agent RL, Jaques et
al. (2018) https://arxiv.org/abs/1810.08647 propose an intrinsic reward function designed for multi-agent RL
(MARL), which awards agents for having a causal influence on other agents’ actions. Open-source implementation 161 .
156
http://metalearning-symposium.ml
157
https://medium.com/applied-data-science/how-to-build-your-own-muzero-in-python-f77d5718061a
158
https://frpays.github.io/lc0-js/engine.html
159
https://github.com/frpays/lc0-js/
160
https://blog.openai.com/openai-five/
161
https://github.com/eugenevinitsky/sequential_social_dilemma_games
22
A PREPRINT - N OVEMBER 5, 2020
Learning to Learn[16].
"The notion of a neural "architecture" is going to disappear thanks to meta learning." — Andrew Trask
A meta-learning algorithm takes in a distribution of tasks, where each task is a learning problem, and it produces a
quick learner — a learner that can generalize from a small number of examples[17].
23
A PREPRINT - N OVEMBER 5, 2020
4.5.2 The Grand Challenge for AI Research | AI-GAs: AI-Generating Algorithms, an Alternate Paradigm for
Producing General Artificial Intelligence
In AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence172 , Jeff
Clune describes an exciting path that ultimately may be successful at producing general AI. The idea is to create an
AI-generating algorithm (AI-GA), which automatically learns how to produce general AI.
Three Pillars are essential for the approach: (1) Meta-learning architectures, (2) Meta-learning the learning algo-
rithms themselves, and (3) Generating effective learning environments.
• The First Pillar, meta-learning architectures, could potentially discover the building blocks : convolution, re-
current layers, gradient-friendly architectures, spatial tranformers, etc.
• The Second Pillar, meta-learning learning algorithms, could potentially learn the building blocks : intelligent
exploration, auxiliary tasks, efficient continual learning, causal reasoning, active learning, etc.
• The Third Pillar, generating effective and fully expressive learning environments, could learn things like :
co-evolution / self-play, curriculum learning, communication / language, multi-agent interaction, etc.
On Earth,
"( . . . ) a remarkably simple algorithm (Darwinian evolution) began producing solutions to relatively simple
environments. The ‘solutions’ to those environments were organisms that could survive in them. Those organism often
created new niches (i.e. environments, or opportunities) that could be exploited. Ultimately, that process produced all
of the engineering marvels on the planet, such as jaguars, hawks, and the human mind." — Jeff Clune
Turing Complete (universal computer) : an encoding that enables the creation any possible learning algorithm.
Darwin Complete : an environmental encoding that enables the creation of any possible learning environment.
v Learning to Continually Learn. Beaulieu et al. https://arxiv.org/abs/2002.09571. Code173 .
v Self-Organizing Intelligent Matter: A blueprint for an AI generating algorithm. Anonymous et al.174
"We propose an artificial life framework of interacting neural elements as a basis of an AI generating algorithm." —
Anonymous175
v Fully Differentiable Procedural Content Generation through Generative Playing Networks. Bontrageret et
al.176
5 Symbolic AI
v Neural Module Networks for Reasoning over Text. Gupta et al.177 Code.178
v Inductive Logic Programming: Theory and methods. Muggleton, S.; De Raedt, L.179
v (Original FOIL paper) Learning Logical Definitions from Relations. J.R. Quinlan.180
v Neural-Symbolic Learning and Reasoning: A Survey and Interpretation. Besold et al.181
v On neural-symbolic computing: suggested readings on foundations of the field. Luis Lamb182 .
170
https://colab.research.google.com/github/mari-linhares/tensorflow-maml/blob/master/maml.ipynb
171
https://medium.com/pytorch/torchmeta-a-meta-learning-library-for-pytorch-f76c2b07ca6d
172
https://arxiv.org/abs/1905.10985
173
https://github.com/uvm-neurobotics-lab/ANML
174
https://openreview.net/forum?id=160xFQdp7HR
175
https://openreview.net/forum?id=160xFQdp7HR
176
https://arxiv.org/abs/2002.05259
177
https://arxiv.org/abs/1912.04971
178
https://nitishgupta.github.io/nmn-drop
179
https://doi.org/10.1016/0743-1066(94)90035-3
180
https://link.springer.com/article/10.1023/A:1022699322624
181
https://arxiv.org/abs/1711.03902
182
https://twitter.com/luislamb/status/1218575842340634626
24
A PREPRINT - N OVEMBER 5, 2020
v Neuro-symbolic A.I. is the future of artificial intelligence. Here’s how it works. Luke Dormehl183 .
v Dimensions of Neural-symbolic Integration - A Structured Survey. Sebastian Bader, Pascal Hitzler184 .
v Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective. Lamb et al.185 .
Figure 27: Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective. Lamb et al.
"The paper was inspired by the AIDebate, Gary Marcus writings, the AAAI2020 Firechat with Daniel Kahneman, and
surveys not only our work, but the work of many in these AI fields." — Luis Lamb
v DDSP: Differentiable Digital Signal Processing. Engel et al. Blog186 , Colab187 , Paper188 and Code189 .
v The compositionality of neural networks: integrating symbolism and connectionism. Hupkes et al.190
v Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective. Lamb et al.191
v A computing procedure for quantification theory. M .Davis, H. Putnam. J. of ACM, Vol. 7, pp. 201-214, 1960
v Discovering Symbolic Models from Deep Learning with Inductive Biases, Cranmer et al.192 . Blog and code193 .
v Symbolic Pregression: Discovering Physical Laws from Distorted Video. Silviu-Marian Udrescu, Max Tegmark194
v (Workshop series on neurosymbolic AI) Neural-Symbolic Integration. Hitzler et al. http://neural-symbolic.org
v Graph Colouring Meets Deep Learning: Effective Graph Neural Network Models for Combinatorial Problems.
Lemos et al. https://arxiv.org/abs/1903.04598.
v Neural-Symbolic Relational Reasoning on Graph Models: Effective Link Inference and Computation from Knowledge
Bases. Lemos et al. https://arxiv.org/abs/2005.02525.
v Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and
Reasoning. Garcez et al. https://arxiv.org/abs/1905.06088
v Differentiable Reasoning on Large Knowledge Bases and Natural Language. Minervini et al.195 Open-source
neuro-symbolic reasoning framework, in TensorFlow https://github.com/uclnlp/gntp.
v (Original ILP foundational work) Automatic Methods of Inductive Inference, Plotkin G.D. PhD thesis, University of
Edinburgh, 1970 https://era.ed.ac.uk/bitstream/handle/1842/6656/Plotkin1972.pdf;sequence=1.
6 Environments
"Run a physics sim long enough and you’ll get intelligence." — Elon Musk
183
https://www.digitaltrends.com/cool-tech/neuro-symbolic-ai-the-future/
184
https://arxiv.org/abs/cs/0511042
185
https://arxiv.org/abs/2003.00330
186
http://magenta.tensorflow.org/ddsp
187
http://g.co/magenta/ddsp-demo
188
http://g.co/magenta/ddsp-paper
189
http://github.com/magenta/ddsp
190
https://arxiv.org/abs/1908.08351
191
https://arxiv.org/abs/2003.00330
192
https://arxiv.org/abs/2006.11287
193
https://astroautomata.com/paper/symbolic-neural-nets/
194
https://arxiv.org/abs/2005.11212
195
https://arxiv.org/abs/1912.10824
25
A PREPRINT - N OVEMBER 5, 2020
"Situation awareness is the perception of the elements in the environment within a volume of time and space, and the
comprehension of their meaning, and the projection of their status in the near future." — Endsley (1987)
The OpenAI Gym https://gym.openai.com/ (Blog196 | GitHub197 ) is a toolkit for developing and comparing
reinforcement learning algorithms. What makes the gym so great is a common API around environments.
"By framing the approach within the popular OpenAI Gym framework, design firms can create more realistic
environments – for instance, incorporate strength of materials, safety factors, malfunctioning of components under
stressed conditions, and plug existing algorithms into this framework to optimize also for design aspects such as energy
usage, easy-of-manufacturing, or durability." — David Ha198
He’re another more difficult (for the agent!) new environment for Gym (evolution strategies on foo-v3):
1. Download gym-foo-v3201
2. cd gym-foo-v3
3. pip install -e .
4. python ES-foo-v3.py
26
A PREPRINT - N OVEMBER 5, 2020
v Spot Mini Mini OpenAI Gym Environment. Maurice Rahme, blog205 et code206 .
v IKEA Furniture Assembly Environment https://clvrai.github.io/furniture/.
v Minimalistic Gridworld Environment https://github.com/maximecb/gym-minigrid.
v DoorGym: A Scalable Door Opening Environment and Baseline Agent, Urakami et al., 2019207 .
v gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo, Lopez et al., 2019208 .
v OFFWORLD GYM Open-access physical robotics environment for real-world reinforcement learning209 .
v Safety Gym: environments to evaluate agents with safety constraints https://github.com/openai/safety-gym.
v TensorTrade: An open source reinforcement learning framework for training, evaluating, and deploying robust
trading agents https://github.com/tensortrade-org/tensortrade.
Unity ML Agents allows to create environments where intelligent agents (Single Agent, Cooperative and Competitive
Multi-Agent and Ecosystem) can be trained using RL, neuroevolution, or other ML methods https://unity3d.ai.
• Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence212 .
6.4 AI Habitat
AI Habitat enables training of embodied AI agents (virtual robots) in a highly photorealistic and efficient 3D simulator,
before transferring the learned skills to reality. By Facebook AI Research https://aihabitat.org/.
Why the name Habitat? Because that’s where AI agents live!
Diversity is the premier product of evolution. Endlessly generate increasingly complex and diverse learning environ-
ments213 . Open-endedness could generate learning algorithms reaching human-level intelligence[23].
• Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges
and their Solutions. Wang et al., 2020 https://arxiv.org/abs/2003.08536. Code214 .
27
A PREPRINT - N OVEMBER 5, 2020
Figure 30: The world’s largest chip : Cerebras Wafer Scale Engine https://www.cerebras.net
7 Deep-Learning Hardware
28
A PREPRINT - N OVEMBER 5, 2020
8 Deep-Learning Software
8.1 TensorFlow
• TF-Coder https://goo.gle/3gwTbB6.
• TensorFlow Lite for Microcontrollers220 .
• Intro to Keras for Researchers. Colab221 .
• Introduction to Keras for Engineers. Colab222 .
• TensorBoard in Jupyter Notebooks223 . Colab224 .
• TensorFlow 2.0 + Keras Crash Course. Colab225 .
• tf.keras (TensorFlow 2.0) for Researchers: Crash Course. Colab226 .
• TensorFlow Tutorials https://www.tensorflow.org/tutorials.
• Exploring helpful uses for BERT in your browser with TensorFlow.js227 .
• TensorFlow 2.0: basic ops, gradients, data preprocessing and augmentation, training and saving. Colab228 .
8.2 PyTorch
• PyTorch primer. Colab229 .
• Get started with PyTorch, Cloud TPUs, and Colab230 .
• MiniTorch https://minitorch.github.io/index.html
• Effective PyTorch https://github.com/vahidk/EffectivePyTorch
• PyTorch internals http://blog.ezyang.com/2019/05/pytorch-internals/
• PyTorch Lightning Bolts https://github.com/PyTorchLightning/pytorch-lightning-bolts
29
A PREPRINT - N OVEMBER 5, 2020
Figure 31: On October 25, 2018, the first AI artwork ever sold at Christie’s auction house fetched USD 432,500.
"The Artists Creating with AI Won’t Follow Trends; THEY WILL SET THEM." — The House of Montréal.AI Fine Arts
233
https://magenta.tensorflow.org/2016/11/09/tuning-recurrent-networks-with-reinforcement-learning
234
https://openai.com/blog/musenet/
235
http://people.csail.mit.edu/liangs/papers/ToG18.pdf
236
https://arxiv.org/pdf/1903.02678.pdf
237
http://proceedings.mlr.press/v80/ganin18a.html
238
https://github.com/deepmind/spiral
30
A PREPRINT - N OVEMBER 5, 2020
"(AI) will rank among our greatest technological achievements, and everyone deserves to play a role in shaping it." —
Fei-Fei Li
Figure 33: The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies. Zheng et al.
https://arxiv.org/abs/2004.13332
v AI Index. http://aiindex.org.
v The State of AI Report. https://www.stateof.ai/.
v Malicious AI Report. https://arxiv.org/pdf/1802.07228.pdf.
v Artificial Intelligence and Human Rights. https://ai-hr.cyber.harvard.edu.
v The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies, Zheng et al.239 . Blog240 .
v Ethically Aligned Design, First Edition241 . From Principles to Practice https://ethicsinaction.ieee.org.
v ADDRESS PREPARED BY POPE FRANCIS FOR THE PLENARY ASSEMBLY OF THE PONTIFICAL
ACADEMY FOR LIFE242 .
"It’s springtime for AI, and we’re anticipating a long summer." — Bill Braun
References
[1] Mnih et al. Human-Level Control Through Deep Reinforcement Learning. In Nature 518, pages 529–533. 26
February 2015. https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
[2] Yann LeCun, Yoshua Bengio and Geoffrey Hinton. Deep Learning. In Nature 521, pages 436–444. 28 May 2015.
https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf
[3] Goodfellow et al. Generative Adversarial Networks. arXiv preprint arXiv:1406.2661, 2014. https://arxiv.
org/abs/1406.2661
[4] Yoshua Bengio, Andrea Lodi, Antoine Prouvost. Machine Learning for Combinatorial Optimization: a Method-
ological Tour d’Horizon. arXiv preprint arXiv:1811.06128, 2018. https://arxiv.org/abs/1811.06128
[5] Brockman et al. OpenAI Gym. 2016. https://gym.openai.com
[6] Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint
arXiv:1810.04805, 2018. https://arxiv.org/abs/1810.04805
[7] Dai et al. Semi-supervised Sequence Learning. arXiv preprint arXiv:1511.01432, 2015. https://arxiv.org/
abs/1511.01432
[8] Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. arXiv preprint arXiv:1602.01783, 2016.
https://arxiv.org/abs/1602.01783
239
https://arxiv.org/abs/2004.13332
240
https://blog.einstein.ai/the-ai-economist/
241
https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf
242
http://w2.vatican.va/content/francesco/en/speeches/2020/february/documents/papa-francesco_
20200228_accademia-perlavita.html
31
A PREPRINT - N OVEMBER 5, 2020
[9] Schulman et al. Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347, 2017. https:
//arxiv.org/abs/1707.06347
[10] Mnih et al. Playing Atari with Deep Reinforcement Learning. DeepMind Technologies, 2013. https://www.cs.
toronto.edu/~vmnih/docs/dqn.pdf
[11] Ha et al. Recurrent World Models Facilitate Policy Evolution. arXiv preprint arXiv:1809.01999, 2018. https:
//arxiv.org/abs/1809.01999
[12] Kenneth et al. Designing neural networks through neuroevolution. In Nature Machine Intelligence VOL 1, pages
24–35. January 2019. https://www.nature.com/articles/s42256-018-0006-z.pdf
[13] So et al. The Evolved Transformer. arXiv preprint arXiv:1901.11117, 2019. https://arxiv.org/abs/1901.
11117
[14] Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv
preprint arXiv:1712.01815, 2017. https://arxiv.org/abs/1712.01815
[15] Silver et al. AlphaGo Zero: Learning from scratch. In DeepMind’s Blog, 2017. https://deepmind.com/blog/
alphago-zero-learning-scratch/
[16] Andrychowicz et al. Learning to learn by gradient descent by gradient descent. arXiv preprint arXiv:1606.04474,
2016. https://arxiv.org/abs/1606.04474
[17] Nichol et al. Reptile: A Scalable Meta-Learning Algorithm. 2018. https://blog.openai.com/reptile/
[18] Frans et al. Meta Learning Shared Hierarchies. arXiv preprint arXiv:1710.09767, 2017. https://arxiv.org/
abs/1710.09767
[19] Zoph and Le, 2017 Neural Architecture Search with Reinforcement Learning. arXiv preprint arXiv:1611.01578,
2017. https://arxiv.org/abs/1611.01578
[20] Finn et al., 2017 Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. arXiv preprint
arXiv:1703.03400, 2017. https://arxiv.org/abs/1703.03400
[21] Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. 2017. https:
//blog.openai.com/evolution-strategies/
[22] Lehman et al. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary
Computation and Artificial Life Research Communities. arXiv preprint arXiv:1803.03453, 2018. https://arxiv.
org/abs/1803.03453
[23] Wang et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse
Learning Environments and Their Solutions. arXiv preprint arXiv:1901.01753, 2019. https://arxiv.org/abs/
1901.01753
[24] Foerster et al. Learning to Model Other Minds. 2018. https://blog.openai.com/
learning-to-model-other-minds/
[25] Rabinowitz et al. Machine Theory of Mind. arXiv preprint arXiv:1802.07740, 2018. https://arxiv.org/abs/
1802.07740
32