This is the second of the blog posts where I am going to summarize, explain and share my thoughts on some of the influential scientific papers that are related to learning, a subject that I am very passionate about. My aim in these blog posts is to present the thought process, the idea and the technicalities of these papers in a comprehensive and simple way.

In this blog post, the main subject will be neural networks and cognitive processes. The papers I have chosen to include are: Rumelhart, Hinton, Williams (1986), Richards, Lillicrap, Beaudoin, et al. (2019) and Lake, Ullman, Tenenbaum, Gershman (2017).
Rumelhart, Hinton, Williams (1986) – Learning Representations by Back-Propagating Errors
This is a more technical paper than the ones I have previously explained and discussed. Its core ideas are groundbreaking and worth understanding. If you’re interested in the technical details, I highly recommend reading the paper itself.
Before backpropagation, training neural networks with multiple layers was incredibly difficult. The main issue was that it was unclear how much each layer contributed to the final error, making it hard to adjust the weights effectively. This is where backpropagation changed everything.
While this is not the first paper that discusses neural networks and backpropagation, it is one of the most influential. Backpropagation is an algorithm that computes gradients of the error function with respect to each weight.
In simple terms, it is a neural network optimization method that works by repeatedly adjusting weights assigned to artificial neurons to minimize error as much as possible. Its impact lies in making it possible to train neural networks with multiple hidden layers. The problem with multiple layers was that the impact of each layer on the final error (the difference between predicted and correct answers) was unclear, therefore, it wasn’t possible to make accurate adjustments. Backpropagation solves this by propagating the error backward from the output layer through each hidden layer, determining the exact contribution of each weight to the error.
If you’re curious to see backpropagation in action with illustrated explanation, I highly recommend Andrej Karpathy’s YouTube series Neural Networks: Zero to Hero1, specifically the first video. It’s a 10 video YouTube series that will no doubt contribute a lot to anyone who is interested in the subject of artificial neural networks and it’s technicalities. It is actually zero to hero, therefore it is okay to follow with little or no prior information, he does a really good job of explaining these complex subjects. Another similiar YouTube channel I can recommend is StatQuest2. See footnotes at the end of this blog post.
Short Thoughts: Multilayer neural networks and backpropagation form the foundation of modern deep learning. Therefore, the influence of this paper is undeniable.
I think it is important to point out that backpropagation also contributes to the representation theory as it organizes the data in hierarchies, similar to cognitive processes described in representational theory of mind. Artificial neural networks use weights and activators to create representations of input data, similarly, cognitive processes described in representational theory of mind use structured mental representations to process and interpret information, organizing concepts into hierarchical frameworks that reflect patterns of thought, perception, and learning. Just as artificial neural networks adjust weights to refine their internal representations, cognitive processes dynamically update and reorganize mental structures, based on experience and context.
I think the word “learning” is another aspect to point out. Artificial neural networks are inspired by structure of human brain, plus the method of backpropagation intersects with theories of mind (e.g. RTM). This raises an interesting question: is this how humans learn? The answer is no. The word learning has different representations in artificial learning systems and human learning. While artificial neural networks are only about minimizing the error to make accurate predictions, human learning involves more; such as generalization, adaptation, and abstract reasoning.
Understanding generalization is one of AI’s greatest challenges today. It’s a fascinating area to explore for anyone curious about it.
Deep Learning Framework for Neuroscience (2019) – Richards
Due to my interest in philosophy, cognitive science and ML/Deep Learning, a paper that bridges the gap between our understanding of the mind and the potential of artificial intelligence was a fun read.
The paper tackles a central problem: how do we make sense of the overwhelming amount of data coming from modern neuroscience? The classical approach, while successful for simple computations, struggles with the complexity of large neural circuits. The paper proposes a solution: Borrow a framework from the world of Deep Learning. By focusing on objective functions, learning rules, and architectures, we can create models that explain why neural responses are the way they are, rather than just describing what they are. It is about finding the underlying principles that govern brain function, even if the details are messy and complex.
Here are some key points from the paper to have a basic understanding of it:
Classical Neuroscience Has Limits: The traditional approach of studying individual neurons and building circuit-level theories might not cut it for complex brain regions like the neocortex or hippocampus. It’s like trying to understand a symphony by only listening to the flutes.
ANNs as Brain Mimics: Artificial Neural Networks (ANNs) offer a way to model neural computation using simplified units that learn rather than being explicitly programmed. Think of it as building a LEGO brain, where the pieces (neurons) connect and learn to perform tasks.
Deep Learning is ANN 2.0: Deep Learning is essentially a more advanced version of ANNs, using multiple layers to train hierarchical networks. The “deep” part means that each layer contributes to the learning goals, which brings us to the credit assignment problem.
Brains and Machines Unite: Deep learning isn’t just for tech; it can inform our theories about the brain. Deep ANNs can mimic how our brains process information and help us understand phenomena like grid cells or visual illusions.
The Holy Trinity of ANN Design: When designing ANNs, we focus on objective functions (what the network aims to achieve), learning rules (how the network updates its connections), and architectures (how the network is structured).
AI vs. Brain: Task Sets: The paper introduces the ‘AI Set’ (tasks animals perform effortlessly) and the ‘Brain Set’ (tasks a species evolved to perform), highlighting the overlap between AI and neuroscience.
Inductive Biases are Key: Inductive biases are the assumptions we make about the solutions to a problem. In other words, it is prior knowledge embedded into an optimization system.
Credit Assignment: The credit assignment problem refers to figuring out how much each neuron or synapse is responsible for a particular outcome.
Brains Have Objectives Too: Objective functions aren’t just for machines; they can be mathematically defined for the brain, regardless of the specific task or environment.
The Problem and the Solution: Scaling Up Our Understanding
The paper attempts to provide a new lens for neuroscience, a framework. This framework offers a new way to approach the brain, inspired by the successes of deep learning. It’s not about replacing traditional methods but complementing them with a broader, more holistic perspective.
The framework suggests that instead of summarizing how a computation is performed, one should summarize what objective functions, learning rules, and architectures would enable learning of that computation. The paper posits that this optimization-based framework can drive theoretical and experimental progress in neuroscience. It is based on the idea that the objective functions, learning rules, and architectures of the brain are likely relatively simple and compact, especially compared to the computations performed by individual neurons. It suggests that a normative framework explaining why neural responses are the way they are might be best obtained by viewing neural responses as an emergent consequence of the interplay between objective functions, learning rules, and architecture.
Based on this, the authors suggest that by identifying a normative explanation using the three components (objective functions, learning rules, architecture), it may be a fruitful way to develop better, non-compact models of the response properties of neurons in a circuit. This approach involves building working models using the three core components and then comparing the models with the brain on multiple levels:
• Solving complex tasks from the Brain Set under consideration.
• Being informed by knowledge of anatomy and plasticity.
• Reproducing the representations and changes in representation observed in brains.
This deep-learning-inspired framework allows researchers to study each of the three components in isolation.
Short Thoughts: What I find most intriguing about this paper is its potential to bridge the gap between different levels of explanation. Since the first time I read Descartes’ Meditations, the idea of finding compact, normative explanations for complex phenomena has always seemed fascinating to me. This framework offers a way to do just that, by identifying the underlying principles that govern brain function.
From a cognitive science perspective, I like how this approach aligns with the idea of the brain as a predictive machine. The emphasis on objective functions like minimizing description length or maximizing mutual information resonates with the idea that the brain is constantly trying to anticipate and make sense of the world.
And as someone working in ML/Deep Learning, I see the practical value of this framework. Due to the framework’s foundation in deep learning, neuroscience discoveries made using this approach may be more readily applicable to artificial intelligence because the common ground provided by the framework should reduce the effort required to translate findings between natural and artificial systems. Finally, the sections on inductive biases and objective functions were particularly interesting for me personally because it reminded me of my first year of studying philosophy in university, learning about a priori and a posteriori knowledge. The idea that the brain comes pre-equipped with certain assumptions about the world, shaped by evolution and experience, is both intuitive and powerful. And the quest to identify the objective functions that drive brain activity, from homeostasis to empowerment, is a fascinating area of research with profound implications for our understanding of motivation, learning, and behavior.
Building Machines That Learn and Think Like People (2017) – Lake, Ullman, Tenenbaum, Gershman
In the last few years, through the advancements in Deep Learning and AI research, deep learning models can now generate human-like text, produce very high quality images from simple prompts, and even master complex games with superhuman skill. Breakthroughs in LLMs, reinforcement learning, and multimodal AI continue to push the boundaries of what machines can do. However, with these achievements, one fundamental question gets more and more spotlight:
Are these machines truly thinking like people?
The authors argue that current AI systems, impressive as they are, still fall short of human-like intelligence. While deep learning excels at pattern recognition and statistical inference, human cognition is built on more than just data-driven learning. We possess rich world knowledge, intuitive physics, and the ability to generalize from very little experience – traits that today’s AI struggle to replicate.
To overcome this, the paper proposes integrating insights from cognitive science into AI. The core idea is equipping machines with:
- The ability to build causal models
- Innate or learned intuitive theories about how the world works
- Mechanisms for compositional representation and learning-to-learn.
Authors suggest that by incorporating these ingredients, AI systems can achieve more human-like learning, characterized by rapid knowledge acquisition, flexible generalization and ability to reason and explain.
Before we continue to talk more about the paper, here are some key takeaways or info pills as I like to call them:
- Human-like AI requires more than just pattern recognition.
- Cognitive science offers valuable insights for building more intelligent machines.
- Causal models, intuitive theories, composionality, and learning-to-learn are crucial ingredients for human-like learning.
- Integrating these ingredients into AI systems is a challenging but promising research direction.
- The article suggests that deep learning and other computational paradigms should aim to tackle tasks using as few training data as people need, and also to evaluate models on a range of human-like generalizations beyond the one task on which the model was trained.
In paper, two challenge problems for ML and AI are presented to showcase the importance of core cognitive ingredients that I previously stated: The Characters Challenge and the Frostbite Challenge. I will only explain the former.
The Characters Challenge is used to compare the ability of humans and machines to recognize handwritten characters. As humans, when we see a handwritten character, we can already start writing it and even create variations of it. For machines, not so much. Anybody with even a little experience with training deep learning models will recognize that machines actually learn from scratch and require a vast amount of data, far more than humans do. Plus, if you want your trained AI model to recognize a wide range of variations of that one character, your dataset must be large and diverse enough to cover those differences. The takeaway here is that humans can learn with fewer examples and they can learn a wider range of variations.
Why is that? Why do machines have such a hard time with something that seems so easy and natural for us? The paper suggests that humans have “developmental start-up software”. It is simply a set of core knowledge about the world that we have from early on and it shapes how we learn. It’s like a foundation that we build everything else on top of. The paper focuses on two main parts of this: Intuitive physics, which is how we understand objects and their behaviour, and intuitive psychology which helps us make sense of other minds.
To break them down, paper exemplifies babies. In terms of intuitive physics, they show an understanding of the physical world, they can tell the difference between solid and liquids, they can anticipate how objects fall. When it comes to intuitive psychology, it is about understanding that other beings have goals and intentions. Even babies react differently to helpful and unhelpful actions. They have an understanding about the intentions behind actions. So it’s not only about the understanding of what someone is doing, but also the understanding of why they are doing it. In other words, recognizing there is a mind behind faces and actions.
This intuitive physics and intuitive psychology abilities that we have helps us to react quickly. For example, reacting to hold a falling object, or to predict actions in complex social situations. The Frostbite Challenge that I mentioned but did not explain earlier is used to demonstrate humans’ ability of intuitive psychology.
To paint the difference between human and machine learning: These intuitive theories take years of interaction with the world to develop, different than machines. Deep learning is great at pattern recognition. However, human learning is more about building a theory piece by piece and this building of mental theories is key to human learning. Instead of passively taking data like machines, humans actively try to make sense of it… It brings us to the final subject I will mention from the paper.
Learning-To-Learn
As I am always excited to learn more about the concept of learning, the section of “Learning-to-learn” was a fun read. The first sentence of the section is eye-opening: When humans or machines make inferences that go far beyond the data, strong prior knowledge (or inductive biases or constraints) must be making up the difference (Geman et al. 1992; Griffiths et al. 2010; Tenenbaum et al. 2011).
Unlike machines, humans do not approach each new task like it’s the first time. Past experiences, even the ones that seem unrelated, actually shape how we handle challenges. This highlights something really important in human learning. Humans do not just collect data points, we build a framework that lets us learn new things more efficiently, we build on what we already know and not starting from scratch every time and this is the ability to learn how to learn.
Short Thoughts: I think what we mainly need to talk about is how the ideas in the paper seem when we look at them from today, in order to have a practical understanding of the paper. Some of these ideas held up well, some are still waiting to be discovered and some didn’t age well.
For example an idea from the paper that held up well is about intuitive physics and theory of mind: AI doesn’t possess a true understanding of the physical world or others’ intentions. While GPT-4 can answer questions about physics, it doesn’t understand the cause-and-effect principles that govern the real world or recognize the mental states behind human actions.
To exemplify an idea that is still to be discovered: The idea that “AI should generalize like humans, seeing one example and applying to new situations” is still to be discovered. For example, GPT-4 can generalize within its training distribution but struggles with completely new domains. It can write Python for different purposes but if we were to invent a new programming language, it won’t generalize well.
Lastly to exemplify an idea that didn’t age well: The idea that “Deep learning has limits and needs fundamental changes to achieve AGI.” didn’t age well, although not falsified as we haven’t achieved AGI yet. However, LLMs have gone way beyond expectations without needing radical changes. The “just scale it” approach has worked better than expected, larger models with more data have returned more intelligence.
📌 Sources & Further Reading
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature.
- Richards, B. A., Lillicrap, T. P., Beaudoin, P., et al. (2019). A deep learning framework for neuroscience. Nature Neuroscience.
- Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences.
- Yamins, D. L., & DiCarlo, J. J. (2016). Using Deep Learning to Understand and Optimize the Brain’s Visual Representation System. Current Opinion in Neurobiology.
📝 Footnotes

Leave a Reply