by Jacob Buckman and Carles Gelada
In “Funes the Memorious” by Jorge Luis Borges, a fall from a horse leaves Irineo Funes paralyzed, but grants him the ability to recall everything he has ever experienced in perfect detail. But though it initially seems to be a superpower, it is revealed that his superior memory has crippled him to a far greater degree than his physical injury:
With no effort, he had learned English, French, Portuguese and Latin. I suspect, however, that he was not very capable of thought. To think is to forget differences, generalize, make abstractions. In the teeming world of Funes, there were only details, almost immediate in their presence.
This description seems strikingly prescient. At the core of modern AI research lies generalization, the idea that we can learn to understand a massive space of concepts from a relatively small amount of data. Here, Borges identifies the secret to this power: abstraction. Good abstraction hides unimportant distinctions, keeping around only the relevant details. Abstraction is what allows us to use past experience to intelligently guide future behavior. Two situations that seem different on the surface, like buying a snowcone from an ice-cream truck in New Hampshire and buying a taco from a food truck in San Diego, can both be abstracted to “making a purchase” or “eating outside”, and so experience with one can serve as a partial guide for the other. But pure memorization undermines abstraction, obscuring important similarities and rendering generalization impossible. So it was with poor Fumes, whose impeccable recall even stole from him abstractions as basic as object permanence:
Not only was it difficult for him to comprehend that the generic symbol dog embraces so many unlike individuals of diverse size and form; it bothered him that the dog at 3:14 (seen from the side) should have the same name as the dog at 3:15 (seen from the front).
Borges was not writing about artificial neural networks; he was writing about humans. And yet, these paragraphs would not appear out of place in a machine learning textbook, describing a model that had overfit to the training set. Funes is like the learning algorithm that just keeps a lookup table with all the input-output pairs it has ever seen. This algorithm will never forget a detail of any input it has seen, but is completely useless as a predictive model. To abstract is to generalize, to look at a finite number of data points and see an infinite space of possibilities; neither Funes, nor a lookup table, have that ability.
Fortunately, you and I do! In fact, we just used that ability in preceding paragraphs. We abstracted away some unimportant differences between humans and machine learning models in order to better understand a crucial commonality. It doesn’t matter that one is made of carbon, and the other of silicon; or that the connection patterns of the two “neural networks” are completely different; or that human learning is driven by complex biological processes, while ML models are trained via gradient descent. We can understand both as dynamical systems with a certain goal (to generalize from a small set of observations to a much larger space) and similar properties (they develop abstractions). That’s a bona fide abstraction right there.
Abstractions Drive Progress, But Also Hinder It
It should come as no surprise that many great leaps in human understanding arrived via the discovery of increasingly useful abstractions. One simple example of this is the discovery of numbers. Although numbers are likely so deeply ingrained in your thinking that you barely even notice them, we should take a moment to appreciate that they are a piece of invaluable cultural heritage, a beautiful and powerful abstraction discovered by our distant ancestors that has improved our thinking in incalculable ways.
Onions and apples taste and smell very different, and yet there a wide variety of problems and situations where they behave exactly the same. If you start with 8 apples (or onions) and someone takes 3, you are always going to be left with 5 apples (or onions). Same with bananas, same with cantaloupes. And it gets even crazier than that. It works for elephants and donkeys. It works for paintings and parachutes and pamphlets. The idea of counting works on things as small as atoms and as big as galaxies. It is hard to overstate how surprising it is that there is any similarity whatsoever between things that are so different in all the ways things can be different. Identifying this symmetry seems simple enough in retrospect, but empirically, it was actually a somewhat rare accomplishment. Plenty of pre-literate cultures that have no tradition of writing or institutional education, including indigenous societies in Australia, South America and Africa, lack specific words for numbers larger than about five or six. Bigger numbers are instead referred to by generic words equivalent to ‘several’ or ‘many’.
The abstraction of the natural numbers emerged the exact moment we discovered that all the parts of the wold that involved discrete objects shared a deep structure. The moment we leared to ignore the differences and focused on what was similar. Finding an abstraction is a super power. Our experiences dealing with problems in one case immediately translate into the ability to solve related problems in the infinitely many situations we now understand to be the same.
Of course, anything this powerful can be harmful just as easily as helpful. Though good abstractions are force multipliers for improving understanding, bad abstractions can have the opposite effect. At best, bad abstractions bog down our thoughts, making it difficult to see connections and advance ideas. At worst, bad abstractions form impenetrable barriers, trapping those who use them into mental frameworks that make it impossible to see beyond. Of course, people wouldn’t knowingly adopt bad abstractions: most bad abstractions were, at one time, the best tool available. Their faults can often only be seen in retrospect, via juxtaposition with a better competitor.
One example of the impact of a bad abstraction can be seen in the Medieval European use of the Roman numeral system. Roman numerals are very unwieldy when it comes to computation, especially computations which use large numbers. But in the 6th or 7th century, Hindu and Arab mathematicians developed the much more elegant base-10 system of Arabic numerals that we still use today. During this time, commerce and mathematics flourished in the Arab world, while Europe lagged by comparison. When Fibonacci brought Arab numerals to popular attention in Europe in the 13th century, there was a backlash: the Church called them demonic symbols of the Arabs, and some cities even banned them. The Europeans, attached to the bad abstractions they already knew, found all sorts of reasons to criticize and dismiss these new ideas. Of course, over the next few centuries, people came to realize the power of Arabic numerals, and gradually switched over. This coincided with the start of the European Renaissance.
Bad abstractions have hindered progess in physics, too. Aristotle was a brilliant philosopher, and his ideas were hugely influential in the development of western thought. But some of the abstractions he described did more harm than good. One example was the classic Hellenistic idea that reality was divided into two parts, the earthly sphere and the heavenly sphere. We now know that the same physical laws describe both regimes, but because of the artificial boundary that had been placed between them, scholars and philosphers who studied Aristotlelian physics were handicapped in their ability to draw connections between the earth and the heavens. It took nearly two millenia before this bad abstraction was cast aside by Isaac Newton when he unified the two spheres with his universal laws of motion and gravitation.
Another Aristotelian abstraction, perhaps less well-known, was the division between the categories of “natural” and “artificial”. Aristotle believed that these two categories needed to be studied separately: natural things moved according to their own volition, whereas artifical things needed an outside force acting upon them. In “To Explain The World”, Steven Weinberg notes that Aristotle and his students were very astute observers of nature – for example, deducing optical laws by carefully observing rainbows – but never thought to set up anything akin to the experments with prisms conducted centuries later by Isaac Newton. Weinberg speculates this oversight may have been due to the Aristotelian natural-artificial distinction. After all, what could be learned about the glorious rainbows of nature from an artificial tool like a glass prism? It took nearly two thousand years before Galileo overcame this erroneous framework and began the modern tradition of scientific experimentation. Since then, setting up artificial experiments has been the main way we further our understanding of the natural world.
Choosing Good Abstractions
Newton famously said, “If I have seen further, it is by standing on the shoulders of giants.” This is a sentiment that the scientific community has taken to heart. There is a culture of immense respect for the scientists of the past, who have laid the groundwork – built up the abstractions – that we use today. Scientific papers are expected to be packed with citations, filled to the brim with any reference that could be considered “prior work”. Fresh contributions are expected to be framed in terms of prior contributions: an extension of a result, a generalization of a theorem, a variant of an algorithm. Probably the most common piece of advice for an advisor to give to any student is “go read up on some of this literature”.
Respect for the scientists who have come before us is natural and healthy. But sometimes I feel it goes too far. It would be impossible to progress as scientist without leveraging the good abstractions developed by those who came before us, but recognizing and replacing bad abstractions is an important part of scientific progress also. Newton stood on the shoulders of Descartes, but he had a very different relationship with Aristotle.
Currently, over the course of a scientific education, most time is spent teaching students about specific abstractions, which are time-tested and reality-approved. For example, in physics class, we learn about “atoms” and “charge” and “magnetism”, and that \(F=ma\). But much less time is spent on the meta-question: what is it about these ideas that makes them so powerful? How might one have been able to tell, without the benefit of hindsight, which ideas would end up winning?
This question is of vital importance to anybody doing scientific research. At the cutting edge, most abstractions are no older than a few decades, and it is not at all obvious which ones will pan out. As a researcher, there are two choices: either choose an existing framework and work within it to derive results, or create a new set of abstractions and demonstrate their usefulness. In both cases, it is crucial to have the ability to distinguish between good abstractions and bad ones.
How can we do this? I don’t have a perfect answer. I don’t think any one does. But I do spend a lot of time thinking about it. Personally, the two main heuristics I use are utility and aesthetics. Did this abstraction help me understand an idea? Solve a problem? Explain a concept? Does this abstraction naturally translate into code? Into math? Does this abstraction allow me to describe things cleanly, or does almost every situation require caveats? Does this abstraction categorize things neatly, or are many things difficult to place? I suspect most scientists are primarily guided by these sorts of subjective heuristics.
How Bad Abstractions Propagate
I’ve also found it helpful to think about the propagation of bad abstractions. That is – if some concepts are not useful, why might they still continue to exist? Here are some reasons:1
- History. Many ideas are initially developed separately, and it takes some time until the connections between them can be seen. This means that for relatively new ideas, there are often artificial distinctions drawn between fundamentally equivalent concepts. Be suspicious of abstractions that involve superficial distinctions on recent ideas, e.g., “there are three types of X…”
- Pedagogy. People who learn to think a certain way get locked-in, and propagate this knowledge to the next generation of students. This is especially true when careers or legacies are on the line. Be suspicious of abstractions propagated primarily by people who have skin in the game, instead of by authentic interest from those who are not already invested.
- Incentives. Many incentives exist in academia that encourage niche (i.e. poorly abstracted) knowledge. For example, writing a paper full of impenetrable domain-specific terminology conveys an aura of competence; “inventing” a new sub-field of study helps build an academic brand and curate prestige. In these situations, failing to realize that you have a bad abstraction is incentivized, and you should never underestimate the ability of humans to ignore what is inconvenient. Be suspicious of abstractions that are introduced as a specific special case of a more well-known idea.
Of course, just because an abstraction is propagated via one of these methods doesn’t necessarily mean that it is bad. But it does make it more likely to be bad, due to “explaining away”. If many people are using some abstraction, it is either because it is useful, or for some other reason. If a particular idea clearly propagates due to one of the above reasons, then it is less likely to also be useful.
Another heuristic that can be helpful in identifying good abstractions is Lindy’s law, which states that a good estimate of the remaining lifespan of a non-perishable item is its past lifespan. In other words, an abstraction that has been around for a long time and seen a lot of use is more likely to continue to stick around than one that is relatively fresh. It’s reasonable to focus on learning more time-tested (and battle-tested) abstractions, which are more likely to contain genuinely deep insight.
One major criticism I have of college education was that it is severely lacking in the context required to make these sorts of “Lindyness” judgement calls. In a typical undergraduate lecture, the course material is simply presented as raw fact. I remember one semester going from an algebra lecture to a linguistics lecture to a software engineering lecture; these three classes had roughly the same presentation style and workload, and I was expected to internalize the abstractions in all three. There was no acknowledgement of the fact that one of these classes contained ideas that had been developed over the course of hundreds of years by thousands of people, and formed the backbone of an enormous amount of real-world technology; while another was developed within the last century and only ever used by a handful of academics; and the last was teaching ideas that had been developed only a decade prior by one team at a big tech company.
Once again, the fact that their material is much less proven does not necessarily mean that the abstractions presented in the latter two classes are less important. But with the benefit of hindsight, I can now say that in this case, they were; if I could do undergrad again, I would shift most of my time towards subjects like math, whose ideas have proven to be more durable and fundamental. Lindy’s law is a good way to predict which ideas these will be.
A final comment: when applying this law, be mindful of scale. We’ve discussed some abstractions that are ancient and immensely powerful (like counting); many others are more modern (like atoms, random variables, or evolution), but have proven themselves in many different arenas, both theoretically and empircally. Compared to these titans of tried-and-true, there is almost no difference between an idea that was proposed ten years ago and one that was proposed yesterday. So, don’t use Lindy’s law as carte-blanche justification for the status quo. Invoke it only if the abstraction in question truly deserves it.
The Social Value of Shared Abstractions
Even if we identify that a particular abstraction is bad, it doesn’t mean we should discard it immediately. Communication between individuals is only possible on a substrate of shared abstraction. Even if we have chosen to discard this abstraction as a model of reality, it’s still worth keeping around as a model of how other people think. This will allow you to continue to engage with the community, participate in discussions, and interpret prior literature.
But when ingesting new information, we can get in the habit of mapping from the bad abstraction into a better one, and work with the improved version. If we have made the correct call, we will be rewarded with clearer thinking, deeper insight, and a richer set of connections to other areas. And of course, once we have convinced ourselves of the value of this new way of thinking, we should share our insights with others. Eventually, we will bring the whole community on board.
In the coming months, I will be writing about some specific abstractions in current ML, both the good and the bad. I’m hoping this post helps to motivate the importance of that type of discussion.2