May 13

A new way to build tiny neural networks could create powerful AI on your phone


We’ve been wasting our processing power to train neural networks that are ten times too big.

by Karen Hao May 10, 2019


Neural networks are the core software of deep learning. Even though they’re so widespread, however, they’re really poorly understood. Researchers have observed their emergent properties without actually understanding why they work the way they do.

Now a new paper out of MIT has taken a major step toward answering this question. And in the process the researchers have made a simple but dramatic discovery: we’ve been using neural networks far bigger than we actually need. In some cases they’re 10—even 100—times bigger, so training them costs us orders of magnitude more time and computational power than necessary.


Put another way, within every neural network exists a far smaller one that can be trained to achieve the same performance as its oversize parent. This isn’t just exciting news for AI researchers. The finding has the potential to unlock new applications—some of which we can’t yet fathom—that could improve our day-to-day lives. More on that later.

But first, let’s dive into how neural networks work to understand why this is possible.



A diagram of a neural network learning to recognize a lion. JEFF CLUNE/SCREENSHOT

How neural networks work

You may have seen neural networks depicted in diagrams like the one above: they’re composed of stacked layers of simple computational nodes that are connected in order to compute patterns in data.

The connections are what’s important. Before a neural network is trained, these connections are assigned random values between 0 and 1 that represent their intensity. (This is called the “initialization” process.) During training, as the network is fed a series of, say, animal photos, it tweaks and tunes those intensities—sort of like the way your brain strengthens or weakens different neuron connections as you accumulate experience and knowledge. After training, the final connection intensities are then used in perpetuity to recognize animals in new photos.

While the mechanics of neural networks are well understood, the reason they work the way they do has remained a mystery. Through lots of experimentation, however, researchers have observed two properties of neural networks that have proved useful.


Observation #1. When a network is initialized before the training process, there’s always some likelihood that the randomly assigned connection strengths end up in an untrainable configuration. In other words, no matter how many animal photos you feed the neural network, it won’t achieve a decent performance, and you just have to reinitialize it to a new configuration. The larger the network (the more layers and nodes it has), the less likely that is. Whereas a tiny neural network may be trainable in only one of every five initializations, a larger network may be trainable in four of every five. Again, why this happens had been a mystery, but that’s why researchers typically use very large networks for their deep-learning tasks. They want to increase their chances of achieving a successful model.


Observation #2. The consequence is that a neural network usually starts off bigger than it needs to be. Once it’s done training, typically only a fraction of its connections remain strong, while the others end up pretty weak—so weak that you can actually delete, or “prune,” them without affecting the network’s performance.

For many years now, researchers have exploited this second observation to shrink their networks after training to lower the time and computational costs involved in running them. But no one thought it was possible to shrink their networks before training. It was assumed that you had to start with an oversize network and the training process had to run its course in order to separate the relevant connections from the irrelevant ones.

Jonathan Frankle, the MIT PhD student who coauthored the paper, questioned that assumption. “If you need way fewer connections than what you started with,” he says, “why can’t we just train the smaller network without the extra connections?” Turns out you can.



Michael Carbin (left) and Jonathan Frankle (right), the authors of the paper. JASON DORFMAN, MIT CSAIL

The lottery ticket hypothesis

The discovery hinges on the reality that the random connection strengths assigned during initialization aren’t, in fact, random in their consequences: they predispose different parts of the network to fail or succeed before training even happens. Put another way, the initial configuration influences which final configuration the network will arrive at.

By focusing on this idea, the researchers found that if you prune an oversize network after training, you can actually reuse the resultant smaller network to train on new data and preserve high performance—as long as you reset each connection within this downsized network back to its initial strength.

From this finding, Frankle and his coauthor Michael Carbin, an assistant professor at MIT, propose what they call the “lottery ticket hypothesis.” When you randomly initialize a neural network’s connection strengths, it’s almost like buying a bag of lottery tickets. Within your bag, you hope, is a winning ticket—i.e., an initial configuration that will be easy to train and result in a successful model.

This also explains why observation #1 holds true. Starting with a larger network is like buying more lottery tickets. You’re not increasing the amount of power that you’re throwing at your deep-learning problem; you’re simply increasing the likelihood that you will have a winning configuration. Once you find the winning configuration, you should be able to reuse it again and again, rather than continue to replay the lottery.

Next steps

This raises a lot of questions. First, how do you find the winning ticket? In their paper, Frankle and Carbin took a brute-force approach of training and pruning an oversize network with one data set to extract the winning ticket for another data set. In theory, there should be much more efficient ways of finding—or even designing—a winning configuration from the start.

Second, what are the training limits of a winning configuration? Presumably, different kinds of data and different deep-learning tasks would require different configurations.

Third, what is the smallest possible neural network that you can get away with while still achieving high performance? Frankle found that through an iterative training and pruning process, he was able to consistently reduce the starting network to between 10% and 20% of its original size. But he thinks there’s a chance for it to be even smaller.

Already, many research teams within the AI community have begun to conduct follow-up work. A researcher at Princeton recently teased the resultsof a forthcoming paper addressing the second question. A team at Uber also published a new paper on several experiments investigating the nature of the metaphorical lottery tickets. Most surprising, they found that once a winning configuration has been found, it already achieves significantly better performance than the original untrained oversize network before any training whatsoever. In other words, the act of pruning a network to extract a winning configuration is itself an important method of training.

Neural network nirvana

Frankle imagines a future where the research community will have an open-source database of all the different configurations they’ve found, with descriptions for what tasks they’re good for. He jokingly calls this “neural network nirvana.” He believes it would dramatically accelerate and democratize AI research by lowering the cost and speed of training, and by allowing people without giant data servers to do this work directly on small laptops or even mobile phones.

It could also change the nature of AI applications. If you can train a neural network locally on a device instead of in the cloud, you can improve the speed of the training process and the security of the data. Imagine a machine-learning-based medical device, for example, that could improve itself through use without needing to send patient data to Google’s or Amazon’s servers.

“We’re constantly bumping up against the edge of what we can train,” says Jason Yosinski, a founding member of Uber AI Labs who coauthored the follow-up Uber paper, “meaning the biggest networks you can fit on a GPU or the longest we can tolerate waiting before we get a result back.” If researchers could figure out how to identify winning configurations from the get-go, it would reduce the size of neural networks by a factor of 10, even 100. The ceiling of possibility would dramatically increase, opening a new world of potential uses.

New Posts
  • An updated analysis from OpenAI shows how dramatically the need for computational resources has increased to reach each new AI breakthrough. In 2018, OpenAI found that the amount of computational power used to train the largest AI models had doubled every 3.4 months since 2012. The San Francisco-based for-profit AI research lab has now added new data to its analysis. This shows how the post-2012 doubling compares to the historic doubling time since the beginning of the field. From 1959 to 2012, the amount of power required doubled every 2 years, following Moore’s Law. This means the doubling time today is more than seven times the previous rate. This dramatic increase in the resources needed underscores just how costly the field’s achievements have become. Keep in mind, the above graph shows a log scale. On a linear scale (below), you can more clearly see how compute usage has increased by 300,000-fold in the last seven years. The chart also notably does not include some of the most recent breakthroughs, including Google’s large-scale language model BERT, OpenAI’s large-scale language model GPT-2,  or DeepMind’s StarCraft II-playing model AlphaStar. In the past year, more and more researchers have sounded the alarm on the exploding costs of deep learning. In June, an analysis from researchers at the University of Massachusetts, Amherst, showed how these increasing computational costs directly translate into carbon emissions. In their paper, they also noted how the trend exacerbates the privatization of AI research because it undermines the ability for academic labs to compete with much more resource-rich private ones. In response to this growing concern, several industry groups have made recommendations. The Allen Institute for Artificial Intelligence, a nonprofit research firm in Seattle, has proposed that researchers always publish the financial and computational costs of training their models along with their performance results, for example. In its own blog, OpenAI suggested policymakers increase funding to academic researchers to bridge the resource gap between academic and industry labs
  • StarckGate is happy to work together with Asimov that will be aiming to radically advance humanity's ability to design living systems. They strive to enable biotechnologies with global benefit by combining synthetic biology and computer science. With their help we will able to grasp the following domains better Synthetic Biology Nature has evolved billions of useful molecular nanotechnology devices in the form of genes, across the tree of life. We catalog, refine, and remix these genetic components to engineer new biological systems. Computational Modeling Biology is complex, and genetic engineering unlocks an unbounded design space. Computational tools are critical to design and model complex biophysical systems and move synthetic biology beyond traditional brute force screening. Cellular Measurement Genome-scale, multi-omics measurement technologies provide deep views into the cell. These techniques permit pathway analysis at the scale of a whole cell, and inspection down at single-nucleotide resolution. Machine Learning We are developing machine learning algorithms that bridge large-scale datasets with mechanistic models of biology. Artificial intelligence can augment human capabilities to design and understand biological complexity.
  • The use of AI (artificial intelligence) in agriculture is not new and has been around for some time with technology spans a wide range of abilities—from that which discriminates between crop seedlings and weeds to greenhouse automation. Indeed, it is easy to think that this is new technology given the way that our culture has distanced so many facets of food production, keeping it far away from urban spaces and our everyday reality. Yet, as our planet reaps the negative repercussions of technological and industrial growth, we must wonder if there are ways that our collective cultures might be able to embrace AI’s use in food production which might include a social response to climate change. Similarly, we might consider if new technology might also be used to educate future generations as to the importance of responsible food production and consumption. While we know that AI can be a force for positive change where, for instance, failures in food growth can be detected and where crops can be analyzed in terms of disease, pests and soil health, we must wonder why food growth has been so divorced from our culture and social reality. In recent years, there has been great pushback within satellite communities and the many creations of villages focussed upon holistic methods of food production. Indeed, RegenVillages is one of many examples where vertical farming, aquaponics, aeroponics and permaculture are part of this community's everyday functioning. Moreover, across the UK are many ecovillages and communities seeking to bring back food production to the core of social life. Lammas is one such ecovillage which I visited seven years ago in Wales which has, as its core concept, the notion of a “collective of eco-smallholdings working together to create and sustain a culture of land-based self-reliance.” And there are thousands of such villagesacross the planet whereby communities are invested in working to reduce their carbon footprint while taking back control of their food production. Even Planet Impact’s reforestation programs are interesting because the links between healthy forests and food production are well known as are the benefits of forest gardening which is widely considered a quite resilient agroecosystem. COO & Founder of, Oscar Dalvit, reports that his company’s programs are designed to educate as much as to innovate: “With knowledge, we can fight climate change. Within the for-profit sector, we can win this battle.” Forest gardening is a concept that is not only part of the permaculture practice but is also an ancient tradition still alive and well in places like Kerala, India and Martin Crawford’s forest garden in southwest England where his Agroforestry Research Trust offers courses and serves as a model for such communities across the UK. But how can AI help to make sustainable and local farming practices over and above industrial agriculture? Indeed, one must wonder if it is possible for local communities to take control of their food production. So, how can AI and other new tech interfaces bring together communities and food production methods that might provide a sustainable hybrid model of traditional methods and innovative technology? We know already that the IoT (internet of things) is fast becoming that virtual space where AI is being implemented to include within the latest farming technology. And where businesses invested in robotics are likewise finding that there is no ethical implementation of food technology, we must be mindful of how strategies are implemented which incorporate the best of new tech with the best of old tech. Where AI is helping smaller farms to become more profitable, all sorts of digital interfaces are transmitting knowledge, education and the expansion of local farming methods. This means, for instance, that garden maintenance is continued by others within the community as some members are absent for reasons of vacation or illness. Together with AI, customer experience is as much a business model as it is a local community standard for communication and empowerment. The reality is that industrial farming need not take over local food production and there are myriad ways that communities can directly respond to climate change and the encroachment of big agriculture. The health benefits of local farming practices are already well known as are the many ways that smartphone technology can create high-yield farms within small urban spaces. It is high time that communities reclaim their space within urban centers and that urban dwellers consider their food purchasing and consumption habits while building future sustainability which allows everyone to participate in local food production. As media has recently focussed upon AI and industrial farming, we need to encourage that such technology is used to implement local solutionsthat are far more sustainable and realistic instead of pushing big agriculture.

Proudly created by Starckgate 

© 2020 by Starckgate

  • White Facebook Icon
  • White Twitter Icon
  • White Instagram Icon