Researchers develop 'MOSAIKS' machine learning tool that makes it accessible and affordable for people anywhere to use satellite imagery to address climate challenges
Artist's image of a Landsat satellite over Earth (NASA), Wikimedia Commons

More than 700 imaging satellites are orbiting the earth, and every day they beam vast oceans of information – including data that reflects climate change, health and poverty – to databases on the ground. There’s just one problem: While the geospatial data could help researchers and policymakers address critical challenges, only those with considerable wealth and expertise can access it. 

Now, a multi-disciplinary team that originated at the UC Berkeley Global Policy Lab, has devised a machine learning system to tap the problem-solving potential of satellite imaging, using low-cost, easy-to-use technology that could bring access and analytical power to researchers and governments worldwide. The study, "A generalizable and accessible approach to machine learning with global satellite imagery," was recently published in the journal Nature Communications.

The system that emerged from their research is called MOSAIKS, short for Multi-Task Observation using Satellite Imagery & Kitchen Sinks. It ultimately could have the power to analyze hundreds of variables drawn from satellite data – from soil and water conditions to housing, health and poverty – at a global scale.

Image
Tamma Carleton

"Millions of satellite images are taken of our planet every day, but actually using this information to solve global problems like climate change and poverty eradication is difficult," explained Tamma Carleton, Bren School Assistant Professor and co-author of the study.

"So far, the technology that translates imagery into useful information – like income in a village or water quality across the planet’s rivers – has been in the hands of a few well-funded research teams in wealthy countries. MOSAIKS turns this model on its head by making it easy and cheap for people across the globe to use imagery to monitor and evaluate conditions in their own communities with a single laptop and a few minutes of computer time," said Carleton.

“We're entering a regime in which our actions are having truly global impact,” said co-author Solomon Hsiang, director of the Global Policy Lab at UC Berkeley. Hsiang and Carleton are also both researchers with the multi-institution Climate Impact Lab. “Things are moving faster than they’ve ever moved in the past. We're changing resource allocations faster than ever. We're transforming the planet. That requires a more responsive management system that is able to see these things happen, so that we can respond in a timely, effective way.” 

The research team that developed MOSAIKS is a mix of economists, environmental scientists, statisticians, and computer scientists. They were guided by a joint desire to democratize access to the information in satellite images, making it usable even by communities and countries that lack resources and advanced technical skill.

"As an environmental economist, I see remotely sensed data as enormously valuable to studying problems ranging from climate change to poverty alleviation, but that data has to date been created and utilized by only a small group of elite researchers," Carleton explained. "In talking about this problem with our computer science colleagues, we eventually figured out that new algorithms could have the power to overturn the current research model, allowing broad access to the enormous potential of satellite imagery."

MOSAIKS: Improving lives, protecting the planet

The research paper details how MOSAIKS was able to replicate with reasonable accuracy reports prepared at great cost by the US Census Bureau. It also has enormous potential in addressing development challenges in low-income countries and to help scientists and policymakers understand big-picture environmental change. 

“Climate change is diffuse and difficult to see at any one location, but when you step back and look at the broad scale, you really see what is going on around the planet,” said Hsiang. 

For example, he said, the satellite data could give researchers deep new insights into expansive areas such as the Great Plains in the US and the Sahel in Africa, or into areas such as Greenland or Antarctica that may be shedding icebergs as temperatures rise. 

“These areas are so large, and to have people sitting there and looking at pictures and counting icebergs is really inefficient,” Hsiang explained. But with MOSAIKS, he said, “you could automate that and track whether these glaciers are actually disintegrating faster, or whether this has been happening all along.” 

For a government in the developing world, the technology could help guide even routine decisions, such as where to build roads. 

“A government wants to build roads where the most people are and the most economic activity is,” Hsiang said. “You might want to know which community is underserved, or the condition of existing infrastructure in a community. But often it’s very difficult to get that information.” 

MOSAIKS can use satellite imagery to measure variables as diverse as income per household, forest cover, and road density, among many others. "But most importantly, all these measurements can be done with just a standard laptop and some basic statistics," adds Carleton.

The challenge: Organizing trillions of bytes of raw satellite data

The growing fleet of imaging satellites beam data back to Earth 24/7 — some 80 terabytes every day, according to the research, a number certain to grow in coming years. How much is 80 terabytes? For comparison, consider the revolutionary Hubble Space Telescope, launched in 1990, that delivers 10 terabytes of data per year. On the average day in 2016, 24 terabytes of video were uploaded to YouTube.  

But often, imaging satellites are built to capture information on narrow topics — supplies of fresh water, for example, or the condition of agricultural soils. And the data doesn’t arrive as neat, orderly images, like a snapshots from a photo shop - it is raw data, a mass of binary information. Researchers who access the data have to know what they’re looking for.

Image
Satellite images and data figures

Merely storing so many terabytes of data requires a huge investment. Distilling the layers of data embedded in the images requires additional computing power and advanced human expertise to tease out strands of information that are coherent and useful to other researchers, policymakers or funding agencies. 

“If you’re an elite professor, you can get someone to build your satellite for you,” said Hsiang. “But there’s no way that a conservation agency in Kenya is going to be able to access the technology and the experts to do this work. We wanted to find a way to empower them. We decided to come up with a Swiss Army Knife — a practical tool that everyone can access.”

Like Google for satellite imagery, sort of

Especially in low-income countries, one dimension of poverty is a poverty of data. But even communities in the United States and other developed countries usually don’t have ready access to geospatial data in a convenient, usable format for addressing local challenges. 

Machine learning opens the door to solutions. 

In a general sense, machine learning refers to computer systems that use algorithms and statistical modeling to learn on their own, without step-by-step human intervention. What the new research describes is a system that can assemble data delivered by many satellites and organize it in ways that are accessible and useful.

There are precedents for such systems: Google Earth Engine and Microsoft's Planetary Computer are both platforms for accessing and analyzing global geospatial data, with a focus on conservation. Even with these technologies, said Ester Rolf, a UC Berkeley PhD student and first author on the paper, considerable expertise is often required to convert the data into new insights. 

The goal of MOSAIKS is to make satellite data widely useable for addressing global challenges. The team did this by making the algorithms radically simpler and more efficient.

MOSAIKS starts with learning to recognize minuscule patterns in the images — Hsiang compares it to a game of Scrabble, in which the algorithm learns to recognize each letter. In this case, however, the tiles are minuscule pieces of satellite image, 3 pixels by 3 pixels. 

Image
Charts and graphs from academic paper

But MOSAIKS doesn’t conclude “this is a tree” or “this is pavement.” Instead, "it recognizes patterns and groups them together," commented Jonathan Proctor, one of the study's co-authors and now with the Harvard Center for the Environment. "It learns to recognize similar patterns in different parts of the world." 

When thousands of terabytes from hundreds of sources are analyzed and organized, researchers can choose a village or a country or a region and draw out organized data that can touch on themes as varied as soil moisture, health conditions, human migration and home values. 

In a sense, MOSAIKS could do for satellite databases what Google in the early days did for the Internet: map the data, make it accessible and user-friendly at low cost, and perhaps make it searchable. But the Google comparison goes only so far.

Creating an atlas of global data

The researchers see the potential for MOSAIKS to evolve in powerful and elegant directions.

Hsiang imagines the data being collected into computer-based, continually evolving atlases. Turn to any given “page,” and a user could access broad, deep data about conditions in a country or a region.

Rolf envisions a system that can take the stream of data from humanity’s fleet of imaging satellites and remote sensors and transform it into a flowing, real-time portrait of Earth and its inhabitants, continually in a state of change. We could see the past and the present, and so discern emerging challenges and address them.

"I’m really excited to see where MOSAIKS goes next," said Carleton, "and I'm hoping that this tool can extend far beyond the research community to decision-makers and individuals across the globe."

Further reading:

A generalizable and accessible approach to machine learning with global satellite imagery, Nature Communications, 20 July 2021

A machine learning breakthrough uses satellite images to improve lives, UC Berkeley News, 20 July 2021

Special thanks to the communications office at Goldman School of Public Policy, UC Berkeley