Interpretable Deep Learning Model for Socioeconomic Status Inference from Satellite Images and Their Correlations with Urban Patterns

November 2, 2020

Cities have become the economic bedrock of modern nations and this transition will likely be continued in the coming years as an estimated three billion people will move into cities by 2030. Nevertheless, while urbanisation can entail economic dynamism and social development, it can also create enormous social challenges. The management of natural hazards and pollution, the exclusion of the poor from the city’s socioeconomic fabric and the subsequent surge of social and economic inequalities have become some of the pressing issues that modern metropolises need to address. Solutions to address these issues requires however both spatially fine-grained socioeconomic information as well as a detailed understanding on how wealth and the underlying urban topology are entangled. 

In this paper we contribute to this challenge by training a deep learning model to predict the socioeconomic status of a given location from its aerial image and in turn interpret its activation maps in terms of the underlying urban topology. More precisely, we first overlay three publicly available datasets, providing a complete description of five French cities in terms of socioeconomic and land use data, as well as aerial imagery. Subsequently, after merging the aerial imagery (Fig.1a) with the corresponding socioeconomic maps, we train a Convolutional Neural Network (CNN) model, to predict accurately the socioeconomic status of inhabited tiles. Next, by relying upon a gradient-weighted class activation mapping (Grad-CAM) for computing attribution maps [2], we generate high resolution class discriminative activation maps (see Fig.1b and c), which are projected back onto the original image and overlaid with land use data (Fig.1d). We thus generate empirical statistics on the features used by our model to predict socioeconomic status in terms of land use classes (Fig.1e and f).

Figure 1: Interpretable deep learning model using Guided Grad-CAM (GGC). From an aerial tile (a), GGC computes activation maps for the poorest (b) and wealthiest (c) socioeconomical class. The activation maps are then overlaid with the tile's tesselation into urban classes polygon (d) to compute the normalised ratio of activations per polygon for the poorest (e) and wealthiest (f) class.

This framework enables the inference of socioeconomic status at scales rarely seen before, while also indicating precisely the predictive features contained in the actual urban environment. Furthermore, it allows for the observation of distinct city-to-city patterns of correlations between urban topology and the distribution of wealth, what we will report in our presentation.

Blog post by Márton Karsai