Subnational Measurement with Geospatial Data

 

Le Bao

Massive Data Institute

October 18-19, 2022

More Exciting Events at MDI

  • MDI Distinguished Lecture
  • Next MDI Data Workshop
    • “Modeling Propensity to Use or Sell Drugs at the Block Group Level”
    • Dr. J.J. Naddeo
    • Tuesday, Nov 15 and Wednesday, Nov 16: 4-5:30pm

Front Matter

  • Introduction
    • Postdoc fellow at MDI
    • Political methodology
    • Polarization, environmental politics, inequality
    • Journal Political Analysis

  • The workshop materials are available at: bit.ly/mdi22ws
    • Slides, Google Colab, Python and R
    • The materials are created using Quarto

Quarto & Multi-Language Programming

  • Python code
h = "Hello"
w = "World"
msg = h + " " + w
print(msg)
Hello World


  • R code
h <- "Hello"
w <- "World"
msg <- paste(h, w)
print(msg)
[1] "Hello World"

The Goal & Plan

  • Connecting the dots
  • Looking at a problem from different perspectives
  • Weighting, aggregation, spatial statistics, multilevel regression, Bayesian statistical methods, small area estimation, MrP, kriging, …

  • Today
    • The problem of subnational measurement; different solutions; an introduction to multilevel regression and poststratification (MrP)
  • Tomorrow:
    • The problem from a spatial perspective; Bayesian spatial methods; Bayesian universial kriging

The Problem(s)

  • Some questions:
    • How to predict the presidential election results of a state based on national polls (\(N=1500\); \(\overline{n} \approx 30\))?
      • What about other public opinions in general?
    • How to estimate the air quality of a city based on sporadically located monitors?
      • What about census tracks?
    • Average housing price? Predictions of covid cases?

The Problem(s)

  • Point-to-block realignment (Banerjee, Carlin and Gelfand, 2014)
    • To use point-level data to produce aggregate estimates at the area/block level (e.g. means, counts, proportions, etc.)
    • E.g. from survey respondents to state-level presidential approval

  • Small Area Estimation (Ghosh and Rao, 1994)
    • To produce reliable estimates of variables of interest with only small samples or no samples are available.
    • “Small area” can be a small geographical area or a “small domain” (i.e. subgroup).

ANES Data

import pandas as pd
import plotly.graph_objects as go
import plotly.express as px


anes = pd.read_csv('data/anes2020.csv')
anes
           state state_abb  ...                                  educ pres_vote
0       Oklahoma        OK  ...                     Bachelor's degree     Trump
1       Virginia        VA  ...                High school credential     Biden
2     California        CA  ...  Some post-high school, no bachelor's     Biden
3       Colorado        CO  ...                       Graduate degree     Trump
4          Texas        TX  ...  Some post-high school, no bachelor's     Biden
...          ...       ...  ...                                   ...       ...
5739        Iowa        IA  ...                     Bachelor's degree     Trump
5740     Florida        FL  ...  Some post-high school, no bachelor's     Trump
5741       Idaho        ID  ...                     Bachelor's degree     Trump
5742  California        CA  ...                High school credential     Biden
5743    Virginia        VA  ...                       Graduate degree     Biden

[5744 rows x 10 columns]

ANES Data

## Vote percentage
#anes.pres_vote.value_counts()
anes.pres_vote.value_counts(normalize=True)
Biden    0.554492
Trump    0.445508
Name: pres_vote, dtype: float64


## Number of respondents by states
state_counts = anes.groupby('state_abb').size().reset_index().rename(columns={0: 'obs'})
state_counts.head(10)
  state_abb  obs
0        AK    8
1        AL   91
2        AR   42
3        AZ  105
4        CA  544
5        CO  119
6        CT   63
7        DE   17
8        FL  363
9        GA  167

The Problem(s)


Code
p1 = go.Figure(data=go.Choropleth(
    locations=state_counts['state_abb'],
    z=state_counts['obs'].astype(float),
    locationmode='USA-states',
    colorscale='bluyl',
    autocolorscale=False,
    marker_line_color='black',
    colorbar_title="Obs"
))

#### Confine map to US states and update layout
p1.update_layout(
    title_text='Number of respondents by states',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True, ),
)

#p1.show()