Hello World
Le Bao
Massive Data Institute
October 18-19, 2022
state state_abb ... educ pres_vote
0 Oklahoma OK ... Bachelor's degree Trump
1 Virginia VA ... High school credential Biden
2 California CA ... Some post-high school, no bachelor's Biden
3 Colorado CO ... Graduate degree Trump
4 Texas TX ... Some post-high school, no bachelor's Biden
... ... ... ... ... ...
5739 Iowa IA ... Bachelor's degree Trump
5740 Florida FL ... Some post-high school, no bachelor's Trump
5741 Idaho ID ... Bachelor's degree Trump
5742 California CA ... High school credential Biden
5743 Virginia VA ... Graduate degree Biden
[5744 rows x 10 columns]
Biden 0.554492
Trump 0.445508
Name: pres_vote, dtype: float64
p1 = go.Figure(data=go.Choropleth(
locations=state_counts['state_abb'],
z=state_counts['obs'].astype(float),
locationmode='USA-states',
colorscale='bluyl',
autocolorscale=False,
marker_line_color='black',
colorbar_title="Obs"
))
#### Confine map to US states and update layout
p1.update_layout(
title_text='Number of respondents by states',
geo = dict(
scope='usa',
projection=go.layout.geo.Projection(type = 'albers usa'),
showlakes=True, ),
)
#p1.show()
p2 = px.histogram(anes.loc[anes['pres_vote']=="Biden"], x="educ", color="sex", marginal="box",
category_orders=dict(educ = ['Less than high school credential',
'High school credential',
"Some post-high school, no bachelor's",
"Bachelor's degree",
"Graduate degree"]),
hover_data=anes.loc[anes['pres_vote']=="Biden"].columns)
p2 = p2.update_layout(legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1),
barmode='group',
legend_title_text="Sex",
title_text='Support for Biden by Education Group and Sex')
p2.show()
### Recoding support for Biden
anes['pres_vote_b'] = anes['pres_vote'].map({'Trump': 0, 'Biden': 1})
anes_votes = (anes.groupby('state_abb')['pres_vote_b'].mean()*100).reset_index().rename(columns={'pres_vote_b': 'biden_share'})
anes_votes
state_abb biden_share
0 AK 12.500000
1 AL 35.164835
2 AR 30.952381
3 AZ 46.666667
4 CA 69.852941
5 CO 60.504202
6 CT 61.904762
7 DE 64.705882
8 FL 50.688705
9 GA 49.101796
10 HI 72.222222
11 IA 50.980392
12 ID 50.000000
13 IL 61.603376
14 IN 51.700680
15 KS 48.192771
16 KY 42.708333
17 LA 38.028169
18 MA 78.343949
19 MD 74.825175
20 ME 61.538462
21 MI 58.415842
22 MN 56.692913
23 MO 48.461538
24 MS 43.103448
25 MT 42.105263
26 NC 44.019139
27 ND 18.750000
28 NE 47.368421
29 NH 61.764706
30 NJ 66.165414
31 NM 57.142857
32 NV 57.142857
33 NY 65.371025
34 OH 44.588745
35 OK 33.783784
36 OR 78.750000
37 PA 54.000000
38 RI 50.000000
39 SC 52.577320
40 SD 23.076923
41 TN 38.931298
42 TX 47.890819
43 UT 48.529412
44 VA 60.869565
45 VT 75.000000
46 WA 71.621622
47 WI 54.385965
48 WV 31.250000
49 WY 75.000000
p3 = go.Figure(data=go.Choropleth(
locations=anes_votes['state_abb'],
z=anes_votes['biden_share'].astype(float),
locationmode='USA-states',
colorscale='blues',
autocolorscale=False,
marker_line_color='white', # line markers between states
colorbar_title="Biden's Two-Party Share"))
p3.update_layout(
title_text="Raw Estimates of Support for Biden",
geo = dict(
scope='usa',
projection=go.layout.geo.Projection(type = 'albers usa'),
showlakes=True, # lakes
lakecolor='rgb(255, 255, 255)'),)
–
\[\text{Pr}(y_i = 1) = logit^{-1} (\beta_0 + \alpha^{race, sex}_{j[i]} + \alpha^{age}_{k[i]} + \alpha^{edu}_{l[i]} + \alpha^{age, edu}_{k[i],l[i]} + \alpha^{state}_{s[i]})\]
\[\alpha_s^{state} \mathbf{s}im N(\alpha_{m[s]}^{region} + \beta^{prev} \cdot X^{prev}) \]
\[y^{MRP}_{state s} = \frac{\mathbf{s}um_{c\in s} N_c \theta_c}{\mathbf{s}um_{c\in s} N_c}\]
anes_sub = anes.groupby(['state_abb','race_ethnicity', 'sex', 'educ', 'age_grp', 'region',
'state_clinton'])['pres_vote_b'].agg(pres_vote_b='sum').reset_index()
anes_sub['n'] = anes.groupby(['state_abb','race_ethnicity', 'sex', 'educ', 'age_grp', 'region',
'state_clinton'])['pres_vote_b'].agg(n='count').reset_index()['n']
anes_sub[["state_abb", "race_ethnicity", "sex", "educ", "age_grp", "pres_vote_b", "n"]]
state_abb race_ethnicity sex ... age_grp pres_vote_b n
0 AK White Female ... 50-59 1 1
1 AK White Female ... 70 and older 0 1
2 AK White Female ... 70 and older 0 1
3 AK White Male ... 70 and older 0 1
4 AK White Male ... 18-29 0 1
... ... ... ... ... ... ... ..
2626 WV White Male ... 60-69 0 1
2627 WV White Male ... 70 and older 0 1
2628 WY White Female ... 60-69 2 2
2629 WY White Female ... 40-49 0 1
2630 WY White Male ... 70 and older 1 1
[2631 rows x 7 columns]
\[ \mathbf{y}(\mathbf{s}) = \mathbf{X}(\mathbf{s})\mathbf{\beta} + \boldsymbol{\omega}(\mathbf{s}) + \boldsymbol{\epsilon}(\mathbf{s}) \]
\[\boldsymbol\omega(\mathbf{s}) \sim \mathcal{N}(\mathbf{0},\sigma^2\mathbf{H}(\phi))\]
\[\boldsymbol\epsilon(\mathbf{s}) \sim \mathcal{N}(\mathbf{0}, \tau^2\mathbf{I})\]
\[ \mathbf{y}(\mathbf{s}) = \mathbf{X}(\mathbf{s})\mathbf{\beta} + \boldsymbol{\omega}(\mathbf{s}) + \boldsymbol{\epsilon}(\mathbf{s}) \]
\[\boldsymbol\omega(\mathbf{s}) \sim \mathcal{N}(\mathbf{0},\sigma^2\mathbf{H}(\phi))\]
\[\boldsymbol\epsilon(\mathbf{s}) \sim \mathcal{N}(\mathbf{0}, \tau^2\mathbf{I})\]
Kriging models, named after statistician and mining engineer Daniel G. Krige, originated in the areas of mining and geostatistics that involve spatially correlated data.
Create a smooth surface over two non-overlapping contiguous geographic regions.
Also known as Gaussian Random Field (GRF) or Gaussian Process (GP)
Other distributional frameworks or special cases of Gaussian process can also be used, such as exponential, spherical, wave, or Mat ́ern processes.
Advantages:
Le Bao · MDI Data Workshops · https://baole.io