name | longitude | latitude | aqi |
---|---|---|---|
River Terrace | -76.95807 | 38.89557 | 42 |
Rockville | -77.10688 | 39.11431 | 30 |
Springfield | -77.18347 | 38.76835 | 34 |
Mapping Spatial Dependence:
An Introduction to Spatial Data Analysis
Massive Data Institute, Georgetown University
March 19, 2024
R
(vs python
).name | longitude | latitude | aqi |
---|---|---|---|
River Terrace | -76.95807 | 38.89557 | 42 |
Rockville | -77.10688 | 39.11431 | 30 |
Springfield | -77.18347 | 38.76835 | 34 |
plot(dmv_air$longitude, dmv_air$latitude, pch=20,
col=heat.colors(3)[rank(dmv_air$aqi)],
xlim=c(-77.25, -76.75), main='Air Quality Index',
ylab = "Latitude", xlab = "Longitude", bg="white")
text(dmv_air$longitude, dmv_air$latitude, labels=dmv_air$name,
pos=4, cex=0.7, offset = 0.5)
legend("topright", legend=unique(aqi), fill=heat.colors(3)[order(unique(aqi))],
title="AQI", bg="white")
library(raster)
dc_extent <- extent(-77.12, -76.91, 38.79, 39.05)
ras <- raster(dc_extent, nrow=25, ncol=25)
values(ras) <- runif(ncell(ras))
ras
class : RasterLayer
dimensions : 25, 25, 625 (nrow, ncol, ncell)
resolution : 0.0084, 0.0104 (x, y)
extent : -77.12, -76.91, 38.79, 39.05 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : layer
values : 0.0003433817, 0.9985121 (min, max)
library(sp)
line_coords <- data.frame(
longitude = c(-77.3, -77.2086, -77.2367, -77.2555, -77.2865, -76.8),
latitude = c(38.92, 38.9041, 38.8954, 38.8816, 38.8688, 38.85)
)
line <- SpatialLines(list(Lines(Line(line_coords), ID="segment")))
line
class : SpatialLines
features : 1
extent : -77.3, -76.8, 38.85, 38.92 (xmin, xmax, ymin, ymax)
crs : NA
polygon_coords <- rbind(
c(-77.11976, 38.93434),
c(-77.04102, 38.99555),
c(-77.00255, 38.96553),
c(-76.90939, 38.89285),
c(-76.97950, 38.83781),
c(-77.03901, 38.79164),
c(-77.03907, 38.84127),
c(-77.03910, 38.86811),
c(-77.09020, 38.90421),
c(-77.11976, 38.93434)
)
dc_polygons <- SpatialPolygons(list(Polygons(list(Polygon(polygon_coords)),
ID="DC")))
plot(dc_polygons, main="Washington D.C.", col="lightblue", border="blue",
xlim = c(-77.3865, -76.8094), ylim = c(38.76835, 39.11431))
points(dmv_air$longitude, dmv_air$latitude, pch=20, col=heat.colors(3)[rank(dmv_air$aqi)])
text(dmv_air$longitude, dmv_air$latitude, labels=dmv_air$name, pos=4, cex=0.7, offset = 0.5)
lines(line_coords, col="blue", lwd=2)
Reading layer `Wards_from_2022' from data source
`/Users/baole/Desktop/spatial-workshop/data/Wards_from_2022/Wards_from_2022.shp'
using driver `ESRI Shapefile'
Simple feature collection with 8 features and 325 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -77.1198 ymin: 38.79164 xmax: -76.90915 ymax: 38.99597
Geodetic CRS: WGS 84
Reading layer `neighbourhoods' from data source
`/Users/baole/Desktop/spatial-workshop/data/neighbourhoods.geojson'
using driver `GeoJSON'
Simple feature collection with 39 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -77.11934 ymin: 38.81801 xmax: -76.90915 ymax: 38.99597
Geodetic CRS: WGS 84
Simple feature collection with 6 features and 325 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -77.08172 ymin: 38.79164 xmax: -76.90915 ymax: 38.9573
Geodetic CRS: WGS 84
WARD NAME REP_NAME
1 8 Ward 8 Trayon White, Sr.
2 6 Ward 6 Charles Allen
3 7 Ward 7 Vincent Gray
4 2 Ward 2 Brooke Pinto
5 1 Ward 1 Brianne Nadeau
6 5 Ward 5 Zachary Parker
WEB_URL
1 https://www.dccouncil.us/council/councilmember-trayon-white-sr
2 https://www.dccouncil.us/council/councilmember-allen
3 https://dccouncil.us/council/vincent-gray
4 https://www.dccouncil.us/council/ward-2-councilmember-brooke-pinto
5 https://dccouncil.us/council/brianne-nadeau
6 https://dccouncil.us/council/kenyan-mcduffie
REP_PHONE REP_EMAIL
1 (202) 724-8045 twhite@dccouncil.us
2 (202) 724-8072 callen@dccouncil.us
3 (202) 724-8068 vgray@dccouncil.us
4 (202) 724-8058 bpinto@dccouncil.us
5 (202) 724-8181 bnadeau@dccouncil.us
6 (202) 724-8028 zparker@dccouncil.gov
REP_OFFICE WARD_ID LABEL STUSAB SUMLEV
1 1350 Pennsylvania Ave, Suite 400, NW 20004 8 Ward 8 DC 610
2 1350 Pennsylvania Ave, Suite 110, NW 20004 6 Ward 6 DC 610
3 1350 Pennsylvania Ave, Suite 406, NW 20004 7 Ward 7 DC 610
4 1350 Pennsylvania Ave, Suite 106, NW 20004 2 Ward 2 DC 610
5 1350 Pennsylvania Ave, Suite 108, NW 20004 1 Ward 1 DC 610
6 1350 Pennsylvania Ave, Suite 102, NW 20004 5 Ward 5 DC 610
GEOID GEOCODE STATE POP100 HU100 P0010001 P0010002 P0010003 P0010004
1 610U600US11008 11008 11 85246 39164 85246 81405 8096 70959
2 610U600US11006 11006 11 84266 52768 84266 77187 50624 19562
3 610U600US11007 11007 11 85685 38968 85685 81893 7038 71628
4 610U600US11002 11002 11 89485 53217 89485 81571 57597 10196
5 610U600US11001 11001 11 85285 45694 85285 76084 42307 18741
6 610U600US11005 11005 11 89617 41794 89617 82833 22248 51307
P0010005 P0010006 P0010007 P0010008 P0010009 P0010010 P0010011 P0010012
1 305 751 34 1260 3841 3341 988 148
2 216 5028 54 1703 7079 6449 851 388
3 343 591 40 2253 3792 3299 935 140
4 317 9853 64 3544 7914 7407 726 359
5 696 5273 88 8979 9201 8497 956 330
6 495 2731 87 5965 6784 6057 1189 241
P0010013 P0010014 P0010015 P0010016 P0010017 P0010018 P0010019 P0010020
1 290 22 608 513 156 26 483 4
2 1528 47 2847 228 157 11 274 13
3 219 17 654 591 150 13 486 5
4 1666 34 4056 120 95 20 193 10
5 1294 38 4558 130 197 3 638 10
6 838 22 2146 396 260 12 736 7
P0010021 P0010022 P0010023 P0010024 P0010025 P0010026 P0010027 P0010028
1 0 46 30 24 3 427 217 32
2 0 31 31 38 5 518 125 73
3 2 32 29 19 7 444 233 42
4 2 61 21 42 2 431 97 51
5 8 239 21 61 14 609 155 50
6 2 133 23 38 14 630 242 40
P0010029 P0010030 P0010031 P0010032 P0010033 P0010034 P0010035 P0010036
1 5 66 7 1 33 18 20 1
2 4 106 11 0 106 20 33 0
3 1 53 5 0 24 10 21 9
4 2 81 9 0 96 23 34 4
5 5 114 24 0 145 16 49 0
6 1 123 8 1 81 30 42 1
P0010037 P0010038 P0010039 P0010040 P0010041 P0010042 P0010043 P0010044
1 9 3 5 3 3 0 0 0
2 8 3 17 5 5 1 0 0
3 10 3 15 6 7 2 1 0
4 5 3 9 3 8 4 0 0
5 7 3 19 2 9 6 0 0
6 8 4 25 10 3 9 1 0
P0010045 P0010046 P0010047 P0010048 P0010049 P0010050 P0010051 P0010052
1 2 2 66 15 3 34 2 6
2 0 1 101 18 0 57 7 14
3 1 1 43 12 0 23 0 3
4 0 2 74 25 0 39 1 2
5 1 4 87 10 1 52 2 10
6 0 1 92 17 0 55 1 7
P0010053 P0010054 P0010055 P0010056 P0010057 P0010058 P0010059 P0010060
1 0 0 4 0 2 0 0 0
2 0 0 1 0 2 2 0 0
3 0 1 2 0 1 0 0 0
4 0 0 2 0 4 1 0 0
5 0 0 5 0 4 2 1 0
6 0 0 3 3 0 4 1 0
P0010061 P0010062 P0010063 P0010064 P0010065 P0010066 P0010067 P0010068
1 0 0 7 4 1 0 1 0
2 0 0 10 4 5 0 0 0
3 1 0 6 0 4 0 1 1
4 0 0 1 1 0 0 0 0
5 0 0 8 1 3 0 1 2
6 1 0 5 4 0 0 0 0
P0010069 P0010070 P0010071 P0020001 P0020002 P0020003 P0020004 P0020005
1 1 0 0 85246 3081 82165 79340 7670
2 1 1 1 84266 6122 78144 73945 49107
3 0 0 0 85685 4078 81607 78876 6624
4 0 1 1 89485 9744 79741 75843 55391
5 1 0 0 85285 17269 68016 64204 40017
6 1 0 0 89617 10419 79198 75285 21217
P0020006 P0020007 P0020008 P0020009 P0020010 P0020011 P0020012 P0020013
1 70357 213 731 29 340 2825 2475 935
2 19243 121 4977 50 447 4199 3885 800
3 70969 239 568 32 444 2731 2384 907
4 10000 120 9803 54 475 3898 3669 695
5 18330 163 5202 70 422 3812 3506 862
6 50554 218 2687 76 533 3913 3516 1122
P0020014 P0020015 P0020016 P0020017 P0020018 P0020019 P0020020 P0020021
1 132 272 19 104 482 152 23 297
2 338 1492 40 630 206 153 11 156
3 93 205 11 131 571 143 13 262
4 297 1633 27 662 99 87 20 85
5 238 1279 30 548 118 189 3 172
6 187 802 14 338 381 251 11 349
P0020022 P0020023 P0020024 P0020025 P0020026 P0020027 P0020028 P0020029
1 4 0 12 29 14 0 315 194
2 12 0 2 22 23 0 268 97
3 3 2 6 29 8 0 326 202
4 9 2 2 15 36 0 200 72
5 8 7 2 19 29 2 275 103
6 7 1 9 22 21 1 368 197
P0020030 P0020031 P0020032 P0020033 P0020034 P0020035 P0020036 P0020037
1 30 5 37 3 0 7 15 6
2 62 3 38 8 0 5 18 11
3 36 1 35 5 0 4 10 4
4 45 2 23 6 0 3 19 7
5 48 3 48 19 0 4 15 2
6 33 1 53 7 0 1 26 13
P0020038 P0020039 P0020040 P0020041 P0020042 P0020043 P0020044 P0020045
1 0 8 2 2 3 2 0 0
2 0 7 3 5 5 5 1 0
3 0 8 2 7 6 5 0 1
4 3 4 2 6 3 4 1 0
5 0 6 3 11 2 8 1 0
6 1 6 4 8 10 0 7 1
P0020046 P0020047 P0020048 P0020049 P0020050 P0020051 P0020052 P0020053
1 0 0 1 32 13 3 10 1
2 0 0 0 40 16 0 9 6
3 0 0 0 19 10 0 7 0
4 0 0 0 28 20 0 2 1
5 0 0 2 27 5 0 10 2
6 0 0 0 27 13 0 8 0
P0020054 P0020055 P0020056 P0020057 P0020058 P0020059 P0020060 P0020061
1 5 0 0 0 0 0 0 0
2 8 0 0 0 0 0 1 0
3 1 0 1 0 0 0 0 0
4 2 0 0 0 0 2 1 0
5 7 0 0 0 0 1 2 0
6 5 0 0 0 0 0 0 1
P0020062 P0020063 P0020064 P0020065 P0020066 P0020067 P0020068 P0020069
1 0 0 0 3 2 0 0 1
2 0 0 0 6 4 1 0 0
3 0 0 0 2 0 0 0 1
4 0 0 0 1 1 0 0 0
5 0 0 0 4 1 0 0 1
6 0 0 0 2 2 0 0 0
P0020070 P0020071 P0020072 P0020073 P0030001 P0030002 P0030003 P0030004
1 0 0 0 0 62507 59754 6972 50933
2 0 1 0 0 74262 68775 45847 16404
3 1 0 0 0 66683 63965 5876 55656
4 0 0 0 0 83628 76770 54915 9062
5 2 0 0 0 75254 67609 39386 15702
6 0 0 0 0 74096 69224 19777 41961
P0030005 P0030006 P0030007 P0030008 P0030009 P0030010 P0030011 P0030012
1 237 646 25 941 2753 2413 675 111
2 177 4765 50 1532 5487 4988 610 332
3 265 504 33 1631 2718 2375 667 90
4 272 9442 53 3026 6858 6476 630 322
5 543 4954 82 6942 7645 7102 757 291
6 414 2490 75 4507 4872 4360 798 193
P0030013 P0030014 P0030015 P0030016 P0030017 P0030018 P0030019 P0030020
1 177 19 464 378 104 19 375 4
2 966 42 2422 181 116 7 221 12
3 114 12 498 469 92 11 352 3
4 1372 31 3677 97 61 16 159 10
5 1001 31 3905 94 167 3 554 9
6 480 14 1605 327 181 10 582 6
P0030021 P0030022 P0030023 P0030024 P0030025 P0030026 P0030027 P0030028
1 0 41 25 19 2 293 161 14
2 0 24 24 30 1 404 101 48
3 1 21 24 14 7 306 159 33
4 1 48 20 30 2 317 73 32
5 6 203 19 49 13 469 129 35
6 2 100 22 28 12 449 181 23
P0030029 P0030030 P0030031 P0030032 P0030033 P0030034 P0030035 P0030036
1 5 45 4 1 24 10 9 0
2 4 91 4 0 96 11 18 0
3 1 36 2 0 15 6 14 8
4 1 58 7 0 79 16 20 4
5 4 87 15 0 134 8 29 0
6 0 95 3 1 59 23 25 1
P0030037 P0030038 P0030039 P0030040 P0030041 P0030042 P0030043 P0030044
1 8 3 3 2 2 0 0 0
2 5 3 12 4 5 1 0 0
3 7 2 12 2 7 2 0 0
4 5 3 7 2 5 3 0 0
5 4 2 8 1 4 6 0 0
6 5 2 20 7 0 2 1 0
P0030045 P0030046 P0030047 P0030048 P0030049 P0030050 P0030051 P0030052
1 0 2 43 10 3 24 1 2
2 0 1 88 15 0 52 5 12
3 0 0 32 7 0 20 0 3
4 0 2 64 24 0 33 1 0
5 0 3 68 4 1 44 2 8
6 0 1 60 11 0 40 0 3
P0030053 P0030054 P0030055 P0030056 P0030057 P0030058 P0030059 P0030060
1 0 0 1 0 2 0 0 0
2 0 0 1 0 2 1 0 0
3 0 0 1 0 1 0 0 0
4 0 0 2 0 3 1 0 0
5 0 0 3 0 4 2 0 0
6 0 0 2 3 0 1 0 0
P0030061 P0030062 P0030063 P0030064 P0030065 P0030066 P0030067 P0030068
1 0 0 4 3 1 0 0 0
2 0 0 7 3 4 0 0 0
3 0 0 5 0 4 0 0 1
4 0 0 1 1 0 0 0 0
5 0 0 6 1 3 0 0 2
6 0 0 3 3 0 0 0 0
P0030069 P0030070 P0030071 P0040001 P0040002 P0040003 P0040004 P0040005
1 0 0 0 62507 2195 60312 58274 6649
2 0 0 0 74262 5257 69005 65905 44499
3 0 0 0 66683 2938 63745 61796 5529
4 0 0 0 83628 8618 75010 71714 52844
5 0 0 0 75254 13901 61353 58271 37464
6 0 0 0 74096 7776 66320 63538 18976
P0040006 P0040007 P0040008 P0040009 P0040010 P0040011 P0040012 P0040013
1 50546 178 628 23 250 2038 1779 651
2 16156 102 4722 48 378 3100 2854 581
3 55244 197 484 25 317 1949 1718 648
4 8908 111 9397 46 408 3296 3121 605
5 15389 129 4890 65 334 3082 2860 690
6 41437 180 2453 69 423 2782 2492 763
P0040014 P0040015 P0040016 P0040017 P0040018 P0040019 P0040020 P0040021
1 99 168 17 83 358 100 16 241
2 303 947 36 514 169 114 7 131
3 65 106 10 98 454 86 11 201
4 271 1344 26 585 83 59 16 80
5 218 993 24 473 86 163 3 150
6 151 462 7 261 317 175 9 292
P0040022 P0040023 P0040024 P0040025 P0040026 P0040027 P0040028 P0040029
1 4 0 7 25 10 0 233 152
2 11 0 2 18 21 0 207 82
3 1 1 5 24 8 0 218 136
4 9 1 1 15 26 0 150 56
5 7 5 2 18 26 2 203 88
6 6 1 9 21 17 1 273 157
P0040030 P0040031 P0040032 P0040033 P0040034 P0040035 P0040036 P0040037
1 14 5 26 3 0 6 7 4
2 40 3 34 4 0 5 10 9
3 27 1 21 2 0 2 6 2
4 26 1 17 5 0 3 15 7
5 34 2 37 12 0 3 8 2
6 17 0 45 3 0 1 20 9
P0040038 P0040039 P0040040 P0040041 P0040042 P0040043 P0040044 P0040045
1 0 7 2 2 2 2 0 0
2 0 4 3 3 4 5 1 0
3 0 7 1 6 2 5 0 0
4 3 4 2 5 2 4 0 0
5 0 3 2 4 1 4 1 0
6 1 3 2 7 7 0 0 1
P0040046 P0040047 P0040048 P0040049 P0040050 P0040051 P0040052 P0040053
1 0 0 1 25 9 3 10 1
2 0 0 0 35 13 0 8 5
3 0 0 0 12 6 0 5 0
4 0 0 0 24 20 0 1 1
5 0 0 2 16 1 0 5 2
6 0 0 0 16 7 0 6 0
P0040054 P0040055 P0040056 P0040057 P0040058 P0040059 P0040060 P0040061
1 2 0 0 0 0 0 0 0
2 8 0 0 0 0 0 1 0
3 1 0 0 0 0 0 0 0
4 0 0 0 0 0 1 1 0
5 5 0 0 0 0 1 2 0
6 3 0 0 0 0 0 0 0
P0040062 P0040063 P0040064 P0040065 P0040066 P0040067 P0040068 P0040069
1 0 0 0 1 1 0 0 0
2 0 0 0 4 3 1 0 0
3 0 0 0 1 0 0 0 0
4 0 0 0 1 1 0 0 0
5 0 0 0 3 1 0 0 0
6 0 0 0 1 1 0 0 0
P0040070 P0040071 P0040072 P0040073 H0010001 H0010002 H0010003 P0050001
1 0 0 0 0 39164 34608 4556 3362
2 0 0 0 0 52768 45285 7483 1991
3 1 0 0 0 38968 34958 4010 4528
4 0 0 0 0 53217 46543 6674 13218
5 2 0 0 0 45694 41035 4659 5066
6 0 0 0 0 41794 37211 4583 6759
P0050002 P0050003 P0050004 P0050005 P0050006 P0050007 P0050008 P0050009
1 1054 23 32 750 249 2308 0 563
2 194 31 29 134 0 1797 655 255
3 2557 2197 34 289 37 1971 0 0
4 205 27 0 178 0 13013 11470 0
5 305 0 173 132 0 4761 4009 0
6 1007 0 40 967 0 5752 3053 0
P0050010 OBJECTID GLOBALID CREATED_US
1 1745 1 {E31550AE-6FAE-4B74-909F-52B283BFAF68} <NA>
2 887 2 {765C4F49-9292-4BDB-AA24-39F4EE43359F} <NA>
3 1971 3 {73F07042-7D7F-452B-9BB3-0F87B0EC5418} <NA>
4 1543 4 {7F8C2A51-427C-45FC-91EB-9693656AED9C} <NA>
5 752 5 {C3C6E2E7-E68D-49B2-970C-D60675EA7B4B} <NA>
6 2699 6 {6C10DD95-DE70-4F26-94BA-79F378FA74E0} <NA>
CREATED_DA LAST_EDITE LAST_EDI_1 SHAPEAREA SHAPELEN
1 <NA> <NA> <NA> 0 0
2 <NA> JLAY 2023-12-07 0 0
3 <NA> <NA> <NA> 0 0
4 <NA> <NA> <NA> 0 0
5 <NA> JLAY 2023-12-07 0 0
6 <NA> JLAY 2023-12-07 0 0
geometry
1 POLYGON ((-76.99392 38.8777...
2 POLYGON ((-77.00908 38.8716...
3 POLYGON ((-76.94186 38.9185...
4 POLYGON ((-77.0327 38.88262...
5 POLYGON ((-77.03523 38.9374...
6 POLYGON ((-76.99144 38.9573...
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Tobler’s first law of geography
“Everything is related to everything else, but near things are more related than distant things.”
- Tobler (1970)
Examples of spatial dependence: crime patterns, disease spread, housing market, political orientations, transportation, environmental equity, etc.
We oftentimes even use spatial dependence without realizing it:
A vast number of spatial analysis methods are based on a spatial matrix \mathbf{W}_{ij} of size n \times n with i an observation and j the “neighbours” of that observation. \mathbf{W}_{ij} represents the degree of spatial relationship between i and j.
Classically one can define :
id <- 25 # select an area
columbus$neighbors <- "other"
columbus$neighbors[id] <- "area"
columbus$neighbors[nb[[id]]] <- "neighbors"
cols <- c("gray30", "gray", "white")
plot(st_geometry(columbus), col=cols[as.factor(columbus$neighbors)], main="Spatial Neighborhoods")
legend("topright", legend=c("Selected Area", "Neighbors", "Other"), fill=cols, bty="n")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000000 0.5987183 0.5796856 0.7480069 1.0878333 1.2779137
[2,] 0.5987183 0.0000000 0.7118774 0.3397537 0.9983315 1.5134864
[3,] 0.5796856 0.7118774 0.0000000 0.5609564 0.5233702 0.8053411
[4,] 0.7480069 0.3397537 0.5609564 0.0000000 0.6901508 1.3018772
[5,] 1.0878333 0.9983315 0.5233702 0.6901508 0.0000000 0.7533395
[6,] 1.2779137 1.5134864 0.8053411 1.3018772 0.7533395 0.0000000
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000000 1.6702347 1.725073 1.3368861 0.9192585 0.7825254
[2,] 1.6702347 0.0000000 1.404736 2.9433086 1.0016712 0.6607261
[3,] 1.7250730 1.4047363 0.000000 1.7826697 1.9106935 1.2417099
[4,] 1.3368861 2.9433086 1.782670 0.0000000 1.4489588 0.7681216
[5,] 0.9192585 1.0016712 1.910694 1.4489588 0.0000000 1.3274227
[6,] 0.7825254 0.6607261 1.241710 0.7681216 1.3274227 0.0000000
dc_wards$neighbors <- "other"
dc_wards$neighbors[dc_wards$WARD == 2] <- "selected area"
dc_wards$neighbors[knb[dc_wards$WARD == 2][[1]]] <- "neighbors"
cols <- c("gray30", "gray", "white")
plot(st_geometry(dc_wards), col=cols[as.factor(dc_wards$neighbors)], main="Spatial Neighborhoods")
legend("topright", legend=c("Selected Area", "Neighbors", "Other"), fill=cols, bty="n")
Spatial autocorrelation is used to describe the extent to which a variable is correlated with itself through space.
Global vs. local spatial autocorrelation
Statistical tests
I = \frac{n \sum\limits_i \sum\limits_j w_{ij}(Y_i - \bar{Y})(Y_j - \bar{Y})}{\left(\sum\limits_{i \neq j} w_{ij}\right) \sum\limits_i (Y_i - \bar{Y})^2}
Expected value of Moran’s I: E(I) = \frac{-1}{n-1}
Test statistic: z = \frac{I - E(I)}{\sqrt{Var(I)}}
I_i = \frac{n(Y_i - \bar{Y})}{\sum_j (Y_j - \bar{Y})^2} \sum_j w_{ij}(Y_j - \bar{Y})
I = \frac{1}{\sum\limits_{i \neq j} w_{ij}} \sum\limits_{i} I_i
boston <- st_read(system.file("shapes/boston_tracts.shp",
package = "spData"), quiet = TRUE)
head(boston[,1:7])
Simple feature collection with 6 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -71.1753 ymin: 42.33 xmax: -71.1238 ymax: 42.3737
Geodetic CRS: NAD27
poltract TOWN TOWNNO TRACT LON LAT MEDV
1 0001 Boston Allston-Brighton 74 1 -71.0830 42.2172 17.8
2 0002 Boston Allston-Brighton 74 2 -71.0950 42.2120 21.7
3 0003 Boston Allston-Brighton 74 3 -71.1007 42.2100 22.7
4 0004 Boston Allston-Brighton 74 4 -71.0930 42.2070 22.6
5 0005 Boston Allston-Brighton 74 5 -71.0905 42.2033 25.0
6 0006 Boston Allston-Brighton 74 6 -71.0865 42.2100 19.9
geometry
1 POLYGON ((-71.1238 42.3689,...
2 POLYGON ((-71.1546 42.3573,...
3 POLYGON ((-71.1685 42.3601,...
4 POLYGON ((-71.15391 42.3461...
5 POLYGON ((-71.1479 42.337, ...
6 POLYGON ((-71.1382 42.3535,...
nb <- poly2nb(boston)
nbw <- nb2listw(nb, style = "W")
# Global Moran's I
gmoran <- moran.test(boston$MEDV, nbw,
alternative = "greater")
gmoran
Moran I test under randomisation
data: boston$MEDV
weights: nbw
Moran I statistic standard deviate = 23.35, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.6266753872 -0.0019801980 0.0007248686
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 -0.3457508492 -5.254157e-04 3.275376e-02 -1.90753363 5.645152e-02
2 0.0175875407 -1.626873e-05 2.045711e-03 0.38921049 6.971204e-01
3 0.0123379633 -6.557001e-07 4.089699e-05 1.92939381 5.368199e-02
4 -0.0001654033 -1.059064e-07 1.331742e-05 -0.04529559 9.638717e-01
5 0.3591628595 -1.427815e-04 7.898947e-03 4.04277384 5.282257e-05
6 0.0545610965 -1.625936e-04 1.357382e-02 0.46970410 6.385664e-01
id name
1 3686 Home in Washington · ★4.64 · 1 bedroom · 1 bed · 1 private bath
2 3943 Townhouse in Washington · ★4.83 · 1 bedroom · 1 bed · 1 private bath
3 4197 Home in Washington · ★4.85 · 1 bedroom · 1 bed · 1.5 shared baths
4 4529 Home in Washington · ★4.66 · 1 bedroom · 1 bed · 1 shared bath
5 4967 Home in Washington · ★4.74 · 1 bedroom · 1 bed · 3 baths
6 5589 Rental unit in Washington · ★4.50 · 1 bedroom · 1 bed · 1 bath
host_id host_name neighbourhood_group
1 4645 Vita NA
2 5059 Vasa NA
3 5061 Sandra NA
4 5803 Bertina NA
5 7086 Edward NA
6 6527 Ami NA
neighbourhood latitude longitude
1 Historic Anacostia 38.86339 -76.98889
2 Edgewood, Bloomingdale, Truxton Circle, Eckington 38.91195 -77.00456
3 Capitol Hill, Lincoln Park 38.88719 -76.99472
4 Eastland Gardens, Kenilworth 38.90585 -76.94469
5 Ivy City, Arboretum, Trinidad, Carver Langston 38.91217 -76.99249
6 Kalorama Heights, Adams Morgan, Lanier Heights 38.91887 -77.04008
room_type price minimum_nights number_of_reviews last_review
1 Private room 67 31 84 2023-08-30
2 Private room 66 1 495 2023-11-22
3 Private room 135 7 58 2023-11-11
4 Private room 56 30 102 2019-07-05
5 Private room 2500 1125 30 2016-09-22
6 Entire home/apt 60 31 96 2023-08-17
reviews_per_month calculated_host_listings_count availability_365
1 0.53 1 365
2 2.78 5 252
3 0.33 2 321
4 0.58 2 179
5 0.19 3 365
6 0.55 1 121
number_of_reviews_ltm license
1 3
2 46 Hosted License: 5007242201001033
3 6 Hosted License: 5007242201000749
4 0 Exempt
5 0
6 1
Simple feature collection with 6 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -77.03515 ymin: 38.81801 xmax: -76.93494 ymax: 38.90736
Geodetic CRS: WGS 84
neighbourhood
1 Congress Heights, Bellevue, Washington Highlands
2 Douglas, Shipley Terrace
3 Woodland/Fort Stanton, Garfield Heights, Knox Hill
4 Near Southeast, Navy Yard
5 River Terrace, Benning, Greenway, Dupont Park
6 Downtown, Chinatown, Penn Quarters, Mount Vernon Square, North Capitol Street
neighbourhood_group geometry
1 <NA> MULTIPOLYGON (((-77.01389 3...
2 <NA> MULTIPOLYGON (((-76.99291 3...
3 <NA> MULTIPOLYGON (((-76.97714 3...
4 <NA> MULTIPOLYGON (((-76.97936 3...
5 <NA> MULTIPOLYGON (((-76.9376 38...
6 <NA> MULTIPOLYGON (((-77.0149 38...
# Select only one bedroom listings
listings_1b <- listings[grep("1 bedroom", listings$name), ]
listings_avgs <- aggregate(price ~ neighbourhood, data = listings_1b, FUN = mean)
dc_nb_avgs <- merge(dc_neighborhood, listings_avgs, by.x = "neighbourhood", by.y = "neighbourhood", all.x = TRUE)
head(dc_nb_avgs)
Simple feature collection with 6 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -77.08773 ymin: 38.87285 xmax: -76.91322 ymax: 38.96315
Geodetic CRS: WGS 84
neighbourhood
1 Brightwood Park, Crestwood, Petworth
2 Brookland, Brentwood, Langdon
3 Capitol Hill, Lincoln Park
4 Capitol View, Marshall Heights, Benning Heights
5 Cathedral Heights, McLean Gardens, Glover Park
6 Cleveland Park, Woodley Park, Massachusetts Avenue Heights, Woodland-Normanstone Terrace
neighbourhood_group price geometry
1 <NA> 82.11215 MULTIPOLYGON (((-77.01904 3...
2 <NA> 95.69841 MULTIPOLYGON (((-76.99647 3...
3 <NA> 117.31471 MULTIPOLYGON (((-77.00909 3...
4 <NA> 69.32000 MULTIPOLYGON (((-76.93291 3...
5 <NA> 148.59524 MULTIPOLYGON (((-77.07651 3...
6 <NA> 106.66667 MULTIPOLYGON (((-77.05395 3...
dc_nb_ctr <- st_centroid(dc_nb_avgs)
nb <- dnearneigh(dc_nb_ctr, 0, 5)
nbw <- nb2listw(nb, style="W")
gmoran <- moran.test(dc_nb_avgs$price, nbw,
alternative = "greater")
gmoran
Moran I test under randomisation
data: dc_nb_avgs$price
weights: nbw
Moran I statistic standard deviate = 5.1457, p-value = 1.332e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.241062069 -0.026315789 0.002699959
“Closest observation” (naive estimates)
Thiessen Polygons
\hat{y}(s_0) = \frac{\sum_{i=1}^{n} y(s_i) \times (1/d_i^{\beta})}{\sum_{i=1}^{n} (1/d_i^{\beta})} = \sum_{i=1}^{n} y(s_i) \times w_i
\hat{y}(s_0) = \frac{\sum_{i=1}^{k} y(s_i)}{k}
library(ggplot2)
us_county <- st_read("./data/cb_2018_us_county_500k/cb_2018_us_county_500k.shp", quiet=T)
dmv_county <- us_county[us_county$STATEFP %in% c("11", "24", "51"),]
dmv_county
Simple feature collection with 158 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -83.67539 ymin: 36.54074 xmax: -75.04894 ymax: 39.72304
Geodetic CRS: NAD83
First 10 features:
STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME LSAD
58 11 001 01702382 0500000US11001 11001 District of Columbia 00
145 24 003 01710958 0500000US24003 24003 Anne Arundel 06
296 51 015 01480098 0500000US51015 51015 Augusta 06
297 51 037 01492442 0500000US51037 51037 Charlotte 06
298 51 051 01497376 0500000US51051 51051 Dickenson 06
299 51 053 01690739 0500000US51053 51053 Dinwiddie 06
300 51 069 01480124 0500000US51069 51069 Frederick 06
301 51 073 01480126 0500000US51073 51073 Gloucester 06
302 51 085 01480132 0500000US51085 51085 Hanover 06
303 51 099 01480137 0500000US51099 51099 King George 06
ALAND AWATER geometry
58 158340391 18687198 MULTIPOLYGON (((-77.11976 3...
145 1074287145 448099595 MULTIPOLYGON (((-76.84036 3...
296 2504708478 9537636 MULTIPOLYGON (((-79.53328 3...
297 1231010119 5809378 MULTIPOLYGON (((-78.90459 3...
298 855860629 8082181 MULTIPOLYGON (((-82.55383 3...
299 1305091846 8954562 MULTIPOLYGON (((-77.90034 3...
300 1070101640 5935153 MULTIPOLYGON (((-78.54022 3...
301 564117808 181754414 MULTIPOLYGON (((-76.7063 37...
302 1211818312 12652671 MULTIPOLYGON (((-77.78648 3...
303 465122891 21198264 MULTIPOLYGON (((-77.34744 3...
Simple feature collection with 35 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -82.1641 ymin: 36.608 xmax: -75.79732 ymax: 39.70595
Geodetic CRS: WGS 84
First 10 features:
Stat_Nm Site_Nm Lcl_S_N
1 District Of Columbia 41 RIVER TERRACE
2 District Of Columbia 43 MCMILLAN NCore-PAMS
3 District Of Columbia 51 Near Road
4 District Of Columbia 53 King Greenleaf Rec Center
5 Maryland 2 Millington
6 Maryland 2 Piney Run
7 Maryland 3 Fair Hill Natural Resource Management Area
8 Maryland 4 Horn Point
9 Maryland 6 Howard County Near Road
10 Maryland 9 Hagerstown
annl_mn geometry
1 9.661741 POINT (-76.95807 38.89557)
2 8.073839 POINT (-77.01318 38.92185)
3 8.695211 POINT (-76.95343 38.89477)
4 8.934325 POINT (-77.01282 38.87516)
5 6.922022 POINT (-75.79732 39.30502)
6 6.403240 POINT (-79.012 39.70595)
7 7.459565 POINT (-75.86477 39.70298)
8 6.918579 POINT (-76.14101 38.58752)
9 8.705889 POINT (-76.84611 39.14313)
10 7.068597 POINT (-77.72024 39.56418)
Simple feature collection with 35 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -82.1641 ymin: 36.608 xmax: -75.79732 ymax: 39.70595
Geodetic CRS: NAD83
First 10 features:
Stat_Nm Site_Nm Lcl_S_N
1 District Of Columbia 41 RIVER TERRACE
2 District Of Columbia 43 MCMILLAN NCore-PAMS
3 District Of Columbia 51 Near Road
4 District Of Columbia 53 King Greenleaf Rec Center
5 Maryland 2 Millington
6 Maryland 2 Piney Run
7 Maryland 3 Fair Hill Natural Resource Management Area
8 Maryland 4 Horn Point
9 Maryland 6 Howard County Near Road
10 Maryland 9 Hagerstown
annl_mn geometry
1 9.661741 POINT (-76.95807 38.89557)
2 8.073839 POINT (-77.01318 38.92185)
3 8.695211 POINT (-76.95343 38.89477)
4 8.934325 POINT (-77.01282 38.87516)
5 6.922022 POINT (-75.79732 39.30502)
6 6.403240 POINT (-79.012 39.70595)
7 7.459565 POINT (-75.86477 39.70298)
8 6.918579 POINT (-76.14101 38.58752)
9 8.705889 POINT (-76.84611 39.14313)
10 7.068597 POINT (-77.72024 39.56418)
(georgetown$pm25_idw <- sum(dmv_air$annl_mn / st_distance(georgetown, dmv_air)^2) /
sum(1 / st_distance(georgetown, dmv_air)^2))
8.845472 [1]
library(gstat)
gstat_idw <- gstat(id = "pm25_idw", formula = annl_mn ~ 1, locations = dmv_air)
predict(gstat_idw, newdata = georgetown)
[inverse distance weighted interpolation]
Simple feature collection with 1 feature and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -77.0654 ymin: 38.9097 xmax: -77.0654 ymax: 38.9097
Geodetic CRS: NAD83
pm25_idw.pred pm25_idw.var geometry
1 8.845469 NA POINT (-77.0654 38.9097)
gstat_knn <- gstat(id = "pm25_knn", formula = annl_mn ~ 1, locations = dmv_air, nmax = 3,
set = list(idp = 0)) # idp: inverse distance power; idp=0 means all three neighbors are equally weighted.
(georgetown$pm25_knn <- predict(gstat_knn, newdata = georgetown)$pm25_knn.pred)
[inverse distance weighted interpolation]
[1] 8.886756
dmv_bbox <- st_bbox(dmv_county)
x <- seq(dmv_bbox$xmin, dmv_bbox$xmax, length.out = 50)
y <- seq(dmv_bbox$ymin, dmv_bbox$ymax, length.out = 25)
dc_locs <- expand.grid(x = x, y = y)
dc_pred0 <- st_as_sf(dc_locs, coords = c("x", "y"), crs = crs(dmv_county))
dc_pred <- st_intersection(dmv_county, dc_pred0)
[inverse distance weighted interpolation]
ggplot() +
geom_sf(data = dmv_county, color = "grey25", fill = "grey75", alpha = 0.55) +
geom_sf(data = dc_pred, aes(color = pm25_idw), size = 1.5) +
geom_sf(data = dmv_air, aes(color = annl_mn), size = 3) +
scale_color_gradient(low = "green", high = "red") +
labs(color = "PM2.5", x = "", y = "") +
theme_bw()
[inverse distance weighted interpolation]
ggplot() +
geom_sf(data = dmv_county, color = "grey25", fill = "grey75", alpha = 0.55) +
geom_sf(data = dmv_raster_sf, aes(fill = pm25_idw), size = 1) +
geom_sf_label(data = dmv_air, aes(label = round(annl_mn, 1), fill = annl_mn), size = 5, alpha=0.75) +
scale_color_gradient(low = "green", high = "red") +
scale_fill_gradient(low = "green", high = "red") +
labs(fill = "PM2.5", x = "", y = "") +
guides(color = guide_legend(order = 1)) +
theme_bw() +
theme(legend.title = element_text(size = 12),
legend.text = element_text(size = 10))
Suppose we want to predict y(s_0) based observed data y(s_1), y(s_2), ..., y(s_n). The ordinary kriging estimator is defined as the linear unbiased estimator: \hat{y}(s_0) = \sum_{i=1}^n \lambda_i y(\mathbf{s}_i)
that minizes the mean squared prediction error defined as E[(\hat{y}(s_0) - y(s_0))^2]
y(\mathbf{s}) = \mu(\mathbf{s}) + \omega(\mathbf{s}) + \epsilon(\mathbf{s})
\mu(\mathbf{s}) = \mathbf{X}(\mathbf{s})\beta
\omega(\mathbf{s}) \sim N(0, \sigma^2\mathbf{H}(\theta))
Partial sill: \sigma^2 is the variance driven by distance between two observations
Decay: \theta is the level of corelation among observations based on distance.
\sigma(\mathbf{s}) \sim N(0, \tau^2\mathbf{I})
pred_loc <- st_as_sf(data.frame(long = -71.1456, lat = 42.3915),
coords = c("long", "lat"), crs = crs(boston))
predict(kmod, pred_loc)
[using ordinary kriging]
Simple feature collection with 1 feature and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -71.1456 ymin: 42.3915 xmax: -71.1456 ymax: 42.3915
Geodetic CRS: NAD27
var1.pred var1.var geometry
1 21.4299 18.91049 POINT (-71.1456 42.3915)
Call:
lm(formula = log(MEDV) ~ RM + CRIM, data = boston)
Residuals:
Min 1Q Median 3Q Max
-1.03618 -0.11069 0.01723 0.13889 1.31727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.121864 0.112901 9.937 <2e-16 ***
RM 0.315514 0.017661 17.865 <2e-16 ***
CRIM -0.019438 0.001443 -13.474 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2721 on 503 degrees of freedom
Multiple R-squared: 0.5587, Adjusted R-squared: 0.557
F-statistic: 318.4 on 2 and 503 DF, p-value: < 2.2e-16
nb <- poly2nb(boston)
nbw <- nb2listw(nb, style = "W")
moran_res <- moran.mc(boston$residuals, nbw, 1000)
moran_res
Monte-Carlo simulation of Moran I
data: boston$residuals
weights: nbw
number of simulations + 1: 1001
statistic = 0.54595, observed rank = 1001, p-value = 0.000999
alternative hypothesis: greater
Also known as the spatial autoregressive model (SAR)
Incorporating spatial dependence in the outcome y = \rho \mathbf{W} y + \mathbf{X} \beta + \epsilon
library(spatialreg)
lagsar_fit <- lagsarlm(log(MEDV) ~ RM + CRIM, data=boston, nbw)
summary(lagsar_fit)
Call:lagsarlm(formula = log(MEDV) ~ RM + CRIM, data = boston, listw = nbw)
Residuals:
Min 1Q Median 3Q Max
-0.6189521 -0.1004647 -0.0049353 0.0898597 1.0161974
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.2810648 0.0838293 -3.3528 0.0007999
RM 0.1977215 0.0132364 14.9377 < 2.2e-16
CRIM -0.0070405 0.0010320 -6.8219 8.986e-12
Rho: 0.68944, LR test value: 384.41, p-value: < 2.22e-16
Asymptotic standard error: 0.027224
z-value: 25.325, p-value: < 2.22e-16
Wald statistic: 641.35, p-value: < 2.22e-16
Log likelihood: 134.3822 for lag model
ML residual variance (sigma squared): 0.030777, (sigma: 0.17543)
Number of observations: 506
Number of parameters estimated: 5
AIC: -258.76, (AIC for lm: 123.65)
LM test for residual autocorrelation
test value: 6.7847, p-value: 0.0091942
Also called spatial moving average models
Incorporating spatial dependence in the error term y = \mathbf{X} \beta + u u = \lambda \mathbf{W} u + \epsilon
ersar_fit <- errorsarlm(log(MEDV) ~ RM + CRIM, data=boston, nbw, tol.solve=1.0e-30)
summary(ersar_fit)
Call:errorsarlm(formula = log(MEDV) ~ RM + CRIM, data = boston, listw = nbw,
tol.solve = 1e-30)
Residuals:
Min 1Q Median 3Q Max
-0.9022086 -0.0739408 -0.0003695 0.0841594 0.9155054
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.4309386 0.1011931 14.1407 < 2.2e-16
RM 0.2544088 0.0141162 18.0224 < 2.2e-16
CRIM -0.0071259 0.0012310 -5.7886 7.099e-09
Lambda: 0.85125, LR test value: 390.82, p-value: < 2.22e-16
Asymptotic standard error: 0.02544
z-value: 33.462, p-value: < 2.22e-16
Wald statistic: 1119.7, p-value: < 2.22e-16
Log likelihood: 137.5884 for error model
ML residual variance (sigma squared): 0.027826, (sigma: 0.16681)
Number of observations: 506
Number of parameters estimated: 5
AIC: -265.18, (AIC for lm: 123.65)
room_type
, number_of_reviews
, reviews_per_month
, calculated_host_listings_count
, availability_365
.Allowing the regression coefficients to vary spatially y_i = \beta_0(u_i, v_i) + \sum_{k=1}^p \beta_k(u_i, v_i) x_{ik} + \epsilon_i
Estimating local regression coefficients at each location
Capturing spatial non-stationarity and local variations in relationships
Le Bao · MDI Workshop · https://baole.io/