Project Description

This is a research project I developed during my Master’s Degree in Environmental Psychology. It includes a hierarchical model that explores the cross-level dynamics within communities in the Netherlands.

I am still motivated about executing this project again in a different setting because it addresses contemporary challenges in ethnically heterogeneous neighborhoods and provides an unique approach to understanding social cohesion. It assesses the individual-level, cognitive processes in neighborhood behavior through spatial measurement, which can offer alternative evidence for policymakers’ views toward contextual level action (e.g., developing economically vital cities, demolition, and housing renovation) (Van Kempen & Bolt, 2009). By bridging the gap between individual-level processes and neighborhood dynamics, this research has the potential to inform policy interventions and community development initiatives.

  • Outcome Variable

    • boundary: combined value of perceived neighborhood scale and perceived trust in neighbors
  • Demographics:

    • age: Normally distributed continuous variable.

    • education: Dichotomous variable

    • income: Ordinal variable from 1 to 5.

    • ethnicity: Categorical variable sampled from four categories “A”, “B”, “C”, and “D”.

    • length_res: Normally distributed continuous variable for length of residency.

  • Physical Disorder:

    • physical_disorder: Ordinal variable from 1 to 5.
  • Perceived Collective Efficacy:

    • perceived_collective_efficacy: Normally distributed continuous variable.
  • Intergroup Leader:

    • leadership: Dichotomous variable.
  • Personalized Features:

    • names, art, playgrounds: Dichotomous variables

1. Load Packages and Data

# Load libraries
options(repos = c(CRAN = "https://cloud.r-project.org"))
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(geojsonio)
Registered S3 method overwritten by 'geojsonsf':
  method        from   
  print.geojson geojson

Attaching package: 'geojsonio'
The following object is masked from 'package:base':

    pretty
library(sp)
library(lme4)
Loading required package: Matrix
library(car)
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
library(ggplot2)

# Read the GeoJSON mapping file
map_data <- geojson_read("map_data.geojson", what = "sp")

map_sf <- st_as_sf(map_data)

# Calculate the area of each drawn boundary
map_sf <- map_sf %>%
  mutate(area = st_area(geometry))
map_sf$area <- as.numeric(map_sf$area)
# Load the survey dataset
data <- read.csv("multilevel_model_data.csv")

# Merge datasets
data$id <- 1:nrow(data)
data <- data %>%
  left_join(map_sf %>% select(id, area), by = "id")

2. Exploratory Data Analysis

Before I compare the models, let’s take a look at the full model and understand our data a bit better.

# Fit the multilevel model
multilevel_model <- lmer(boundary ~ age + education + income + ethnicity + length_res + physical_disorder + perceived_collective_efficacy + facilities + names + art + playgrounds + leadership + (1 | neighborhood_id), data = data)
summary(multilevel_model)
Linear mixed model fit by REML ['lmerMod']
Formula: boundary ~ age + education + income + ethnicity + length_res +  
    physical_disorder + perceived_collective_efficacy + facilities +  
    names + art + playgrounds + leadership + (1 | neighborhood_id)
   Data: data

REML criterion at convergence: 12877.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9966 -0.7063  0.0071  0.6710  3.2348 

Random effects:
 Groups          Name        Variance Std.Dev.
 neighborhood_id (Intercept)  0.5319  0.7293  
 Residual                    91.6096  9.5713  
Number of obs: 1750, groups:  neighborhood_id, 25

Fixed effects:
                               Estimate Std. Error t value
(Intercept)                   49.019346   1.575249  31.118
age                           -0.009148   0.015198  -0.602
education                     -0.211335   0.469636  -0.450
income                        -0.425076   0.164823  -2.579
ethnicityB                     0.541790   0.645876   0.839
ethnicityMe                    0.439613   0.651204   0.675
ethnicityW                     0.359890   0.650369   0.553
length_res                    -0.052653   0.076229  -0.691
physical_disorder              0.063064   0.162170   0.389
perceived_collective_efficacy  0.366557   0.232243   1.578
facilities                     0.163935   0.155372   1.055
names                         -0.091896   0.600375  -0.153
art                            0.821966   0.597757   1.375
playgrounds                   -0.077756   0.566788  -0.137
leadership                     0.746840   0.585630   1.275

Correlation matrix not shown by default, as p = 15 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it
# Normality of residuals
plot(multilevel_model)

# Homoscedasticity
plot(residuals(multilevel_model) ~ fitted(multilevel_model))

# Fixed effects estimates
fixed_effects <- fixef(multilevel_model)
print(fixed_effects)
                  (Intercept)                           age 
                 49.019346235                  -0.009148289 
                    education                        income 
                 -0.211335096                  -0.425075682 
                   ethnicityB                   ethnicityMe 
                  0.541790280                   0.439613297 
                   ethnicityW                    length_res 
                  0.359890186                  -0.052653388 
            physical_disorder perceived_collective_efficacy 
                  0.063063774                   0.366557166 
                   facilities                         names 
                  0.163934559                  -0.091895788 
                          art                   playgrounds 
                  0.821965502                  -0.077756138 
                   leadership 
                  0.746839797 

3. Model Selection

# Model without Low-estimate predictors:
alt_model1 <- lmer(boundary ~ education + income + ethnicity + perceived_collective_efficacy + facilities + art + leadership + (1 | neighborhood_id), data = data)

# Individual-focused Model:
alt_model2 <- lmer(boundary ~ age + education + income + perceived_collective_efficacy + (1 | neighborhood_id), data = data)

# Interaction Effects:
alt_model3 <- lmer(boundary ~ age * perceived_collective_efficacy + education + income + length_res + physical_disorder + facilities + names + art + playgrounds + leadership + (1 | neighborhood_id), data = data)

# Random Slopes Model:
alt_model4 <- lmer(boundary ~ age + education + income + perceived_collective_efficacy + (age + education + income + perceived_collective_efficacy | neighborhood_id), data = data)
boundary (singular) fit: see help('isSingular')
# Cross-Level Interactions:
alt_model5 <- lmer(boundary ~ age + education + income + perceived_collective_efficacy + facilities + names + art + playgrounds + leadership + (age + education + income + perceived_collective_efficacy | neighborhood_id), data = data)
boundary (singular) fit: see help('isSingular')
# Non-linear Effects:
data$age_squared <- data$age^2
alt_model6 <- lmer(boundary ~ age + age_squared + education + income + perceived_collective_efficacy + (1 | neighborhood_id), data = data)
Warning: Some predictor variables are on very different scales: consider
rescaling
# Nested Models:
alt_model7 <- lmer(boundary ~ age + education + income + perceived_collective_efficacy + (1 | neighborhood_id) + (1 + age | neighborhood_id), data = data)
boundary (singular) fit: see help('isSingular')
# Compare models using AIC
AIC_values <- AIC(multilevel_model, alt_model1, alt_model2, alt_model3, alt_model4, alt_model5, alt_model6, alt_model7)
BIC_values <- BIC(multilevel_model, alt_model1, alt_model2, alt_model3, alt_model4, alt_model5, alt_model6, alt_model7)
print(AIC_values)
                 df      AIC
multilevel_model 17 12911.40
alt_model1       12 12892.09
alt_model2        7 12895.86
alt_model3       15 12916.94
alt_model4       21 12919.90
alt_model5       26 12924.87
alt_model6        8 12907.87
alt_model7       10 12901.39
print(BIC_values)
                 df      BIC
multilevel_model 17 13004.35
alt_model1       12 12957.70
alt_model2        7 12934.13
alt_model3       15 12998.95
alt_model4       21 13034.71
alt_model5       26 13067.02
alt_model6        8 12951.61
alt_model7       10 12956.07

After removing variables (age, length_res, names, playgrounds, physical_disorder) , my first adjusted model has one of the lower AIC/BIC, while also aligning with my theoretical framework and hypotheses.

4. Results

# Plot Cooks D estimates with influence plot
influencePlot(alt_model1, id.n = 5)
Warning in plot.window(...): "id.n" is not a graphical parameter
Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter
Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not a
graphical parameter
Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not a
graphical parameter
Warning in box(...): "id.n" is not a graphical parameter
Warning in title(...): "id.n" is not a graphical parameter
Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
graphical parameter

        StudRes         Hat        CookD
325   3.2599741 0.008857092 0.0094969282
832  -2.7199424 0.010389742 0.0077671191
961   3.0225287 0.006943756 0.0063879495
1411 -0.8990109 0.016199316 0.0013308205
1606 -0.6344288 0.014523900 0.0005932024
# Calculate variance inflation factors
vif(alt_model1)
                                  GVIF Df GVIF^(1/(2*Df))
education                     1.003845  1        1.001921
income                        1.002528  1        1.001263
ethnicity                     1.007411  3        1.001231
perceived_collective_efficacy 1.004720  1        1.002357
facilities                    1.075506  1        1.037066
art                           1.105281  1        1.051324
leadership                    1.045332  1        1.022415
# Extract coefficients
model_coefs <- coef(summary(alt_model1))
model_coefs
                                Estimate Std. Error    t value
(Intercept)                   48.5442001  1.2528175 38.7480213
education                     -0.1962912  0.4685969 -0.4188913
income                        -0.4273233  0.1645159 -2.5974585
ethnicityB                     0.5186504  0.6449240  0.8042039
ethnicityMe                    0.4096027  0.6499668  0.6301902
ethnicityW                     0.3533965  0.6496055  0.5440171
perceived_collective_efficacy  0.3572640  0.2314167  1.5438123
facilities                     0.1658361  0.1436879  1.1541409
art                            0.8218169  0.5516168  1.4898330
leadership                     0.7168449  0.5260312  1.3627422