What are the best rail-trails in Michigan?

2017/07/24

Background

I was curious about what rail trails were the best in Michigan, and so to figure out an answer, I checked out the TrailLink website, sponsored by the Rails-to-Trails Conservancy. I had just purchased a copy of their book Rail-Trails Michigan and Wisconsin, and wanted to see whether I could learn more from the website.

To start, I checked whether they had a way to access the reviews on the site through an API. They didn’t, so I checked their robots.txt file at http://traillink.com/robots.txt. They didn’t disallow access to their reviews for each state, so I was able to download all of the reviews for the 259 trails with reviews in Michigan.

library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)
library(stringr)
library(lme4)
library(broom)

f <- here::here("static", "data", "mi.rds")
df <- read_rds(f) # this is a file with the rail-trail data - you can get it from here: https://github.com/jrosen48/railtrail

df <- df %>% 
    unnest(raw_reviews) %>% 
    filter(!is.na(raw_reviews)) %>% 
    rename(raw_review = raw_reviews,
           trail_name = name) %>% 
    mutate(trail_name = str_sub(trail_name, end = -7L),
           distance = str_sub(distance, end = -6L),
           distance = as.numeric(distance),
           n_reviews = str_sub(n_reviews, end = -9L),
           n_reviews = as.numeric(n_reviews))

What are the characteristics of the best trails?

On the site, there are “surfaces” (i.e., asphalt and gravel) and “categories” (i.e., rail-trail and paved pathway), so I tried to group them into a few categories.

df <- df %>% 
    mutate(category = as.factor(category),
           category = forcats::fct_recode(category, "Greenway/Non-RT" = "Canal"),
           mean_review = ifelse(mean_review == 0, NA, mean_review))

df <- mutate(df,
             surface_rc = case_when(
                 surface == "Asphalt" ~ "Paved",
                 surface == "Asphalt, Concrete" ~ "Paved",
                 surface == "Concrete" ~ "Paved",
                 surface == "Asphalt, Boardwalk" ~ "Paved",
                 str_detect(surface, "Stone") ~ "Crushed Stone",
                 str_detect(surface, "Ballast") ~ "Crushed Stone",
                 str_detect(surface, "Gravel") ~ "Crushed Stone",
                 TRUE ~ "Other"
             )
)

Then, I checked out their mean reviews, from one to five stars.

Some trails had a ton of reviews:

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(n_reviews)) %>% 
    head(5) %>% 
    knitr::kable()
trail_namesurface_rccategorydistancen_reviews
Lakelands Trail State ParkCrushed StoneRail-Trail26.078
Pere Marquette Rail-TrailPavedRail-Trail30.075
Fred Meijer White Pine Trail State ParkCrushed StoneRail-Trail92.666
William Field Memorial Hart-Montague Trail State ParkPavedRail-Trail22.748
Kal-Haven Trail Sesquicentennial State ParkCrushed StoneRail-Trail34.047

And some had very few reviews- 60 of the trails had only one review!

Some of these reviews for trails with one review were high (five stars):

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews, mean_review) %>% 
    distinct() %>% 
    filter(n_reviews == 1) %>% 
    arrange(desc(mean_review)) %>% 
    head(5) %>% 
    knitr::kable()
trail_namesurface_rccategorydistancen_reviewsmean_review
Big Rapids RiverwalkCrushed StoneGreenway/Non-RT3.815
Boardman Lake TrailCrushed StoneRail-Trail2.015
Cannon Township TrailPavedGreenway/Non-RT4.015
Chippewa TrailPavedGreenway/Non-RT4.115
Grass River Natural Area Rail TrailCrushed StoneRail-Trail2.215

Some of the trails with one review were very low:

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews, mean_review) %>% 
    distinct() %>% 
    filter(n_reviews == 1) %>% 
    arrange(mean_review) %>% 
    head(5) %>% 
    knitr::kable()
trail_namesurface_rccategorydistancen_reviewsmean_review
Alpena to Hillman TrailCrushed StoneRail-Trail22.011
Felch Grade TrailCrushed StoneRail-Trail38.011
Interurban Trail (Kent County)PavedRail-Trail2.012
Linear Trail ParkPavedGreenway/Non-RT16.912
Albion River TrailPavedRail-Trail1.613

Building a model

To try to figure out what trails had many good reviews, I used an approach that is not an average of all of the reviews for the trail, but a rating that uses the value of the individual reviews for a trail as well as how different they are from each other and how different they are from the “average” review across every trail.

What if, intsead, we just looked at the top-reviewed trails and then sorted them by how many reviews they had? Because many trails’ average review was five, this does not help much

These ratings - model_based_rating below - are from the mixed effects model specified here:

m1 <- lmer(raw_review ~ 1 + (1|trail_name), data = df)

The data has to be merged back into the data frame with the other characteristics of the trail:

m1_tidied <- tidy(m1)

m1_fe <- filter(m1_tidied, group == "fixed")

estimated_trail_means <- ranef(m1)$trail_name %>% 
    rownames_to_column() %>% 
    as_tibble() %>% 
    rename(trail_name = rowname, estimated_mean = `(Intercept)`) %>% 
    mutate(model_based_rating = estimated_mean + m1_fe$estimate)

df_ss <- df %>% 
    group_by(trail_name) %>% 
    summarize(raw_mean = mean(raw_review))

df_out <- left_join(df_ss, estimated_trail_means)
df_out <- left_join(df_out, df)

So, where are we riding next?

Here are the top-10 trails of any length:

df_out %>% 
    select(trail_name, surface_rc, distance, category, estimated_mean, raw_mean, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(estimated_mean)) %>% 
    mutate_if(is.numeric, function(x) round(x, 3)) %>% 
    head(10) %>% 
    knitr::kable()
trail_namesurface_rcdistancecategoryestimated_meanraw_meann_reviews
Saginaw Valley Rail TrailPaved11.0Rail-Trail0.8864.94136
Clinton River Park TrailPaved4.5Greenway/Non-RT0.8754.93317
Leelanau TrailPaved16.6Rail-Trail0.8294.90020
Wayne County Metroparks TrailPaved16.3Greenway/Non-RT0.8154.8899
Southern Links TrailwayOther10.2Rail-Trail0.8114.85339
Mackinac Island Loop (State Highway 185)Paved8.3Greenway/Non-RT0.7964.87511
Detroit RiverWalkPaved3.5Greenway/Non-RT0.7795.0003
Fred Meijer Pioneer TrailPaved5.4Rail-Trail0.7795.0003
Grand Haven Waterfront TrailPaved2.5Rail-Trail0.7795.0004
Granger Meadows Park TrailPaved1.9Greenway/Non-RT0.7795.0002

What if we wanted to take a shorter trip - one less than 10 miles?

df_out %>% 
    select(trail_name, surface_rc, distance, category, estimated_mean, raw_mean, n_reviews) %>% 
    distinct() %>% 
    filter(distance < 10) %>% 
    arrange(desc(estimated_mean), desc(n_reviews)) %>% 
    head(10) %>% 
    knitr::kable()
trail_namesurface_rcdistancecategoryestimated_meanraw_meann_reviews
Clinton River Park TrailPaved4.5Greenway/Non-RT0.87476654.93333317
Mackinac Island Loop (State Highway 185)Paved8.3Greenway/Non-RT0.79621374.87500011
Grand Haven Waterfront TrailPaved2.5Rail-Trail0.77894885.0000004
Stony Creek Metropark TrailPaved6.2Greenway/Non-RT0.77894885.0000004
Detroit RiverWalkPaved3.5Greenway/Non-RT0.77894885.0000003
Fred Meijer Pioneer TrailPaved5.4Rail-Trail0.77894885.0000003
Granger Meadows Park TrailPaved1.9Greenway/Non-RT0.77894885.0000002
Western Gateway TrailPaved6.0Rail-Trail0.77894885.0000002
Paint Creek Trail (MI)Crushed Stone8.9Rail-Trail0.73017124.78571426
Dequindre Cut GreenwayPaved1.8Rail-Trail0.70916324.77777812

Conclusion

This approach that uses a model is powerful because we can figure out what trails are higher (or lower) when we consider how many reviews we have about each trail. Needless to say, this approach is powerful in research, as well: Grades for students in classrooms, for example, can be analyzed in the same way if we want to learn what students are consistently performing differently (for better or worse!).

The code to download the reviews is here. The code in this post can be used to do a similar analysis.