#### Background

Research in classrooms and schools can be complex because of all of the factors that matter. A question that often comes up when we say that we observed some pattern in data is, *but did you control for X*?

In the context of working on an approach to find out how impactful an omitted variable would need to be to invalidate an inference, we had to modify a function that worked for a single sensitivity analysis to work for many and to be easier to use.

#### The initial version of the function

In the context of programming (and mathematics!), many functions take inputs and then transform them into output.

Imagine we have a function that takes two values used to calculate a *t*-test, the *t* statistic (the coefficient divided by its standard error) and the degrees of freedom of the *t* distribution. it then returns a bunch of output about how sensitive an inference based on the *t*-test is to bias or to an omitted confounding variable.

Here is a function that outputs values about the sensitivity of a *t*-test as a row in a row of a data frame (actually an R `data.frame`

modified to be a bit easier to work with, an `R`

tibble).

It’s a bit of a whopper:

```
core_sensitivity_mkonfound <- function(t, df, alpha = .05, tails = 2) {
critical_t <- stats::qt(1 - (alpha / tails), df)
critical_r <- critical_t / sqrt((critical_t ^ 2) + df)
obs_r <- abs(t / sqrt(df + (t ^ 2)))
# for replacement of cases framework
if (abs(obs_r) > abs(critical_r)) {
action <- "to_invalidate"
inference <- "reject_null"
pct_bias <- 100 * (1 - (critical_r / obs_r)) }
else if (abs(obs_r) < abs(critical_r)) {
action <- "to_sustain"
inference <- "fail_to_reject_null"
pct_bias <- 100 * (1 - (obs_r / critical_r)) }
else if (obs_r == critical_r) {
action <- NA
inference <- NA
pct_bias <- NA
}
if ((abs(obs_r) > abs(critical_r)) & ((obs_r * critical_r) > 0)) {
mp <- -1
} else {
mp <- 1
}
# for correlation based framework
itcv <- (obs_r - critical_r) / (1 + mp * abs(critical_r))
r_con <- round(sqrt(abs(itcv)), 3)
out <- dplyr::data_frame(t, df, action, inference, pct_bias, itcv, r_con)
names(out) <- c("t", "df", "action", "inference", "pct_bias_to_change_inference", "itcv", "r_con")
out$pct_bias_to_change_inference <- round(out$pct_bias_to_change_inference, 3)
out$itcv <- round(out$itcv, 3)
out$action <- as.character(out$action)
out$inference <- as.character(out$inference)
return(out)
}
```

Let’s test it out. Imagine we have a *t* statistic of `3`

for a hypothesis test associated with `100`

degrees of freedom.

`core_sensitivity_mkonfound(3, 100)`

```
## # A tibble: 1 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3 100 to_invalidate reject_null 32.276 0.115
## # ... with 1 more variables: r_con <dbl>
```

Works.

It looks like in order to invalidate the inference, around `32`

% of the effect would be need to be due to bias; or, an committed variable would need to be correlated with both the predictor of interest and the dependent variable at `.339`

in order to invalidate the inference. You can read more about sensitivity analysis and an in-development R package on the approach to sensitivity analysis with Ran Xu and Ken Frank here.

#### The second version of the function

How could we write a function to provide output not only for one *t* and its associated *df*, but rather many values?

We can write a simple function to iterate through multiple values and to bind them together. The key is the `map()`

function (from the `tidyverse`

package `purrr`

; if you are familiar with `R`

, it is similar to many of the `apply()`

functions). Specifically, because we:

- Have two variables that we are iterating through
- Want the output in
`data.frame`

form

We use `map2_dfr()`

. Check out this helpful chapter of R for Data Science for more on iteration using approaches such as for and while loops as well as the useful `apply()`

/ `map()`

families of functions.

Here is what a function could look like:

```
library(purrr)
mkonfound <- function(t, df, alpha = .05, tails = 2) {
map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}
```

Simple! But does it work? :)

Instead of passing a single *t* and *df*, as we did above with the `core_sensitivity_monfound()`

function, we can pass vectors of *t* and *df* values:

```
mkonfound(t = c(3, 2, 2.5),
d = c(100, 200, 150))
```

```
## # A tibble: 3 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3.0 100 to_invalidate reject_null 32.276 0.115
## 2 2.0 200 to_invalidate reject_null 1.378 0.002
## 3 2.5 150 to_invalidate reject_null 20.364 0.048
## # ... with 1 more variables: r_con <dbl>
```

We could also do something like binding *t* and *df* together into a small `data.frame`

:

```
d <- data.frame(t = c(3, 2, 2.5),
df = c(100, 200, 150))
d
```

```
## t df
## 1 3.0 100
## 2 2.0 200
## 3 2.5 150
```

And then `mkonfound()`

could work like this:

`mkonfound(d$t, d$df)`

```
## # A tibble: 3 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3.0 100 to_invalidate reject_null 32.276 0.115
## 2 2.0 200 to_invalidate reject_null 1.378 0.002
## 3 2.5 150 to_invalidate reject_null 20.364 0.048
## # ... with 1 more variables: r_con <dbl>
```

#### The third version of the function

Still seems to work fine. Those of you familiar with the tidyverse may sense another possible improvement. Namely, the function could be written to both input and output a `data.frame`

, and be a bit more intuitive to use via non-standard evaluation.

The goal is to add an additional argument for the `data.frame`

(`d`

), and then use non-standard evaluation to capture *and then later evaluate in the context of the data.frame* the names of the *t* and *df* columns

`library(rlang)`

`## Warning: package 'rlang' was built under R version 3.4.3`

```
library(dplyr)
mkonfound <- function(d, t, df, alpha = .05, tails = 2) {
t_enquo <- enquo(t)
df_enquo <- enquo(df)
t = pull(select(d, !!t_enquo))
df = pull(select(d, !!df_enquo))
map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}
```

But does it work? Now, the first argument is the name of the `data.frame`

, the second is the unquoted name of the column with the *t* statistics, and the third is the same as the second, but for the *df* associated with the *t*’s.

`mkonfound(d, t, df)`

```
## # A tibble: 3 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3.0 100 to_invalidate reject_null 32.276 0.115
## 2 2.0 200 to_invalidate reject_null 1.378 0.002
## 3 2.5 150 to_invalidate reject_null 20.364 0.048
## # ... with 1 more variables: r_con <dbl>
```

If we have an entire spreadsheet, read in R as a `data.frame`

using the `read.csv()`

(or, `read_csv()`

from the very useful `readr`

package) function, then we can easily compute output for all of the statistics in the spreadsheet. Here is a spreadsheet from a website (from Ken’s):

```
spreadsheet_of_vals <- read.csv("https://msu.edu/~kenfrank/example%20dataset%20for%20mkonfound.csv")
head(spreadsheet_of_vals)
```

```
## t df
## 1 7.076763 178
## 2 4.127893 193
## 3 1.893137 47
## 4 -4.166395 138
## 5 -1.187599 97
## 6 3.585478 87
```

We would use it the same way as above but with `d`

replaced with what we named the `data.frame`

we read from the website, `spreadsheet_of_vals`

:

`mkonfound(spreadsheet_of_vals, t, df)`

```
## # A tibble: 30 x 7
## t df action inference
## <dbl> <int> <chr> <chr>
## 1 7.076763 178 to_invalidate reject_null
## 2 4.127893 193 to_invalidate reject_null
## 3 1.893137 47 to_sustain fail_to_reject_null
## 4 -4.166395 138 to_invalidate reject_null
## 5 -1.187599 97 to_sustain fail_to_reject_null
## 6 3.585478 87 to_invalidate reject_null
## 7 0.281938 117 to_sustain fail_to_reject_null
## 8 2.549647 75 to_invalidate reject_null
## 9 -4.436048 137 to_invalidate reject_null
## 10 -2.045373 195 to_invalidate reject_null
## # ... with 20 more rows, and 3 more variables:
## # pct_bias_to_change_inference <dbl>, itcv <dbl>, r_con <dbl>
```

Since the output is in a `data.frame`

, we can, for example, easily plot output:

`results_df <- mkonfound(spreadsheet_of_vals, t, df)`

```
library(ggplot2)
results_df$action <- dplyr::case_when(
results_df$action == "to_invalidate" ~ "To Invalidate",
results_df$action == "to_sustain" ~ "To Sustain"
)
ggplot(results_df, aes(x = pct_bias_to_change_inference, fill = action)) +
geom_histogram() +
scale_fill_manual("", values = c("#1F78B4", "#A6CEE3")) +
theme_bw() +
ggtitle("Histogram of Percent Bias") +
facet_grid(~ action) +
theme(legend.position = "none") +
ylab("Count") +
xlab("Percent Bias")
```

`## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.`

Like many functions in R, this could be written many different ways, and this post shows just one approach to writing a function.

In some cases, non-standard evaluation makes the function a bit harder to use - particularly in cases in which we are interested in the output from only a single study.

In that case, we would want to go back to the function we initially wrote (`core_sensitivity_mkonfound()`

) or would have to write something a bit like:

```
single_study <- data.frame(t = 3, df = 100)
mkonfound(single_study, t, df)
```

```
## # A tibble: 1 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3 100 to_invalidate reject_null 32.276 0.115
## # ... with 1 more variables: r_con <dbl>
```

So, this is one approach that is useful in one use - for the in-development package for sensitivity analysis with a number of functions, with a version of this `mkonfound()`

function for meta-analyses that make use of the approach.

Oh, and if you are interested in sensitivity analysis, please check out the `konfound`

package this is for here.