Joshua M. Rosenberg
/index.xml
Recent content on Joshua M. Rosenberg
Hugo  gohugo.io
enus
Copyright © 2017 Joshua M. Rosenberg
Sun, 15 Apr 2018 00:00:00 +0000

AERA 2018 presentation: Patterns of engagement in a flipped undergraduate class (slides, paper, & code)
/blog/aerapresentationpatternsofengagementinaflippedundergraduateclassantecedentsandoutcomesinformationslidespapercode/
Sun, 15 Apr 2018 00:00:00 +0000
/blog/aerapresentationpatternsofengagementinaflippedundergraduateclassantecedentsandoutcomesinformationslidespapercode/
<p>I’m presenting a second paper at AERA on patterns of (outside of class) engagement in a flipped undergraduate class with my colleagues Youkyung Lee, Kristy Robinson, John Ranellucci, Cary Roseth, and Lisa LinnenbrinkGarcia. The presentation is at 8:15 am (Sunday, 4/15) in the Millenium Broadway Hotel (Room 7.04) and is on a session on emotions, engagement, and technology (with some very interesting papers presented by copresenters in the session!).</p>
<ul>
<li><a href="/_media/slides/Rosenberg,%20Lee,%20Robinson,%20Ranellucci,%20Roseth,%20&%20LinnenbrinkGarcia,%202018%20%20AERA.pptx">Slides here</a></li>
<li><a href="/_media/publications/Rosenberg,%20Beymer,%20&%20Schmidt,%202018%20%20AERA.pdf">Paper here</a></li>
<li><a href="https://github.com/jrosen48/flippedengagement/blob/master/script.r">Code</a></li>
</ul>

AERA 2018 presentation: How engagement during outofschool STEM programs promotes the development of interest (slides, paper, & code)
/blog/aerapresentationhowengagementduringoutofschoolstemprogramspromotesthedevelopmentofinterest/
Sat, 14 Apr 2018 00:00:00 +0000
/blog/aerapresentationhowengagementduringoutofschoolstemprogramspromotesthedevelopmentofinterest/
<p>I’m excited to present a paper on how engagement during outofschool STEM programs promotes youths’ development of interest with my coauthors Patrick Beymer and Jennifer Schmidt.</p>
<p>The paper is in the session “DataIntensive Approaches to Studying Engagement in Education: Exploring Their Current Potential”. My copresenters in the session include Eric Weibe and James Creager, Sophia Hooper, Erica Patall, Ariana Vasquez, Keenan Pituch, and Rebecca Steingut, and Eric Poitas, Tenzin Doleck, Lingyun Huang, Shan Li, & Susanne Lajoie. Ryan Baker is serving as the discussant.</p>
<p>If you are at AERA, the session is at 2:15 pm at the Millenium Broadway Hotel (Room 5.08).</p>
<ul>
<li><a href="/_media/slides/Rosenberg,%20Beymer,%20&%20Schmidt%20%202018,%20AERA.pptx">Slides here</a></li>
<li><a href="/_media/publications/Rosenberg,%20Beymer,%20&%20Schmidt,%202018%20%20AERA.pdf">Paper here</a></li>
<li><a href="https://github.com/jrosen48/mcmcglmm/blob/master/mcmgglmmexample3.Rmd">Code here</a></li>
</ul>

An R package for sensitivity analysis (konfound)
/blog/anrpackageforsensitivityanalysiskonfound/
Thu, 12 Apr 2018 00:00:00 +0000
/blog/anrpackageforsensitivityanalysiskonfound/
<pre class="r"><code>knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE
)</code></pre>
<p>With <a href="https://sites.google.com/site/ranxupersonalweb/">Ran Xu</a> and <a href="https://msu.edu/~kenfrank/research.htm">Ken Frank</a>, I have worked on <a href="http://konfoundit.com/">a Shiny interactive web application for sensitivity analysis</a> as well as an R package for carrying out sensitivity analysis using R.</p>
<p>That R package is now available on CRAN! A link to the CRAN page for it is <a href="https://cran.rproject.org/web/packages/konfound/">here</a> and the website for the package is <a href="https://jrosen48.github.io/konfound/">here</a>.</p>
<p>Here is the description:</p>
<blockquote>
<p>Statistical methods that quantify the conditions necessary to alter inferences, also known as sensitivity analysis, are becoming increasingly important to a variety of quantitative sciences. A series of recent works, including Frank (2000) and Frank et al. (2013) extend previous sensitivity analyses by considering the characteristics of omitted variables or unobserved cases that would change an inference if such variables or cases were observed. These analyses generate statements such as “an omitted variable would have to be correlated at xx with the predictor of interest (e.g., treatment) and outcome to invalidate an inference of a treatment effect”. Or “one would have to replace pp percent of the observed data with null hypothesis cases to invalidate the inference”. We implement these recent developments of sensitivity analysis and provide modules to calculate these two robustness indices and generate such statements in R. In particular, the functions konfound(), pkonfound() and mkonfound() allow users to calculate the robustness of inferences for a user’s own model, a single published study and multiple studies respectively.</p>
</blockquote>
<p>As a super short introduction, imagine that we carried out a regression for the relationship between a car’s weight and its fuel efficiency (miles per gallon):</p>
<pre class="r"><code>library(konfound)
#> Sensitivity analysis as described in Frank, Maroulis, Duong, and Kelcey (2013) and in Frank (2000).
#> For more information visit https://jmichaelrosenberg.shinyapps.io/shinykonfound/.
m1 < lm(mpg ~ wt + drat, data = mtcars)
summary(m1)
#>
#> Call:
#> lm(formula = mpg ~ wt + drat, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 5.4159 2.0452 0.0136 1.7704 6.7466
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> (Intercept) 30.290 7.318 4.139 0.000274 ***
#> wt 4.783 0.797 6.001 1.59e06 ***
#> drat 1.442 1.459 0.989 0.330854
#> 
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.047 on 29 degrees of freedom
#> Multiple Rsquared: 0.7609, Adjusted Rsquared: 0.7444
#> Fstatistic: 46.14 on 2 and 29 DF, pvalue: 9.761e10</code></pre>
<p>We can carry out sensitivity analysis for the effect of weight, for example, using the <code>konfound()</code> function on the model output:</p>
<pre class="r"><code>konfound(m1, wt)
#> Note that this output is calculated based on the correlationbased approach used in mkonfound()
#> Replacement of Cases Approach:
#> To invalidate an inference, 65.969% of the estimate would have to be due to bias. This is based on a threshold of 1.628 for statistical significance (alpha = 0.05).
#> To invalidate an inference, 21 observations would have to be replaced with cases for which the effect is 0.
#>
#> Correlationbased Approach:
#> An omitted variable would have to be correlated at 0.781 with the outcome and at 0.781 with the predictor of interest (conditioning on observed covariates) to invalidate an inference based on a threshold of 0.36 for statistical significance (alpha = 0.05).
#> Correspondingly the impact of an omitted variable (as defined in Frank 2000) must be 0.781 X 0.781 = 0.61 to invalidate an inference.
#> NULL</code></pre>
<p>This (very preliminary  and just as an illustration) suggests that nearly twothirds of the effect of the weight of a car on its miles per gallon would need to be due to bias  in the model or measures, for example  for the effect to be invalidated.</p>
<p>Alternatively, another way to interpret the results of this sensitivity analysis is in terms of how correlated an omitted, confounding variable (i.e., a covariate) would need to be with both the variable of interest (weight) and the outcome, and this approach suggests that such a confounding variable would nee do be correlated at about .80 with both weight and miles per gallon for the effect of weight to be invalidated.</p>
<p>The <code>konfound()</code> function works on output from <code>lm()</code> as well as <code>glm()</code> (for nonlinear models) and <code>lmer()</code> (from the <strong>lme4</strong> package) for mixed effects models. There are also a number of ways (besides text) to present the output. Much more on the konfound() function (and package) can be found <a href="https://jrosen48.github.io/konfound/reference/konfound.html">here</a>.</p>

Explorations in Markov Chain Monte Carlo  comparing results from MCMCglmm and lme4
/blog/explorationsinmarkovchainmontecarlomcmc/
Mon, 26 Mar 2018 00:00:00 +0000
/blog/explorationsinmarkovchainmontecarlomcmc/
<div id="introduction" class="section level1">
<h1>Introduction</h1>
<p>I’ve been interested in Markov Chain Monte Carlo (MCMC) for a little while, in part because of <a href="https://doi.org/10.1093/beheco/arx023">a paper</a> by Tom Houslay and Alastair Wilson (2017) that shows how using output from models the way I have been can lead to results that overstate the impact of effects.</p>
<p>In particular, I’m working on a project with colleagues in which we try to figure out how students’ engagement in summer STEM programs relates to changes in their interest (in STEM), controlling for their initial levels of interest. In this project, we use the studentspecific predictions from mixed effects (or multilevel) models in <em>other</em> models predicting changes in their interests. Tom and Alastair show that doing this ignores the uncertainty in these predictions, leading to resolves that are stronger than they would be were this uncertainty included in modeling. In short, it’s a more conservative way of doing what we’re trying to do. But, it is not easy to do this approach using the tool for mixed effects models (the <strong>lme4</strong> R package) we are using, but using a tool that uses MCMC methods, it is possible to do this.</p>
<p>This post explores MCMC methods by comparing the results of MCMC methods and those used by <strong>lme4</strong> (which uses maximum likelihood (ML) estimation) in a case in which we would expect the results to be the same, namely, when MCMC methods with particular settings for relatively simple models.</p>
</div>
<div id="mypriorbeliefsaboutpriors" class="section level1">
<h1>My prior beliefs about “priors”</h1>
<p><strong>MCMCglmm</strong> methods, unlike <strong>lmer</strong>, requires priors. For my super limited understanding, there are two related ways to look at these priors. One is that they constrain the possible values that parameters may take in order to set the modeling up for success (this is how Malsburg describes them in <a href="https://github.com/tmalsburg/MCMCglmmintro">this tutorial</a>). Another way to look at priors is to consider them as part of a Bayesian approach, in which they represent the degree of belief in different parameter values.</p>
<p>There are also cases when the prior values can be estimated from the data in the sample. <a href="https://www.amazon.com/AnalysisRegressionMultilevelHierarchicalModels/dp/052168689X/ref=sr_1_1?ie=UTF8&qid=1522094327&sr=81&keywords=gelman+hill">Gelman and Hill (2007)</a> describe multilevel models in these terms: For the “random” effects, usually “grouping” variables like the classroom students are in, for example, the prior for the classroomspecific effects is estimated on the basis of the mean and variance in the dependent variable from the whole sample / data set collected. In these cases (in which the prior for “random” effects can be estimated from the data), the priors for the <em>other</em> variables can be set to be neutral, much closer to the “constrain the possible values that parameters may take” than the Bayesian approach. In these cases, for models that can be estimated with both MCMC and ML, the estimates should be very close to one another.</p>
<p>This post tries to see just how close they are, using the <strong>lme4</strong> and <strong>MCMCglmm</strong> packages.</p>
</div>
<div id="analysisloadingsettingup" class="section level1">
<h1>Analysis: Loading, setting up</h1>
<p>I load the two packages (for the modeling), the <strong>tidyverse</strong> package for some basic data processing, and the <strong>railtrails</strong> package for some example data. This data consists of reviews for railtrails (trails for biking and running!). I filter the data to just use the data for Michigan (to make sure things run quickly) and create a data set without any missing <em>y</em>values (where the <em>y</em> values represent the trail review).</p>
<p>A <em>very</em> simple model is estimated: a random intercept model, or a model in which each trail’s intercept (or mean) is estimated, accounting for each trail’s number of reviews and their mean and variance in light of the reviews across all trails and their mean and variance.</p>
<pre class="r"><code>library(lme4)
library(MCMCglmm)
library(tidyverse)
library(railtrails)</code></pre>
<pre class="r"><code>d < railtrails::railtrails
d < filter(d, state == "MI")
d < unnest(d, raw_reviews)
d_ss < filter(d, !is.na(raw_reviews)) # this is because lme4 does not work with missing yvariable values</code></pre>
</div>
<div id="resultsfromlme4" class="section level1">
<h1>Results from lme4</h1>
<p>Here are the results of the model estimated using <strong>lme4</strong>:</p>
<pre class="r"><code>m1 < lmer(raw_reviews ~ 1 + (1name), data = d_ss)
summary(m1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: raw_reviews ~ 1 + (1  name)
#> Data: d_ss
#>
#> REML criterion at convergence: 2610
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> 3.9423 0.4646 0.2403 0.6066 1.8297
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> name (Intercept) 0.3285 0.5731
#> Residual 0.9254 0.9620
#> Number of obs: 899, groups: name, 116
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 4.06245 0.06909 58.8</code></pre>
<p>The key thing to note is the <code>Estimate</code> for the intercept (<code>(Intercept)</code>) in the “Fixed effects” section, and the <code>Variance</code> for the trail name (<code>name</code>) in the “Random effects” section. It looks like the intercept’s estimate, which represents the mean review across all of the trails, is 4.062, and the variance is 0.328, suggesting that, on average, each trail’s estimated review is 4.062 plus or minus 0.328. So, most trails are reviewed pretty highly, around 4 (on the 15 scale), with some higher and some lower.</p>
</div>
<div id="resultsfrommcmcglmm" class="section level1">
<h1>Results from MCMCglmm</h1>
<p>Here are the results of the model estimated using <strong>MCMCglmm</strong>. To setup the prior, I followed the advice in <a href="https://github.com/tmalsburg/MCMCglmmintro">this tutorial</a> (also linked above), which is similar to the advice given in Tom and Alastair’s tutorials and in the <a href="https://cran.rproject.org/web/packages/MCMCglmm/vignettes/CourseNotes.pdf">MCMCglmm resources</a>.</p>
<pre class="r"><code>prior < list(
R=list(V=1, n=1, fix=1),
G=list(G1=list(V = diag(1),
n = 1,
alpha.mu = rep(0, 1),
alpha.V = diag(1)*25^2)))
m2 < MCMCglmm(fixed = raw_reviews ~ 1,
random= ~ us(1):name,
family = "gaussian",
data=as.data.frame(d),
prior=prior,
verbose=TRUE)
#>
#> MCMC iteration = 0
#>
#> MCMC iteration = 1000
#>
#> MCMC iteration = 2000
#>
#> MCMC iteration = 3000
#>
#> MCMC iteration = 4000
#>
#> MCMC iteration = 5000
#>
#> MCMC iteration = 6000
#>
#> MCMC iteration = 7000
#>
#> MCMC iteration = 8000
#>
#> MCMC iteration = 9000
#>
#> MCMC iteration = 10000
#>
#> MCMC iteration = 11000
#>
#> MCMC iteration = 12000
#>
#> MCMC iteration = 13000
summary(m2)
#>
#> Iterations = 3001:12991
#> Thinning interval = 10
#> Sample size = 1000
#>
#> DIC: 2560.048
#>
#> Gstructure: ~us(1):name
#>
#> post.mean l95% CI u95% CI eff.samp
#> (Intercept):(Intercept).name 0.3286 0.2003 0.4756 1000
#>
#> Rstructure: ~units
#>
#> post.mean l95% CI u95% CI eff.samp
#> units 1 1 1 0
#>
#> Location effects: raw_reviews ~ 1
#>
#> post.mean l95% CI u95% CI eff.samp pMCMC
#> (Intercept) 4.063 3.928 4.197 1000 <0.001 ***
#> 
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</code></pre>
<p>The key values are the <code>post.mean</code> values for the intercept (<code>(Intercept)</code>) in the “Location effects” section, and the variance of the intercept (<code>(Intercept):(Intercept).name</code>) in the “Gstructure” section. It looks like the intercept’s estimate, which represents the mean review across all of the trails, is 4.063, and the variance is 0.328, just about equal. Because of the nature of MCMC, there will be slightly different results each time it is run. The longer that the estimation is run, the more stable the estimates will be.</p>
<p>Here is a summary of the two parameters’ values for the two methods:</p>
<table>
<thead>
<tr class="header">
<th align="left">method</th>
<th align="right">fixef_intercept</th>
<th align="right">trail_variance</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">lme4</td>
<td align="right">4.062</td>
<td align="right">0.328</td>
</tr>
<tr class="even">
<td align="left">MCMCglmm</td>
<td align="right">4.063</td>
<td align="right">0.328</td>
</tr>
</tbody>
</table>
<p>One key point that is skipped for now is the importance of examining diagnostic plots (for the <strong>MCMCglmm</strong> results, in particular, but also for those from <strong>lme4</strong>) and other measures of how well the estimates fit the data. There is also a <em>lot</em> more to MCMC than this (and that I don’t know about), and the use of MCMC becomes harder (for me)  but also more useful  with more complex models and data.</p>
</div>

Finding the top railtrails in each state using mixed effects models
/blog/findthetoprailtrailsineachstate/
Thu, 22 Feb 2018 00:00:00 +0000
/blog/findthetoprailtrailsineachstate/
<p>Outside of education, one of my interests is cycling, and one of my favorite ways to cycle is on railtrails, pathways and greenways that are converted from former railroad tracks.</p>
<p>In a sideproject (and because the data source can be used for teaching and learning about complex, nested data), I collected information from the <a href="https://www.traillink.com/">TrailLink website</a>. I’ve blogged about this data <a href="https://jrosen48.github.io/blog/michiganrailtrailsandpathwaysthroughdata/">here</a> and <a href="https://jrosen48.github.io/blog/characteristicsofrailtrails/">here</a> to find out what the best railtrails in Michigan are and to find out what the characteristics of the best railtrails are, respectively.</p>
<p>Using this data, I created a simple Shiny web app (<a href="https://jrosen48.github.io/blog/findthetoprailtrailsineachstate/">here</a>) to find the top railtrails (using the reviews from TrailLink) in each state. One neat thing about the app is that it uses predictions from a mixed effects (or multilevel) model.</p>
<p><a href = "https://jmichaelrosenberg.shinyapps.io/railtrails/
"><img src="/_media/images/railtrails.png"></a></p>
<p>The reason I chose to do this is that using the raw reviews to find the top railtrails is not as helpful as I first thought, as trails with very few (but very high) reviews–such as one with two “5” (out of 5) reviews–may end up ranked as the top in the state. At the same time, a trail with many (primarily high) reviews–such as one with 30 reviews that average out to almost but not quite “5”–may be ranked lower.</p>
<p>In <code>lme4</code>, the model is a random intercept (for the trail and state) model and would look like this (all of the code is <a href="https://github.com/jrosen48/railtrails/blob/master/app/app.R">here</a>):</p>
<pre class="r"><code>m1 < lmer(raw_reviews ~ 1 + (1name) + (1state), data = d)</code></pre>
<p>The model, which accounts for the multiple (repeated) reviews for each trail and the nesting of trails in each state looks something like this:</p>
<p><span class="math display">\[
\begin{aligned}
\widehat { y } _{ trail,\quad state }\quad =\\ { \beta }_{ 0 }(overall\quad mean\quad review)\quad +\\ { \alpha }_{ 1 }{ (trail\quad effect) }_{ trail }\quad +\\ { \alpha }_{ 2 }{ (state\quad effect) }_{ state }\quad +\\ { \varepsilon }_{ trail,\quad state }
\end{aligned}
\]</span></p>
<p>So, the mixed effects model helps to account for both the number and variability in the reviews, giving a bit more weight to trails with a whole lot of high reviews relative to trails with less reviews to go on to (hopefully) predict rankings. In any case, you can check out the app at <a href="https://jmichaelrosenberg.shinyapps.io/railtrails/" class="uri">https://jmichaelrosenberg.shinyapps.io/railtrails/</a>.</p>

Introducing tidyLPA (an R package for carrying out Latent Profile Analysis)
/blog/introducingtidylpaanrpackageforcarryingoutlatentprofileanalysis/
Wed, 14 Feb 2018 00:00:00 +0000
/blog/introducingtidylpaanrpackageforcarryingoutlatentprofileanalysis/
<p>I’m excited to introduce tidyLPA, an R package for carrying out Latent Profile Analysis (LPA). This is the result of a collaborative project with Jennifer Schmidt, Patrick Beymer, and Rebecca Steingut, and is the result of a long period of learning about <em>cluster analysis</em> (see <a href="https://jrosen48.github.io/blog/prcranrpackageforpersoncenteredanalysis/">here</a>) and, recently, <strong>modelbased cluster analysis</strong>. Here, I introduce and describe LPA as a particular type of modelbased cluster analysis.</p>
<div id="background" class="section level2">
<h2>Background</h2>
<p>Latent Profile Analysis (LPA) is a statistical modeling approach for estimating distinct profiles, or groups, of variables. In the social sciences and in educational research, these profiles could represent, for example, how different youth experience dimensions of being engaged (i.e., cognitively, behaviorally, and affectively) at the same time.</p>
<p>tidyLPA provides the functionality to carry out LPA in R. In particular, tidyLPA provides functionality to specify different models that determine whether and how different parameters (i.e., means, variances, and covariances) are estimated and to specify (and compare solutions for) the number of profiles to estimate parameters for.</p>
</div>
<div id="installation" class="section level2">
<h2>Installation</h2>
<p>You can install tidyLPA from CRAN with:</p>
<pre class="r"><code>install.packages("tidyLPA")</code></pre>
<p>You can also install the indevelopment version of tidyLPA from GitHub with:</p>
<pre class="r"><code>install.packages("devtools")
devtools::install_github("jrosen48/tidyLPA")</code></pre>
</div>
<div id="example" class="section level2">
<h2>Example</h2>
<p>Here is a brief example using the builtin <code>pisaUSA15</code> dataset and variables for broad interest, enjoyment, and selfefficacy. Note that we first type the name of the data frame, followed by the unquoted names of the variables used to create the profiles. We also specify the number of profiles and the model. See <code>?estimate_profiles</code> for more details.</p>
<pre class="r"><code>library(tidyLPA)</code></pre>
<pre class="r"><code>d < pisaUSA15[1:100, ]
estimate_profiles(d,
broad_interest, enjoyment, self_efficacy,
n_profiles = 3,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 3 profiles.
#> LogLik is 279.692
#> BIC is 636.62
#> Entropy is 0.798
#> # A tibble: 94 x 5
#> broad_interest enjoyment self_efficacy profile posterior_prob
#> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 3.80 4.00 1.00 1 0.976
#> 2 3.00 3.00 2.75 2 0.847
#> 3 1.80 2.80 3.38 2 0.982
#> 4 1.40 1.00 2.75 3 0.963
#> 5 1.80 2.20 2.00 3 0.824
#> 6 1.60 1.60 1.88 3 0.960
#> 7 3.00 3.80 2.25 1 0.847
#> 8 2.60 2.20 2.00 3 0.704
#> 9 1.00 2.80 2.62 3 0.584
#> 10 2.20 2.00 1.75 3 0.861
#> # ... with 84 more rows</code></pre>
<p>See the output is simply a data frame with the profile (and its posterior probability) and the variables used to create the profiles (this is the “tidy” part, in that the function takes and returns a data frame).</p>
<p>In addition to the number of profiles (specified with the <code>n_profiles</code> argument), the model is important. The <code>model</code> argument allows for four models to be specified:</p>
<ul>
<li>Varying means, equal variances, and covariances fixed to 0 (model 1)</li>
<li>Varying means, equal variances, and equal covariances (model 2)</li>
<li>Varying means, varying variances, and covariances fixed to 0 (model 3)</li>
<li>Varying means, varying variances, and varying covariances (model 6)</li>
</ul>
<p>Two additional models can be fit using functions that provide an interface to the MPlus software. More information on the models can be found in the <a href="https://jrosen48.github.io/tidyLPA/articles/Introduction_to_tidyLPA.html">vignette</a>.</p>
<p>We can plot the profiles with by <em>piping</em> (using the <code>%>%</code> operator, loaded from the <code>dplyr</code> package) the output to <code>plot_profiles()</code>.</p>
<pre class="r"><code>library(dplyr, warn.conflicts = FALSE)
estimate_profiles(d,
broad_interest, enjoyment, self_efficacy,
n_profiles = 3,
model = 2) %>%
plot_profiles(to_center = TRUE)
#> Fit varying means, equal variances and covariances (Model 2) model with 3 profiles.
#> LogLik is 279.692
#> BIC is 636.62
#> Entropy is 0.798</code></pre>
<p><img src="/blog/20180214introducingtidylpaanrpackageforcarryingoutlatentprofileanalysis_files/figurehtml/unnamedchunk51.png" width="672" /></p>
</div>
<div id="moreinformation" class="section level2">
<h2>More information</h2>
<p>To learn more:</p>
<ul>
<li><p>Browse the tidyLPA <a href="https://jrosen48.github.io/tidyLPA/">website</a> (especially check out the Reference page to see more about other functions)</p></li>
<li><p><em>Read the Introduction to tidyLPA</em> <a href="https://jrosen48.github.io/tidyLPA/articles/Introduction_to_tidyLPA.html">vignette</a>, which has much more information on the models that can be specified with tidyLPA and on additional functionality</p></li>
</ul>
</div>
<div id="contact" class="section level2">
<h2>Contact</h2>
<p>As tidyLPA is at an early stage of its development, issues should be expected. If you have any questions or feedback, please do not hesitate to get in touch:</p>
<ul>
<li>By <a href="mailto:jrosen@msu.edu">email (jrosen@msu.edu)</a></li>
<li>By <a href="http://twitter.com/jrosenberg6432">Twitter</a></li>
<li>Through filing an issue on GitHub <a href="https://github.com/jrosen48/tidyLPA">here</a></li>
</ul>
<p>Please note that this project is released with a <a href="CONDUCT.md">Contributor Code of Conduct</a>. By participating in this project you agree to abide by its terms.</p>
</div>
<div id="someacknowledgments" class="section level2">
<h2>Some acknowledgments!</h2>
<p>As I mentioned earlier, this package is the result of a lot of learning that started awhile ago. Thank you to Christina Krist and Stephanie Wormington for introducing me to cluster analysis and LPA, respectively. Also, thank you to Kristy Robinson and Youkyung Lee for their invaluable help in learning about LPA.</p>
</div>

Upcoming presentations at AERA, SITE, and NARST
/blog/upcomingpresentationsataeraandnarst/
Thu, 25 Jan 2018 00:00:00 +0000
/blog/upcomingpresentationsataeraandnarst/
<p>I am excited to present (or contribute to presentations) associated with a number of ongoing projects at the <a href="http://aera.net/EventsMeetings/AnnualMeeting/2018AnnualMeetingCallforPaperandSessionSubmissions">AERA</a>, <a href="https://conf.aace.org/site/">SITE</a>, and <a href="https://narst.org/annualconference/2018conference.cfm">NARST</a> conferences later this semester. If you are at one of the conferences (or presentations) and would like to <a href="https://jrosen48.github.io/about/">get in touch</a>, please do! Here are the titles and some other information for the presentations:</p>
<p>Beymer, P. N., Rosenberg, J.M., & Schmidt, J. A. (2018, April). <em>Investigating the effects of interest and choice: An experience sampling approach</em>. Paper to be presented at the Annual Meeting of the American Educational Research Association New York, NY.</p>
<p>Greenhalgh, S. P., Staudt Willet, B., Rosenberg, J. M., Akcaoglu, M., & Koehler, M. J. (2018, April). <em>Timing is everything: Comparing synchronous and asynchronous modes of Twitter for teacher professional learning</em>. Paper to be presented at the Annual Meeting of the American Educational Research Association New York, NY.</p>
<p>Rosenberg, J. M., Beymer, P. N., & Schmidt, J. A. (2018, April). <em>How engagement during outofschool time STEM programs predicts changes in motivation in STEM</em>. In J. M. Rosenberg (Chair), Dataintensive approaches to studying engagement in education: Exploring their current potential. Paper to be presented at the Annual Meeting of the American Educational Research Association New York, NY.</p>
<p>Rosenberg, J. M., Lee, Y., Robinson, K. A., Ranellucci, J., Roseth, C. J., & Linnenbrink Garcia, L. (2018, April). <em>Patterns of engagement in a flipped undergraduate class: Antecedents and outcomes. In L. Daniels & A. Frenzel (Chairs), New empirical insights on what energizes learners – A session on emotions and engagement</em>. Paper to be presented at the Annual Meeting of the American Educational Research Association New York, NY.</p>
<p>Schmidt, J. A., Rosenberg, J.M., & Beymer, P. N. (2018, April). <em>Experiences, activities, and personal characteristics as predictors of interest and engagement in STEMfocused summer programs</em>. Paper to be presented at the Annual Meeting of the American Educational Research Association New York, NY.</p>
<p>Shwartz, Y., Bayer, I., Bielik, T., Kolonich, A., Eidelman, R., Shwartz, G., … Rosenberg, J. M. (2018, March). <em>Graduate student international collaboration for investigating science teachers’ professional learning</em>. Paper to be presented at the meeting of the National Association for Research in Science Teaching, Atlanta, GA.</p>
<p>Koehler, M. J., & Rosenberg, J. M. (2018, March). <em>What factors matter for engaging others in an educational conversation on Twitter?</em> Paper to the presented at the Society for Information Technology and Teacher Education International Conference 2018, Washington, DC.</p>

A Shiny interactive web application to quantify how robust inferences are to potential sources of bias (sensitivity analysis)
/blog/ashinyinteractivewebapplicationtoquantifyhowrobustinferencesaretopotentialsourcesofbiassensitivityanalysis/
Wed, 17 Jan 2018 00:00:00 +0000
/blog/ashinyinteractivewebapplicationtoquantifyhowrobustinferencesaretopotentialsourcesofbiassensitivityanalysis/
<p>As part of a revise and resubmit decision for a paper (that was just accepted to the <a href="https://doi.org/10.1007/s1096401808149"><em>Journal of Youth and Adolescence</em></a> (preprint <a href="https://jrosen48.github.io/_media/preprints/BeymerRosenbergSchmidt2018JYA.pdf">here</a>)), we (Patrick Beymer, Jennifer Schmidt, and I) were asked by the editor to carry out sensitivity analysis for findings. Our understanding was that, in this context, sensitivity analysis means one of two things  or both:</p>
<ul>
<li>How results hold up under different specfications of an analyses</li>
<li>How much bias would have to be present to invalidate an inference</li>
</ul>
<p>Over the past year or so, I had learned about sensitivity analysis from a class and then work with <a href="https://msu.edu/~kenfrank/">Ken Frank</a>. At the time we received this decision, I had been working with Ken and Ran Xu to develop an R package and web application to make it easier to carry out Ken’s approach to sensitivity analysis, focused on the second of the two ways (above) that it’s carried out, find out how much bias would have to be present to invalidate an inference.</p>
<p>I want to share what we incuded in our revised manuscript and then a message that Ken, Ran, and I crafted to distribute the web application we just completed. Without ado, the paragraph we included in our paper is below, followed by our announcement of the web application (and still indevelopment R package) we used to write the paragraph and to make this revision to the paper.</p>
<h1 id="paragraphweincludedinourrevisedmanuscript">Paragraph we included in our revised manuscript</h1>
<p>Here is the paragraph we included, first introducing sensitivity analysis and then describing results from carrying it out for the main findings for our paper:</p>
<blockquote>
<p>Particularly for studies that do not use experimental designs, it can be important to determine how robust an inference is to alternative explanations. One approach to addressing this is sensitivity analysis, which involves quantifying the amount of bias that would be needed to invalidate an inference (hypothetically, this bias might be due to omitted or confounding variables, measurement, missing data, etc.). Using the approach described in Frank, Maroulis, Duong, and Kelcey (2013), we carried out sensitivity analysis for inferences we made relative to our key findings. The result is a numeric value for each effect that indicates the proportion of the estimate that would have to be biased in order to invalidate the inference: higher values indicate more robust estimates in that the inferences would still hold even if there were substantial bias in the estimate. For the effect of affect upon engagement, we determined that 84.94% of the estimate in Model 1 and 73.22% of the estimate in Model 2 would have to be due to bias to invalidate the inferences about these relationships. For the sensitivity of the effect of choice in Models 1 and 2, we found that 41.95% and 42.13% of the estimate would have to be due to bias to invalidate the inference, respectively. For the effect of location 54.97% of the estimate in Model 1 and 55.30% of the estimate in Model 2 would have to be due to bias to invalidate the inferences. These large values across all the sensitivity analyses conducted are considered high relative to prior studies using this method (see Frank et al., 2013 for many examples), and suggest that these findings are likely robust in light of possible confounding variables (such as covariates that were not included in the analyses in this study) and other sources of potential bias. Further, we can consider the impact of data that is not missing at random. A small number of missing responses associated with null effects could invalidate inferences about key findings assuming the percent bias needed to invalidate the inferences is small. Considering the large proportions of estimates that would have to be biased to invalidate the inferences made, we can conclude that these findings are robust in light of the data that is presently missing.</p>
</blockquote>
<h1 id="thekonfounditwebapplication">The KonFoundIt web application</h1>
<p>We are happy to announce the release of an interactive web application, <a href="http://konfoundit.com">KonFoundIt</a>, to make it easy to quantify the conditions necessary to change an inference. For example, <a href="http://konfoundit.com">KonFoundIt</a> generates statements such as “XX% of the estimate would have to be due to bias to invalidate the inference” or “an omitted variable would have to be correlated with the outcome and predictor of interest (e.g., treatment) at ZZ to invalidate the inference.” Thus, KonfoundIt provides a precise language for debating causal inferences.</p>
<p>Who would use such an approach in the course of their work?
* Researchers receiving a revise and resubmit decision for a manuscript who is asked to carry out sensitivity analyses or alternative analyses as part of the revision of their manuscript.
* Practitioners (including policymakers) seeking to understand whether evidence in a research report is strong enough to support action in their contexts
* Data analysts and scholars debating whether a research community should make an inference in light of an estimated effect
* Those analyzing observational data seeking to understand how trustworthy an inference is relative to potential omitted variables
* Those analyzing data from a randomized controlled trial (RCT) trying to understand how generalizable the results from the study are to other contexts
* Those needing to characterize the strengths of their research for the media / a wider audience, including audiences of teachers and administrators</p>
<p><a href="http://konfoundit.com">KonfoundIt</a> takes four values  the estimated effect (such as an unstandardized regression coefficient), its standard error, the number of observations, and the number of covariates. <a href="http://konfoundit.com">KonFoundIt</a> returns output in the forms of publishable statements as well as figures to support the interpretation of the output. As you can see in the figure below, the <a href="http://konfoundit.com">KonFoundIt</a> interface is easytouse (and can be accessed from both a computer browser as well as mobile devices) and provides links to additional resources and a State procedure and R package for further use.</p>
<p><a href="http://konfoundit.com"><img src="/_media/images/konfounditss.png" alt="Screenshot" /></a>
To learn more, check out <a href="http://konfoundit.com">KonFoundIt</a> here: <a href="http://konfoundit.com">http://konfoundit.com</a>. If you have any questions or feedback, please do not hesitate to contact Ken Frank (and Joshua Rosenberg and Ran Xu) at kenfrank@msu.edu.</p>

Outcomes from a selfgenerated utility value intervention in science (in IJER)
/blog/newarticleonautilityvalueinterventioninijer/
Sat, 30 Dec 2017 00:00:00 +0000
/blog/newarticleonautilityvalueinterventioninijer/
<p>The <a href="http://www.sesp.northwestern.edu/learningsciences/researchprojects/">Scientific Practices project</a>, was focused on engaging middle school students in scientific and engineering practices (such as developing and using models, constructing explanations of phenomena, and analyzing and interpreting data). As part of this longitudinal project, we carried out field experiments to understand the impact of specific features of the curriculum.</p>
<p>In this paper published in the <a href="https://www.journals.elsevier.com/internationaljournalofeducationalresearch">International Journal of Educational Research</a> with Mete Akcaoglu, John Ranellucci, Christina Schwarz, and I examined whether asking students to generate ideas about how what they were learning about could be useful in the future.</p>
<p>Students who wrote about why what they were learning was relevant generated future uses for what they were learning. Students who wrote about how what they were learning about could be useful demonstrated higher levels of cognitive processing in their responses, which we found through the use of a computational, Natural Language Processing technique (the <a href="http://liwc.wpengine.com/">LIWC</a>). We found that these students who wrote about how what they were learning could be useful reported changes in their utility value for science (in comparison to students in a control group who were asked to summarize what they were learning), but not higher levels of interest in science.</p>
<p>Here’s the abstract:</p>
<blockquote>
<p>The purpose of this field experiment was to understand whether fifth and sixthgrade students were able to write about the usefulness and relevance of what they were learning in their science class through selfgenerated reflections and to examine the impacts of this activity on students’ value, utility value, and interest for science. Analysis of students’ essays revealed in the selfgenerated reflection condition students connected what they were learning to their lives significantly more than the control condition. Linguistically, student essays did not differ between the two conditions, except for cognitive processing. Selfreflecting increased students’ utility value but not value nor interest. Selfefficacy did not moderate these relations. Implications for extending selfgenerated utility value and broader socialpsychological interventions for early adolescent students are discussed.</p>
</blockquote>
<p>The paper is available from <a href="https://www.sciencedirect.com/science/article/pii/S0883035517308492">here</a>.</p>

Modifying an R function to iterate (using purrr) and use nonstandard evaluation (using rlang)
/blog/modifyinganrfunctiontousenonstandardevaluation/
Sun, 17 Dec 2017 00:00:00 +0000
/blog/modifyinganrfunctiontousenonstandardevaluation/
<div id="background" class="section level4">
<h4>Background</h4>
<p>Research in classrooms and schools can be complex because of all of the factors that matter. A question that often comes up when we say that we observed some pattern in data is, <em>but did you control for X</em>?</p>
<p>In the context of working on an approach to find out how impactful an omitted variable would need to be to invalidate an inference, we had to modify a function that worked for a single sensitivity analysis to work for many and to be easier to use.</p>
</div>
<div id="theinitialversionofthefunction" class="section level4">
<h4>The initial version of the function</h4>
<p>In the context of programming (and mathematics!), many functions take inputs and then transform them into output.</p>
<p>Imagine we have a function that takes two values used to calculate a <em>t</em>test, the <em>t</em> statistic (the coefficient divided by its standard error) and the degrees of freedom of the <em>t</em> distribution. it then returns a bunch of output about how sensitive an inference based on the <em>t</em>test is to bias or to an omitted confounding variable.</p>
<p>Here is a function that outputs values about the sensitivity of a <em>t</em>test as a row in a row of a data frame (actually an R <code>data.frame</code> modified to be a bit easier to work with, an <code>R</code> tibble).</p>
<p>It’s a bit of a whopper:</p>
<pre class="r"><code>core_sensitivity_mkonfound < function(t, df, alpha = .05, tails = 2) {
critical_t < stats::qt(1  (alpha / tails), df)
critical_r < critical_t / sqrt((critical_t ^ 2) + df)
obs_r < abs(t / sqrt(df + (t ^ 2)))
# for replacement of cases framework
if (abs(obs_r) > abs(critical_r)) {
action < "to_invalidate"
inference < "reject_null"
pct_bias < 100 * (1  (critical_r / obs_r)) }
else if (abs(obs_r) < abs(critical_r)) {
action < "to_sustain"
inference < "fail_to_reject_null"
pct_bias < 100 * (1  (obs_r / critical_r)) }
else if (obs_r == critical_r) {
action < NA
inference < NA
pct_bias < NA
}
if ((abs(obs_r) > abs(critical_r)) & ((obs_r * critical_r) > 0)) {
mp < 1
} else {
mp < 1
}
# for correlation based framework
itcv < (obs_r  critical_r) / (1 + mp * abs(critical_r))
r_con < round(sqrt(abs(itcv)), 3)
out < dplyr::data_frame(t, df, action, inference, pct_bias, itcv, r_con)
names(out) < c("t", "df", "action", "inference", "pct_bias_to_change_inference", "itcv", "r_con")
out$pct_bias_to_change_inference < round(out$pct_bias_to_change_inference, 3)
out$itcv < round(out$itcv, 3)
out$action < as.character(out$action)
out$inference < as.character(out$inference)
return(out)
}</code></pre>
<p>Let’s test it out. Imagine we have a <em>t</em> statistic of <code>3</code> for a hypothesis test associated with <code>100</code> degrees of freedom.</p>
<pre class="r"><code>core_sensitivity_mkonfound(3, 100)</code></pre>
<pre><code>## # A tibble: 1 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3 100 to_invalidate reject_null 32.276 0.115
## # ... with 1 more variables: r_con <dbl></code></pre>
<p>Works.</p>
<p>It looks like in order to invalidate the inference, around <code>32</code>% of the effect would be need to be due to bias; or, an committed variable would need to be correlated with both the predictor of interest and the dependent variable at <code>.339</code> in order to invalidate the inference. You can read more about sensitivity analysis and an indevelopment R package on the approach to sensitivity analysis with Ran Xu and Ken Frank <a href="https://jrosen48.github.io/konfound/articles/Introduction_to_konfound.html">here</a>.</p>
</div>
<div id="thesecondversionofthefunction" class="section level4">
<h4>The second version of the function</h4>
<p>How could we write a function to provide output not only for one <em>t</em> and its associated <em>df</em>, but rather many values?</p>
<p>We can write a simple function to iterate through multiple values and to bind them together. The key is the <code>map()</code> function (from the <code>tidyverse</code> package <code>purrr</code>; if you are familiar with <code>R</code>, it is similar to many of the <code>apply()</code> functions). Specifically, because we:</p>
<ul>
<li>Have two variables that we are iterating through</li>
<li>Want the output in <code>data.frame</code> form</li>
</ul>
<p>We use <code>map2_dfr()</code>. Check out <a href="http://r4ds.had.co.nz/iteration.html">this helpful chapter of R for Data Science</a> for more on iteration using approaches such as for and while loops as well as the useful <code>apply()</code>/ <code>map()</code> families of functions.</p>
<p>Here is what a function could look like:</p>
<pre class="r"><code>library(purrr)
mkonfound < function(t, df, alpha = .05, tails = 2) {
map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}</code></pre>
<p>Simple! But does it work? :)</p>
<p>Instead of passing a single <em>t</em> and <em>df</em>, as we did above with the <code>core_sensitivity_monfound()</code> function, we can pass vectors of <em>t</em> and <em>df</em> values:</p>
<pre class="r"><code>mkonfound(t = c(3, 2, 2.5),
d = c(100, 200, 150))</code></pre>
<pre><code>## # A tibble: 3 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3.0 100 to_invalidate reject_null 32.276 0.115
## 2 2.0 200 to_invalidate reject_null 1.378 0.002
## 3 2.5 150 to_invalidate reject_null 20.364 0.048
## # ... with 1 more variables: r_con <dbl></code></pre>
<p>We could also do something like binding <em>t</em> and <em>df</em> together into a small <code>data.frame</code>:</p>
<pre class="r"><code>d < data.frame(t = c(3, 2, 2.5),
df = c(100, 200, 150))
d</code></pre>
<pre><code>## t df
## 1 3.0 100
## 2 2.0 200
## 3 2.5 150</code></pre>
<p>And then <code>mkonfound()</code> could work like this:</p>
<pre class="r"><code>mkonfound(d$t, d$df)</code></pre>
<pre><code>## # A tibble: 3 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3.0 100 to_invalidate reject_null 32.276 0.115
## 2 2.0 200 to_invalidate reject_null 1.378 0.002
## 3 2.5 150 to_invalidate reject_null 20.364 0.048
## # ... with 1 more variables: r_con <dbl></code></pre>
</div>
<div id="thethirdversionofthefunction" class="section level4">
<h4>The third version of the function</h4>
<p>Still seems to work fine. Those of you familiar with the <a href="https://www.tidyverse.org/">tidyverse</a> may sense another possible improvement. Namely, the function could be written to both input and output a <code>data.frame</code>, and be a bit more intuitive to use via nonstandard evaluation.</p>
<p>The goal is to add an additional argument for the <code>data.frame</code> (<code>d</code>), and then use nonstandard evaluation to capture <em>and then later evaluate in the context of the data.frame</em> the names of the <em>t</em> and <em>df</em> columns</p>
<pre class="r"><code>library(rlang)</code></pre>
<pre><code>## Warning: package 'rlang' was built under R version 3.4.3</code></pre>
<pre class="r"><code>library(dplyr)
mkonfound < function(d, t, df, alpha = .05, tails = 2) {
t_enquo < enquo(t)
df_enquo < enquo(df)
t = pull(select(d, !!t_enquo))
df = pull(select(d, !!df_enquo))
map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}</code></pre>
<p>But does it work? Now, the first argument is the name of the <code>data.frame</code>, the second is the unquoted name of the column with the <em>t</em> statistics, and the third is the same as the second, but for the <em>df</em> associated with the <em>t</em>’s.</p>
<pre class="r"><code>mkonfound(d, t, df)</code></pre>
<pre><code>## # A tibble: 3 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3.0 100 to_invalidate reject_null 32.276 0.115
## 2 2.0 200 to_invalidate reject_null 1.378 0.002
## 3 2.5 150 to_invalidate reject_null 20.364 0.048
## # ... with 1 more variables: r_con <dbl></code></pre>
<p>If we have an entire spreadsheet, read in R as a <code>data.frame</code> using the <code>read.csv()</code> (or, <code>read_csv()</code> from the very useful <code>readr</code> package) function, then we can easily compute output for all of the statistics in the spreadsheet. Here is a spreadsheet from a website (from Ken’s):</p>
<pre class="r"><code>spreadsheet_of_vals < read.csv("https://msu.edu/~kenfrank/example%20dataset%20for%20mkonfound.csv")
head(spreadsheet_of_vals)</code></pre>
<pre><code>## t df
## 1 7.076763 178
## 2 4.127893 193
## 3 1.893137 47
## 4 4.166395 138
## 5 1.187599 97
## 6 3.585478 87</code></pre>
<p>We would use it the same way as above but with <code>d</code> replaced with what we named the <code>data.frame</code> we read from the website, <code>spreadsheet_of_vals</code>:</p>
<pre class="r"><code>mkonfound(spreadsheet_of_vals, t, df)</code></pre>
<pre><code>## # A tibble: 30 x 7
## t df action inference
## <dbl> <int> <chr> <chr>
## 1 7.076763 178 to_invalidate reject_null
## 2 4.127893 193 to_invalidate reject_null
## 3 1.893137 47 to_sustain fail_to_reject_null
## 4 4.166395 138 to_invalidate reject_null
## 5 1.187599 97 to_sustain fail_to_reject_null
## 6 3.585478 87 to_invalidate reject_null
## 7 0.281938 117 to_sustain fail_to_reject_null
## 8 2.549647 75 to_invalidate reject_null
## 9 4.436048 137 to_invalidate reject_null
## 10 2.045373 195 to_invalidate reject_null
## # ... with 20 more rows, and 3 more variables:
## # pct_bias_to_change_inference <dbl>, itcv <dbl>, r_con <dbl></code></pre>
<p>Since the output is in a <code>data.frame</code>, we can, for example, easily plot output:</p>
<pre class="r"><code>results_df < mkonfound(spreadsheet_of_vals, t, df)</code></pre>
<pre class="r"><code>library(ggplot2)
results_df$action < dplyr::case_when(
results_df$action == "to_invalidate" ~ "To Invalidate",
results_df$action == "to_sustain" ~ "To Sustain"
)
ggplot(results_df, aes(x = pct_bias_to_change_inference, fill = action)) +
geom_histogram() +
scale_fill_manual("", values = c("#1F78B4", "#A6CEE3")) +
theme_bw() +
ggtitle("Histogram of Percent Bias") +
facet_grid(~ action) +
theme(legend.position = "none") +
ylab("Count") +
xlab("Percent Bias")</code></pre>
<pre><code>## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
<p><img src="/blog/20171217modifyinganrfunctiontousenonstandardevaluation_files/figurehtml/unnamedchunk121.png" width="672" /></p>
<p>Like many functions in R, this could be written many different ways, and this post shows just one approach to writing a function.</p>
<p>In some cases, nonstandard evaluation makes the function a bit harder to use  particularly in cases in which we are interested in the output from only a single study.</p>
<p>In that case, we would want to go back to the function we initially wrote (<code>core_sensitivity_mkonfound()</code>) or would have to write something a bit like:</p>
<pre class="r"><code>single_study < data.frame(t = 3, df = 100)
mkonfound(single_study, t, df)</code></pre>
<pre><code>## # A tibble: 1 x 7
## t df action inference pct_bias_to_change_inference itcv
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 3 100 to_invalidate reject_null 32.276 0.115
## # ... with 1 more variables: r_con <dbl></code></pre>
<p>So, this is one approach that is useful in one use  for the indevelopment package for sensitivity analysis with <a href="https://jrosen48.github.io/konfound/reference/index.html">a number of functions</a>, with a version of this <code>mkonfound()</code> function for metaanalyses that make use of the approach.</p>
<p>Oh, and if you are interested in sensitivity analysis, please check out the <code>konfound</code> package this is for <a href="https://jrosen48.github.io/konfound/">here</a>.</p>
</div>

New post on engaging students in “data practices” in online science classes (in the MVLRI Research, Policy, Innovation & Networks blog)
/blog/mvlriblogpostondatapractices/
Tue, 21 Nov 2017 00:00:00 +0000
/blog/mvlriblogpostondatapractices/
<p>I have worked to design activities for students to work with authentic data sources in their science classes through a collaboration with <a href="https://michiganvirtual.org">Michigan Virtual School</a> and the <a href="https://mvlri.org/">Michigan Virtual Learning Research Institute</a>.</p>
<p>As a part of this project, which was supported by their dissertation research fellowship, I hosted a <a href="https://jrosen48.github.io/blog/mvlriwebinar/">webinar</a>. I also wrote a post for the Michigan Virtual Learning Research Institute Blog: Research, Policy, Innovation & Networks <a href="https://mvlri.org/blog">blog</a>.</p>
<p>Here is an excerpt of the post:</p>
<blockquote>
<p>Data are powerful, both in science and science education as well as in our everyday lives. By preparing students to think about data, students can question the claims of scientists, news media, and experts in marketing by questioning what data were collected – and how. Moreover, by preparing students to think with data, students can use data to answer questions that are relevant and interesting to them. Being able to think of and with data is powerful not only in science (and other STEM areas of study) but also in occupations that did not traditionally involve a focus on data, such as journalism.</p>
</blockquote>
<blockquote>
<p>This post explores the topic of work with data, particularly a set of activities, or what the Next Generation Science Standards [1] (NGSS and the similar Michigan Science Standards) and the Common Core State Standards (CCSS) refer to as “practice.” In short, these are activities akin to what experts in STEM—scientists, mathematicians, engineers, and even data scientists—do. To refer to practices focused on work with data, we use the term “data practices.” Data practices draw not only from the practices of developing and using models and analyzing and interpreting data, but also obtaining, evaluating, and communicating information and, in many cases, using mathematics and computational thinking.</p>
</blockquote>
<p>The post was published yesterday and I hope you check it (and the excellent posts by other scholars and administrators on the blog) <a href="https://mvlri.org/blog/opportunitiesengagingstudentsdatapracticesonlinescienceclasses/">here</a>.</p>

Review of 'What’s Worth Teaching: Rethinking Curriculum in the Age of Technology' in Teachers College Record
/blog/reviewofwhatsworthteachingrethinkingcurriculumintheageoftechnology/
Tue, 07 Nov 2017 00:00:00 +0000
/blog/reviewofwhatsworthteachingrethinkingcurriculumintheageoftechnology/
<p>I was recently requested to submit a review with Charles Logan for the book <em>What’s Worth Teaching: Rethinking Curriculum in the Age of Technology</em> by the incredible scholar Allan Collins. A link to the review and book on the Teachers College Record website <a href="http://www.tcrecord.org/Content.asp?ContentID=22173">here</a>. A preprint of our review is also available <a href="https://github.com/jrosen48/homepagesource/raw/master/static/_media/publications/RosenbergLoganTCRReview2017.pdf">here</a>.</p>

Two data packages: Railtrails and an assessment of student achievement
/blog/twoopendatasetsrailtrailsandanassessmentofstudentachievement/
Wed, 25 Oct 2017 00:00:00 +0000
/blog/twoopendatasetsrailtrailsandanassessmentofstudentachievement/
<p>Because of interest and the need for better examples (for teaching and for use in tools underdevelopment), such as <a href="https://github.com/jrosen48/prcr">prcr</a> and <a href="https://github.com/jrosen48/tidyLPA">tidyLPA</a>, I worked to create two data packages, data easily available through an R package.</p>
<p>A benefit of the data being in an <code>R</code> package is that it is even easier to access than other formats (in <code>R</code>): Just load the name of the package and type the name of the data frame, or, if the data is included as builtin data in another package (one that is loaded), just type the name of the data frame. This can be helpful because data packages can make it easy to use an interesting dataset right away (loading data can sometimes be surprisingly hard) Another benefit is the data are documented and easily joined (in the case of the student questionnaire and achievement data in the PISA data) with related information, such as students’ schools. The data are also saved in an efficient format.</p>
<ul>
<li><p><a href="https://github.com/jrosen48/pisaUSA15">pisaUSA15</a>: Data package that provides student questionnaire data from the 2015 PISA for students in the United States of America. I’ve used these for examples in the <code>prcr</code> and <code>tidyLPA</code> packages. More information is available at <a href="http://www.oecd.org/pisa/data/" class="uri">http://www.oecd.org/pisa/data/</a>.</p></li>
<li><p><a href="https://github.com/jrosen48/railtrails">railtrails</a>: Data package with trail data from the RailstoTrails Conservancy, including the trail name, state, distance, and surface. I’ve used this data to illustrate ideas behind mixed effects (or multilevel) models <a href="https://jrosen48.github.io/blog/characteristicsofrailtrails/">here</a>, <a href="https://jrosen48.github.io/blog/comparingmixedeffectsandlinearmodels/">here</a>, and <a href="https://jrosen48.github.io/blog/michiganrailtrailsandpathwaysthroughdata/">here</a>. More information is available at <a href="https://www.traillink.com/" class="uri">https://www.traillink.com/</a>. UPDATE: This package is now available on CRAN (see <a href="https://cran.rproject.org/web/packages/railtrails/index.html">here</a>).</p></li>
</ul>

MVLRI webinar on engaging students in authentic 'data practices' in online science classes
/blog/mvlriwebinar/
Wed, 18 Oct 2017 00:00:00 +0000
/blog/mvlriwebinar/
<p>I will be hosting a webinar on <em>Engaging students in authentic “data practices” in online science classes</em> with <a href="https://mvlri.org/">Michigan Virtual Learning Research Institute</a>, the research organization associated with <a href="https://michiganvirtual.org/students/">Michigan’s statewide online school</a>. The webinar is today (10/18) at 2pm EDT and is available via <a href="https://connect.mivu.org/mvlri" class="uri">https://connect.mivu.org/mvlri</a></p>
<div class="figure">
<img src="https://raw.githubusercontent.com/jrosen48/homepagesource/master/static/_media/images/DMWACQ4WAAAHNbL.jpg" title="Webinar announcement text" />
</div>
<p>Here is a bit more information on the webinar:</p>
<blockquote>
<p>Not only in science and science education but also our everyday lives, data are powerful, but opportunities for students to think of and with data as part of their learning are limited. This webinar discusses activities focused on work with data (or data practices) in the context of the Next Generation Science Standards and Michigan Science Standards as well as recommendation for engaging students in data practices based on findings from a designbased research study carried out at MVS.</p>
</blockquote>

Getting started with 'open science' through blogging
/blog/gettingstartedwithopensciencethroughblogging/
Sun, 01 Oct 2017 00:00:00 +0000
/blog/gettingstartedwithopensciencethroughblogging/
<p>Through a few different projects and people (such as <a href="https://www.improvingpsych.org/SIPS2017/">SIPS</a> and <a href="https://ropensci.org/">rOpenSci</a> and conversations with friends / colleagues both online and offline), I have been exposed to the idea of <em>open science</em>.</p>
<p>I’m actually going to punt for the moment. Here’s a <a href="https://en.wikipedia.org/wiki/Open_science">definition</a> that sounds about right to me:</p>
<blockquote>
<p>Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open notebook science, and generally making it easier to publish and communicate scientific knowledge.</p>
</blockquote>
<p>One way I have found valuable for thinking about how to get started with open science is through blogging. I have blogged for more than 10 years (using WordPress up until this year) but have never had as many people mention that they saw or liked something as I have since I started using <a href="https://github.com/rstudio/blogdown">Blogdown</a>. Blogdown is a package for the statistical software <a href="https://www.rproject.org/">R</a>. It’s main benefit, apart from allowing you to create websites using the platform <a href="https://gohugo.io/">Hugo</a>  which serves a similar role as a platform such as WordPress (or Weebly or Wix)  to create websites is it allows you to write text inline with R code.</p>
<p>For example, here is a trivial example using a builtin data set (about diamonds) and the <code>ggplot2</code> R package for creating figures:</p>
<pre class="r"><code>library(ggplot2)
ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
geom_point()</code></pre>
<p><img src="/blog/20171001gettingstartedwithopensciencethroughblogging_files/figurehtml/unnamedchunk11.png" width="672" /></p>
<p>I have spent some time over the past few months blogging this way. Here are two examples:</p>
<ul>
<li><a href="https://jrosen48.github.io/blog/comparingmplusandmclustoutput/">Comparing MPlus and MCLUST output</a> (to benchmark an approach to carrying out Latent Profile Analysis in R with the same analysis in MPlus)</li>
<li><a href="https://jrosen48.github.io/blog/characteristicsofrailtrails/">Using characteristics of railtrails to predict how they are rated</a> (to illustrate some key ideas behind mixedeffects models)</li>
</ul>
<p>Writing both text and code in these ways (and as demonstrated in this post) provides a platform, especially for emerging scholars, to share work inprogress and engage with some of the ideas behind open science, namely writing for an audience of both both professional scientists and other interested in the content and practicing notebook science, making it easier for others to use the code to carry out a new analysis or replicate other analyses.</p>
<p>I have a few specific thoughts relate to why some folks have come across my blog who otherwise might not have, particularly by coming across the post I wrote comparing MPlus and MCLUST output. As I wrote at the start of <a href="https://jrosen48.github.io/blog/comparingmplusandmclustoutput/">that post</a>, while MPlus is a widelyused tool to carry out Latent Profile Analysis but there does not seem to be a widelyaccepted or used way to carry out Latent Profile Analysis in R.</p>
<p>What the post does that I think is especially useful in compare and contrast the output from the two approaches. There is a lot of interest (I’ll speak for educational researchers) in using opensource tools: The two most common data analysis tools (apart from Microsoft Excel) are <a href="https://www.ibm.com/analytics/us/en/technology/spss/">SPSS</a> and <a href="https://www.statmodel.com/">MPlus</a> followed by <a href="https://www.sas.com/en_us/companyinformation.html">SAS</a> [and] [Stata](<a href="https://www.stata.com/" class="uri">https://www.stata.com/</a>). Each has its features (and its challenging aspects) but I think R (and Python, too, in slightly different quarters) matches up to and in some cases improves on what each of them offer. Unlike them, it is crossplatform and freelyavailable, which means both beginning researchers and scholars doing cuttingedge analyses  and students  can use it for a wide range of purposes. I think comparing two approaches  one using opensource and one using proprietary tools  helps to build confidence (for me and I think for others who are considering using the approach) that the approach both does what we think it does and compares to a common approach that uses other software. This is important because at the moment, I think there is not yet wide as wide acceptance of opensource approaches (including R packages, Python libraries, and standalone software) as those in proprietary software. This makes sense but it puts the responsibility on those developing opensource approaches to show how the analysis compares to other approaches.</p>
<p>To circle back to the point of this post, blogging provides a place for making this type of work more open  to inquiring minds including others trying to carry out a similar approach, those learning the approach for whom the tutoriallike writing is helpful, and as a place to ask for and receive feedback from experts. In short, blogging, particularly using Blogdown or something else that allows you to write text inline with (R, Python, or maybe even SPSS and MPlus) code is  at least based on one definition of it  a candidate way to get started with open science. And in that spirit  I welcome any questions, particularly if you are looking to get started with <a href="https://github.com/rstudio/blogdown">Blogdown</a>. There are probably a lot of opportunities for my peers in educational research to continue to hash out what these ideas mean to us working in a field in which <a href="http://www.colorado.edu/registrar/students/records/ferpa">the privacy of the data we collect is of paramount importance</a>. Feel free to check out other posts or pages on this site and send me a message at <a href="mailto:jrosen@msu.edu">jrosen@msu.edu</a> or reach out and connect on <a href="https://twitter.com/jrosenberg6432">Twitter</a>.</p>