When we announced our research results last week, Audrey Watters was one of the first to cover it. Shortly thereafter, Justin Reich wrote a very thoughtful review of our research and response to Audrey’s blog post at his EdTechResearcher blog. Others, through comments made in post comments, blogs, emails, and conversations, have asserted that we (Auburn School Department) have made claims that our data don’t warrant.
I’d like to take a moment and respond to various aspects of that idea.
But first, although it may appear that I am taking on Justin’s post, that isn’t quite true (or fair to Justin). Justin’s is the most public comment, so the easiest to point to. But I actually believe that Justin’s is a quite thoughtful (and largely fair) critique from a researcher’s perspective. Although I will directly address a couple things Justin wrote, I hope he will forgive me for seeming to hold up his post as I address larger questions of the appropriateness of our claims from our study.
Our Research Study vs. Published Research –
Our results are initial results. There are a lot of people interested in our results (even the initial ones – there are not a lot of randomized control trials being done on iPads in education), so we decided to share what we had so far in the form of a research summary and a press release. But neither of these would be considered “published research” by a researcher (and we don’t either – we’re just sharing what we have so far). Published research is peer reviewed and has to meet standards for the kinds of information included. We actually have more data to collect and analyze (including more analyses on the data we already have) before we’re ready to publish.
For example, Justin was right to point out that we shared no information about scales for the ten items we measured. As such, some of the measures may seem much smaller than when compared proportionally to their scale (because some of the scales are small), and we were not clear that it is inappropriate to try to make comparisons between the various measures as represented on our graph (because the scales are different). In hindsight, knowing we have mostly a lay audience for our current work, perhaps we should have been more explicit around the ten scales and perhaps created a scaled chart…
Mostly, I want my readers to know that even if I’m questioning some folks’ assertions that we’re overstating our conclusions, we are aware that there are real limitations to what we have shared to date.
Multiple Contexts for Interpreting Research Results –
I have this debate with my researcher friends frequently. They say the only appropriate way to interpret research is from a researcher’s perspective. But I believe that it can and should also be interpreted as well from a practitioner’s perspective, and that such interpretation is not the same as a researcher’s. There is (and should be) a higher standard of review by researchers and what any results may mean. But practical implementation decisions can be made without such a high bar (and this is what makes my researcher friends mad, because they want everyone to be just like them!). This is just like how lawyers often ask you to stand much further back from the legal line than you need to. Or like a similar debate mathematicians have: if I stand some distance from my wife, then move half way to her, then move half way to her again, and on and on, mathematicians would say (mathematically) I will never reach her (which is true). On the other hand, we all know, I would very quickly get close enough for practical purposes! 😉
Justin is very correct in his analysis of our research from a researcher’s perspective. But I believe that researchers and practitioners can, very appropriately, draw different conclusions from the findings. I also believe that both practitioners and researchers can overstate conclusions from examining the results.
I would wish (respectfully) that Justin might occasionally say in his writing, “from a researcher’s perspective…” If he lives in a researcher world, perhaps he doesn’t even notice this, or thinks it implied or redundant. But his blog is admittedly not for an audience of researchers, but rather for an audience of educators who need help making sense of research.
Reacting to a Lay Blog as a Researcher –
I think Justin has a good researcher head on him and is providing a service to educators by analyzing education research and offering his critique. I’m a little concerned that some of his critique was directed at Audrey’s post rather than directly at our research summary. Audrey is not a researcher. She’s an excellent education technology journalist. I think her coverage was pretty on target. But it was based on interviews with the researchers, Damian Bebell (one of the leading researchers on 1to1 learning with technology), Sue Dorris, and me, not a researcher’s review of of our published findings. At one point, Justin suggests that Audrey is responding to a graph in our research summary (as if she were a researcher). I would suggest she is responding to conversations with Damian, Sue, and me (as if she were a journalist). It is a major fallacy to think everyone should be a researcher, or think and analyze like one (just as it is a fallacy that we all should think or act from any one perspective, including as teachers, or parents, etc). And it is important to consider individual’s context in how we respond to them. Different contexts warrant different kinds of responses and reactions.
Was It The iPads or Was It Our Initiative –
Folks, including Audrey, asked how we knew what portion of our results were from the iPads and which part from the professional development, etc. Our response is that it is all these things together. The lessons we learned from MLTI, the Maine Learning Technology Initiative, Maine’s statewide learning with laptop initiative, that has been successfully implemented for more than a decade, is that these initiatives are not about a device, but about a systemic learning initiative with many moving parts. We have been using the Lead4Change model to help insure we are taking a systemic approach and attending to the various parts and components.
That said, Justin is correct to point out that, from a research (and statistical) perspective, our study examined the impact that solely the iPad had on our students (one group of students had iPads, the other did not).
But for practitioners, especially those who might want to duplicate our initiative and/or our study, it should be important to note that, operationally, our study studied the impact of the iPad as we implemented them, which is to say, systemically, including professional development and other components (Lead4Change being one way to approach an initiative systemically).
It is not unreasonable to expect that a district who simply handed out iPads would have a hard time duplicating our results. So although, statistically, it is just the iPads, in practice, it is the iPads as we implemented them as a systemic initiative.
Statistical Significance and the Issue of “No Difference” in 9 of the 10 Tests –
The concept of “proof” is almost nonexistent in the research world. The only way you could prove something is if you could test every possible person that might be impacted or every situation. Instead, researchers have rules for selecting some subset of the entire population, rules for collecting data, and rules for running statistical analyses on those data. Part of why these rules are in place is because, when you are only really examining a small subset of your population, you want to try to control for the possibility that pure chance got you your results.
That’s where “statistical significance” comes in. This is the point at which researchers say, “We are now confident that these results can be explained by the intervention alone and we are not worried by the impact of chance.” Therefore, researchers have little confidence in results that do not show statistical significance.
Justin is right to say, from a researcher’s perspective, that a researcher should treat the 9 measures that were not statistically significant as if there were no difference in the results.
But that is slightly overstating the case to the rest of the world who are not researchers. For the rest of us, the one thing that is accurate to say about those 9 measures is that these results could be explained by either the intervention or by chance. It is not accurate for someone (and this is not what Justin wrote) to conclude there is no possitive impact from our program or that there is no evidence that the program works. It is accurate to say we are unsure of the role chance played on those results.
This comes back to the idea about how researchers and practitioners can and should view data analyses differently. When noticing that the nine measures trended positive, the researcher should warn, “inconclusive!”
It is not on a practitioner, however, to make all decisions based solely on if data is conclusive or not. If that were true, there would be no innovation (because there is never conclusive evidence a new idea works before someone tries it). A practitioner should look at this from the perspective of making informed decisions, not conclusive proof. “Inconclusive” is very different from “you shouldn’t do it.” For a practitioner, the fact that all measures trended positive is itself information to consider, side by side with if those trends are conclusive or not.
“This research does not show sufficient impact of the initiative,” is as overstated from a statistical perspective, as “We have proof this works,” is from a decision-maker’s perspective.
We don’t pretend to have proof our program works. What is not overstated, and appropriate conclusions from our study, however, and is what Auburn has stated since we shared our findings, is the following: Researchers should conclude we need more research. But the community should conclude at we have shown modest positive evidence of iPads extending our teachers’ impact on students’ literacy development, and should take this as suggesting we are good to continue our program, including into 1st grade.
We also think it is suggestive that other districts should consider implementing their own thoughtfully designed iPads for learning initiatives.
“Justin is very correct in his analysis of our research from a researcher’s perspective. But I believe that researchers and practitioners can, very appropriately, draw different conclusions from the findings.”
What does that mean?
“For the rest of us, the one thing that is accurate to say about those 9 measures is that these results could be explained by either the intervention or by chance.” So, the chance part didn’t work for you, so you’ll just conveniently ignore that?
What do you think Dr. Bebell would say about this?
I’m not ignoring the chance part. I’m saying that the lack of statistical signficiance doesn’t automatically mean the intervention was ineffective. It was Dr. Bebell’s conclusion that it would be appropriate for the community to decide to move forward based on these results.
Of course “the lack of statistical signficiance doesn’t automatically mean the intervention was ineffective.” But, the lack of statistical significance means you don’t have the right to promote the intervention as particularly effective. “…wasn’t ineffective” is a far cry from “effective.” Now, because of your puffery, the results are being touted throughout various media in ways that are totally misleading.
And, I would have said that it’s worth moving forward, too. But, only because you’re doing something that’s well-considered pedagogically and because there’s no reason to expect real, meaningful results after 1/2 year. What did Dr. Bebell say about the press release and what would he say about the way the media has picked up on it?
Jon, we have said we have modest positive results showing the iPad extends the impact of our teachers. I agree with you that “effective” is too strong a word and I’m struggling to see where I used it. Damian Bebell coauthored the press release with Sue Dorris and me.
Maybe not in the press release, but what about: “Now we have pretty good evidence it works!” and “…these results should be more than enough evidence to address the community’s question, “How do we know this works?”” and “So. We have what we were looking for:Confirmation that our vision works.”
Yes, in a blog post where I was excited that our modest results suggested the district should continue, a conclusion I confirmed with Damian on several occasions, including when he Skyped in to our School Committee meeting when we presented our initial results. I did not say hat the program was effective, which to me has a stronger connotation.
We have different views of puffery, the words I have used (I have tried to be careful with my words throughout the past week) and of how much control I have of the press, who have clearly upset you.
RE: “They say the only appropriate way to interpret research is from a researcher’s perspective. But I believe that it can and should also be interpreted as well from a practitioner’s perspective, and that such interpretation is not the same as a researcher’s. There is (and should be) a higher standard of review by researchers and what any results may mean. But practical implementation decisions can be made without such a high bar (and this is what makes my researcher friends mad, because they want everyone to be just like them!).”
Can you explain why implementation decisions should be made with lower standards? If research is to actually mean anything, it seems like we should start taking it more seriously.
Thanks for giving me a chance to clarify. I think my choice of words like higher standard and lower standard was confusing and didn’t exactly get at what I was trying to say.
I think it would have been more accurate if I had said that I thought researchers and practitioners are asking two different sets of questions.
Researchers (in situations like our study) are asking, “Can we rule out that chance caused the differences between the average performance in the control group from the average performance in the experimental group?” In our case, we can say yes for one measure and no for the others.
Practitioners are asking. “Do we have enough evidence to continue?” As a piece of answering that question, they use research to help INFORM their decision, not to MAKE their decision. They acknowledge that statisticians can’t rule out that it was chance in 9 of ten areas, but look at the research, and teacher observations, and parent observations, and other indicators and concluding that Auburn should continue the program is a reasonable conclusion to draw.
My experience from this announcement is that folks from a lot of different walks (researchers, reporters, parents, educators, etc.) can easily read a lot more than we said into our words. I’ve tried to be careful with my choice of words, but I guess haven’t always succeeded. That’s complicated by the fact that folks bring their own context to our writing.
What a lovely set of articles and feedback with comments in the blog. Lovely evidence of professionals thrashing out an issue. As a teacher who might have to act upon the conclusions of such research, I’d like to offer my perspective.
Daily I see pronouncements in the media about “observations” and the consequences of interventions – “boys underperformance compared to girls”, “the link between poverty and attainment” and the herein mentioned “iPads and attainment debate”. Nowhere in the media are actual statistics used – at best the means of data sets are compared in a very superficial manner.
Whilst the finer details of p-values and significance can be argued over (as the comments here will testify to) – teachers can handle real statistics. Please can we have more info in published articles so we can make our own minds up.
If the stats get removed, we are left “taking your word for it” and as this article demonstrates, interpretation of facts is open to manipulation – however well intentioned.
I’m a teacher, but I can think like a researcher and implementer .
Glen, this summer, when we complete our study, we’ll work to publish the study in a research journal and that will certainly have much more detail.
I look forward to seeing how this pans out. Sadly teachers normally can’t access peer reviewed journals without a personal subscription – hence I need journalists not not dumb things down too much by stripping out the meaty stats.
However, thanks for taking the time to actively respond to all the comments that the original article has spawned.
Pingback: Education & politics in Maine: is it the iPad? | Technology with Intention
Pingback: EdTechResearcher » What Should We Do with the Auburn Kindergarten iPad Findings?
I’ve been busy this week, but I wanted to offer one further blog response with two main points. First, I agree with you about how other districts should interpret your findings (assuming they stand up to scrutiny); I think your phrasing at the end of this post is quite fair. Second, I agree with Glen’s point above and disagree with your assertion that non-researchers should analyze quantitative data differently than researchers.
Here’s the full post:
Thanks for the conversation. I hope we have a chance to meet in person soon. If you are ever in Boston to meet at BC, let me know. Best,