Chapter 2: Historicizing Writing Assessment
O’Neill, Moore, and Huot trace the history of writing assessment along a few motiving lines. Early on they advocate, “writing assessment should always be about improving writing instruction—a strongly supported tenant of current validity theory” (14); however this has not always been the case. In fact, in the earliest moments of writing testing, “teacher judgments about student preparation were found suspect. A test was assumed to be better at helping university admissions personnel make important, consequential decisions about students than judgments of secondary teachers” (17). However, this disconnection of writing instruction and assessing writing is problematic since, as O’Neill writes, the writing assessment practices and composition—as a field of study—are intimately tethered since it is often the writing assessment procedures that define what we value in writing and what writing looks like.
Like most histories of writing assessment, they begin with reliability as the central organizing principle in the early moments of writing assessment’s implementation, spearheaded primarily by the educational measurement community. Defined, reliability refers to how consistently a test measures what it measures. Defined further, instrument reliability refers to the consistency of the scores (whether a test taker receives the same score in multiple administrations), and interrater reliability refers to the consistency among raters reading one writing sample. In this early part of its history, reliability is often conflated with validity or at least seen as a necessary component for validity. It is here that we move to scoring rubrics as a tool for interrater reliability.
Validity, on the other hand, has not received much attention early in the history of writing assessment as a field. In fact, the theorizing of validity within the educational measurement community has only recently begun to manifest in writing assessment scholarship. As the authors write, this disconnection from modern validity theory is evidenced by several scholars’ use of a reductive definition of validity: the degree to which a test measures what it purports to measure. Or relying on “face validity”: a test looks like it would measure the desired ability or trait.
However, in modern validity theories, it is more involved. The authors allude to the trinity of validity concepts; content, criterion, and construct. Specifically, a unified theory of validity involves only construct (the other concepts are included within it). Such a unified theory “demands that test consequences and implications of the local educational environment be considered” (27). In other words, construct tests whether or not “a test is a worthy construct of the ability or trait being measured” thus having implications for how teachers prepare students for such tests (26).
Chapter 3: Considering Theory
At the onset, they make clear that theory and practice are intimately intertwined. While they recognize that each theory and practice focus on particular concepts, they also believe that a practice is always operating within some kind of theoretical stance. Almost synecdochly, practice is the operational dimension of theory (Shuster). Accordingly, their focus is on instructors and administrators to become conscious “of the theories informing practice as part of the assessment process” (37).
Writing assessment theory is—and should be—grounded in a theory and understanding of language, literacy, and composition: “without a sense of how language is learned and how literacy functions, an assessment may not yield information that is accurate, useful, or valid” (38). Accordingly, an understanding of literacy and language must contend with the contextual nature of meaning, and that context involves knowledge beyond the linguistic code of grammar, words, and letters. In other words, language, literacy, and discourse are social: “all communication demands reader/listeners use extra-linguistic knowledge to determine meaning. Students need to have opportunities to engage in authentic language use if they are going to develop into sophisticated language users” (41). In this way, the authors lean heavily upon modern theories of validity that look to the valid use of results to make decisions. In such a formation, the way the assessment assesses the construct of writing—and the ways we use the results of this assessment—may have an impact on how writing is taught and understood.
IN this way, the authors reiterate the commonly held premise that writing assessment should be local and likewise, validation inquiry should be locally situated and conducted in order to investigate its contextual use within a particular instance. However, the validations should be ongoing and recursive in order to speak to changing evidence and changing contexts. They also allude to reliability and how reliability is measured along the lines of measurement error: “Measurement errors are considered random and unpredictable…because measurement errors are random and unpredictable, ‘they cannot be removed form observed scores’; however, they can be summarized and reported in various ways’” (Standards 26 qtd in O’Neill et al 49).
To conclude, the authors posit that there is not much agreement across writing assessment scholarship—much of which is due to the bifurcation of assessment scholarship along disciplinary lines of composition studies and educational measurement. They conclude with Huot’s six pronged assessment schema.
Chapter 4: Attending to Context
“Determining how a program defines itself (or has been defined by others) is an important first step toward developing context-sensitive assessment” (61). A program should consider these questions:
- What defines the writing program?
- Where do program values and philosophies come from?
- Who are the students?
- Who are the faculty?
- How are program values supported—or complicated—by course goals, curricula, and instruction?
- What does all of this mean for writing assessment?