Murphy, Sandra and Kathleen Yancey: “Construct and Consequence: Validity in Writing Assessment” (in Bazerman’s Handbook of Research on Writing) (22 pages)
Murphy and Yancey offer a review of research on validity in various writing assessment formats/procedures. They begin with outlining a set of terms:
Reliability: “refers to the reproducibility of a test’s results. …’a test is highly reliable if a student taking it on two different occasions will get two very similar if not identical scores. The key issue of reliability…is to establish that something is being measured with a certain degree of consistency’” (Heubert and Hauser 71 qtd in Murphy and Yancey 448).
Validity: “the key issue of validity is to determine…whether the test measures what it purports to measure and what meaning can be drawn from the results” (449).
Construct Validity: “it examines the extent to which an assessment tool conforms to a theory of writing” (Williamson 13 qtd in Murphy and Yancey 449). Messick identifies two major threats to construct validity:
- Construct underrepresentation: when a ‘test is too narrow and fails to include important dimensions or facets for the construct’
- Construct-irrelevant variance: occurs when a ‘test contains excess reliable variance, making items or tasks easier or harder for some respondents in a manner irrelevant to the interpreted construct.’ In other words, what external variables influence the writer’s ability on the assessment?
Contextual Validity: “involves the purpose of the assessment… ‘A measure that is highly valid for one use or inference, may be quite invalid for another…the criteria for judging the assessment must correspond to the purpose, regardless of the nature of form of the assessment’” (Linn, Baker, and Dunbar 20 qtd in M&Y 449).
Authenticity: “the degree of correspondence of the characteristics of a given language test task to the features of a target language use…task (Bachman & Palmer 23). Briefly, writing tasks should represent the type of writing that examinees will be expected to employ in the context of r which the assessment is designed” (449).
Consequence as a facet of validity: “appraisal of both potential and actual social consequences of an assessment should be undertaken in any effort to determine the validity of an assessment. For writing assessment, an effective argument for validity means, ideally, a positive influence on the teaching and learning of writing. Not least, the validation process demands ‘that we take responsibility to research our own assessment’ (Huot 15)” (449).
For indirect assessments of writing, such assessments fail on consequential validity: “actual writing begins to disappear from the curriculum and the curriculum begins to take the form of the test” (451). Similarly, such assessment fails validity concerns for construct representation (i.e. the construct itself is not being tested at all) and construct relevance.
For timed, impromptu writing samples, or direct writing assessments, there are still validity issues. Namely, they point to construct-irrelevant variance as a key concern: variation in writers’ knowledge of the subject, linguistic cultural background, and interpretation of task; variation in the rater’s disciplinary background, language and cultural background, and experience in the teaching of evaluation of writing; and variation of contextual effects including writer’s topic choice, the system of scoring trained and used (i.e holistic scoring), and time allotted for writing.
For Portfolio assessments, such assessment met important benchmarks for consequential validity. For example, “participation in scoring sessions for curriculum embedded assessments such as portfolios may contribute to teacher’s knowledge and expertise and to curricular reform…teachers learned about the qualities of student work form the conversations that occurred during scoring sessions” (459). However, they also note that this is not always the case: we have to be conscious about how we implement these assessments.
Portfolios also offer opportunities for self-assessment, namely students are asked to reflect on their work as a means of self-assessment. As the authors write, “learning and the processes of learning are best assessed in the social setting in which they occur, in the case of schools, in the classroom. From this perspective, students should have a role in negotiating the content and outcomes of the assessment. Promoting that role, it is believed, may encourage students to monitor and reflect ton their own performances” (460).
To conclude, they outline the new work of validity: in particular, as advocated by many, assessments should be and are increasingly personal, contextual and descriptive. As Moss proposes, raters should be understood in a hermeneutic perspective: “the most credible judges are those who are most knowledgeable about the context in which a performance occurred, an about the nature of the performance itself. Their judgments are grounded in the richest contextual and textual evidence available, including the observations of others in their interpretive community” (462). Thus, the challenge in future conversations of validity will be to “balance the demands of accountability and the development and validation of assessments that enhance learning and the educational environment as a whole” (463).