Elliot, Perelman, White, Writing Assessment in the 21st Century (Parts II & IV) (~229 pages)
PART II: STRATEGIES IN CONTEMPORARY WRITING ASSESSMENT: BRIDGING THE TWO CULTURES
Setting Sail with Ed White: The Possibilities of Assessment and Instruction within College Writing Assessment, Diane Kelly-Riley
In the “modern era of writing assessment” as defined by White, there emerged new opportunities to combine assessment and instruction “that allowed the responsibility of writing assessment to reside in the hands of faculty and nor primarily administrators or test developers” (157). Using Washing State University’s implementation of a campus-wide portfolio assessment program, Kelly-Riley discusses the process of assessing these portfolios through faculty standard-setting and assessment sessions. The WSU model, much like that described by Elbow and Belanoff, relies heavily on the input of teaching faculty to make placement decisions: “faculty bring their expertise and experience from the classroom into the rating sessions, and they decide whether students’ writing abilities are ready for the specific writing challenges at the particular points in the curriculum” (161). But this collaboration is a means for faculty to articulate, negotiate, and maintain communal standards of writing values. The impact of such sessions becomes a benefit for the teaching of writing across the campus; defining and setting standards of writing values—particular placing such standard setting in the hand of teachers of writing—allows teachers to both (1) negotiate community standards through their own disciplinary lens; and (2) “return to their own disciplinary contexts with revised knowledge of the community standards” (163).
Assessment and Curriculum in Dialogue, Irvin Peckman
Peckman offers some platitudes when reflecting upon his program’s transition away from holistically scored, timed writing essays used for program-wide assessment of student progress in a writing course. As he writes, “assessment design should be based on a clear definition of what one wants to know and an instrument that is aimed like a rifle at getting that information” (169). And later, “if you want to predict how students will respond to a specific kind of writing situation in the future, replicate that writing situation in the assessment” (173). His assessment design is able to accomplish a couple of things: primarily, it assesses students writing ability, but it can also assess teacher’s effectiveness. In doing so, as Peckman writes, the assessment allowed the development of a common course: “be able to describe what specific writing skills we were focusing on, and be able to look at student writing at the end of the semester to measure our success in teaching them” (176). In this way, assessment and curriculum feed into one another; assessment coalesced the curriculum: “the purpose of the course and all writing in the course was to help students achieve high scores on the Semester Assessment. Consequently, assignments in the course had to be aimed in that direction” (176).
What Good is it?: The Effects of Teacher Response on Student’s Development, Chris Anson
Anson takes note of a gap in research on teachers’ response to student writing, namely, that there has been a strong focus on what teachers do, but not whether such response is incorporated into student revisions. In other words, what does response do to students? He attributes two reasons for such a gap, both centered on the process movement. First, much of the research on teacher response is in reaction to current-traditionalist models: “that if teachers change their means of intervention, student writing will improve” (189). Certainty, there was a change in tide in how teachers responded to student writing that marked a paradigmatic shift away from current-traditionalist models; however, students are often not involved in reflecting upon how such shifts in response helped them as writers. Second, with the emergence of the process movement, there was a simultaneous emphasis on offering significant support for student writing—this also brought about a renewed focus on teacher workload.
Once we actually look to student improvement, there appeared scant evidence that students actually used teacher feedback in revision or that students improve from one assignment to the next (See: Hillocks 1986; Knoblauch & Brannon 2006). Many teachers and writing tutors believe in the myth of improvement: “the belief that teacher feedback is not only important, but is the single most crucial variable in student growth as writers” (193). Anson attributes the dissonance—or fault lines as he says—to methodology of research: “serious fault lines in the research on response—which has been unable to show us with much certainty whether or how comments on students’ writing really work—end the up stabilized with strong convictions from teachers whose daily observations convince them that response is not only valuable but crucial to student development as writers” (194). The difference, then, is between research and teacher lore (see: North).
Anson begins to note that we should change how we understand how response is used for students. For example, referencing the work of Prior, Anson writes that the nature of response is dynamic and multifaceted, “its layered and sometimes hidden sources, shifts the focus away from teachers as the sole or most important source of response and toward other aspects of classroom interaction. For example, even in courses that do no systematically engage student in providing response to each other, there is little question that the broader peer dynamics blend fluidly with what teachers do or facilitate, especially in the student ‘underlife’” (196). He continues, “Teacher response, in other words, gets filtered, interpreted, remixed, and repurposed among students, influencing their decisions and their own response to tasks and evaluations” (196-7). It is in this way that he can also allude to everyday writing situations as contributing to student dispositions toward response.
Fostering Best Practices in Writing Assessment and Instruction with E-Rater, Jill Burstein
This piece of shit chapter attempts to explore the usefulness, helpfulness, and dependability of automated essay scoring software, e-rater. The author, obviously unaware about how writing works, operates in an understanding of writing that is in line with what the test constructs: namely, that writing is mechanical or mechanics-driven, stylistically consistent, self-referential. Of course, in this construction, her monstrosity will obviously be helpful to assess writing. She is also careful to note that many educators have bought into e-rater and offered feedback to help in its design (despite the fact that no right-minded educator of writing would ever aid in creating an e-rater—so, exactly who this hooded gang of educators are remains a mystery). However, I’ll also argue that in the process of trying to make the automated essay scoring software more closely ‘human’ in the way it assesses knowledge, we are in fact ignoring the ways that such alignment may also be working to automate human scoring/scorers. In other words, some teachers of writing may still be operating in paradigms of writing (such as current-traditionalist) that would easily translate into AES technologies; however, other paradigms would simply not see such options as feasible for the kinds of functions we see assessment accomplishing for student learning.
Writing to a Machine is Not Writing at All, Anne Herrington & Charles Moran
In a response to Burstein, Herrington and Moran outline criticism of ETS’ Criterion based on a trial run of their product. To guide their critique, the use a question that originates with Ed White to frame their analysis: “will the assessment and feedback given by the program to student writers be helpful to them as writers? Will it support our work with them as their teachers? Or will it become something we have to work around, or work against—an impediment to our teaching and our students’ learning?” (222). Their analysis points to the latter.
To begin, they note that the essay they test ran received a score of 5 out of 6 possible points; however, they note that such nuance in understanding the difference between a 5 and 6 (clear vs. insightful connections; relevant vs. well chosen evidence) would be a distinction Criterion would not be able to make. However, the machine made the distinction based on mechanical errors, many of which the machine did not accurately or reliably identify or identified “errors” without consideration of context. In this way, using Criterion would rely upon a teacher to counter the confused feedback a student may receive from Criterion; however, without a teacher’s intervention, a student is left to make revisions that would satisfy the “gamification” of the scoring rather than the quality of writing. In other words, Criterion—and the test developers—do not consider the impact for the quality of writing or writing instruction.
But more, in their focus on error, Criterion privileges Standard American English, and marks the dynamic fluidity of language as errors in comparison to the standard as opposed to valuing difference. “the program conveys the wrong priorities; focusing attention on surface features of grammar and form instead of substantive development and rhetorical effectiveness, and failing to be able to recognize effective use of language and syntax” (228). Lastly, the authors note that the cost-effectiveness of such a program actually creates a two-tiered assessment model “divided by wealth and connection, where poor and unconnected students write to machines and wealthy and connected students write to human readers” (230). And finally, “writing to a machine distorts the very nature of writing itself. Writing to a machine is writing to nobody. Writing to nobody is not writing at all” (230).
The Future of Portfolio-Based Writing Assessment: A Cautionary Tale, William Condon
Condon opens with an anecdote offered by Thomas Caryle about the ways that some idealist principles and practices become inflated and lose their purchase after building an institution around it: “Our beliefs turn into good intentions and, in turn, lead us to construct institutions designed to put those beliefs into action. Once in existence, however, those institutions outlast the validity of our beliefs, becoming agents of stasis rather than continuing as agents for the change we intended. Instead, the institution acts as an impediment for further positive change. Such is the history of writing assessment—or. For that matter, all kinds of assessments of competency or achievement” (233). Much like the consequences of “scaling up” direct writing assessment in response to indirect writing assessment, Condon warns against the same mishap occurring to portfolio assessment. As he writes, in the act of “correcting the problems of the previous models”, we simultaneously are “hampered by the previous methods’ flaws” by framing the new method in the paradigms of the old to demonstrate its currency: As he continues, “scale interferes, and the need for efficiency results in a continual reduction of the new model, until what is left is hardly different from—let alone better than—its predecessor” (236). In other words, “the attempt to compete with the past results in yielding to that past” (239). This problem occurs, in part, because developers are too willing to sacrifice validity “in favor of efficiency and consistency” (238).
We have already witnessed some ways in which portfolios have been scaled up from classroom practices to writing program assessment (Elbow & Belanoff); however, we must prevent such practice becoming “an agent that preserves the past rather than one that builds a more successful future” (240). He offers three suggestions moving forward:
- Avoid the urge to establish portfolios entirely according to past criteria: as he writes, “we have to reject the previous paradigm…solely intended to identify those lower on the vertical ranking in order to help them catch up” (241)
- Consider the ways portfolios can be generative rather than reduce them in the name of reliability and cost-effectiveness. In other words, question what portfolios allows us to see more of what our students are capable of in this new assessment format. Also, rather than reducing such format in the name of cost-effeteness, “we need to argue for cost-effectiveness in the sense that the extra cost of scoring portfolios…yields far more in useful and usable information—that, in effect, portfolios are worth paying for” (241-2).
- “we need to make an argument based on the future of higher education, and of our culture in general, rather than on their past…there will be a great need for robust assessment, but not really for traditional rankings” (242).
Complicating the Fail-or-succeed Dichotomy in Writing Assessment Outcomes, Jon A. Leydens and Barbara M. Olds
The authors implemented a program-wide pre- and post-test assessment that gauged student’s writing ability in a new class offered at their institution. As they found, the average and median scores of student’s writing on two prompts did not have any statistical change from pre- to post-assessment. As they write, they want to make a distinction between a poor assessment and disappointing results; however, they go about discussing what they believe contributed to the disappointing results by critiquing how they implemented the assessment. Many of the critiques are hypotheses or assumptions, and they specifically note that they do not want to consider that the results of the assessment may be due to teachers’ pedagogy. While they offer some unconvincing critiques that involved student motivation, they do posit that they may be noting a “latency period” “that may occur before they can demonstrate that skill effectively” (253). However, it’s also worth nothing that they scored the essays from a scale of 1-5 which may have also contributed to why they consistently saw average scores of 3.
Mapping a Dialectic with Edward M. White (in Four Scenes), Bob Broad
As the title suggests, Broad notes his relationship with White as a dialectic. Despite Broad’s criticism of White, he notes that they are fighting different battles: White “was to legitimate direct writing assessment by making it acceptably ‘scientific’ within the dominant paradigm at the time” (261); and Broad “was, and is, to shift communal writing assessment theories and practices to better fit our theories and values of language, reading, writing, research, and pluralist democracy” (261). He offers a brief summary of Dynamic Criteria Mapping (DCM), positioning it at the crux of grounded theory (emphasizing inductive value judgments from examples) and fourth-generation evaluation (privileging the integrity and validity of multiple and conflicting evaluations). While White critiques this method as too complicated, Broad is quick to point out that they do, in fact, agree that teachers and administrators articulate, document, and make visible values of writing.
The Given-New Contract: Toward a Metalanguage for Assessing Composition, Lee Odell
Odell advocates for a metalanguage to assess (and implicitly for students to invent) multimodal compositions that can also work as analogy for print compositions. This metalanguage must fulfill three criteria:
- Economical: “consisting of relatively small number of terms that teachers can easily keep in mind while working with a wide variety of print and multimodal compositions” (273).
- Descriptive Power: “enabling students and teachers to recognize successful practice” (273).
- Generative Power: “helping writers and readers formulate and articulate their ideas, feelings, reactions, and insights” (273).
He offers the idea of a given-new contract whereby students establish claims early on that lay the groundwork for identification with an audience (given). Such claims are meant to establish common ground (see: Burke). And with this established common ground, the author builds/scaffolds new insights. In a way, this reflects the tools described by Graff & Birkenstein in “They Say / I Say”.
PART IV: TOWARD A VALID FUTURE: THE USES AND MISUSES OF WRITING ASSESSMENT
Fighting Number with Number, Richard H. Haswell
Haswell pushes against composition’s perceived number-phobia, and advocates that program administrators should use numbers rhetorically, namely because numbers have a currency in educational institutions. Per “White’s law”: “if local composition programs do not create their own large-scale assessment, outsiders usually accompanied by a national commercial testing firm will do it for them” (414). As he summarizes, often in writing assessment battles, one set of numbers is pitted against another set, but to combat this, composition programs can prepare and provide numbers in anticipation of potential number battles. He calls this use of numbers procatalepsis: “A writer counter-argues by writing into a claim an argument countering the reader’s anticipated argument against the clam” (416). Such arguments, in classical rhetoric, often function like so: the claim, the anticipation objection, and the counterargument. Thus, in order to anticipate the objections from administration—often in the form of numbers—we can prepare counterarguments in the form of numbers.
Mass-Market Writing Assessments as Bullshit, Les Perelman
Perelman offers a critique of timed impromptu essays through the frame of bullshit: “it is just not that these statements are false. Truth or falsity does not determine bullshit, but rather the bullshit’s intention to be uncovered with truth of falsehood” (426). In other words, bullshit refers to a rhetor’s a speech act that intentionally uses false information to game the system. As he writes, testing organizations encourage and invite students to practice bull shit for timed impromptu essays: “the topics invite students to write about subjects for which they have little if any real information and about which they may never have given much consideration” (430). In the Guide to the SAT writing essay, it states “Writers may make errors in facts or information that do not affect the quality of their essays” (427). In fact, the SAT Writing Test “rewards putative facts, regardless if they are true or not” (427). In this way, students are rewarded for gaming the system—including misinformation that can support their theses without regard to their factual accuracy.
Perelman also notes that the scoring, itself, by scorers is a form of bullshit. For example, he notes that testing organizations who employ machine scoring encourage their human scorers to evaluate through counting error and word choice & ignore errors of fact. “the result is that the testing companies can show a close correlation between these graders and Automated Essay Scoring (AES)” (428). But more, graders often describe the scoring environment as “cyber sweatshops” where the timing constraints of the scoring environment prevents scorers from offering a thoughtful evaluation of the student writing; rather, such scoring involves providing a quick score that does not appear to deviate from the party line (i.e. this constitutes bull shit).
How Does Writing Assessment Frame College Writing Programs? Peggy O’Neill
“Although there are surely positive effects of locally designed writing assessments that are aligned with local contexts and assessment theory, there may also be unintended consequences that are less positive, even detrimental, or more likely, not clearly positive or negative. The very act of assessment changes the nature of the phenomenon being assed whether it is the individual students’ performances, the teacher’s instruction, or the program itself. This change, of course, may be part of the assessment program’s goals. However, some of the changes or consequences may not be intended” (441).
Changing the Language of Assessment: Lessons from Feminism, Cindy Moore
Of course, language matters, and it is no less true in assessment contexts. In the history of writing assessment, many educational measurement and writing assessment scholars have illustrated how “traditional assessment tems (most notably validity and reliability) can be used both to promote meaningful assessments and to highlight the usefulness of results” (458). In this way, the argument for using such terms is to note that the problem is not the words themselves, but rather “the unspecified, oversimplified, or old-fashioned ways they are sometimes used” (458). In fact, as new models and forms of assessment emerged, the language of the old paradigms became used in more nuanced and concerted ways.
However, there are others who call for “a whole new assessment language—a language that is more obviously aligned with the discipline’s humanistic history and evolving theoretical beliefs, a language that is, in a word, ‘ours’” (459). Certainly, in either case, there is a debate about language, and whether language of the old paradigm can evolve to include new and developing debates in assessment or whether we should move to a new set altogether. To illuminate this debate, Moore finds a analogous debate in feminist theory. Feminist theorist have long wrestled with the male discourse of language, particularly in the way such language alters reality (or as Moore writes, “perceptions”). Summarizing, “if they are to succeed in changing perceptions (the unequivocal desired result), attempts at language change must not only be considered locally (i.e., as important to the agenda of a specific group or movement), but more globally (i.e., as important to the larger community of language users). Even a seemingly insignificant linguistic move like dropping a suffix must be evaluated within the broader social context to determine the likelihood of a broad spectrum of speakers recognizing the need for change and then actually adopting the proposed alternative” (462). Put otherwise, the simple substituting of one word for another in a local context may not necessarily translate to (a) widespread use or (b) change in the larger political arena that the alternatives are meant to effect. “New terms adopted by people who are ambivalent about or do not fully appreciate the need for change will not have the desired impact on perception” (468).
To conclude, Moore sees greater effectiveness in form replacement: “process that involves ‘proposing amendments to existing forms, rules, and uses of language so that they serve our purposes better” (471). This, of course, is in opposition to replacing old language wholesale.
The Rhetorical Situation of Writing Assessment: Exigence, Location, and the Making of Knowledge, Kathleen Blake Yancey
Yancey identifies two factors that contribute to the changes in writing assessment (from wave to wave). First, local exigencies—which (in the Bitzerian sense) refers to an imperfection marked by urgency in response. These are located in particular institutions in response to the particular context at those institutions: these provide motives for raising new questions and developing new approaches. Local needs, then, were “identified as prototypical, and information about ways to address them was widely disseminated…such dissemination, of course—through workshops, materials, and this book, among others—helped determine the efficacy of the local ‘solution’ and its potential for adaption to other situations” (477). This, then constitutes the second factor contributing the change in the face of the fourth wave of assessment (characterized by federal involvement of standards). She refers to it as self-created exigence: “rather than the new practice being created in response to a single local exigence, this second means of developing innovative assessment practices relied on a self-created exigence, independent of any specific local need” (477).
Gallagher, in his review of the book, refers to this in to a concern of scaling laterally: local contexts were used as prototype to apply to other local contexts. Such dissemination constitutes a network of shared practices focused on student learning. In such self-created exigence, local needs are addressed, “but also, more importantly, [addresses] the concerns and questions that we in an emerging field shared” (483). Self-designed or self-created exigence is thus more collaborative model.