|
|
September/October
2005 |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Using Performance-Based Assessment in the ELD ClassroomBy Dr. Natalie A. Kuhlman, ELL Outlook™ Contributing WriterIn my last article I focused on issues about formal language proficiency assessments required by the No Child Left Behind Act and state mandates, and what teachers could learn from them. I suggested that only performance-based assessments would tell teachers what students know and can do. In this article, I will focus on performance-based assessment, what it is and isn't, some examples of performance activities and how to observe and record them, and how performance-based assessment can help educators and publishers to better understand and educate our English Language Learners (ELLs). What is Performance-Based Assessment?O'Malley and Valdez Pierce (1996, p. 4) say that performance-based assessment "consists of any form of assessment in which the student constructs a response orally or in writing." They go on to explain that it can be formal or informal, an observation, or an assigned task. Herman, Aschbacher, and Winters (1992) go further when they state that performance-based assessment requires students to "accomplish complex and significant tasks, while bringing to bear prior knowledge, recent learning, and relevant skills to solve realistic or authentic problems" (in O'Malley & Valdez Pierce, 1996, pp. 4-5). O'Malley and Valdez Pierce (p. 5) list six characteristics of performance based assessment that they adapted from Herman et al.: Even essay exams, which have been added to the Graduate Record Exam and to the SAT for college entrance, may not be performance based, even though the students are expected to construct responses to a prompt, usually using higher order thinking skills. These essays may be on structured topics of little interest to the students, or not based on their prior knowledge. Hence it becomes a test of knowledge of the prompt and its expectations, rather than what students know and can do. Criterion-referenced assessments are more likely to be performance based or authentic, but they are not always so. In criterion-referenced tests, "scores are compared with a criterion of achievement" (Chase, 1999, p. 12). In other words, all students can get a perfect score (an "A") if they meet the criterion. The score may be based on "mastery" as demonstrated on multiple-choice tests, in which case the assessment would not be considered performance based, or it may be based on assignments such as turning in 10 book reports a semester, in which case the assessment is based on performance (process and product; authenticity). Implementing Performance-Based AssessmentSince performance assessment requires students to be doing something, this type of assessment is based on projects and other activities rather than on tests. Virtually anything that the students are doing in the classroom can be used, such as country maps, science projects, math problems, interviewing other students, preparing a research report–anything that includes a language component. This allows for depth rather than breadth of knowledge, as well as authentic use of language for real things that students need to know and be able to do. When assessing language proficiency activities, rubrics need to focus not just on the content and requirements of the project or activity, but also on how language is used. Stiggins (1994) identifies four types of reporting procedures for performance based assessment: checklists, rating scales, anecdotal records, and mental records. Checklists are quick and easy but don't necessarily provide much information. Rating scales, such as rubrics, might be as simple as a checklist, or as detailed as the examples given below. Anecdotal records are more time consuming, but provide the context for both the observed activity and for details that rating scales and checklists don't allow for. Mental records are the least reliable, since they depend on remembering all of the things that are included in written form in the first three types. Rubrics Used To Assess PerformanceRubrics essentially are descriptive or annotated rating scales and are the most common way to assess performance. When criterion-referenced assessments are used, some sort of rubric is usually applied to determine if the criterion is met, rather than simply counting the number of "right" answers. Rubrics can take a variety of forms, but are usually spread over a 3- to 5-point range. For example, a rubric that looks at the quality of a product might use "excellent; meets requirements; poor" on a 3-point scale. However, each descriptor ("excellent," etc.) must be carefully defined so that the teacher can easily place students' work on the scale/rubric. An "excellent" essay might have "well-developed and thought-out topic or theme, carefully constructed paragraphs with good detail, and nearly error-free mechanics," while a "poor" essay might have "no apparent theme or organization, disjointed paragraphs with little or no detail, and multiple mechanical errors in grammar and spelling." Rubrics can be specific or general, but the more specific they are, the easier it is to apply them. A specific rubric for an ELL might focus only on the sentence structure of an essay. In this case, an "excellent" might reflect just one or two grammatical errors, "adequate" might reflect more errors, and "poor" would reflect many errors, with each point on the scale being specified with descriptors. Another rubric might focus only on the organization of the essay, assessing how well it shows a beginning, middle, and end and has logically organized paragraphs. This type of rubric could be particularly important for ELLs who are literate in their first language when that language has a different organizational structure for essays than that used in U.S. classrooms. Points would be awarded for how well the U.S. model is followed, with carefully written descriptors of what a "1," "2," or "3," for example, would receive. Another specific rubric might only look at the content of the essay, such as how well an argument is developed, with descriptors for very well developed, adequately developed, and poorly developed. Keep in mind, however, that these rubrics are subjective, and raters need practice until consistency is obtained. Letter grades (A, B, C, etc.) are essentially a rubric when what is expected for each grade is carefully defined and both the student and the teacher know exactly what the difference is between the grades. Unfortunately, that is not always the case, and students may have to guess what the teacher expects in order to receive an A grade. Also, the teacher may make strictly subjective decisions, not based on any carefully constructed criteria at all, about what grade to give a student. Rubrics make the system fair to both the student and to the teacher. Another kind of rubric is used to show standards-based language proficiency. In this case, the rubric can provide baseline data when a student enters an ELD program and then show the progress the student makes during the school year. The rubric should be constructed in such a way that it can be adapted for use with multiple activities, rather than just one specific activity. Overall assessment thus becomes consistent through the use of the same rubric. Case Study: Language Observation Task System (LOTS)This case study looks at the Language Observation Task System (LOTS) that I developed for San Diego County Office of Education (Kuhlman, 2002). LOTS, based on the California English Language Development Standards (California Department of Education, 1999), is comprised of a series of tasks that can be applied to whatever curriculum is being used for ELD. It provides tasks in seven categories of language (listening/speaking; word analysis; systematic vocabulary; reading comprehension; literary response and analysis; writing strategies; and writing conventions), which correspond to the categories in the California ELD Standards. These are spread across the five proficiency levels indicated in the examples below and across four grade spans of K-2, 3-5, 6-8, and 9-12. The first example is for a student in the K-2 grade span and shows growth from the beginning level to the advanced level for a social language task in the listening and speaking category:
While it appears to be an easy task to place students using this rubric, think about what each level means. "Common social greetings" implies things like "Good morning" and "How are you?", but how many of these greetings should be evident before the teacher decides the student is moving towards the early intermediate level ("orally communicates basic needs"). And what if the student demonstrates speech patterns at both the beginning and early intermediate levels? Another example from LOTS, this time for reading comprehension for the 6-8 grade span, uses the factual information task and raises other issues:
First of all, no matter how carefully constructed the rubric is, it is still a teacher's subjective opinion as to where the student falls on any one task. Let's look at the intermediate example: "Answers factual comprehension questions using some details." How many are "some" details? How detailed is detailed? Is adding one adjective sufficient, or does the student need to know the day, time, and year? This requires teacher judgment. When using such rubrics, it is best to have several teachers sit down and discuss what is appropriate for each proficiency level. How many times does the behavior need to be observed in one time frame? Is twice enough? Should it be observed three or four times? Teachers need to make common decisions about this. These rubrics, however, can be used to determine how much a student is growing overall in language. The proficiency levels can be translated into the numbers 1-5 (shown in parentheses above), and averages can be taken to show growth. Scores for the same task, such as giving factual information, can also be averaged across observations of different activities. For example, a student in grade 7 who one day is able to only answer factual questions about a story with a few words would be identified as a beginner, or 1. But perhaps the student didn't know much about the content of the story or wasn't engaged with it. Later the same day, the teacher sees that the student can use simple sentences to answer factual questions to respond to a different activity, and so would place him or her as a 2, or early intermediate. By averaging the two observations, the teacher can see that the student is a 1.5, or between beginning and early intermediate in answering factual questions on that day. The teacher can also note that the performance may have been affected by the student's engagement with the task rather than by language proficiency (LOTS scoring allows for anecdotal notes). It is important, as noted earlier, not to make decisions based on only one observation. The benefit of such rubrics is that teachers can see growth. After a few months, the teacher might see that the student in the example above has moved between 2 (early intermediate) and 3 (intermediate) in his or her ability to respond to factual questions in several activities, and so might have an average of 2.5. In this way, the teacher has a systematic and documented way to show the growth of the student, which can be shared with the student, parents, and other school personnel and can be used to help inform instruction. But how well prepared are teachers to observe students? What does it mean to be an observer of student action? How do you learn to do it? It means paying close attention to what students do, and that takes practice and feedback. Many teachers will say that they don't have the time. And how much time does it take to observe every student several times? In secondary schools, where class sizes are often quite large, it can be a real problem. But as with anything that is new, you start small. You pick out just a few students, maybe two or three that you are particularly concerned about and you observe them carefully, using a checklist or rubric. At the same time, you are also learning the system. Once you become comfortable with it, you start using it with more students. If you plan to observe a class of 30, then observe just 5 students a week, and in six weeks (the quarter grading period) you will have observed everybody for one week. ConclusionIn this article I have reviewed what performance-based assessment includes and what it doesn't include. I've focused on introducing rubrics, the most common way to document performance, and have given some examples of how they can be used to show the growth and development in language of English language learners at one point in time and across time. I've also reviewed a few other ways of documenting performance and provided some guidance in what it takes to be a good observer. This type of knowledge will help teachers better meet the needs of their students and identify critical areas for instruction. Performance-based assessment is only useful if the teacher (and student) pay attention to what students know and can do. It is easier to make guesses about that, or to rely on norm-referenced, decontextualized measures; but ultimately, if we want our ELLs to succeed, we as teachers need to know how they are doing. We might have the best materials available, but we still won't be able to document what students know and can do unless we observe just that: what they know and can do. References
California Department of Education (CDE) (1999). California English language development standards.
http://www.cde.ca.gov/re/pn/fd/englangart-stnd-pdf.asp
Chase, C. (1999). Contemporary assessment for educators. New York: Longman
Herman, J. L., Aschbacher, P. R., & Winters, L. (1992). A practical guide to alternative assessment.
Alexandria, VA:Association for Supervision and Curriculum Development.
Kuhlman, N. (2002). Language observation task system (LOTS). San Diego, CA: San Diego
County Office of Education.
O'Malley, J. M., & Valdez Pierce, L. (1996). Authentic assessment for English language learners: Practical
approaches for teachers. Reading, MA: Addison-Wesley.
Stiggins, R. (1994). Student-centered classroom assessment. Upper Saddle River, NJ: Merrill.
If you have any comments about this article or questions for for the author, please send them to: alex@coursecrafters.com. |
|||||||||||||||||||||
| Copyright © 2005 Course Crafters, Inc.® All rights reserved. |
|||||||||||||||||||||