Writing tests

Easier said than done. A test writer has very few friends and often finds that it is impossible to do the job to everyone’s satisfaction.

But it is clear that it is imperative for test writers to be very clear in their own minds what it is that they want to test and what the purpose of the test is. Too often, teachers decide that they will write a test for their learners, so they sit down and begin writing it, stopping when they feel the test is sufficiently long. This process may very well result in a test that does not reflect what should be tested. The starting point when writing test specifications is, therefore, for teachers to know what it is that they are seeking to test. Next, the test writer needs to look at how the skills are to be tested, and what weighting should be given to the skills, and then match these to the time available for the rest. The blueprint produced at this stage may be quite a lengthy document.

A test specification is the official [and authoritative] statement about what the test tests and how it tests it [Alderson et al, 19995: 9]. If public examination bodies did not issue test specifications, it would be difficult for teachers to prepare learners for such tests. But it is also important for teachers to know how to draw up their own test specifications. There is no single correct way to write test specifications. Hughes [1989: 52 to 54] outlines on approach. The example below is based on Alderson et al [1995: 11 to 14]. It presents a series of questions that test writers need to answer when drawing up their specifications.

  1. What is the purpose of the test? Is it a progress test, an achievement test or any other?
  2. What sort of learner will be taking the test? There are some important variables that must be taken into account, including: age, gender, nationality, level of proficiency in the target language, attitude towards the test, cultural and educational background, likely levels of world knowledge and background.
  3. How long will the test be, and will it be divided into sections or even into separate papers?
  4. What language skills [such as reading] should be tested? Are the micro-skills or enabling sub-skills [such as scanning or skimming] specified?
  5. What text types should be chosen? Should these be authentic? How difficult or long should they be? What language functions should be included in the text [such as definition, persuasion, or summarizing]?
  6. What language elements should be tested? Is there a list of grammatical structures to be included? Are language functions or notions specified?
  7. How many items are required for each section? How are these weighted [with equal marks for each item or more marks for more demanding items]?
  8. What test methods and item types are to be used [for example, multiple-choice, gap-filling, matching, true-false, transformation, essay writing, role-play, and so on]?
  9. What rubrics will the testees be given?
  10. What is the marking scheme? What criteria will be used for marking? How important are accuracy, appropriacy, spelling, length of utterance, or script?

Sample Test Specification

This is an end-of-unit achievement test. The target testees are fourteen-year old male learners of English [EFL] in agricultural schools in Egypt. The testees are accustomed to taking regular classroom tests. Their level of English is elementary to lower intermediate. There is a significant proportion of false beginners in the test population. The test will be twenty minutes long and divided into three sections as follows.

Section One

Vocabulary Items [Labeling five parts of machinery taught in Unit 3]

One mark will be awarded for each correct answer. Answers that are recognizable but spelt incorrectly will be awarded half a mark [5 marks].

Section Two

Grammar

  • Past tense of irregular verbs in the affirmative.
  • Dialog completion with past tense of verbs.
  • The base form of the verbs will be supplied.
  • There will be five gaps.

One mark will be awarded for each correct answer. Incorrectly spelt answers will receive no marks [5 marks]

Section Three

Reading for detailed information

  • A ten-line text giving instructions on feeding-information for cattle will be provided.
  • The text will be at a similar level of difficulty to the text on feeding-information for chickens provided in Unit 3.

The text will be followed by five True/False questions [5marks].

All rubrics will be given in L1.

It is important in writing test specifications to ensure that the test reflects all the areas you want to include in suitable proportions and that there is no bias towards items that are easier to write or to test material that happens to be available [Harrison 1983: 11]. For further comprehensive guidance on writing test specifications, see Alderson et al [1995: 11 to 14].

idea.pngWrite a specification for a test for a teaching group or level with which you are familiar.Describe the target group in terms of age, level, language and educational experience, and any other factor that help to justify your specification.

Test Moderation

After writing the test, the next stage is to examine whether the test meets the needs for which it was specified. Weir [1993] lists the following features of a test that should be considered at the moderation stage.

  1. Level of Difficulty
    1. Is the test at an appropriate level of difficulty?
    2. Are the easiest items placed first?
    3. Does the level of difficulty rise progressively?
  2. Discrimination
    1. Will the test discriminate adequately between testees with different levels of achievement? [There are certain tests where this is not necessarily applicable]
  3. Appropriateness of Sample
    1. Does the test sample and assess the full range of appropriate skills and abilities as defined by the objectives of the syllabus or course text units? The problem here is that when writing a progress or achievement test it is up to the teacher to select what is criterial: this often entails very subjective judgments.
  4. Overlap
    1. Is there excessive overlap in the structures or skills tested?
      • Are we testing the same thing in too many different ways?
      • Any overlap in areas of testing should be intentional.
  5. Layout
    1. Is this clear and unambiguous?
    2. Are rubrics explicit and user-friendly?
  6. Marking
    1. Will an answer guide be needed in order to standardize marking?Weir [1993: 26] has a list of questions to be resolved at this stage:
      • Does the marking scheme anticipate responses of a kind that candidates are likely to make? For example, what variations in spelling might be accepted in responses to listening and reading tests? Does the marking scheme allow for possible alternative answers?
      • Does the marking scheme specify performance criteria to reduce as far as possible the element of subjective judgment that the tester must exercise in evaluating responses of testees, especially in production tasks?
      • Are the marks allocated to each task commensurate with the demands that the task makes on the testees? In listening tasks, should all parts be weighted equally? In writing tasks, how many marks should be given for copying or information transfer tasks? How many to gap filling tasks? And so on. Does the marking scheme indicate clearly the marks to be awarded for different parts of a question or the relative weighting of criteria that might be applicable?
      • Has the marking scheme minimized the tester’s need to compute the final mark for a testee?
      • Are the abilities being rewarded those that the tasks are designed to assess? For example, if testees have to write down their answers in a listening test and we subsequently deduct marks for written errors, has writing then become an element of the task?
      • Can the writing schemes be easily interpreted by a number of different testers in a way that will ensure all score scripts to the same standard? Are the criteria for marking essays sufficiently explicit to avoid differences in interpretation? Will there be multiple marking of scripts?

The stages in the test design process [Weir, 1993: 26] can be summarized:

 Identify Skill[s]


Draw up Test Specification [Blueprint of Test]


Moderation of Test


Produce Final Version of Test

 

None of this covers the trialling and pre-testing of items that many testers would include in the moderation process.

Some Final Thoughts

On Self-Assessment

Ultimately, the learners are in charge of their own language learning [they are responsible and accountable for it, through testing and evaluation of performance, and nobody else can do the learning for them]. Consequently, it makes sense to involve them in the assessment procedure, particularly if there is any associated curricular momentum towards learner self-reliance [or autonomy, independence, or cooperative & collaborative learning or whatever]. The skills and underlying philosophy of self-assessment have to be taught, otherwise [Carroll, 1985: 135] the outcomes may be farcical. Nevertheless, if properly set up, self-assessment schemes can be useful both as instruments of measurement and motivation.

Group Assessment

Since communication involves more than one person, some examinations are now assessing communication amongst groups of learners. Groups of learners interact while an evaluator acts as an external observer rather than as a participant or interlocutor. For example, UCLES FCE could be conducted in the way. An element of group assessment clearly has a role [and adds to test validity] in any system of instruction predicated on cooperative learning and communicative language learning approaches. This kind of testing can be seen to be valid in that it tests learners in as close a direct performance situation as possible to real-life communication. There are some problems with variables of personality, in situations where less confident or more retiring testees might participate only peripherally and score below their true potential. But there are ways to counter this.

In conclusion, consider the view that testing has been permitted to follow some way behind contemporary ELT methodology for too long. There is little use encouraging learners to use the target language actively and meaningfully in the classroom if our tests for placement and achievement are pitched at a different level and approach to language. We owe it to our learners to test as we teach.

Related posts

Tags: