Two questıons to ponder:
1. The effect of a test on teaching or learning is known as washback [or backwash in testing in the USA]. [Alderson and Wall, 1993: 17]
The backwash hypothesis seems to assume that teachers and learners do things they would not necessarily otherwise do because of the test.
In their article [Does Backwash Exist], Alderson and Wall question whether testing does have as powerful an effect on teaching as has previously been assumed.
What is your view on this?
2. Baker  has a section in his book entitled The Pass Mark Problem: Is Norm-Referencing Wicked?
When you set a class a test, how do you decide what the pass mark is going to be?
Summative Versus Formative Testing
Traditionally, there has always been some kind of test at the end of a course. This is often almost a ritual. This type of testing, where a final performance is evaluated, is termed summative testing. Decisions may be made on the basis of the results of such tests that effect further study or educational progress. The focus is backward looking or retrospective over items previously learnt [or taught].
Formative assessment feeds into the teaching or learning program, providing information about the learning process, enabling teachers to modify programs and learners to modify strategies. Results are perceived as the starting point for action rather than an end product, the opening of a new stage [or cycle] in the learning process rather than the closure of a previous stage. Formative tests may be used to establish what has been learnt already and what needs to be learnt, and how best to bridge the gap between the two.
Norm Referenced Versus Criterion Referenced Testing
Traditionally, language has been tested using norm-referenced techniques. Learners are placed on some sort of scale and compared with one another, with the expectation that most of the scores will bunch in the middle, with relatively few at the top or bottom of the range [and increasingly fewer towards the extremes of the range].
With a criterion-referenced test, the distribution of scores should be very different: scores should peak very steeply at and just after the criterion score for a satisfactory performance [which, if it is not 100%, is a relatively high percentage]. A driving test is an example of a 100% criterion referenced performance test. Either you can drive a car safely and you pass, or you cannot and you fail. It is not very meaningful to say, for example, that a candidate scored 69 per cent on his driving test. With a more communicative approach to language testing, a criterion-referenced approach seems to make more sense. Language syllabuses are increasingly situationalized and the language specified in functional terms. For example, here is an extract from a NEAB GCSE language syllabus:
Candidates should be able to:
- Attract the waiter’s attention
- Say how many are in the group
- Order a meal
- Ask for a particular fixed price menu
- Ask for a table [for a certain number]
The emphasis here is moving from what the candidate knows [or doesn't know] to what the candidate can do [or cannot do].
You could also look at the Council of Europe ALTE Waystage Level, Threshold Level and Vantage Level for can do statements in benchmarked [criterion referenced] descriptions of language.
An example of criterion referencing is UCLES IELTS. This is a proficiency and placement test of English for non-native speakers who wish to pursue their studies through the medium of English. Rather than looking at what the candidtes know, IELTS gives information about how well students can cope with studying their specialist subject in English. Scores are assigned for reading, writing, listening and speaking to provide an overall profile or description of the candidate using nine bands [or levels]. Each band has a descriptive statement.
|Band 9||Expert UserHas fully operational command of the language: appropriate, accurate and fluent with complete understanding.|
|Band 8||Very Good UserHas fully operational command of the language with only occasional unsystematic inaccuracies and inappropriacies. Misunderstandings may occur in unfamiliar situations. Handles complex detailed argumentation well.|
|Band 7||Good UserHas operational command of the language despite some inaccuracies, inappropriacies and misunderstandings in some situations. Generally handles complex language well and understands detailed reasoning.|
|Band 6||Competent UserHas generally effective command of the language despite some inaccuracies, inappropriacies and misunderstandings. Can use and understand fairly complex language, particularly in familiar situations.|
|Band 5||Modest UserHas partial command of the language, coping with overall meaning in most situations, though is likely to make many mistakes. Should be able to handle basic communication in own field.|
|Band 4||Limited UserBasic competence is limited to familiar situations. Has frequent problems in understanding and expression. Is not able to use complex language.|
|Band 3||Extremely Limited UserConveys and understands only general meaning in very familiar situations. Frequent breakdowns in communication occur.|
|Band 2||Intermittent UserNo real communication is possible except for the most basic information using isolated words or short formulae in familiar situations and to meet immediate needs. Has great difficulty understanding spoken and written English.|
|Band 1||Non-UserEssentially has no ability to use the language beyond possibly a few isolated words.|
|Band 0||Did not attempt the test. No assessable information.|