A short history of testing

It is useful to view testing from a historical perspective for two reasons:

  • It enables us to relate current developments in the field to what has preceded them.
  • Methods of testing rarely die out, but are more often than not adapted or continue to exist in modified forms in different parts of the world. Although various stages or periods in the development of testing can be identified, many of the testing techniques associated with these periods remain in use today.
  • Multiple Choice

    Multiple-choice is probably the best known and even now the most widely used discrete point testing technique. Its objectivity and its ease and speed of scoring remain highly valued within many educational and training systems. In placement testing, the fact that the focus is only on one element at a time means that areas of difficulty can be easily identified.

    Consider the following:

    The advantages of the multiple choice technique were so highly regarded at one time that it almost seemed that it was the only way to test. While many laymen have always been skeptical of what could be achieved through multiple-choice testing, it is only recently that the technique’s limitations have been more generally recognized.

     

    [Hughes, 1989: 60]
    1. What do you feel might be the limitations of the multiple-choice technique?
    2. What guidelines might you offer to prospective writers of multiple-choice tests?

    The Psychometric Structuralist Approach

    This approach is associated with Lado, who, heavily influenced by the prevalent mood of the audiolinguists of the time, proposed that language should be broken up into discrete units for the purposes of testing. This has clear implications for what is to be tested and how the test is to be carried out. Discrete items are constructed to sample a specific component of the target language within a particular skill. Since it is so difficult to break up language in this way, at its worst, this approach produces test items that are extremely trivial. For example, consider the following test item, which asks testees to identify the picture that matches the sentence [or, if preferred, to match a definition to the italicised word].

    She is boiling water

    A - accompanying picture prompt

    B - accompanying picture prompt

    C - accompanying picture prompt

    The same item with the alternatives in the target language might be as follows:

    She is boiling water

    A - making hot
    B - putting in

    C - drinking

    One clear advantage to this approach to testing is, however, that results are easily quantifiable. The problem is that the approach is predicated on the assumption that it is indeed possible to parcel up language neatly, usefully and meaningfully in this way. It is questionable whether the fact that the learner knows discrete elements of the target language actually tells us anything at all about their ability to use it correctly. What is missing in this atomistic approach to language testing is the capacity to synthesize [and automatize] the elements and to use the language in some purposeful fashion. To return to the analogy of learning to drive a vehicle: if a person can steer a car, indicate, and change gear in isolation, it does not mean that they can drive the car [Morrow, 1981].
    idea.png Below is an example of a classic cloze test.First complete the blanks. Then reflect upon the following questions:

    • How easy is it to complete the blanks?
    • Which words are the most difficult to complete?
    • How useful is this as a testing technique?

    Problems with the Psychometric Structuralist Approach

    There are obvious advantages to the testing of discrete linguistic points. The data are easily quantifiable and it is possible cover a wide range of items. However, this approach to testing assumes that it is indeed possible to parcel up language in a very atomistic way. In this sense, many discrete ………… tests can be said to suffer ………… a lack of construct validity.

    Nevertheless, ………… can be argued that discrete point ………… may be useful to investigate aspects ………… a candidate’s linguistic competence and as ………… may form only one part of ………… larger test battery. In response to ………… feeling that discrete point tests were ………… the testing pendulum swung in favor ………… global tests during the 1970s. This ………… what Spolsky [1976] termed the psycholinguistic-sociolinguistic ………… . Oller [1979] claimed that global integrative tests ………… as cloze could measure the ability ………… integrate discrete items of language in ………… way that approximated real language use.

    ” ………… concept of an integrative test was ………… in contrast with the definition of ………… discrete point test. If discrete items ………… language skill apart, integrative tests put ………… back together. Whereas discrete items attempt ………… test knowledge of language one bit ………… a time, integrative tests attempt to ………… a learner’s capacity to use many ………… all at the same time.”

    Oller [1979] quoted in Weir [1990].

    Oller’s ………… to testing is based on his ………… that General Language Proficiency underlies all ………… skills. This is sometimes called the ………… Competence Hypothesis [UCH].

    In practical terms, ………… classic cloze is a test where ………… gaps are regular i.e. every nth ………… and thus fall indiscriminately of all ………… of words [articles, prepositions etc]. A ………… cloze is also possible. This is ………… a certain class of word [e.g. past ............] is deleted.

    There are different possibilities ………… the scoring of cloze tests [exact ............ or any suitable word]. There are ………… significant differences in the reliability between ………… two methods.

    The Psycholinguistic Sociolinguistic Approach

    As a reaction to Lado’s views on the atomistic nature of language, integrative tests of a more global nature were proposed. Oller [1979] argued that activities such as cloze, essay writing and dictation sample a fuller range of skills in naturally occurring contexts and additionally sample basic language processing mechanisms [analysis by synthesis]. The next two activities focus on two other testing techniques widely used during this period.

    Dictation

    idea.png Agree, disagree, or modify the following statements.

    1. Dictated material should only be used that incorporates oral messages typical of those that learners might encounter in the target situation.
    2. Dictation is a very reliable testing technique.
    3. The use of a semantic scoring technique [accepting an alternative word that means the same as the dictated word] as against an exact word system increases the validity of dictation as a testing technique.
    4. As a testing device, dictation measures too many different language features to be effective as a means of assessing any one particular skill.
    5. Dictation is a very good test of overall listening comprehension because it samples a broad range of integrative skills.
    6. Dictation draws on the ability of learners to use all the systems of the language, in conjunction with context and knowledge of the world, to predict what will be said [synthesis of the message] and after the message has been uttered to scrutinize this through the short-term memory to see if it matches what had been predicted.
    7. Dictation samples not only the ability of learners to discriminate phonological and phonemic units but also their ability to make decisions about word boundaries.
    8. Marking of dictations is problematic.
    9. Many dictations are unrealistic if the text used has been previously created to be read silently rather than heard.
    10. The best way to score dictations is to adopt a communicatively oriented marking system in which a mark is given if the testee has understood the substance of the message.

    The C-Test

    idea.png Below is an example of a more recent testing technique based on the cloze principle. Try to complete the gaps.

    The most well-kno…… integrative test…… techniques ar…… cloze, dicta…… and fr…… writing. Oth…… tests o…… this ty…… are th…… cloze eli…… or th…… intrusive clo…… test, an…… also liste…… recall.

    I…… the 1980s howe……, serious quest…… were rai…… about th…… validity o…… integrative meth…… as test…… devices. Oll…… assumption o…… an under…… competency go…… against substa…… evidence i…… favor o…… at lea…… two compet……, production an…… reception.

    Nevert……, integrative tes…… of th…… type fav…… by Oll…… do ha…… a pa…… to pl…… in ma…… types o…… test batt…… . A ve…… recent innov…… test descri…… by Cyr…… Weir i…… his bo…… on Communi…… Language Test…… is th…… C-Test. Th…… was origi…… developed i…… Germany an…… is bas…… on th…… same theor…… rationale a…… cloze wi…… regard t…… testing th…… ability t…… cope wi…… reduced redund…… and pred…… from cont…… .

    In th…… C-Test eve…… second wo…… in a te…… is part…… deleted. A…… exact wo…… scoring i…… adopted. Wher…… in a clo…… test th…… performance o…… native spea…… is hig…… variable, i…… is com…… for nat…… speakers t…… be ab…… to sco…… 100% o…… C-Tests. Howe….., given th…… relatively rec…… appearance o…… the techn…… there i…… little empir…… evidence o…… its val…… .

    Communicative Approaches

    All the techniques considered so far have been concerned with linguistic competence. Although integrative approaches do sample language in context in longer stretches of discourse, they still fail to give any indication how the learners might use the target language in actual or authentic [real-life] situations. In other words, there has been no attempt to ascertain whether the learners are able to translate competence into performance. It may be argued that, without evidence of performance, language testing is essentially a waste of time and energy because it tells the users of the test results nothing useful.

    One of the main features of the communicative approach to language testing has been to emphasize the importance of language in use. Whereas during the psychometric structuralist era a great deal of attention was paid to the testing techniques themselves, the concern is now more with describing testable skills involved in communication [with the what rather than the how]. If we are seeking to measure communicative competence in tests, then we need to refer to the constructs that we are trying to measure. However, agreement on what components should be included in a model of communicative competence is by no means unanimous. An obvious difficulty is that competence, by its very nature, can only be assessed through its manifestation in performance [probably direct rather than indirect].

    Commissioned by the RSA, Morrow [1979] produced Techniques of Evaluation for a Notional Syllabus. This examined the characteristics of communicative interaction that were found lacking in traditional tests. Morrow listed the following areas:

    • Interaction: In the vast majority of instances, language is based on interaction, but this is not sampled in most tests.
    • Unpredictability: The processing of unpredictable data in real time is a vital aspect of using language.
    • Context: Language forms vary in accordance with context [such as physical environment, role, status, attitude, register and formality].
    • Purpose: Learners must recognize why something has been said and respond appropriately.
    • Authenticity: How a learner copes with language [so, for example, finding out how learners cope with simplified texts tell us little about their communicative ability].
    • Behaviour-based: The participants on the basis of behavioral outcomes judge the success or failure of interactions, and, strictly speaking, no other criteria are relevant [Morrow, 1977: 53].

    idea.png Take a test with which you are familiar and evaluate it against Morrow’s criteria?
    Morrow argued that traditional tests, by concentrating on linguistic competence, fail to assess the communicative competence of learners. The emphasis is on use rather than usage. Morrow also highlighted another of the difficulties inherent in all testing, the fact that tests only sample and when we test we are only testing performance in one particular situation.

    The very essence of a communicative approach is to establish particular situations with particular features of context [etc], in order to test the candidate’s ability to use language appropriate in terms of a particular specification. While it is hoped that the procedures discussed will indeed be revealing in those terms, they cannot strictly speaking reveal anything of a candidate’s ability to produce language that is appropriate to a situation different in even one respect from that established.

    [Morrow, 1977: 53]

    A perennial issue in communicative testing is to reconcile this conflict. Testing needs to focus on a representative sample of language in use but also needs to be sufficiently focused on one or more skills so that effective and reliable assessments can be made.

    Building on Canale and Swain’s [1980] model of communicative competence, Bachman’s [1990] model of communicative language ability is probably the best developed in that it accounts for both competence [or knowledge about language] and the ability the implement and manifest that competence in language use [in performance].

    Trait Factors: Language Competencies

    Organizational Competence

    • Grammatical [Lexis, Morphology, Syntax]
    • Textual [Written & Oral Cohesion, Rhetorical Organization]

    Pragmatic Competence

    • Illocutionary [Language Functions]
    • Sociolinguistic [Register, Dialect & Lectal Variations, Figurative Language, Cultural Allusion, Naturalness]

    Strategic Competence

    • Assessment
    • Planning
    • Execution

    Skill Factors

    • Psychophysiological Mechanisms
    • Mode [Receptive or Productive]
    • Channel [Oral, Aural or Visual]

    Method Factors

    • Language Use
    • Situation
    • Amount of Context
    • Distribution of Information
    • Response Mode

    Bachman’s model, whilst theoretically helpful, may not be immediately realizable for many testing practitioners. Weir’s features of a communicative test are much more workable in practice. These provide a checklist from which a test developer may work.

    A communicative test should have the following characteristics.

    • It should be interactive.
    • It should be direct in nature with tasks reflecting realistic discourse processing activities.
    • Texts and tasks should be relevant to the intended situation.
    • Ability should be sampled within meaningful and developing contexts.
    • The test should be based on an a priori specification, so what is to be tested and how it is to be tested should be laid down at the test design stage.

    [Adapted from Weir, 1993]

    Central to any discussion of communicative language testing is the idea that communication should not be for its own sake, but should be linked closely [or as closely as possible] to the communicative situations in which the target learners find themselves. Designing tests based on need analysis is therefore an ideal to work towards. ESP [and EAP] tests have been particularly influential in this respect. The relative narrowness [or specificity] of these fields removes much of the unpredictability of the content, the media, and the style of the target language to be taught and tested. It is perhaps especially obvious [and easier to implement] in an area such as Business English. The OIBEC [Oxford International Business English Certificate] and the UCLES BEC [Business English Certificate] are essentially task-based examinations aimed at business people who need English in a specific range of situations [and are by and large fairly direct tests of performance, or at least focused indirect tests].

    Related posts

    Tags: , , , , , , , , , ,