Evaluation is an integral part of the learning process. Whenever learning takes place, the result is a change in behaviour. Evaluation is concerned with defining, observing and measuring the new behaviour. Once instruction has begun, some sort of evaluation is essential to determine both what and how well the student is learning, as well as how effective the course of instruction has been45. Evaluation for these purposes may be formative, ie, it is used during a course of instruction, or summative, when it is used at the completion of a course of instruction46.
Your evaluation may consist simply of observations of the student's performance, or it may be accomplished by more comprehensive, systematic and objective means, by oral questioning, administering written tests, or performance testing47.
Flight instructors have a moral obligation to provide guidance and restraint with respect to the operations of their students. This applies to instructor's observations of unsafe or inept operations by pilots who are not aware they are being observed, as well as pilots who have requested an instructor's evaluation or guidance. In the case of an observed unsatisfactory performance, it is your responsibility to try to correct it by the most reasonable and effective means. If unable to correct the situation by personal contact and good advice, you should report the matter to their supervisor.
Subjective written records of observed student performances are known as anecdotal records. Generally, a system is used to record student behaviour that cannot be evaluated by other means, for example, respect for laws, reaction to authority, persistence or physical skill. The main advantage of these records is that they depict behaviour in natural situations. For example, a student may show good knowledge of VFR minima but violate them in everyday situations. These records often form the basis of a written debrief, and they can be of considerable help to the instructor who is to fly with a previously unknown student.
You should record sufficient information about the situations to make the behaviour understood, for example, "entered cloud while concentrating on instruments in the turn". Just enough detail should be included to make the description meaningful and accurate. The description should be as objective as possible and it should record positive as well as constructive occurrences.
A more structured form of anecdotal record is the rating scale. Rating scales provide a systematic procedure for reporting your observations. Its value depends on careful preparation and appropriate use. For example, it should measure the desired learning outcome, and it should be used when sufficient opportunity exists to make the necessary observations.
A rating scale, as used in the CAA Flight Test Standard Guides (FTSG), in the measurement of Managing Critical Incidents is given below as an example.
|Aircraft performance and operating requirements
|Rating up to 75
|Not yet competent
|(1) Uses inappropriate performance charts, tables or data
|(1) Uses appropriate performance charts, tables and data
|(1) Uses all appropriate performance charts, tables and data
|(2) Uses inappropriate conditions for the calculation of take-off or landing distance, such that safety would be compromised
|(2) Uses the appropriate conditions to calculate the take-off and landing distance for a private operation
|(2) Uses the appropriate conditions to accurately and quickly calculate the take-off and landing distance for a private operation
|(3) Cannot complete the calculations required in (1) and (2) within one hour
|(3) Completes the calculations required in (1) and (2) within one hour
|(3) Completes the calculations required in (1) and (2) within 30 minutes
|(4) Fails to ensure sufficient runway length is available for take-off or landing
|(4) Ensures sufficient runway length is available for take-off and landing through local knowledge
|(4) Ensures sufficient runway length is available for take-off and landing by correctly comparing distance required to distance available
|(5) Is unable to explain or apply the group rating system
|(5) Explains the use of the group rating system
|(5) Explains the use of the group rating system and applies its principles (as applicable) in flight
|(6) Demonstrates inadequate knowledge of factors affecting aircraft performance in winter (ice) or summer (density altitude)
|(6) Demonstrates a satisfactory knowledge of seasonal factors affecting aircraft performance
|(6) Demonstrates a thorough knowledge of all seasonal factors affecting aircraft performance
Oral questioning has a wide range of uses in flight instruction. Questions that require the recall from memory of a fact usually start with who, what, when or where. Questions that require the student to combine knowledge of facts with the ability to analyse a situation, solve problems or arrive at conclusions usually start with why or how. Your instructional techniques course will provide guidance on ensuring that questions be asked at all levels of Bloom's taxonomy as is appropriate to the lesson.
Your use of oral questioning can have a number of desirable results:
Effective oral questioning requires preparation. You, therefore, should write pertinent questions in advance. The recommended method is to place them in the lesson plan. These prepared questions serve as a framework and, as the lesson progresses, should be supplemented by any impromptu questions you consider appropriate. To be effective, these questions must be adapted to the past experience and present ability level of the student.
Effective questions centre on only one idea. One idea – one question. A single question should be limited to using who, what, when, where, how or why – not a combination.
An effective question should be brief and concise. Enough concrete words must be used to establish the conditions or situation exactly, so that instructor and student have similar mental pictures. The student's response should be determined by their knowledge of the subject – not by their ability to understand the question.
To be effective, questions must apply to the subject of instruction48. Unless the question pertains strictly to the particular training being conducted, it serves only to confuse the student and divert their thoughts to an unrelated subject. Any part of a question that the student could disregard and still respond correctly should probably be removed.
Usually an effective question has only one correct answer, although in a problem solving question it may be expressed in a variety of ways.
Effective questions present a challenge to the student. Questions of suitable difficulty serve to stimulate learning. The difficulty of the question should be appropriate to the student's level of training.
Asking "Do you understand?" or "Have you any questions?" have no place in effective questioning. Assurance by the student that they do understand, or that they have no questions, provides no evidence of their comprehension.
Catch-em-out questions should be avoided, as the student will soon develop the feeling that they are engaged in a battle of wits with you. Other types of questions to avoid are:
"What is the first action you should take if a conventional gear aircraft with a weak right brake is swerving left in a right crosswind during a full-flap power-on wheel-landing?"
"What do you do before starting the engine?"
"In an emergency, should the crew activate the escape slide or control the passengers?"
"In reading the altimeter – you know you set a sensitive altimeter for the nearest station pressure – if you take temperature into account, as when flying from a cold air mass through a warm front, what precaution should you take when in a mountainous area?"
The teaching process is an orderly procedure of building one block of learning on another, and the introduction of unrelated facts and thoughts will only obscure this process and retard the student's progress.
The answering of a student's questions must conform to certain considerations if it is to be an effective teaching method.
The question must be clearly understood by you before an answer is attempted. You should display interest in the student's question and frame an answer as direct and accurate as possible. For example, if the student asks "What is drag?" an appropriate answer would be, "Drag is the resistance experienced by a body in motion through a fluid".
After you complete a response, it should be determined whether or not the student is completely satisfied with the answer. In the example given, this may lead to a discussion on the factors that affect drag. Organising the answers in this way conforms with the recommended teaching method for the development of a subject, in this case from simple to complex.
Sometimes it may be unwise to introduce the more complicated or advanced considerations necessary to completely answer a student's question, for example, the drag formula. In this case, you should carefully explain to the student that the question was good and pertinent but that the answer would, at this time, unnecessarily complicate the learning task at hand. This is particularly true of the pre-flight brief where time does not permit irrelevant or in-depth discussions. If it will not be answered later in the normal course of instruction, you should advise the student to ask the question again later.
On rare occasions, a student asks a question which you cannot answer; you should freely admit not knowing the answer, but should get the answer. If practicable, you could help the student look it up in available references.
Instructors should avoid using the one-word answers "Yes" or "No" if the greatest instructional benefit is to be gained from the student's question.
As evaluation devices, written tests are only as good as the knowledge and proficiency of the test writer. The following are some of the basic concepts of written test design.
Many publications are available on test administration, test scoring and test analysis, so these topics are not covered in this chapter47.
If a test is to be effective, it must have certain characteristics; the most important of these are validity, reliability and useability47.
Validity is the most important feature of any written test; it is the ability of a test to measure what it is supposed to measure. The results of a written test are said to be valid only when they are interpreted in relation to what the test was supposed to measure. For example, if instruction has centred on the term stalling angle, and the test question refers to the critical angle, the test result would be invalid in relation to the stalling angle, but it may have validity if interpreted in relation to a broader knowledge of stalling.
Reliability refers to the consistency of results obtained from a test or any other measuring device. A metal rule that expands and contracts with temperature changes will not give reliable results. By the same token using a device that is highly reliable does not necessarily mean the results will be valid. For example, an altimeter incorrectly calibrated will consistently measure altitude above the wrong datum; the result is reliable, but wrong (not valid).
Useability is a measure of the test's practicality irrespective of other qualities. Tests should be easily administered and scored, produce results that can be accurately interpreted, and be economical in time and cost.
In flight instruction the essay-type question is rarely used and will not be discussed here. Those most commonly used are:
The short-answer question requires the student to supply their own answer. The shortest possible answer will be in response to the true/false question, and the longest answer extending to perhaps half a page. Other than the true/false type, these questions can be difficult to mark. For example, in the simplest one-word answer type, "The aircraft stalls at the _____ angle" the answer could be stalling, critical or same, and you are sure to get someone who answers with 15-degree.
The correctness of the answer is subjective (decreasing reliability as well as validity depending on how the answer is interpreted). Therefore, the same test graded by different instructors may result in different scores. The more latitude the student has in the answer the more difficult it becomes to assess their answer. While the true/false question eliminates this problem, it also provides the highest probability of guessing the answer. For these reasons the multiple-choice or matching type question is generally favoured.
When properly devised and constructed, the multiple-choice type offers several unique advantages that make it more widely used and versatile than either the matching or true/false question.
Multiple-choice questions are highly objective; that is, the results of such a test would be graded the same regardless of the student taking the test or the person marking it (reliability). This makes it possible to directly compare the performance of students within the same class or in different classes, students under one instructor with those under another, and student accomplishment at one stage of instruction with that at later stages (validity). This type of test question permits easy marking and allows you to examine more areas of knowledge, over the same period, than could be done by requiring the student to supply written responses (useability).
Three major difficulties are encountered in the construction of multiple-choice test questions:
The stem may take several forms:
The student may be asked to select the one choice that is the correct answer, the one choice that is the incorrect answer, or the one choice that is the most correct answer.
These three methods of answering, combined with the three question forms, give you flexibility in preparing multiple-choice questions. However, experience has shown that the direct question form is the most successful for instructors inexperienced in the writing of multiple-choice questions.
This form is generally better than the incomplete stem in that it is simpler and more natural.
Which gas forms the largest part of the atmosphere?
When using this form, care must be taken to avoid ambiguity, giving clues and using unnecessarily complex or unrelated alternatives.
The atmosphere is a mixture of gases, the largest part being:
Useful for measuring ability to read instruments or identify objects.
Name and label the four forces acting on the aircraft in straight-and-level flight.
These are very poor alternatives and should not be used. This is why no example is given here.
These should be avoided as the negative raises the difficulty of the question. If they must be used, the negative should be emphasised.
Which of the following is NOT used to control an aeroplane in flight?
This type is useful if a limited number of associations are to be made. Matching questions serve better if a large number of related associations are to be made.
Which manoeuvre does NOT belong with the others?
These are useful for determining knowledge of basic rules or facts.
The difference between magnetic north and true north is known as:
When multiple-choice questions are used, four or five alternatives are generally provided. It is usually difficult to construct more than five plausible responses. If there are less than four alternatives, the probability of guessing the correct response is considerably increased. Recent studies suggest three responses carefully constructed are sufficient to determine knowledge.
Make each question independent of every other question in the test. The wording of a question in the test should not provide the correct answer to any other question. For example, avoid pairs of questions like this: Q1. If an aircraft has a rate of climb of 500 feet per minute, what amount of altitude will be gained in one minute? Q2. Define Rate of Climb. Another bad practice is to have the answer to any question dependent on knowing the correct answer to any other question. For example, this is bad: Q1. If an aircraft weighs 1600 lb, how much lift will be required for straight-and-level flight? Q2. If the lift/drag ratio is 10:1 how much drag is produced in Q1?
Design questions that call for essential knowledge rather than abstract background knowledge or unimportant facts.
State the question in the working language of the student. A common criticism of written tests is the emphasis on the reading ability of the student. If language comprehension is not the objective of the test, failing to use appropriate language will decrease validity.
Include sketches, diagrams or pictures when they can present a situation more vividly than words. They add interest and avoid reading difficulties with technical language.
Avoid the negative word or phrase. A student who is pressed for time may identify the wrong response simply because the negative form was overlooked.
Double negatives should be avoided because invariably they cause confusion. If a word such as "not" or "false" appears in the stem, avoid using another negative in the alternatives.
Catch questions, unimportant details and leading questions should be avoided as they do not contribute to effective evaluation. Moreover, they tend to antagonise the student.
Research the question stems and appropriate verbs to use to frame questions at the varying levels of Bloom’s Taxonomy.
The stem should clearly present the problem or idea. The function of the stem is to set the stage for the alternatives that follow.
The stem should be worded in such a way that it does not give away the correct response.
Put everything that pertains to all alternatives in the stem. This helps to avoid repetitious alternatives.
Generally avoid using "a" or "an" at the end of the stem. These may give away the correct choice. Every alternative should fit grammatically with the stem.
Incorrectness should not be the only criterion for the distracting alternatives. A common misconception or a statement that is itself true, but does not satisfy the requirements of the problem, may also be used.
Keep all alternatives of approximately equal length.
When alternatives consist of numbers they should be listed in ascending order.
The matching type question is particularly good for measuring the student's ability to recognise relationships. As this question type is a collection of multiple-choice questions it samples more student abilities in a given period of time. Samples of two different forms of this type follow.
Equal columns: When using this form, always provide for some questions in the response column to be used more than once, or not at all, to preclude guessing by elimination. For example:
|a. Never exceed speed
|b. Best angle of climb speed
|c. Red radial line on the airspeed indicator
|d. Design manoeuvring speed
|e. Best rate of climb speed
Unequal columns: Generally these are preferable to equal columns.
|a. Never exceed speed
|b. Best angle of climb speed
|c. Red radial line on the airspeed indicator
|d. Design manoeuvring speed
|e. Best rate of climb speed
Unlike the examples above, give specific and complete instructions. Do not make the student guess what is required.
Also unlike the questions above, test only essential information.
Use closely related material throughout the question.
Where possible, make all responses plausible.
Use the working language of the student.
Arrange the alternatives in a sensible, easily read order.
If alternatives are not to be used more than once, provide extra alternatives to avoid guessing by elimination.
Question writing is one of your most difficult tasks. Besides requiring considerable time and effort, the task demands a mastery of the subject, an ability to write clearly, and an ability to visualise realistic situations for developing relevant questions. Because of the time and effort required in the writing of effective questions, it is desirable to establish a question bank or pool.
As long as precautions are taken to safeguard the questions in a pool, the burden of continually preparing new questions will be lightened (but not eliminated). The most convenient and secure method is to record questions on a computer. These can be added to or amended as required, and using the cut and paste feature, different examination papers can quickly be compiled and printed.
Regardless of the question type or form, the following principles should be followed in writing or reviewing questions49.
Each question should test a concept or idea that it is important for the student to know, understand or be able to apply.
The question must be stated so that everyone who is competent in the subject would agree on the correct response.
The question should be stated in the student's working language.
The wording should be simple, direct and free of ambiguity.
Sketches, diagrams or pictures should be included if they add realism or aid the student in visualising the problem.
The question should present a problem that demands knowledge of the subject. A question that can be responded to on the basis of general knowledge does not test achievement.
If a student demonstrates the ability to perform selected parts of a skill for which they are being trained, it is assumed that they will be able to perform the entire skill. Performance testing is a sampling process. It should be a carefully selected part of an action process typical of the skill for which training is being given. For example, successful completion of a cross-country flight test would assume the student is able to fly anywhere in New Zealand.
This method of evaluation is particularly suited to the measurement of student abilities in either mental or physical tasks. Performance testing is desirable for evaluating training that involves an operation, a procedure or a process, and it is used extensively in flight instruction.
Evaluation of demonstrated ability during flight instruction must be based upon established standards of performance (see the appropriate FTSG), suitably modified to apply to the student's experience, stage of development as a pilot and the conditions under which the demonstration was performed. For the evaluation to be meaningful to you, the student's mastery of the elements involved in the manoeuvre must be considered, rather than merely the overall performance.
In evaluating student demonstrations of piloting ability, as in questioning and other instructional processes, it is important to keep the student informed of progress. This may be done as each procedure or manoeuvre is completed or during the debriefing.