CATS - Cognitive Abilities Tests
The Cognitive Abilities Test (CAT) is an assessment of a range of reasoning skills. The tests looks at reasoning with three types of symbols: words, numbers and shapes or figures, i.e. verbal, quantitative and non-verbal reasoning.
The verbal reasoning element assesses reasoning processes using the medium of words. Such processes include: identifying relationships between things (e.g. 'big' is the opposite of 'small'); creating correlates of such relationships (e.g. 'big' is to 'small' as 'thick' is to 'thin'); identifying classes ('hat', 'gloves,' ____?’: pyjamas, slippers, scarf), and reasoning deductively ('A' is taller than 'B' and 'B' is taller than 'C'; therefore 'A' is taller than 'C'). It is not therefore an assessment of reasoning with words, nor wider language skills such as speaking, listening or writing.
The quantitative tests look at the same processes but use numbers as the symbols. For example determining rules by analogy and applying these to new cases (2->3, 9->10, 6->_? (7)), determining patterns and relationships in series (1, 4, 7, _? (10)), or combining elements to form number sentences (e.g., by combining the following elements you can make one of these answers (2 3 4 + -: 0 2 4 5 7).
The non-verbal tests again look at reasoning processes but use shapes and figures. Because these questions require no knowledge of English language, or the number system, they are particularly useful when assessing children with poor English language skills, or disaffected pupils who may have failed to achieve in academic work for motivational reasons.
Â
2. Is CAT a measure of innate ability? Are CAT results in any way affected by teaching?
There is no such thing as a measure of ‘innate ability’. The quality of prior teaching, opportunities to learn, parental support and pupils’ educational experience will undoubtedly affect pupils’ performance on all educational tests. However tests of the taught curriculum - reading, mathematics, spelling etc. - are likely to be influenced to a greater degree than reasoning tests. Attainment tests, such as National Curriculum tests, are designed to measure outcomes of specific learning and instruction, and the content is drawn directly from the taught curriculum. In contrast, reasoning tests tap a general set of prior experience by assessing the perception and manipulation of relationships and content that is not generally part of the taught curriculum. Non Verbal Reasoning (NVR) tests, with their relatively low language demands, are least likely to be influenced by the quality of teaching issues. This issue is dealt with in detail in Chapter 8 of “Getting the Best from CAT� (Strand, 2003).
Â
3. Can CAT be used to identify ‘underachieving’ pupils?
As identified in FAQ2, CAT scores are less likely to be affected by school experience than attainment tests. Comparisons between a pupil’s CAT scores and their attainment in school subjects such as English and mathematics can therefore be helpful. This can identify pupils whose reasoning ability is average or above but whose attainment in curriculum-related subjects is low. Such pupils may be characterised as underachieving, and may benefit from targeted intervention.
Â
4. Can I use CAT every year to monitor pupils' progress?
Reasoning test scores tend to be more stable over time than attainment test scores. Annual testing is therefore more typical for reading, spelling or mathematics tests, which assess how well children are performing in relation to the curriculum that has been delivered. Examples of such tests would be the nferNelson Progress in English 5-14 or Mathematics 5-14 test series. These tests include 'progress scores' which specifically allow users to compare the amount of progress made by pupils over the course of a year with the progress made by a large and nationally representative sample of pupils.
However, reasoning scores can and do change over time. For a minority of pupils, these changes may be quite substantial. The mean scores for a group of pupils or even a whole school can also change substantially, for example where there has been an intervention such as the National Literacy or Numeracy Strategies (NLS/NNS), Cognitive Acceleration through Science (CASE) or Philosophy in the Classroom thinking skills approaches.
For these reasons, it is advisable to re-test pupils when key educational decisions are to be made. For example, where pupils are initially tested in Year 7, it is advisable to re-test them during Y9 when important decisions such as GCSE option choices and examination targets are being considered.
Â
5. What change in scores over time represents a significant improvement or decline in CAT scores?
For individual pupils you should remember that any test is based on performance on one day and may be affected by a wide range of motivational or other influences (e.g., the pupil may have been distressed or upset by an incident at home earlier that day). It is important that the score is placed within a ‘confidence interval’ so you do not over-interpret small changes in standard scores. As a rule of thumb with the CAT batteries, there will need to be a change of 10 or more standard score points before you would say a pupil had a ‘significant’ change in their CAT score.
What about change in the mean scores for groups of pupils? We should remember that when using standardised age scores, a consistent score over time indicates the expected amount of progress. For example if a group achieved a mean standard score of 102.5 in Y7, and the same group achieved a mean score of 102.5 two years later in Y9, then the group would have made the appropriate amount of progress for their age. You would also need to place ‘confidence intervals’ around the group mean scores, which would depend upon the number of pupils in the group. Again, a rule of thumb, in order to be significant a change would need to be at least 2 standard score points for a group size of 100 or more pupils. For a smaller group the change would need to be larger to be significant.
Â
6. When is the difference in a pupil’s standard scores on the three batteries considered ‘significant’?
In most cases the three standard age scores (verbal, quantitative and non-verbal) will be broadly in-line with each other. Scores will rarely be exactly equal and there has to be a difference of 10 or more standard age score (SAS) points between a pupil’s score on any two tests before the difference would be considered statistically significant. The implications of any score differences will depend on the particularly batteries where the differences exist, and whether they indicate relative strengths or relative weaknesses. Chapter 3 of “Getting the Best from CAT� describes a detailed system for analysing CAT pupil profiles, their implications for teaching and learning, and practical guidance on strategies.
It is rarely advisable to give advice based on test scores in isolation. Test scores are only a small part of the picture and you need to know the whole pupil in order to interpret the results in an appropriate context. Test scores should feed into a broader assessment, bringing to bear knowledge of the pupil's achievements in school subjects, their personal background and their attitudes, motivation and behaviour. For this reason a pupil’s teacher will be best placed to interpret the implications, if any, of the CAT scores of any individual pupil.
Â
7. What conclusions can be drawn from patterns of test results for year groups (e.g., pupils tending on average to score better in one battery than in another)?
First, you would need to determine whether any score differences are significant (see FAQ5). If yes there may be general implications. For example, where the mean VR score for a year group is lower than the mean NVR score, this may indicate a need for specific interventions to address low verbal skills. “Getting the Best from CAT� (Strand, 2003), p72-79, includes an EXCEL spreadsheet to allow you to evaluate the significance of score differences for groups of pupils. You should remember though that any difference in the mean scores for the group will be a generalisation and will not necessarily apply to all individual pupils. You will need to look at each individual pupil’s scores in order to identify those who might benefit most from any intervention.
Â
8. What is the correlation between CAT and IQ scores such as WISC used by educational psychologists?
In general, we would expect a high correlation between the CAT Verbal and Non-Verbal batteries and the WISC Verbal and Performance IQ respectively. We are currently working with the University of Sheffield to determine the correlation between CAT3 and WISCIII scores. The project is due for completion in the summer 2004.
9. What explanation could be given for differences in a pupil’s scores on the WISC and CAT tests?
The CAT is a group-administered test, while the Weschler is individually administered on a one-to-one basis. This can affect the performance of a small number of pupils, for example, some pupils with attention problems may lose focus during a group test but remain ‘on-task’ with one-to-one testing. The WISC verbal tests involve the examiner reading the material to the pupil, while the CAT verbal tests require the pupil to read the materials, so this may be another factor. For both tests, scores can sometimes vary because of extrinsic factors such as tiredness, distraction, lack of motivation, incorrect administration etc.
Where there are score differences between two tests you need to know something about the individual who is the focus of the assessment. One-to-one work with observation of a student over the course of a school day can sometimes be far more informative than a whole battery of test scores. Perhaps the important message is to use all the available evidence, from every source, when making any educational decisions.
Â
10. To what extent is the reliability of CAT results affected by children with dyslexia, dyscalculia or specific learning difficulties in regard to following multiple instructions?
The great benefit of the CAT is in the diagnostic use that can be made of the pupil’s profile of performance across the three batteries. For example, a specific language difficulty (maybe “dyslexia�) might be manifested in a low score on the VR battery in contrast to the QR and NVR batteries. A specific arithmetic difficulty may show as an uneven profile with a low QR score relative to VR and NVR. In either case, it is probably appropriate to follow up the CAT results for such pupils with further one-to-one assessment. CAT is likely to be the starting point for hypotheses and questions which will require further detailed investigation.
Â
11. How reliant is CAT on the quality of the adult explaining and supervising the test?
It is important that the administrator follows the instructions as given in the test manual. Each sub-test starts with demonstration and practice questions. The administrator must use these to ensure that pupils are familiar with the test structure and question formats before they start the test. Being a timed test, it is vital that a stopwatch or clock is used to ensure the correct amount of time is given for each sub-test, e.g., 10 minutes for Number Series means exactly 10 minutes, not 9.5 minutes or 10.5 minutes. Providing the test manual is followed accurately and professionally by the teacher, the influence of the specific adult who administers the test should be minimal.
Â
12. My son has a lower score on the Non-Verbal test than the others. Are there any learning strategies that could be employed that would help him?
Helpful suggestions for activities that parents can undertake at home to support their children are given in the book “Getting the Best from the CAT� by Dr Steve Strand (September 2003). The book is available, price £35-00, from nferNelson. Call 0845 602 1937 for further details.
Â
13. If I read out the CAT Verbal Battery to pupils, will this invalidate it?
You can and should read out all the instructions, and the demonstration and practice questions, explaining these in detail, using community languages if appropriate. However if you read out the test questions in the Verbal Battery then you cannot use the standard age scores, because this was not how the test was administered to the standardisation sample.
My advice, if you have concerns about a pupil who may find the Verbal Battery too challenging, would be to assess the pupil in two stages.
First, administer all three CAT batteries in the standard fashion. If a pupil has specific language difficulties, then this should be revealed in the profile of their scores across the three batteries. We might expect the Verbal score to be significantly lower than either the Quantitative or Non-verbal score. Significant in this context means a difference of 10 or more standard age score points. If this is the case, then you have independent evidence to confirm the pupil’s difficulties in reasoning with words, but not with numbers or spatial concepts. The book "Getting the Best from the CAT" (see references) considers detailed action with the pupil that might follow in terms of teaching and learning.
Second, to isolate a specific problem with reading, you could then read some of the verbal questions to the pupil. If their response to these questions is markedly better than on the first occasion this might isolate reading as the key issue. This would suggest that the pupils 'true' reasoning score is obscured by their reading difficulties. If this is the case, then you should base any target setting or similar activity on the mean of the Quantitative and Non-verbal scores alone, ignoring the Verbal score. Of course, it is absolutely vital that the resources are found to address the pupil's reading difficulty, otherwise the pupil's potential is unlikely to be fulfilled.
Questions about the Indicators
Â
14. What indicators are available?
Indicated outcomes are currently available for the following:
End of Key Stage 2 national tests in English, mathematics and science;
End of Key Stage 3 national tests in English, mathematics and science;
GCSE public examinations in 30 subjects;
Scottish standard grade examinations in 24 subjects.
In addition AS level indicators will be produced in September 2004 and AS/A2 indicators in September 2005.
Â
15. Can I use CAT for target-setting with individual pupils?
There is a strong correlation between pupils’ scores on CAT and their subsequent performance in national tests at the end of Key Stage 3 (KS3) and in public examinations at age 16. This does not imply a deterministic relationship between CAT scores and KS3/GCSE results for individual students. Students with similar CAT scores can achieve a wide range of GCSE outcomes. Clearly a whole range of factors such as the pupil’s motivation, behaviour and effort, the extent of parental support, the quality of teaching and learning in the school etc. impact on pupil’s level of success in subsequent examinations. However the CAT scores are helpful to teachers in providing a forecast of potential KS3 or GCSE outcomes. The teacher can then use the CAT scores as one piece of evidence, alongside everything else they know about the student, when considering targets for future attainment.
Â
16. Do the indicators mean that pupil will achieve the predicted grades?
The indicators are not precise, they indicate the outcomes expected for students with a particular CAT score making average progress in the typical secondary school. They come with a margin of error, which reflects the differences in progress that may be made by different pupils in different schools or circumstances. The subject indicators come with a margin of error of at least plus or minus one grade, as illustrated through the progress graphs and tables. For example a student may have an indicated outcome of 'D' in a particular GCSE subject, but this may for example reflect a 5% chance of an A, 10% chance of a B, 20% chance of a C, 30% chance of a D and 20% chance of an E, 10% chance of an F and 5% chance of a G or below. The ‘Progress tables’ or ‘chances graphs’ show the indicators are a good starting point for considering targets but should only be considered in the context of the confidence intervals. Further details on interpreting the indicators are given in “Getting the Best from CAT� (see further information).
Â
17. How is the 5+ GCSE A*-C grades indicator calculated?
The GCSE indicators are derived from an analysis of the progress made by large and nationally representative samples of pupils between CAT tests at age 11 and GCSE examinations at age 16. An example of the actual data is shown below. Figure 1 shows the proportion of pupils achieving 5+A*-C grades for each mean CAT score. We can see that the higher the mean CAT score the greater the proportion of pupils achieving 5+A*-C grades. For example only just over 10% of pupils with a mean CAT score of 85 achieve 5+A*-C grades. In contrast around 95% of pupils with a mean CAT score of 115 achieve 5+A*-C grades.
We can use a statistical technique termed ‘ordered logistic regression’ to smooth and more accurately chart the relationship between mean CAT scores and 5+A*-C grades. The result of this analysis is shown in Figure 2. This shows the probability that a pupil will achieve 5+A*-C grades, given their particular CAT score. For each pupil we can therefore indicate the probability that they will achieve 5+A*-C grades, ranging from less than 1% up to greater than 99%.
Â
The school level indicator is derived by taking the average of these pupil level indicators. For example, consider a (hypothetical) school where the year group had only three pupils. If the mean CAT scores for these three pupils were 85, 100 and 115 respectively, then the school level indicator for 5+A*-C would be: (10% + 60% + 95%) / 3 = 55%.
Â
18. At what CAT score should I expect pupils to achieve 5+ GCSE A%-C grades?
From the 2003 GCSE Indicators, we can see that pupils with a mean CAT3 score of 99 or above have a greater than 50% probability of achieving 5+ GCSE A*-C grades, i.e. they are more likely than unlikely to achieve 5+ A*-C grades. Looked at in this way, the 5+ A*-C threshold is a mean CAT score of 99. However, you should recognise that this is a rough rule of thumb only. Some pupils with a mean CAT3 score above 99 will not actually achieve 5+A*-C grades, while some pupils with a score below 99 will achieve this level of success.
19. Why is the 5+ A*-C indicator for my school higher/lower than I expected?
The GCSE indicators are derived from an analysis of the progress of all pupils in the national sample, without regard to school membership or any other factors. The GCSE indicators are therefore the outcomes expected for a ‘typical’ school. However, there is considerable variation in GCSE outcomes across schools. In some schools pupils obtain markedly better results than pupils with the same CAT score in other schools.
As an example, we can consider the school variation in the mean CAT score above which pupils are more likely than unlikely to achieve 5+A*-C passes (i.e. where the probability of achieving 5+A*-C grades becomes greater than 50%). Less successful schools have high thresholds; in a school at the lower quartile, pupils need to have a mean CAT scores of 102 before they are likely to achieve 5+A*-C grades. More successful schools have lower thresholds; in a school at the upper quartile, pupils with a mean CAT3 score of 97 are likely to achieve 5+A*-C passes.
Thus if the CAT indicator looks quite challenging, you may have to consider whether your school is adding as much value as it might. On the other hand, if the CAT indicator does not look challenging, you may be adding a lot of value already, and need to consider aiming for the Upper Quartile or a more challenging target.
You should remember that the whole school 5+A*-C indicator needs to be interpreted with a high degree of caution. There is substantial aggregation involved in calculating the 5+A*-C figure for a school, involving (i) summarising the eight GCSE grades to a simple pass/fail at the C/D border for each subject; (ii) summarising across all the subjects taken by a pupil, again to a simple pass/fail, so that 4 C’s represents fail but 5 represents a pass, and then; (iii) averaging across all the pupils in the year group. For this reason there will be a wide error margin associated with the indicator. For this reason it is advisable to consider indicators and targets at the individual pupil level.
Â
20. My KS3 indicators show that 10% of pupils are indicated to obtain level 3 or below in English. However, no individual pupil has an indicated level 3 for English. How is this explained?
In the KS3 English test, level 3 is a very infrequent outcome. In 2002, 3% of pupils nationally were graded level 3, 4% were graded N (failed to register a level) and 3% were graded B (teacher assessment only). Taken together, these outcomes represent only 10% of all 14 year olds; nationally 90% of pupils aged 14 years achieve level 4 or above. Therefore, even for pupils with very low CAT scores, the most likely outcome is usually level 4. Only pupils with VR scores of 73 or below have an indicated Level 3 for English.
However, at the group level, we are dealing with a different question. The focus here is not on indicating the most likely outcome for an individual pupil. Rather it is to ask what the distribution of levels may be across a large group of pupils. We can illustrate this with an extreme example. Suppose a school has 10 pupils, all of whom achieved a CAT score of 80. All the individual pupils in the group would have a KS3 English indicated level 4B, because Level 4 is the most frequent outcome achieved by pupils with this VR score. However across the group as a whole, we might expect 3 pupils (approximately 30% of the group) to achieve level 3 or below, five pupils (approximately 50% of the group) to achieve level 4, and two pupils (approximately 20% of the group) to achieve level 5 or above.
The above is an extreme example that is unlikely to happen in practice. The important activity for the school is to translate the indicators or forecasts derived from CAT into actual targets for student’s attainment. The distinction between an indicator/forecast and a target is important. Targets should equal forecast plus an element of challenge. In this context it may be inappropriate to set level 3 in English as a target for any mainstream 14-year-old student. However, at the same time, it is important to recognise that a certain (small) proportion of students do achieve level 3 as an English outcome. It is the role of the teacher, using the wider knowledge of the pupil that s/he has available, to decide which level represents an appropriately challenging target for any individual student.
Â
21. Why is there no longer a ‘Number of A*-C grades’ indicator, like there used to be with CAT2?
The old ‘number of A-C grades’ indicator was calculated as a separate regression equation when indicators were only available for six individual GCSE subjects. Now that we have indicators for 30 GCSE subjects, it is most robust to look directly at the indicators for the subjects an individual pupil will take, and to directly estimate the number of A*-C grades from that.
Â
22. Why are the GCSE indicators for ‘Art and Design’ higher than other subjects?
There are really two underlying reasons for the comparatively high indicated grades in GCSE Art and Design.
First, grades achieved in Art & Design do tend to be higher than in other GCSE subjects. Thus in summer 2000, 65% of Art & Design entries were graded A*-C, beaten only by drama (69%) and music (69%). Some comparable figures in other subjects are mathematics (50%), modern foreign languages (50%), science (51%), D&T (51%), Business Studies (54%) and Geography (58%) etc.
Second, the correlation between CAT scores and Art & Design grades is lower than for other more academic subjects. For example the correlation between mean CAT scores and English, maths and science GCSE grades are .67, .75 and .69 respectively. In contrast the correlation with Art & Design is only .45. This means that some students with relatively low CAT scores can and do achieve quite high Art & Design grades. Thus for pupils with the lowest CAT scores the most typical outcome in English or maths may be an F or even a G, but in Art & Design it's a GCSE grade E. This is basically saying that while general cognitive ability does play a part in success in Art & Design, other specific factors are also important, e.g. aptitude for Art. (Obviously motivation, effort, and the quality of teaching are also incredibly important influences). We see the same kind of pattern for the GCSE indicators for Physical Education (PE) and Creative Arts.
You may ask why we include Art & Design indicators if the correlation with CAT is relatively low. We do this because although the correlation is not as strong as for other subjects, it is still highly statistically significant. We also produce them because Art, PE and drama teachers ask for the indicators to give them a starting point to think about targets. They realise they may have to use a lot of their own assessment and judgement to determine a realistic target, but at least they have a starting point.
Â