Does the Z-score method need improvement?
August 12, 2012, 7:33 pm,
the island
The Z-score method, has failed twice in ranking 2011 G.C.E. Advanced
Level (A/L) performances. Is this not an eye opener to the Z-score
method having some grave drawbacks?
Z-score Method
Since
year 2000, the Z-score has been used to rank G.C.E. (A/L) performances
for university admissions. It is considered a better scaling method
than the previous use of the aggregated marks for comparing student
performance in different subject combinations. However, the Z-score
method has come under an avalanche of public criticism since its
inception. Reaso ns for this criticism stem from the lack of
understanding and lack of transparency about the method. For an A/L
student, Z-score looks like a magic black box; it should not be the
case.
A student who sits the GCE (A/L) will get the grades
for subjects and the average Z-score for the three subjects sat as the
results of the examination. However, the unfortunate thing is that
there need not be a strong positive correlation between the grades and
the average Z-scores among different subject combinations. For
instance, a student with 3 ‘B’s in a subject combination might get a
better Z-score than a student with 3 As in another subject combination.
Thus, the innocent students get confused with their two sets of non
related results. But, when aggregate marks were being used as tool for
ranking it was not the case as there existed a linear relationship and a
strong positive correlation between the grades and aggregate marks. As
long as the raw marks are not used for ranking students, the grades
which are based on raw marks make no sense. On the other hand the above
grades cause unnecessary confusion too.
The Department of Examinations could consider one of the following suggestions as the means to allay unnecessary confusion:
1. It would be better to release the Z-scores for each subject rather than grades.
2.
Otherwise, the grades of the subjects should be based on Z-scores
rather than raw marks. For instance, for a particular subject, the
Grade ‘A’ can be given for a student who gets a Z-score of 1.0 or above
in the subject.
There is no perfect scaling method
available and Z-score is a widely accepted one. However, there are some
drawbacks in this method. Therefore, further research is needed to
find a better scaling method. Let us examine this in detail.
For
the calculation of Z-score, we do not need to assume any particular
probability distribution of the raw marks of a particular subject. The
formula for the Z-score is Z-score = (raw marks – measure of
location)/measure of dispersion; are mean and standard deviation are
being used as measures of location and dispersion respectively. Mean is
a good measure of location and standard deviation is a good measure of
dispersion for unimodal symmetric distributions. However, for
non-symmetric distributions mean is no longer a good measure of
location and standard deviation is not a good measure of dispersion
either. Therefore, we have to be careful in using Z-score for scaling,
when the raw marks follow any non-symmetric distribution.
In
order to have a unimodal symmetric distribution for particular
subject’s marks, the entire country has to be considered a homogeneous
population. Otherwise, there will be a possibility of having a
multimodal non-symmetric distribution. Still, we have district quotas
for university entrance and thus, we believe that all the districts are
not of the same standard. If so, how can we assume the countrywide
examination marks of a subject as a homogeneous population?
2011 A/L and Z-score
In
the year 2011, two different G.C.E. (A/L) examinations were conducted
for old and new syllabuses. While the repeat candidates sat the old
syllabus examination, fresh candidates sat the new one. Consequently,
for a particular subject, the Department of Examinations (DoE) had two
different sets of marks for the old and new syllabuses. Thus, when a
need for calculating the Z-score to rank and prepare lists of
candidates of both groups according to find a common cut-off point for
university admissions, the DoE found itself in a dilemma.
Earlier
the means and variances of the two different examination marks have
been pooled for the calculation of the Z-score of a particular subject.
However, Prof. R.O. Thattil, the person who introduced Z-score as a
tool for ranking A/L students in Sri Lanka, strongly opposed the above
pooling method. Later the Supreme Court’s verdict has also proved that
pooling is not an appropriate method.
Therefore it is clear
that if DoE wants to use the Z-score as a scaling method, it should
not pool the means and variances of the different examinations. If DoE
feels it appropriate to pool the means and variances of the different
examinations it should use some other scaling methods (not the Z-score)
for ranking purpose.
However, it is interesting to note
that Z-score calculations have become controversial even when they were
calculated separately following the court’s verdict. It seems there is
clear evidence that the repeat students were affected by this method.
In the recent past, on average 58% of the Medical seats were filled by
repeat candidates. But as per the 2011 separate Z-score results, only
26% (less than half of the past average) of the medical seats are being
filled by repeat candidates. Repeat students have been affected in the
engineering and management streams as well. It shows that the separate
Z-score is also not a good scaling method. However, note that pooling
is not a solution to this problem.
Why are the repeat Bio
science students affected heavily in the new (separate) Z-score
results? Since the historical data show that majority of the medical
seats were filled by repeat candidates, there could be two groups among
Bio science repeat students. One group wants to receive medical
education while the other wants to merely gain the A/L qualication.
Thus, there could be a high possibility that marks of the bio science
repeaters might follow a bimodal distribution. Thus the distribution
would not be symmetric and Z-Score method fails as a ranking method.
Median Centered Score
For
non-symmetric distributions, Median (which is the 50th percentile) is
the better measure of location, and Inter Quartile Deviation (IQD) is a
better measure of dispersion than standard deviation. Inter Quartile
deviation is the half of the difference between the 75th and 25th
percentiles.
We could define a new scaling method, Median
Centered Score (MCS), by MCS = (raw marks – median marks)/IQD of the
marks. The above MCS is not sensitive to extreme values, as median and
IQD are less sensitive to extreme values compared with mean and
standard deviation respectively. However, MCS is yet to be validated
using some real world data set. Moreover, further research is needed in
developing a scaling method for non-symmetric distributions.
Dr. S. Arivalzahan
Department of Mathematics and Statistics, University of Jaffna.