Does the Z-score method need improvement?
August 12, 2012, 7:33 pm, 
the island
 

The Z-score method, has failed twice in ranking 2011 G.C.E.  Advanced
 Level (A/L) performances. Is this not an eye opener to the Z-score  
method having some grave drawbacks?
 
       Z-score Method
 
  Since
 year 2000, the Z-score has been used to rank G.C.E. (A/L)  performances
 for university admissions. It is considered a better scaling method  
than the previous use of the aggregated marks for comparing student 
performance  in different subject combinations. However, the Z-score 
method has come under an  avalanche of public criticism since its 
inception. Reaso ns for this criticism  stem from the lack of 
understanding and lack of transparency about the method.  For an A/L 
student, Z-score looks like a magic black box; it should not be the  
case.
 
 A student who sits the GCE (A/L) will get the grades 
for  subjects and the average Z-score for the three subjects sat as the 
results of  the examination. However, the unfortunate thing is that 
there need not be a  strong positive correlation between the grades and 
the average Z-scores among  different subject combinations. For 
instance, a student with 3 ‘B’s in a subject  combination might get a 
better Z-score than a student with 3 As in another  subject combination.
 Thus, the innocent students get confused with their two  sets of non 
related results. But, when aggregate marks were being used as tool  for 
ranking it was not the case as there existed a linear relationship and a
  strong positive correlation between the grades and aggregate marks. As
 long as  the raw marks are not used for ranking students, the grades 
which are based on  raw marks make no sense. On the other hand the above
 grades cause unnecessary  confusion too.
 
 The Department of Examinations could consider one of the  following suggestions as the means to allay unnecessary confusion:
 
 1. It would be better to release the Z-scores for each subject  rather than grades.
 
 2.
 Otherwise, the grades of the subjects should be based on  Z-scores 
rather than raw marks. For instance, for a particular subject, the  
Grade ‘A’ can be given for a student who gets a Z-score of 1.0 or above 
in the  subject.
 
 There is no perfect scaling method 
available and Z-score is a  widely accepted one. However, there are some
 drawbacks in this method.  Therefore, further research is needed to 
find a better scaling method. Let us  examine this in detail.
 
 For
 the calculation of Z-score, we do not need to assume any  particular 
probability distribution of the raw marks of a particular subject.  The 
formula for the Z-score is Z-score = (raw marks – measure of  
location)/measure of dispersion; are mean and standard deviation are 
being used  as measures of location and dispersion respectively. Mean is
 a good measure of  location and standard deviation is a good measure of
 dispersion for unimodal  symmetric distributions. However, for 
non-symmetric distributions mean is no  longer a good measure of 
location and standard deviation is not a good measure  of dispersion 
either. Therefore, we have to be careful in using Z-score for  scaling, 
when the raw marks follow any non-symmetric distribution.
 
 In
 order to have a unimodal symmetric distribution for  particular 
subject’s marks, the entire country has to be considered a  homogeneous 
population. Otherwise, there will be a possibility of having a  
multimodal non-symmetric distribution. Still, we have district quotas 
for  university entrance and thus, we believe that all the districts are
 not of the  same standard. If so, how can we assume the countrywide 
examination marks of a  subject as a homogeneous population?
 
   2011 A/L and Z-score
 
  In
 the year 2011, two different G.C.E. (A/L) examinations were  conducted 
for old and new syllabuses. While the repeat candidates sat the old  
syllabus examination, fresh candidates sat the new one. Consequently, 
for a  particular subject, the Department of Examinations (DoE) had two 
different sets  of marks for the old and new syllabuses. Thus, when a 
need for calculating the  Z-score to rank and prepare lists of 
candidates of both groups according to find  a common cut-off point for 
university admissions, the DoE found itself in a  dilemma.
 
 Earlier
 the means and variances of the two different examination  marks have 
been pooled for the calculation of the Z-score of a particular  subject.
 However, Prof. R.O. Thattil, the person who introduced Z-score as a  
tool for ranking A/L students in Sri Lanka, strongly opposed the above 
pooling  method. Later the Supreme Court’s verdict has also proved that 
pooling is not an  appropriate method.
 
 Therefore it is clear
 that if DoE wants to use the Z-score as a  scaling method, it should 
not pool the means and variances of the different  examinations. If DoE 
feels it appropriate to pool the means and variances of the  different 
examinations it should use some other scaling methods (not the  Z-score)
 for ranking purpose.
 
 However, it is interesting to note 
that Z-score calculations  have become controversial even when they were
 calculated separately following  the court’s verdict. It seems there is
 clear evidence that the repeat students  were affected by this method. 
In the recent past, on average 58% of the Medical  seats were filled by 
repeat candidates. But as per the 2011 separate Z-score  results, only 
26% (less than half of the past average) of the medical seats are  being
 filled by repeat candidates. Repeat students have been affected in the 
 engineering and management streams as well. It shows that the separate 
Z-score  is also not a good scaling method. However, note that pooling 
is not a solution  to this problem.
 
 Why are the repeat Bio 
science students affected heavily in the  new (separate) Z-score 
results? Since the historical data show that majority of  the medical 
seats were filled by repeat candidates, there could be two groups  among
 Bio science repeat students. One group wants to receive medical 
education  while the other wants to merely gain the A/L qualication. 
Thus, there could be a  high possibility that marks of the bio science 
repeaters might follow a bimodal  distribution. Thus the distribution 
would not be symmetric and Z-Score method  fails as a ranking method.
 
   Median Centered Score
 
  For
 non-symmetric distributions, Median (which is the 50th  percentile) is 
the better measure of location, and Inter Quartile Deviation (IQD)  is a
 better measure of dispersion than standard deviation. Inter Quartile  
deviation is the half of the difference between the 75th and 25th 
percentiles.
 
 We could define a new scaling method, Median 
Centered Score  (MCS), by MCS = (raw marks – median marks)/IQD of the 
marks. The above MCS is  not sensitive to extreme values, as median and 
IQD are less sensitive to extreme  values compared with mean and 
standard deviation respectively. However, MCS is  yet to be validated 
using some real world data set. Moreover, further research  is needed in
 developing a scaling method for non-symmetric distributions.
 
   Dr. S. Arivalzahan
 
  Department of Mathematics and Statistics, University of Jaffna.