Monday, August 13, 2012

Does the Z-score method need improvement?

the island

article_image
The Z-score method, has failed twice in ranking 2011 G.C.E. Advanced Level (A/L) performances. Is this not an eye opener to the Z-score method having some grave drawbacks?

Z-score Method

Since year 2000, the Z-score has been used to rank G.C.E. (A/L) performances for university admissions. It is considered a better scaling method than the previous use of the aggregated marks for comparing student performance in different subject combinations. However, the Z-score method has come under an avalanche of public criticism since its inception. Reaso ns for this criticism stem from the lack of understanding and lack of transparency about the method. For an A/L student, Z-score looks like a magic black box; it should not be the case.

A student who sits the GCE (A/L) will get the grades for subjects and the average Z-score for the three subjects sat as the results of the examination. However, the unfortunate thing is that there need not be a strong positive correlation between the grades and the average Z-scores among different subject combinations. For instance, a student with 3 ‘B’s in a subject combination might get a better Z-score than a student with 3 As in another subject combination. Thus, the innocent students get confused with their two sets of non related results. But, when aggregate marks were being used as tool for ranking it was not the case as there existed a linear relationship and a strong positive correlation between the grades and aggregate marks. As long as the raw marks are not used for ranking students, the grades which are based on raw marks make no sense. On the other hand the above grades cause unnecessary confusion too.

The Department of Examinations could consider one of the following suggestions as the means to allay unnecessary confusion:

1. It would be better to release the Z-scores for each subject rather than grades.

2. Otherwise, the grades of the subjects should be based on Z-scores rather than raw marks. For instance, for a particular subject, the Grade ‘A’ can be given for a student who gets a Z-score of 1.0 or above in the subject.

There is no perfect scaling method available and Z-score is a widely accepted one. However, there are some drawbacks in this method. Therefore, further research is needed to find a better scaling method. Let us examine this in detail.

For the calculation of Z-score, we do not need to assume any particular probability distribution of the raw marks of a particular subject. The formula for the Z-score is Z-score = (raw marks – measure of location)/measure of dispersion; are mean and standard deviation are being used as measures of location and dispersion respectively. Mean is a good measure of location and standard deviation is a good measure of dispersion for unimodal symmetric distributions. However, for non-symmetric distributions mean is no longer a good measure of location and standard deviation is not a good measure of dispersion either. Therefore, we have to be careful in using Z-score for scaling, when the raw marks follow any non-symmetric distribution.

In order to have a unimodal symmetric distribution for particular subject’s marks, the entire country has to be considered a homogeneous population. Otherwise, there will be a possibility of having a multimodal non-symmetric distribution. Still, we have district quotas for university entrance and thus, we believe that all the districts are not of the same standard. If so, how can we assume the countrywide examination marks of a subject as a homogeneous population?

2011 A/L and Z-score

In the year 2011, two different G.C.E. (A/L) examinations were conducted for old and new syllabuses. While the repeat candidates sat the old syllabus examination, fresh candidates sat the new one. Consequently, for a particular subject, the Department of Examinations (DoE) had two different sets of marks for the old and new syllabuses. Thus, when a need for calculating the Z-score to rank and prepare lists of candidates of both groups according to find a common cut-off point for university admissions, the DoE found itself in a dilemma.

Earlier the means and variances of the two different examination marks have been pooled for the calculation of the Z-score of a particular subject. However, Prof. R.O. Thattil, the person who introduced Z-score as a tool for ranking A/L students in Sri Lanka, strongly opposed the above pooling method. Later the Supreme Court’s verdict has also proved that pooling is not an appropriate method.

Therefore it is clear that if DoE wants to use the Z-score as a scaling method, it should not pool the means and variances of the different examinations. If DoE feels it appropriate to pool the means and variances of the different examinations it should use some other scaling methods (not the Z-score) for ranking purpose.

However, it is interesting to note that Z-score calculations have become controversial even when they were calculated separately following the court’s verdict. It seems there is clear evidence that the repeat students were affected by this method. In the recent past, on average 58% of the Medical seats were filled by repeat candidates. But as per the 2011 separate Z-score results, only 26% (less than half of the past average) of the medical seats are being filled by repeat candidates. Repeat students have been affected in the engineering and management streams as well. It shows that the separate Z-score is also not a good scaling method. However, note that pooling is not a solution to this problem.

Why are the repeat Bio science students affected heavily in the new (separate) Z-score results? Since the historical data show that majority of the medical seats were filled by repeat candidates, there could be two groups among Bio science repeat students. One group wants to receive medical education while the other wants to merely gain the A/L qualication. Thus, there could be a high possibility that marks of the bio science repeaters might follow a bimodal distribution. Thus the distribution would not be symmetric and Z-Score method fails as a ranking method.

Median Centered Score

For non-symmetric distributions, Median (which is the 50th percentile) is the better measure of location, and Inter Quartile Deviation (IQD) is a better measure of dispersion than standard deviation. Inter Quartile deviation is the half of the difference between the 75th and 25th percentiles.

We could define a new scaling method, Median Centered Score (MCS), by MCS = (raw marks – median marks)/IQD of the marks. The above MCS is not sensitive to extreme values, as median and IQD are less sensitive to extreme values compared with mean and standard deviation respectively. However, MCS is yet to be validated using some real world data set. Moreover, further research is needed in developing a scaling method for non-symmetric distributions.

Dr. S. Arivalzahan

Department of Mathematics and Statistics, University of Jaffna.

1 comment:

  1. Nice article.There has been doubts about Z- score system since its inspection. There has been several severe discussion about Z-score method adopted in our faculty.However, for some reason, the persons responsible try to implement Z-score system without analyzing and listening opponents. It is high time academics come to an open discussion and adopt a new method suitable for our country.

    ReplyDelete