Z Scores, Standard Scores, and Composite Test Scores Explained (2024)

  • Journal List
  • Indian J Psychol Med
  • v.43(6); 2021 Nov
  • PMC8826187

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Z Scores, Standard Scores, and Composite Test ScoresExplained (1)

Link to Publisher's site

Indian J Psychol Med. 2021 Nov; 43(6): 555–557.

Published online 2021 Oct 10. doi:10.1177/02537176211046525

PMCID: PMC8826187

PMID: 35210687

Author information Copyright and License information PMC Disclaimer

Abstract

Patients may be assessed using a battery of tests where different tests yield scores indifferent units, where different tests have different minimum and maximum scores, andwhere higher or lower scores mean different things in different tests. Therefore, acomposite test score cannot be obtained by simple addition or averaging of scores in theindividual tests. However, if performances in individual tests are converted to Z scores,the Z scores can be added or averaged to yield a composite score that can be interpretedor processed using conventional statistical methods. This article explains in simple wayshow Z scores are calculated, what the properties of Z scores are, how Z scores can beinterpreted, and how Z scores can be converted into other standard scores.

Keywords: Statistics, Z score, standard score, composite score, T score, stanine score, sten score

In a hypothetical study, I randomize schizophrenia patients to computer-based cognitiveremediation (CR) or television viewing (TV) thrice weekly for three months. I administer fivecognitive tasks at the study baseline and, again, at the study endpoint. I wish to determinewhether CR improves cognitive task scores more than TV does. One way of doing this is to usestatistical tests to compare CR and TV groups, task by task; however, there are severalproblems associated with this approach. For example, performing five separate statisticaltests, one for each cognitive task, increases the risk of a Type 1 (false positive) error.1 Or, patients in one group may perform better in some tasks and worse in other tasksrelative to patients in the other group; so, what should the overall conclusion be? Or,patients in one group may perform better than patients in the other group in all tasks withoutthe results reaching statistical significance for any task; again, what should the overallconclusion be?

Need for Composite Scores

One way to get an overall perspective is to create a composite score. This is easily donein some circ*mstances; for example, one may reasonably add or average language, science,math, geography, and history marks to get a single composite score in school examinations.Simple addition or averaging of marks is possible because all subjects are treated equally,all subjects are marked from 0 to 100, and all subjects have higher marks indicating betterperformance. Simple addition or averaging is not possible for cognitive tasks becausedifferent tasks have different everyday importance, because different tasks have differentminimum and maximum scores, because in some tasks lower scores indicate better performance,and in other tasks, higher scores indicate better performance, and because some tasks yieldscores measured in units of time, others yield scores measured as the number of correctresponses, and so on. As examples, verbal memory may be more important than visual memorybecause of its application to everyday life; tests of processing speed are measured in unitsof time and lower scores indicate better performance; and tests of memory are measured inthe number of units correctly recalled and higher scores indicate better performance. So,simple addition or averaging of scores is not possible.

Computing Z Scores

One solution is to first convert the original (raw) scores for each cognitive task into anew score that is described in the same unit for all tasks. The new scores can then be addedor averaged to form a composite score. Conversion of the raw scores into Z scores is onesuch approach. To do this (for example) for the verbal memory test in the cognitive battery,I would need to perform the following actions with the verbal memory raw scores.

  1. Calculate the mean (M) and standard deviation (SD) verbal memory score for the CR andTV groups combined into a single group; that is, for the pooledsample.

  2. Calculate the Z score for each patient; the formula is Z = (x – M)/SD, where x is thepatient’s verbal memory raw score and M and SD are the estimates from the previous step.Positive Z values indicate scores that are greater than the mean of the pooled sample,and negative values indicate scores that are less than the pooled mean.2

Z scores are similarly calculated for each patient for each of the remaining four cognitivetasks. This is done separately for the baseline and endpoint data.

Understanding Z Scores

If we look at the formula for the Z score, we will immediately realize that the Zscore tells us how far above or below the mean an individual’s score is, expressed inunits of SD. So, if the M(SD) is 18(4) for the verbal memory scores in the pooledsample, a patient with a verbal memory score of 20 has a Z score of (20−18)/4, or 0.5. Thatis, the patient’s verbal memory score is half an SD above the mean of the sample. Anotherpatient whose raw score is 12 would have a Z score of (12−18)/4, or −1.5; that is, one and ahalf SDs below the sample mean.

Interpreting and Using the Z Scores

The raw scores were in different units in the different cognitive tasks. Z scores are allin the same unit, that is, SD. The Z score distribution has a mean of 0 and an SD of 1. Zscores are useful because they allow data to be interpreted or used in many ways, as thefollowing examples show:

  1. The Z score tells us at a glance how the patient has performed relative to the restof the sample, something that is not evident from the inspection of a raw score.

  2. Because we understand the relationship between M and SD in the normal distribution,and because the Z score is an SD unit, we know that Z scores of 2 and above (eitherpositive or negative) are quite far from the mean and that Z scores of 3 and above(either positive or negative) are so far from the mean as to represent outliers. So,an inspection of Z scores can identify outliers in the sample.

  3. Using published tables, such as a table of the area under the normal curve, we canread off the probability of obtaining any individual Z value.

  4. If a patient has a Z score of, for example, 1 for verbal memory and a Z score of −0.3for processing speed, because the unit of Z is the same for both tasks, we canconclude that this patient performed better in the verbal memory task than in theprocessing speed task. This conclusion would not have been possible from an inspectionof the raw scores.

  5. Z scores can be added to create a composite score. In the context of the studydescribed at the start of this article, the Z scores for the five cognitive tasks canbe added for each patient; this creates a composite cognitive (total) score for thepatient. There are two noteworthy points.

    1. For tests where lower scores indicate better performance, Z scores should bemultiplied by −1 so that when the Z scores are added, the composite score willcorrectly indicate the direction of change.

    2. Whereas the individual Z scores are in units of SD, the composite score, createdby adding the Z scores for the five tests, is no longer in units of SD. However,if the composite score is divided by the number of tests, we get a composite(average) score that is again a unit of SD.

Z Scores and Composite Scores

When Z scores are added or averaged as described above, each cognitive task receives equalweightage. It is possible to create composite scores in which some tasks are given higherweightage than others, based on preset values for weights. For example, it can a priori bedecided that, because verbal tasks are more relevant in everyday life than visual tasks, theverbal memory task should receive twice the weight that the visual memory task receives whencomputing the composite score. Weights can also be determined and assigned throughstatistical methods.3, 4

Once the composite score has been calculated for each patient, the M(SD) composite scorecan be calculated for CR and TV groups separately at the study baseline and at the studyendpoint and then processed using usual statistical methods; this can be done whether thecomposite score is a total or an average of the Z scores of the individual cognitivetasks.

Standard Scores

Some people find it hard to understand Z scores, especially when values are negative(readers are reminded that Z scores have a mean of 0, and that Z values that are negativeindicate scores that are below the mean of the sample). This difficulty can be resolved byconverting Z scores into other standard scores. The Z score is one example of a standardscore; using simple formulae, Z scores can be converted to other standard scores that haveonly positive values and other specific properties. An example is the T score which has M=50and SD=10. Stanine and sten scores are based on the same principle. Stanine (standard nine)scores range from 1 to 9, with a mean of 5 and an SD of 2; sten (standard ten) scores rangefrom 1 to 10, with a mean of 5.5 and an SD of 2. Stanine and sten scores are used in somepsychological tests. IQ scores are also standardized; they have a mean of 100 and an SD of15. Readers may note that Z transformation and other methods of standardization do notchange the ranking of the original data.

Computing Z Scores: Reprise

The Z score for an individual measurement can be calculated using the mean and standarddeviation of a sample, or of a pooled sample, or of the population, depending on the contextin which the Z score requires to be derived and used. If the sample comprises a singlegroup, such as a class of students, the Z scores are based on the M(SD) of that group. Ifthere are two groups, such as in the study described in this article, Z scores should becalculated based on the M(SD) of the pooled sample. However, when Z scores are interpretedfor a single individual on a test for which population norms are available, the populationmean and standard deviation are used rather than the sample M(SD).

As an aside, for the study described in this article, why are the Z scores computed for thepooled sample; why cannot the Z scores be computed for each group separately, and then M(SD)Z scores compared between groups? The answer ought to be obvious. Z scores may create newvalues but do not change the ranks (order) of the raw scores within a group; and these newvalues have an M(SD) of 0(1) because, as stated earlier, this is a property of Z scores. So,if Z scores are computed separately for CR and TV groups, the M(SD) of the z scores for eachgroup will be 0(1), making comparisons between groups illogical. However, if the CR and TVgroups are pooled, the order of the raw scores will change; whereas the M(SD) of the Zscores for the pooled group will be 0(1), the Z scores for the CR and TV group patients willdepend on the new order, and the M(SD) of the Z scores for the CR and TV groups will nolonger each be 0(1). So, the M(SD) Z scores thus created can now be validly compared betweenthe CR and TV groups. Pooling of groups is done in certain nonparametric tests, as well. Forexample, in the Mann–Whitney and Kruskal–Wallis tests, the groups are pooled, individualvalues are ranked, and then the ranks are compared across groups.

Parting Notes

Knowledgeable readers may recognize that the standardized mean difference that is a measureof pooled effect size in meta-analysis and the (standardized) beta coefficient in regressionanalysis are both based on principles similar to those discussed in this article. Adiscussion on these, however, is out of the scope of the present article.

Footnotes

Declaration of Conflicting Interests: The author declared no potential conflicts of interest with respect to the research,authorship, and/or publication of this article.

Funding: The author received no financial support for the research, authorship, and/or publicationof this article.

References

1. Andrade C. Multiple testing and protection against a Type 1 (falsepositive) error using the Bonferroni and Hochberg corrections.Indian J Psychol Med, 2019;41(1):99–100. [PMC free article] [PubMed] [Google Scholar]

2. Norman GR and Streiner DL. Biostatistics: The bare essentials.4th ed. People’s Medical PublishingHouse,, 2014. [Google Scholar]

3. Song MK, Lin FC, Ward SE, et al. Composite variables: When andhow. Nurs Res, 2013;62(1):45–49. [PMC free article] [PubMed] [Google Scholar]

4. Andrade C. Mean difference, standardized mean difference (SMD), andtheir use in meta-analysis: As simple as it gets. J ClinPsychiatry, 2020; 81(5):20f13681. [PubMed] [Google Scholar]

Articles from Indian Journal of Psychological Medicine are provided here courtesy of Indian Psychiatric Society South Zonal Branch

Z Scores, Standard Scores, and Composite Test Scores
Explained (2024)

References

Top Articles
Latest Posts
Article information

Author: Merrill Bechtelar CPA

Last Updated:

Views: 5950

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Merrill Bechtelar CPA

Birthday: 1996-05-19

Address: Apt. 114 873 White Lodge, Libbyfurt, CA 93006

Phone: +5983010455207

Job: Legacy Representative

Hobby: Blacksmithing, Urban exploration, Sudoku, Slacklining, Creative writing, Community, Letterboxing

Introduction: My name is Merrill Bechtelar CPA, I am a clean, agreeable, glorious, magnificent, witty, enchanting, comfortable person who loves writing and wants to share my knowledge and understanding with you.