Psychological tests are written, visual, or verbal evaluations that are commonly administered to assess cognitive and emotional function in children and adults. Many different tests have been designed to assess a variety of attributes including achievement, ability, personality, and neurological function. Achievement and ability testing is utilized for several practical applications including determination of learning disabilities, identifying giftedness, and tracking intellectual development. Most achievement and ability tests are standardized, i.e. normative values were established on a large representative sample. Personality tests are also administered for a variety of reasons, which may include diagnosis of psychopathology, screening of job candidates, and determination of strength and weaknesses in educational and vocational settings. These tests evaluate thoughts, emotions, attitudes, and behavioral traits that comprise one's personality.

The purpose of this paper was to compare and contrast two commonly-used psychological tests; the Myers-Briggs Type Indicator (MBTI), a personality test, and the Wide Range Achievement Test (WRAT3), an achievement and ability test. Each test was described regarding its purpose, measurement scale, reliability, validity, normative construct, and appropriate uses and limitations. Modalities that measure similar traits are also described.


The MBTI was developed by Isabel Briggs Myers and Katharine Cook Briggs in 1962. The test was constructed to build on the work of C.G. Jung, a Swiss psychiatrist with a main research interest of understanding differences among people. The premise of his work was to discover how human beings process information and make decisions. The MBTI is a self-report questionnaire designed to make Jung's theory of psychological types useful.

Psychological type theory states that people have preferred modes of perception (sensing [S]/intuition [N]) and judgment (thinking [T]/feeling [F]), as well as attitudes that reflect their orientation of energy (extraversion [E]/introversion[I]) and their orientation toward the outer world (judging [J]/perceiving[P]). Jung attested that people develop a dominant function and an auxiliary function for balance. In order to determine if the judging or perceiving function was dominant, Myers added the JP scale. She reasoned that since Es focus outwardly, the JP preference directly indicates their dominant function. However, the dominant function for Is preferred for dealing with their inner world. Since the JP scale reflects their extroverted function, the relevant dimension is the opposite of their JP preference. These four sets of preferences (S/N, T/F, E/I, J/P) combine to form 16 distinct personality types.

Joseph F. Jastak developed the first Wide Range Achievement Test (WRAT) in 1936. This test of basic academic codes was developed to supplement the multifactor measure of intelligence developed by David Wechsler. The first edition and a revision in 1946 had only one scale of achievement, which ranged from kindergarten to college for each of the three subtests. The 1965 version retained these three subtests, but each was represented by separate scales at two levels. Level I was developed for children between the ages of 5 years and 11 years 11 months. Level II was intended for persons from 12 years to adulthood. In 1984, the WRAT-Revised (R) was developed. The currently-used version, the WRAT3, is a very brief screening measure for academic achievement, which includes reading (recognizing and naming letters, pronouncing words out of context), spelling (writing name, writing letters and words to dictation), and arithmetic (counting, reading number symbols, solving oral problems, performing written computations).


The scales used to construct the MBTI have been met with criticism. One hundred twenty dichotomous scale questions were used to establish the MBTI. The implication is that the test-taker fits into certain categorical types. The instrument results consist of a four-letter code to indicate personality type. Four dichotomous implications are made about the test-taker. These implications are Introversion/Extroversion, Sensing/Intuition, Thinking/Feeling, and Judging/Perceiving. Introverts are thought to function effectively in solitary pursuits and tend to be quiet and reserved. Extroverts tend to be sensation seeking, spontaneous, and are energized by people. "Sensors" are described as practical and believe intuition is not trustworthy. "Intuitives" typically prefer metaphor, analogy, and logic and reason from principles, not emotion. "Thinkers" use impersonal means of reasoning such as logic and experience. "Feelers" make judgments based on personal reasoning, value, and emotion. "Judgers" prefer to make quick decisions and move on, preferring not to revisit those decisions in the future. Finally, "Perceivers" leave their options open to entertain new possibilities and processes. Based on these designations, the system presumes that there are 16 basic personality categories, each with a distinct profile of characteristic behavior patterns.

Three standard scores are averaged to compute an Overall Achievement score. Mean scores on the WRAT are 100 with a standard deviation of 15. Thus, approximately 68% of test-takers would be expected to score between 85 and 115, or ± 1 standard deviation unit. About 95% of test-takers would score between 70 and 130; ~99.7% would score within 3 standard deviations of the mean score, or between 55 and 145. Based on these data, percentile scores and grade levels may be imputed.


Reliability, in regards to psychological testing, is the extent to which the test is repeatable and yields consistent scores. All measurement procedures have potential for error. Thus, one aim in the development of a psychological test is to maximize reliability by minimizing the measurement error. Many types of reliability may be assessed including test-retest reliability, alternate forms, split-half reliability, inter-rater reliability, and internal consistency. Regardless of the type of reliability assessed,.80 is generally considered an acceptable benchmark for development of psychological tests.

The reliability of all four MBTI scales typically exceeds.70. Other studies have reported the reliability of the SN and TF scales to range from.67 to.85 (Harvey, 1996)

Reliability ranges from.85 -.95 over the different WRAT test forms. Correlations among the raw scores of the three subtests are all very high (.98). Test-retest reliability ranges from.91 to.98. Median alternate forms reliability are above.89, and test-retest reliability is.91 or better. The standard error of measurement is 5 for Reading and Spelling and 6 for Spelling.


Validity is the extent to which a test measures what it is supposed to measure. In order to ensure validity, a test must be reliable. However, reliability does not ensure validity. Several types of validity may be measured including face validity, construct validity, criterion validity, convergent validity, and discriminant validity. Validity coefficients of >.8 are generally considered acceptable.

Since the MBTI is based on personality theory, validity must be evaluated according to its ability to demonstrate relationships and predict outcomes based on that theory. Carlyn (Sipps, Alexander, & Friedt, 1985)

Serious statistical drawbacks exist with the use of dichotomous scores. Assuming a bivariate normal distribution, artificially dichotomizing one continuous variable at the mean reduces the explained variance accounted to.64. If both continuous variables are dichotomized at the mean, the amount of explained variance declines to.40. Thus, artificial dichotomization reduces the power of these analyses dramatically.

Extensive validity evidence is supplied by type distribution tables that reveal differing type proportions across occupations that are consistent with type theory. Numerous correlations between the MBTI scales and various interest, personality and academic measures provide further support (Carlson, 1985). However, because the scales were often simply correlated with other self-report measures collected at the same time, some portion of these correlations may stem from common method variance.

Inter-correlations of the WRAT3 are positively correlated with one another. The range of correlation for the reading-spelling comparison is.81 -.91, reading-arithmetic.54 -.78, and spelling-arithmetic.58 -.82. Reading and Spelling correlate.65 to.72, and Vocabulary.64, with WISC III Verbal IQs. Arithmetic correlates.65 to.74 with WISC III FSIQ, VIQ, and PIQ. Correlation with Arithmetic is only.66 with WISC III and lower with WAIS III. Correlations to other achievement tests are in the.50s to.70s (California Achievement Test and Stanford Achievement Test) and.60s to.80s (California Test of Basic Skills).


Normative data for the MBTI was gathered from 3200 adults aged 18 years and older from across the United States. The distributions of age, gender, and ethnicity were matched to the distribution of these variables from the 1990 United States Census.

Restandardization of the third edition of the test began in the early 1990s as a result of the Rasch analysis of item difficulties. It was found that there were many duplications of items in bold forms of the WRAT-R. The WRAT3 was given to a normative sample of 4834 subjects aged 5 to 75 years in the United States, selected by using a stratified sample design controlled for age, regional residence, gender, ethnicity and socio-economic level, which attempted to match the 1990 United States Census data.


Over 2 million people a year complete the MBTI (Sipps et al., 1985). Many have argued that the MBTI is not based on science, but beliefs. Some believe that the profiles may seem to fit any person by confirmation bias and the ambiguity of basic terms.

