Testing and Measurements for Research Tests Essay

Pages: 4 (1835 words)  ·  Style: APA  ·  Bibliography Sources: 3  ·  File: .docx  ·  Level: College Senior  ·  Topic: Psychology

Testing and Measurements for Research

Tests of reliability and validity: An overview of common methods

Internal Consistency:

When used and when inappropriate

Internal consistency as a test of reliability assesses the consistency of results across or between different tables of results. These might include the responses of two researchers conducting the same test on groups of similar populations. (Trochim 2006, Reliability types). It would be inappropriate when the two or more populations under comparison were 'supposed' to be different, as in the case of a test where one group was the experimental group and the other group was the control group, or where there was a great deal of variation within the population (Trochim, 2006, Types of reliability).


Internal consistency is a useful assessment tool in situations such as when two researchers are comparing the same data sets from material collected in different locations but on similar subjects and when there might be observational bias on the part of the recorders, such as when interviewing subjects. Internal consistency is useful when assessing observational bias in subjective data that is being quantified, as is occasionally the case in social sciences or in marketing research. And finally, any very unusual discrepancy in consistency, unless explainable or expected, should be cause for further review of the data.


Buy full Download Microsoft Word File paper
for $19.77
It is of limited value in studies with many different population groups as the tests may contain many different affecting factors that can create real differences that are valid rather than invalid to the core subject of the study. Some reliable test 'should' produce some internal inconsistencies because of the study's design and assessing the internal consistency in that case is not appropriate.


Application: When used and when inappropriate

Essay on Testing and Measurements for Research Tests of Assignment

The use of split-half reliability is deployed by making a randomized yet sufficiently balanced division of all of the items measured in the study. Then, the two groups are assessed and the total scores are assessed for their correlation. The total score is usually calculated for the correlation for each randomly divided half, not for each subset (Trochim, 2006, Types of reliability).


Split-half reliability is most useful for very large selections of data, so a truly similar yet randomized grouping can be created in the study sample. The test application is reasonably straightforward and easy to perform, provided a truly comparative split-half can be created from the individuals or phenomenon being assessed.


It is not appropriate for small, more diverse and anomalous studies. Also, for studies with many different factors affecting the results it may be inaccurate to look at the overall correlation. A random sampling may not yield a truly similar sampling of individual test subjects in a large but diverse population. Creating two representative groups may be time-consuming in some instances, and does not add to the usefulness of the findings of the study.


Application: When used and when inappropriate

This method of assessment, often used in the natural sciences is used to assess the consistency of a measure from one time to another when several batteries of tests are performed. The test-retest method by definition "assumes that there is no substantial change in the construct being measured between the two occasions" being assessed (Trochim, 2006, Types of reliability).


The amount of time allowed between measures is critical" (Trochim, 2006, Types of reliability). If the same thing is measured twice the accuracy of a correlation between the two observations depends on how much time has elapsed between the two measurement occasions, so to make this method valid, very little time must have elapsed between the two tests. However, many laboratories may have such retesting of procedures already built into the methodology.


Its strengths are also its weaknesses: in terms of the test and retest, "the shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation," so the tests must be able to be performed in rapid succession which may be logistically difficult in some instances (Trochim, 2006, Types of reliability). However, there must also be the allowance as the closer in time of the test and retest, the more similar the factors that contribute to error under the specific conditions of the experiment (Trochim, 2006, Types of reliability)

Parallel and alternate forms

Application: When used and when inappropriate

The parallel forms approach is very similar to the split-half reliability approach, however, in parallel forms reliability the researcher first has to create two parallel form of data to compare, such as creating a large set of questions that address the same construct and then randomly dividing the questions into two sets and giving both sets to the same sample of people. "The correlation between the two parallel forms is the estimate of reliability" and is built into the study design (Trochim, 2006, Types of reliability).


The major difference between split-half and parallel form reliability tests is that in the parallel method the two forms can be used independently of each other and are considered equivalent measures. "With split-half reliability we have an instrument that we wish to use as a single measurement instrument and only develop randomly split halves for purposes of estimating reliability" and the creation of the two split halves has no other use other than as a test, while the parallel forms method can add additional depth to the study or may be already built into the study making it an easy test of reliability (Trochim, 2006, Types of reliability)


As with split-half reliability, the major problem with the parallel approach is that the researcher has to have a large sampling. He or she must "be able to generate lots of items that reflect the same construct. This is often no easy feat. Furthermore, this approach makes the assumption that the randomly divided halves are parallel or equivalent. Even by chance this will sometimes not be the case" (Trochim, 2006, Types of reliability).

Face validity

Application: When used and when inappropriate

The face validity is if a test is if the test seems valid on the surface, in terms of the appearance of the test construct. It is an almost 'if it looks like a duck and quacks like a duck, it is a duck" sort of test. In short, a test is said to have face validity if appears on its surface going to measure what it is supposed to measure (Trochim, 2006, Measurement validity types)


The primary value of this methodology is its ease. For example, a researcher "might look at a measure of math ability, read through the questions" and decide on the basis of this face, subjective assessment that the test is a good measure of math ability (Trochim, 2006, Measurement validity types). On some level, one could argue that all instruments must pass this cursory assessment.


However, "face validity is considered the weakest way to try to demonstrate construct validity because of its subjective nature" (Trochim, 2006, Measurement validity types). To an untrained eye, a test with low or high face validity might not actually function as such. (for example, law schools consider the 'games' problem of the LSAT a good measure of legal reasoning ability, although they appear to be tests of spatial reasoning.)

Content Validity

Application: When used and when inappropriate

Assessing an instrument's content validity means assesses the operation of the test against the relevant content domain for the construct. Purpose-driven studies with a strong sense of focus are essential when using this method of validity. For example, the criteria for studying the effectiveness of different types of teenage pregnancy prevention programs would have a strict definition of what constituted such a program. "Only programs that meet the criteria can legitimately be defined as teenage pregnancy prevention programs" (Trochim, 2006, Measurement validity types).


The approach assumes that the researcher has a clear definition and description of the study's content domain, which is not always the case. On one hand, using content validity forces the researchers to come to a strict definition may improve the validity of the study by forcing them to be clear as to what is being studied, but it can also limit them to a conventional but possibly inaccurate or limited definition of what constitutes self-esteem or intelligence for example.


While content validity "sounds fairly straightforward, and for many operationalizations it will be....for other constructs (e.g., self-esteem, intelligence), it will not be easy to decide on the criteria that constitute the content domain" and may increase the biases inherent in the studies because of the limits of the definition or the vagueness (Trochim, 2006, Measurement validity types).

Criterion Validity

Application: When used and when inappropriate

In criterion-related validity, the researcher examines "whether the operationalization behaves the way it should given your theory of the construct. This is a more relational approach to construct validity" because it assumes the "operationalization should function in predictable ways" in relation to other operationalizations done previously, on which the study is based (Trochim, 2006, Measurement validity… [END OF PREVIEW] . . . READ MORE

Two Ordering Options:

Which Option Should I Choose?
1.  Buy full paper (4 pages)Download Microsoft Word File

Download the perfectly formatted MS Word file!

- or -

2.  Write a NEW paper for me!✍🏻

We'll follow your exact instructions!
Chat with the writer 24/7.

Testing and Two Implications of Tests Research Proposal

Testing and Measurement According to Colom Literature Review

Tests and Measurement in Organizations and Other Settings Research Proposal

Pros of Standardized Testing Research Paper

Testing for Competence Rather Than for Intelligence Research Proposal

View 200+ other related papers  >>

How to Cite "Testing and Measurements for Research Tests" Essay in a Bibliography:

APA Style

Testing and Measurements for Research Tests.  (2008, December 29).  Retrieved September 18, 2020, from https://www.essaytown.com/subjects/paper/testing-measurements-research-tests/1908063

MLA Format

"Testing and Measurements for Research Tests."  29 December 2008.  Web.  18 September 2020. <https://www.essaytown.com/subjects/paper/testing-measurements-research-tests/1908063>.

Chicago Style

"Testing and Measurements for Research Tests."  Essaytown.com.  December 29, 2008.  Accessed September 18, 2020.