Confidence interval (95% CI)This defines a range of values within which we are 95% confident the true population effect lies.
Construct validitySee Validity.
Content validitySee Validity.
Correlation coefficientSee Pearson correlation coefficient.
Cronbach's alpha (α)This evaluates internal consistency. It is calculated as the average of all possible split-half reliabilities, so it guards against any unlucky random choice that might occur if a single split-half reliability is calculated. In one sense it would seem that the higher the Cronbach α, the better (i.e. the more internally consistent the items). However, if α is too high, then one inference is that the items are too homogeneous, and there might well be some redundancy (and hence unnecessary burdening of future respondents with more scale items than needed). Therefore the recommended range for acceptable Cronbach α is between 0.7 and 0.9.
Face validitySee Validity.
Homogeneity of itemsTending to be scored in the same way, all fairly high, all middling, or all fairly low.
Internal reliability or consistencySee Reliability.
Item-total correlation (ri(t-1))Used to quantify the homogeneity of responses to items in a scale that purports to measure a specific construct (e.g. anxiety). The standard Pearson correlation coefficient is calculated, in turn, for each item against the total score of the remaining items. The rule of thumb is that if any calculated r is <0.20 then that item would appear to be measuring something different to the rest of the scale, and consideration should be given to deleting it. However concern for content validity might influence retention of an item despite low item-total correlation.
Null hypothesis (NH)A statement, prior to testing, of no effect.
Pearson correlation coefficient (r)The standard parametric linear correlation coefficient (or product moment correlation) quantifying the strength of the relationship between two numeric variables (where r=1 is perfect correlation and r=0 no correlation).
PowerThis term is used here, loosely, as the probability of rejecting the stated null hypothesis on the basis of the study data, when that is in fact the correct decision.
PrecisionAccuracy of estimation possible from the study data (narrowness of confidence interval).
ReliabilityReliability encompasses a number of desirable features for a measurement scale. Most fundamental to these is minimal ‘error’ in responses. ‘Error’ occurs when an impression is given that differs from the truth, for example, if a respondent really does know something, or hold an opinion, but misunderstands the item, and erroneously responds negatively. Or if the respondent accidentally ticks the wrong response. Any process of measurement always runs a risk of some ‘error’, but scale design aims to minimise its probability (e.g. a layout that minimises inadvertently ticking the wrong response, wording that is as widely understood as is possible, and so on). Measurement error can be random (i.e. ‘pure’), such as accidentally and randomly mis-ticking a response, or systematic (occurring differentially, that is, more in some respondents/circumstances, than in others), such as if less-educated respondents tend not to recognise a medical term for something they actually know. Both types of ‘error’ are of concern regarding measurement/diagnostic precision and research power, while the latter is a particular concern in that its occurrence will cause findings to deviate systematically from the truth (bias). Specific types of reliability are defined, including:
Internal reliability or consistency – This is a measure of the homogeneity of items within a scale, the extent to which they are measuring a unitary construct. See also Item-total correlation.
Split-half reliability – The items in a scale are randomly split into two halves and the Pearson correlation calculated between the total scores for each half.
Test-retest reliability – This is a measure of the extent to which the same subject, if reassessed, will give similar responses on the two separate occasions, assuming the aspect being measured has not changed. The challenge is to judge the time interval between test and re-test, long enough to avoid the respondent recalling what they answered before (and reprising), but short enough to ensure that what is being measured has not changed in the meantime (e.g. in the interim the respondent might have watched a TV programme about ovarian cancer signs and symptoms).
Responsiveness to changeA formal process of evaluating whether the scale is responsive to an intervention that would be expected to alter the scores on the scale. Generally this is achieved by random allocation to two groups, with one group being ‘primed’ in some way (or ‘treated’) prior to completion of the scale – in the Simon et al.1 study by being given a leaflet to read – and the scores of the two groups are then compared statistically against a null hypothesis of no effect being detectable.
Sensitivity to changeSee Responsiveness to change.
Split-half reliability (rs-h)See Reliability.
Test-retest reliability (rt-r)See Reliability.
ValidateShow by purpose-designed research that in a specified context a scale/questionnaire that has been developed provides meaningful data about the health aspect(s) being ‘measured’, and that these ‘measurements’ are dependable/reliable. See also Validity and Reliability.
ValidityThe characteristic that a scale does measure what it purports to measure. Specific types of validity are defined, some of the many variants being:
Construct validity – The ability of the scale to measure an abstract concept (e.g. ‘family complete’) for which no absolute gold standard measure exists that could be used as reference standard. In such cases validity has to be evaluated indirectly, via a construct (e.g. if ‘family complete’ is true then irreversible contraception will be acceptable). For example, it might involve follow-up to ascertain percentages going on to choose irreversible methods of contraception, with construct validity indicated by greater percentages among those scoring high on the ‘family complete’ scale).
Content validity – Expert judgement that the scale includes all aspects it should, given its aim, and does not include items addressing other (distinct) aspects. See also Face validity.
Face validity – A special form of content validity, as assessed by experts.