In the social sciences, scaling is the process of measuring or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products. Certain methods of scaling permit estimation of magnitudes on a continuum, while other methods provide only for relative ordering of the entities.

See level of measurement for an account of qualitatively different kinds of measurement scales.

## Comparative and noncomparative scaling

With comparative scaling, the items are directly compared with each other (example : Do you prefer Pepsi or Coke?). In noncomparative scaling each item is scaled independently of the others (example : How do you feel about Coke?).

## Composite measures

Composite measures of variables are created by combining two or more separate empirical indicators into a single measure. Composite measures measure complex concepts more adequately than single indicators, extend the range of scores available and are more efficient at handling multiplie items.

In addition to scales, there are two other types of composite measures. Indexes are similar to scales except multiple indicators of a variable are combined into a single measure. The index of consumer confidence, for example, is a combination of several measures of consumer attitudes. A typology is similar to an index except the variable is measured at the nominal level.

Indexes are constructed by accumulating scores assigned to individual attributes, while scales are constructed through the assignment of scores to patterns of attributes.

While indexes and scales provide measures of a single dimension, typologies are often employed to examine the intersection of two or more dimensions. Typologies are very useful analytical tools and can be easily used as independent variables, although since they are not unidimensional it is difficult to use them as a dependent variable.

## Data types

The type of information collected can influence scale construction. Different types of information are measured in different ways. See in particular level of measurement.

1. Some data are measured at the nominal level. That is, any numbers used are mere labels : they express no mathematical properties. Examples are SKU inventory codes and UPC bar codes. (See Nominal scale)
1. Some data are measured at the ordinal level. Numbers indicate the relative position of items, but not the magnitude of difference. An example is a preference ranking. (See Ordinal scale)
1. Some data are measured at the interval level. Numbers indicate the magnitude of difference between items, but there is no absolute zero point. Examples are attitude scales and opinion scales.(See Interval scale)
1. Some data are measured at the ratio level. Numbers indicate magnitude of difference and there is a fixed zero point. Ratios can be calculated. Examples include: age, income, price, costs, sales revenue, sales volume, and market share.(See Ratio scale)

## Scale construction decisions

• What level of data is involved (nominal, ordinal, interval, or ratio)?
• What will the results be used for?
• Should you use a scale, index, or typology?
• What types of statistical analysis would be useful?
• Should you use a comparative scale or a noncomparative scale?
• How many scale divisions or categories should be used (1 to 10; 1 to 7; -3 to +3)?
• Should there be an odd or even number of divisions? (Odd gives neutral center value; even forces respondents to take a non-neutral position.)
• What should the nature and descriptiveness of the scale labels be?
• What should the physical form or layout of the scale be? (graphic, simple linear, vertical, horizontal)
• Should a response be forced or be left optional?

## Comparative scaling techniques

• Pairwise comparison scale - a respondent is presented with two items at a time and asked to select one (example : Do you prefer Pepsi or Coke?). This is an ordinal level technique when a measurment model is not applied. Krus and Kennedy (1977) elaborated the paired comparison scaling within their domain-referenced model. The Bradley-Terry-Luce (BTL) model (Bradley and Terry, 1952; Luce, 1959) can be applied in order to derive measurments provided the data derived from paired comparisons possess an appropriate structure. Thurstone's Law of comparative judgment can also be applied in such contexts.
• Rasch model scaling - respondents interact with items and comparisons are inferred between items from the responses to obtain scale values. Respondents are subsequently also scaled based on their responses to items given the item scale values. The Rasch model has a close relation to the BTL model.
• Rank-order scale - a respondent is presented with several items simultaneously and asked to rank them (example : Rate the following advertisements from 1 to 10.). This is an ordinal level technique.
• Constant sum scale - a respondent is given a constant sum of money, script, credits, or points and asked to allocate these to various items (example : If you had 100 Yen to spend on food products, how much would you spend on product A, on product B, on product C, etc.). This is an ordinal level technique.
• Bogardus social distance scale - measures the degree to which a person is willing to associate with a class or type of people. It asks how willing the respondent is to make various associations. The results are reduced to a single score on a scale. There are also non-comparative versions of this scale.
• Q-Sort scale - Up to 140 items are sorted into groups based a rank-order procedure.
• Guttman scale - This is a procedure to determine whether a set of items can be rank-ordered on an unidimensional scale. It utilizes the intensity structure among several indicators of a given variable. Statements are listed in order of importance. The rating is scaled by summing all responses until the first negative response in the list. The Guttman scale is related to Rasch measurement; specifically, Rasch models bring the Guttman approach within a probabilistic framework.

## Non-comparative scaling techniques

• Borg scale
• Continuous rating scale (also called the graphic rating scale) - respondents rate items by placing a mark on a line. The line is usually labeled at each end. There are sometimes a series of numbers, called scale points, (say, from zero to 100) under the line. Scoring and codification is difficult.
• Likert scale - Respondents are asked to indicate the amount of agreement or disagreement (from strongly agree to strongly disagree) on a five- or seven-point scale. The same format is used for multiple questions.
• Phrase completion scales - Respondents are asked to complete a phrase on an 11-point response scale in which 0 represents the absence of the theoretical construct and 10 represents the theorized maximum amount of the construct being measured. The same basic format is used for multiple questions.
• Semantic differential scale - Respondents are asked to rate on a 7 point scale an item on various attributes. Each attribute requires a scale with bipolar terminal labels.
• Stapel scale - This is a unipolar ten-point rating scale. It ranges from +5 to -5 and has no neutral zero point.
• Thurstone scale - This is a scaling technique that incorporates the intensity structure among indicators.
• Mathematically derived scale - Researchers infer respondents’ evaluations mathematically. Two examples are multi dimensional scaling and conjoint analysis.
• Visual analogue scale

## Scale evaluation

Scales should be tested for reliability, generalizability, and validity. Generalizability is the ability to make inferences from a sample to the population, given the scale you have selected. Reliability is the extent to which a scale will produce consistent results. Test-retest reliability checks how similar the results are if the research is repeated under similar circumstances. Alternative forms reliability checks how similar the results are if the research is repeated using different forms of the scale. Internal consistency reliability checks how well the individual measures included in the scale are converted into a composite measure.

Scales and indexes have to be validated. Internal validation checks the relation between the individual measures included in the scale, and the composite scale itself. External validation checks the relation between the composite scale and other indicators of the variable, indicators not included in the scale. Content validation (also called face validity) checks how well the scale measures what it is supposed to measure. Criterion validation checks how meaningful the scale criteria are relative to other possible criteria. Construct validation checks what underlying construct is being measured. There are three variants of construct validity. They are convergent validity, discriminant validity, and nomological validity (Campbell and Fiske, 1959; Krus and Ney, 1978). The coefficient of reproducibility indicates how well the data from the individual measures included in the scale can be reconstructed from the composite scale.