# Observational Studies

## 8. Formal Representation

For any given observational unit i, let

(1) x_{i} = T_{i} + β_{i} + ε_{i},

where x_{i} is the measurement, T_{i} is the "true" value, β_{i} is the bias, and ε_{i}; is the chance error. The ε_{i} is assumed to have been generated at random from a distribution with a mean of zero; on the average, the noise cancels out. It is also, under this simple model, unrelated to T_{i} or β_{i}. For example, larger values of ε_{i} are not more likely when T_{i} or β_{i} are larger. In short, if β_{i} is not zero, especially if β_{i} is large, there can be substantial bias. There are also problems if ε_{i} is large. Then, reliability will tend to be low. Ideally, β_{i} and ε_{i} should be small.

There are additional problems if ε_{i} is related to β_{i} or T_{i}. For example, if the size of the bias is related to the size of the "true" value, it can be difficult to obtain a good fix on the bias. Thus, in areas where there is a lot or crime, people may be less likely to report it. They may believe there is no point or that there could be retaliation. One result is that the underreporting of crime can be higher in high crime neighborhoods. Therefore, the bias is not constant. The size of β_{i} depends on the size of T_{i}. This is a major complication not captured by the simple equation above.

For an observational study the following information should be reported for each measure:

- The definition of what is being measured
- A formal representation of how the measurement process is assumed to function
- Justification for that representation
- Any information on validity
- Any information of reliability