Observational Studies

8. Formal Representation

For any given observational unit i, let

(1) xi = Ti + βi + εi,

where xi is the measurement, Ti is the "true" value, βi is the bias, and εi; is the chance error. The εi is assumed to have been generated at random from a distribution with a mean of zero; on the average, the noise cancels out. It is also, under this simple model, unrelated to Ti or βi. For example, larger values of εi are not more likely when Ti or βi are larger. In short, if βi is not zero, especially if βi is large, there can be substantial bias. There are also problems if εi is large. Then, reliability will tend to be low. Ideally, βi and εi should be small.

There are additional problems if εi is related to βi or Ti. For example, if the size of the bias is related to the size of the "true" value, it can be difficult to obtain a good fix on the bias. Thus, in areas where there is a lot or crime, people may be less likely to report it. They may believe there is no point or that there could be retaliation. One result is that the underreporting of crime can be higher in high crime neighborhoods. Therefore, the bias is not constant. The size of βi depends on the size of Ti. This is a major complication not captured by the simple equation above.

For an observational study the following information should be reported for each measure:

  1. The definition of what is being measured
  2. A formal representation of how the measurement process is assumed to function
  3. Justification for that representation
  4. Any information on validity
  5. Any information of reliability