We can break up any statistical problem into three steps:
- Data collection and Sampling.
- Data analysis.
- Decision making.
It is well understood that step 1 typically requires some thought of steps 2 and 3. It is only when you have a sense of what you will do with your data, that you can make decisions about where, when, and how accurately to take your measurements.
However, the relevance for step 3 to step 2 is perhaps not understood so well. In many statistics textbooks, the steps of data analysis and decision-making are kept separate: we first discuss how to analyze the data, with the general goal being the production of some inferences that can be applied into any decision analysis.But your decision plans may very well influence your analysis.
Here are two ways this can happen:
- Precision. If you know ahead of time you only need to estimate a parameter to within an uncertainty of 0.1 (on some scale), say, and you have a simple analysis method that will give you this precision, you can just go simple and stop. This sort of thing occurs all the time.
- Relevance. If you know that a particular variable is relevant to your decision making, you should not sweep it aside, even if it is not statistically significant (or, to put it Bayesianly, even if you cannot express much certainty in the sign of its coefficient).
Conversely, a variable that is not relevant to decisions could be excluded from the analysis (possibly for reasons of cost, convenience, or stability), in which case you’d interpret inferences as implicitly averaging over some distribution of that variable.
Reblogged this on SQL Tutorials.
LikeLike