Monday, February 02, 2009

Large and small data set modelling

The problem of modelling with large and small data sets is bascially the problem between population and sample. Most of statistics is based on the idea that we only have a sample of the population. However, with new data collection methods we bscially get the whole lot. This has led to a loss of sophistication in models, since simple hypothesis can be tested by aggregating the data in a table (say). However, I think it's important that sample models stay with us because they can provide insightful statistics (as in statistic) which tell us something about the data.

