International Conference on Computational Social Science, Helsinki, 11 Jun 2015, #iccss2015

Challenges with (big) data

More data means more complexity.

Data points have dependencies and hierarchies.

Data is noisy and partly missing.

Conclusions based on raw data are often misleading.

raw data

Probabilistic modeling

Helps in handling missing data, uncertainty and dependencies.

Example: Model of regional apartment prices in Finland

raw data

Probabilistic modeling (2)

Makes interesting and reliable findings possible.

Example: Clear urbanisation trend visible

raw data

Probabilistic programming

Automated inference for probabilistic models

  • problem \(\rightarrow\) model \(\rightarrow\) inference \(\rightarrow\) results
  • STAN for full Bayesian statistical inference
model {
    y ~ normal(x, sigma);
    x ~ normal(0, 2);
    sigma ~ uniform(0, 10);
}
  • rapid iterative model development
  • towards big data applications with efficient approximative solutions

Conclusions