Programmatic Content Management course @TampereUniTech, 28 Apr 2015, #ohsiha
"Data Science is statistics on a Mac" -Big Data Borat on Twitter
"Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…" -Dan Ariely on Facebook
"Data science is the process of formulating a quantitative question that can be answered with data, collecting and cleaning the data, analyzing the data, and communicating the answer to the question to a relevant audience." -simply stats
There are multiple definitions from various points of views, all at least partially relevant pieces of the whole.
Important questions:
It is important to understand at least the basics of data science, statistics & algorithms, because
Improved personalised services
Nate Silver predicting the Presidential election results in 2012
Using Satellite Images to Understand Poverty
Using big data to prevent homelessness in New York
Factor analysis of election machine results
Studies at Aalto (former Helsinki University of Technology)
Data scientist at Reaktor
Open tools for open data: Louhos & rOpenGov
Interests: probabilistic (Bayesian) modeling, information visualization, open source/data/science
I like: solving hard problems in various fields, learning and sharing
Read more at Kaupunkifillari: Pyöräily on arkista touhua and check the code at GitHub.
Done by Johan Himberg, read more from Louhos blog!
Data-driven solutions for any business problems
Focus on statistical modeling
Tough, non-standard problems
Open source tools
Consulting!
Individual or team?!
According to interviews and expert estimates, 50-80 % of data scientists' time is spent on handcrafted work (data "wrangling/munging"). -New York Times 18.8.2014
Exploratory data analysis
Probabilistic modeling
Hans Rosling communicating facts about the world
data science tools & products
< statistical methods
< data access & munging
< business case
< getting to production
But! None of these matter unless you have a data-driven mindset…
"Culture eats strategy for breakfast"
Big data is (OCCAM)
Conclusion: Amount of data does not matter, but how it was collected!
Recap: Understand statistics, because
Also
Make your analyses transparent and reproducible! PDF report is not that!
Reproducible R scripts
Open source tools are replacing commercial ones in many DS tasks
Open data sets offer excellent playground for learning new stuff!
Check also iPython notebooks!
Helsinki
Tampere?
Other