Out in the open

Introducing the blog

Now that I’ve managed to kick the blog off with two events posts, I’d like also to introduce the ideas and hopes I have for the blog. As indicated by the name, the main themes will be open knowledge and data science (defining these is a topic for another post). Naturally these stem from my interests and things I have done, of which you can read more at my homepage.

Why did I start a blog? First of all I will use the blog to organize my thoughts related to open knowledge and data science. I have been following and thinking on these topics a lot during my PhD studies, and also writing quite a lot of notes to myself. However, I never had the guts to share much of my thoughts. I regret that now, but it’s never too late to begin. I believe sharing ideas will only lead to their improvement, and there appears to be a lot of other benefits in blogging, as nicely summarised by Drew Convay.

So blogging is a natural step in my pursuit of openness, which started with learning the open source statistical programming language R. It is the de facto language in bioinformatics, which was the main application field in my masters and PhD theses. R benefits from a huge community of people contributing their code in the form of packages and blogging about stuff they have done in the R-bloggers blog feed. When asked for what a newcomer to R should do, R guru David Smith recommends blogging. I’m no newcomer anymore, but I’m still eager to follow the advice and to learn more by writing and sharing.

One goal I wish to achieve is to bring more data scientists aware of the opportunities in the emerging masses of open data. The datasets being opened often concern the society, but unfortunately most social scientists do not posses the skills to take the data into use. Hence there is a huge need and chance for data scientists to start tackling some of the biggest societal issues we are facing with the help of open data.

This is where our rOpenGov project helps to bridge the gap by bringing open data easily available for further analysis. In this blog, I hope I can here from my own part enhance the cross-section of open data science, meaning both openly performed data science, and data science with open data.