Title New data sources for statistics: Experiences at Statistics Netherlands
Workshop Workshop 2011
Year 2011

Traditionally, sample surveys are used by National Statistical Institutes (NSI’s) to collect data on persons, businesses, and all kinds of social and economical phenomena. During the last 30 years, however, more and more NSI’s have gradually been replacing these primary sources of information with data available in secondary sources such as administrative registers. The increased use of registers is mainly caused by the wish to decrease the response burden on the data providers as much as possible and the desire to produce statistics of sufficient quality in a cost efficient way.

Apart from administrative registers there are, however, also other types of secondary data sources available in the world around us that could -potentially- provide data of interest for producers of statistics. Nowadays, more and more information is processed and stored by many of the ubiquitous electronic equipment surrounding us. Examples of these are route planners and mobile phones. In addition, the ever increasing use of the internet causes more and more persons (and companies) to leave their digital footprint on the web. All of these sources of information are of potential interest of NSI’s. They could assist in the production of statistics in the same way as registers, but could even provide information describing new social and economical phenomena!

To get a grip on the practical and methodological implications, a number of ‘new’ secondary data sources were studied at Statistics Netherlands. Some of these sources could be used immediately in the traditional statistical process and for some this is clearly more of a future topic. The data sources and their potential use are introduced and described below.

Product prices on the internet:

The internet, especially the World Wide Web, is a very interesting source of pricing information. At the moment, the Consumer Price Index department of Statistics Netherlands collects data from the web on articles such as airline fares, books, cd’s, and dvd’s. This data is collected by statisticians that visit websites, look for the relevant data, and store it into a local database. However, this information could also be automatically collected -at more frequent intervals- by programs that are popularly referred to as web crawlers or internet bots. The results of these studies are discussed.

Mobile phone location data:

The use of mobile phones nowadays is ubiquitous. In the Netherlands, around 92 percent of the people use mobile phones on a regular basis. A lot of those people carry their phone with them and use it many times during the day. Every time someone communicates via his/her mobile phone, the signal is send to a nearby located mobile phone mast. These connections provide a source of location information. Much of the activity associated with handling phone traffic is logged by the mobile phone company for billing information and network optimization. However, this data also potentially provides information on the location and(moving) behaviour of people during the day. We investigated these possibilities.

Twitter text messages:

More and more people are using social media to discuss topics of interest, exchange information and/or contact friends and family. Examples of social media are Facebook, LinkedIn, and Twitter. Especially, Twitter could be a very interesting source of information because much of the text messages are publicly available. To study the content and the usability of the information exchanged by Dutch inhabitants, twitter-messages were collected and analyzed. Our main interest was the classification of the topics discussed in the Netherlands.

Traffic loop and route planner information:

Statistics Netherlands publishes quantified traffic data, among others to support policy makers. Currently this data is predominantly collected from transport companies by questionnaires with a relatively high response burden. For vehicles equipped with a route planner, global positioning data could provide very detailed information on their whereabouts. If the latter data were combined with traffic loop detector information also the traffic flow could be modelled. This could be a good alternative for the route information requested from transport companies. We looked into these possibilities.

For each data sources described above an overview is given of the research into the usability of the information provided by it, the results obtained so far, and of the practical and methodological challenges that lay ahead. Especially the latter were found to be remarkably similar.



Roos, M., Daas, P., Puts, M. (2009) Innovative data collection: new sources and opportunities (in Dutch). Discussion paper 09027, Statistics Netherlands, Heerlen.

Dialogic, Ministry of Economic affairs, Utrecht University (2008) Go with the dataflow! Analysing the Internet as a data source. Report for the Ministry of Economic affairs, version May 13th .

Bethlehem, J. (2010) Statistics without surveys? About the past, present and future of data collection in the Netherlands. Presentation for the 2010 International Methodology Symposium of Statistics Canada, October 26-29, Ottawa, Canada.