Big data has been a buzzword now for several years, and some will say the concept has disappointed in delivering the revolution to business it has promised. For some, the concept has been empty, abstract and difficult to get a grasp on. For others it is equally as empty because they already have been supporting business decisions with the help of models on sets of big data for years now – regardless of whether the data is actually technically considered “big” or not.
Surveys vs. organic data
The buzz about big data has been somewhat painful for the market research business, because in addition to being an abstract buzzword, it is thought to be posing a threat to the “traditional” market research business as a whole. Why should we want to do scientifically designed surveys, when there is so much free, accessible data about peoples’ behavior out there?
This question pits the two types of data up against each other – “designed” data and “organic” data. Because society has created systems that automatically track transactions of all sorts, data is created “organically” and has become an abundant, accessible and cheap commodity. For example, internet search engines build data sets with every entry, scanners record purchases, and websites capture and store mouse clicks.
In contrast to this the data created by surveys is “designed”. We design questions with a specific purpose in mind and to be representative for a specific target group. Since designed data are created with a pre-specified purpose the ratio of information to data is very high in comparison with organic data. However, it is a known fact that response rates have been decreasing steadily since market research’s birth in the 1930s. Back then survey data collection consisted of face-to-face or postal surveys, and the problem then was not response rates, which were close to 100%, rather the accessibility and to reach sufficient numbers of people for a sample. With the introduction of telephone-based surveys, and later internet panels, it has become easier to reach people but also to reject participation and non-response has become an increasing concern. This puts pressure on the classic sample design – and the natural response is to question how and why we should carry on trying to collect responses from people seemingly reluctant to give them, when we have so much data readily available and which for the most part is collected in a much less intrusive manner.
Designed data | Organic data |
Representative without information gaps but selective | Representative with information gaps but non-selective |
Intrusive | Non-intrusive |
Costly | Cheap |
High information to data ratio | Low information to data ratio |
Information on opinions, aspirations, preferences, actions planned and past actions | Information on transactions, actions, behavior, sentiment. |
Will organic data replace designed data?
It is tempting to think that organic data can and will replace designed data. If organic data can be used to for example prescribe sales effect to specific elements in a marketing mix or to predict the best targeted message in a newsletter to a specific customer, why should you ever have to ask people about anything and rely on their memory, perception and honesty?
The problem is that the organic “big” data is not collecting all behaviors in society, just some. There will be information gaps to fill. Also, when data is a byproduct of human behavior intended to fill a purpose for a particular action (i. e. the credit card payment, the reception of a phone call, or the expression of ‘liking’) and not a particular research question, the data is bound to be incomplete or a bad match to the concept you want to measure. Very often relevant data may exist but is protected in so-called walled gardens or closed platforms that monetize on the data generated on the platform and thus have an interest in restricting access to the data – examples of this are many; Apple and Facebook to mention a few of the most prominent. Paradoxically, although each of us generate enormous amounts of digital data, this data is not necessarily owned by or accessible to us.
Well designed surveys fill information gaps and the combination of data sources will be key
There will still be the need for designed data to fill the information gaps and then we need to piece them together with the masses of organic data. The future belongs to those who are able to combine data sources – organic as well as designed – to produce insights that the data sources alone cannot yield. Norstat has developed technology and services that facilitates this. An example of this is audience tracking where passive tracking of exposure to advertisement is enriched with survey responses on preferences, perceptions and habits, in addition to demographic information. This is relevant for both publishers and media agencies who want to make sure their online advertisement actually has the impact they are hoping for. In Norway we have close cooperation with Schibsted on validating target groups with this technology. We also aid clients in combining their existing data sources with survey data for them to build models based on both demographics and behavior as well as aspirations and preferences. As an example of this we work closely with Bisnode in Sweden on enriching their data to build personas enabling them to provide even more insight to their clients.
High quality survey data remains relevant and necessary. We need to keep investing in the processes and systems that can yield representative and valid survey data, not let the abundance of data devaluate all types of data, including designed data. The continued legitimacy of designed data depends on keeping online panels as representative as possible and honoring the craftsmanship of well-designed surveys to keep the results valid and the information to data ratio high.