Little Data vs. Big Data: Nine Types of Data and How They Should Be Used
How today's overwhelming volume of data is making it difficult for marketers and advertisers to know which information is significant and which is pure noise
How different types of data should be used and for what purposes
How small amounts of data, if correctly obtained and properly analyzed, can provide better marketing insight more cost-efficiently.
You are not feeling well, so you visit your friendly family doctor. He puts you in a new electronic scanner and generates 28 trillion measurements of your temperature all over the surface of your body. He then saves all of these measurements and, using advanced statistical algorithms and supercomputers, announces that your temperature is 98.6 degrees Fahrenheit. What a relief! Big Data to the rescue.
As the Big Data bandwagon picks up momentum, consultants, professors, conference organizers, authors, magazines, blogs, software firms, pundits, crooks, private equity firms, and computer hardware manufacturers are clamoring to get aboard. Rarely has a bandwagon attracted so much attention or so many passengers.
The basic premises of Big Data appear to be the following:
More data are always better than less data.
Volume, variety, and velocity of data create new sources of potential knowledge and prescience.
With Big Data, all questions can be answered: The "why" will finally be revealed to the human race, and the future can be accurately predicted.
Is Big Data an accurate picture of the future, or is it simply a mirage shimmering in the distant desert heat? Is it the pathway to ultimate truth, or is it only a bandwagon of exaggerated promises and illusory dreams?
The truth is that the solution to marketing and business problems—and the identification of strategic opportunities—often lies in the realm of Little Data, not Big Data. You don't have to boil the ocean to determine its salt content. You don't have to eat the whole steer to know it's tough.
The Limits of Data
The preponderance of business data—indeed, all data—in the world is historical data, or "tracking" data, such as financial data, sales data, customer behavioral data, weather data, and inventory data. Virtually all data tend to be backward-looking, analogous to looking in the rearview mirror to steer a car forward.
No matter how current or instantaneous data are (i.e., the velocity), or no matter the sheer amount of data, the backward-looking bias is an omnipresent limitation. We might see trends in that data that give us an inkling of the near-term future, and we might be able to find out what has driven a firm's success in the past, but most historical data are of limited value in predicting the future.
Data You Can Trust
Often, without thinking, we tend to see all data as equal, but rarely is that true. The corporate world is awash in data. It streams in from all directions 24 hours a day, and the data deluge continues to worsen.
In fact, the growing flood of data is part of the problem. More data often means more confusion. Which data are correct? What data can be trusted?
Here's a point of view on the trustworthiness of various types of data, ranked from most trustworthy to least.
1. Experimental data
Carefully designed and carefully controlled experiments, conducted by objective third parties who are experts in such experiments, yield the most trustworthy data. Before-after and side-by-side controls are employed, along with sophisticated statistical analyses, to separate the noise from the signal.
2. Survey research data
Scientific research studies, conducted by experienced professionals who are objective third parties, yield trustworthy data. Often the data is experimental in nature. Research design, normative data, mathematical modeling, stimulus controls, statistical controls, historical experience, quality-assurance standards, etc., tend to make the data very precise. Noise tends to be minimal.
3. Marketing-mix modeling data
The creation of an analytical database, the cleansing and normalizing of that data, and the use of multivariate statistics and modeling to isolate and neutralize some of the noise tend to make marketing-mix modeling data better than actual sales data.
The signal in marketing-mix modeling data is more stable, more reliable, and more measurable. This type of data can be valuable in helping companies understand what variables are driving their businesses (is it media advertising, or the number of salesmen, or pricing differentials?), but it generally takes multiple years of data to get maximum value out of marketing-mix modeling.
4. Media-Mix Modeling Data
This is the same concept as marketing-mix modeling, just applied to a different set of variables. The same general rules apply. An analytic database, data cleansing, modeling, and statistics allow the noise in the data to be minimized, so that the effects of various media can be isolated. Again, if combined with controlled experiments, the data and analyses are much more explanatory.
5. Sales Data
Sales data are pretty good, but not perfect, measures of actual sales. But sales are not reliable and valid measures of advertising effectiveness, optimal media spending, product quality, service productivity, competitive activities, etc.
Sales data can only be trusted so far. The economy, competitive activity, the weather, inflation, the vacation cycle, news events, political events, aberrations in inventories and distribution, pricing disturbances, etc., create false echoes and distorted illusions. Sales data are not good measures of cause and effect. Sales are reasonably good measures of what happened, but not why it happened or what forces caused it to happen.
6. Eye-Tracking Data
With steady improvements in measurement equipment and software, the direction the human eye is pointing can be determined with a high degree of accuracy—less than one degree of error in a controlled environment with high-quality equipment. Accordingly, you can generate useful diagnostic information to help understand why a package, website, or advertisement is failing to attract attention or failing to register certain messages or images.
7. Biometric or Physiological Measurements
Galvanic skin response, eye pupil dilation, heart rate, EEG (brainwave) measurements, facial emotions recognition, etc., are very interesting and exciting, and they may one day open portals into the human soul, but for the present these measures are largely speculative and unproven. Some of these measures are reasonably good at tracking arousal, but there's no precise way to know if the arousal is positive or negative without bringing in survey or qualitative research.
8. Communities or Advisory Panel Data
Many large companies have bought into systems that allow them to frequently talk to and survey a small group of target consumers over and over again. Surveys among this group are conducted by various folks in the corporation on a daily or weekly basis. The cost per survey or measurement is relatively low—if the quality of outcomes is not taken into account. Such communities are not truly representative, not randomly chosen, and seldom ever validated. Over time, the risks of conditioning and learning undermine the representativeness of the community, assuming it existed at the outset.
9. Social Media Data
Social media data are very popular in corporate America. The data are comparatively inexpensive, often massive, and real-time (day by day, hour by hour). Many new software tools and systems make analyses of the data relatively easy.
Social media data are, perhaps, most valuable as an early-warning system—of something going wrong, of a competitive initiative, or of an unexpected aberration. Social media data, however, must always be viewed with suspicion and skepticism, for several reasons:
Many product categories and brands are scarcely ever mentioned in social media, making sample sizes too small for data reliability.
Social media comments are influenced by the news cycle, special events, media advertising, promotions, publicity, movies, competitive activity, and television shows (i.e., there is a lot of noise in the data).
Social media data are subject to manipulation. You may think you are following an important trend in the data, only to learn later it was a clever ruse to confuse by a competitor. Increasingly, corporations and other organizations are striving to create social media content and manage social media comments, so the research value of the data is rapidly diminishing.
As social media comments are identified and collected via Web scraping, we almost never know the exact source, the context, the stimulus, or the history that underlie a comment. These unknowns make interpretation risky, indeed. That's why social-media data must be viewed with trepid spirit and jaundiced eye.
Corporate decision-makers often would be better served if they rely on tried-and-true tools and systems from the world of Little Data, rather than illusions from Big Data. Sampling theory teaches that if the sample is random, one can measure the behavior or mood of the whole by talking to very few people.
A sample of 1,500 is sufficient to predict who will win a presidential election. A sample of 200-300 respondents is generally sufficient to predict how much the whole population will like a new product or service. A sample of 200 users can test a new peanut butter in-home for a week, and from that it can be precisely determined whether the product is optimal and what its market share will be once introduced.
Those are examples of Little Data. Survey research is relatively inexpensive, yet very accurate, because professional researchers know the source, stimulus, context, and history—and have tried-and-true measuring instruments, normative data, quality assurance, and controls.
Marketing research can be designed to be forward-looking and predictive, rather than backward-looking. Experienced researchers can create alternative futures and measure the relative appeal of the differing visions of the future. Those researchers can predict the sales volume of new products within narrow tolerances, based on survey research. They can optimize the formulation of a new product via product testing. They can accurately predict the effectiveness of new commercials long before they air. They can measure the size and composition of an industry or category with amazing precision, based solely on scientific sampling and surveys.
All of that research is based on Little Data. The data are derived from random sampling, carefully controlled experiments, and/or scientific surveys. The sample and sampling error are known; the stimulus is known; the questions are known; the context is understood; and the meaning of the answers is known.
Despite the marketing hoopla and the gurus touting Big Data, Little Data often provides a more accurate basis for sound corporate decision-making.