Good models + Bad data = Bad analysis

One of the important aspect of data analytics is the relationship between models and data. Thinking of data as inputs to models, which generate outputs (predictions, trends etc.). Most of the articles in the data science community revolves around models, or algorithms that implement underlying models (random forests, deep learning, etc.). However, there are countless examples of applying good models to bad data, resulting in bad Inference.

The 1936 Election – A Polling Catastrophe

In United States presidential elections of 1936, when Great Depression was 7 years old, The incumbent United States President, Franklin D. Roosevelt, had taken bold steps including his “New Deal.” The “New Deal” included many programs designed to assist Americans struggling under the depression, arguably at the expense of those who were doing better financially.

The Literary Digest, an influential weekly magazine of the time, had begun political polling. They had polled a sample of over 2 million people based upon telephone and car registrations. The results they obtained predicted Landon would win in a landslide with over 57% of the popular vote

However, there was a problem with the sample frame. During the Depression, not everyone could afford a car or a telephone. Those who did were usually wealthier, and therefore less likely to be directly helped by “New Deal” programs. As a result, this group was more likely to disapprove of Roosevelt than the general population.

Discrepancy in Inference

The Prediction: Landon in a Landslide

Landon, 57.1%, Roosevelt, 42.9%

Instead, the actual results gave a very different picture.

Actual Result: Roosevelt Runs Away With It

Roosevelt 60.8%, Landon 36.5%

An incorrect sample frame can destroy a study, regardless of the sample size. The researchers surveyed over 2 million people (today’s typical political survey asks between 500 & 1000 respondents), yet it missed about as badly as possible.

Advertisements

Written by Varun Kumar

Varun works with Microsoft as a Cloud Consultant. He comes with 10+ years of experience into Consultant, Solution Architect, and Delivery Management roles. As a Consultant in Microsoft, his job is to design, develop and deploy enterprise level solutions using Azure, to help organizations to achieve more.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: