Toggle navigation
Home
▼ Details
Products and pricing
Chart gallery
User stories
Text analytics
CDC NAMCS Library
Blog
Tutorials
Contact
Sign in
Post Editor
← All blog posts
View post
Save
<p><a href="/post/outliers"><img src="/images/blog/2015-07-31/2015-07-31-fish.png" style="width:100%; padding-left: 0px;" alt="logo" /></a></p> <p>A key task in any survey is identifying outliers that can mar an otherwise great analysis. Outliers can arise for many reasons -- honest mistakes, careless entries, or outright bogus answers. Protobi makes outliers stand out so identifying them is as easy as shooting fish in a barrel.</p> <!--more--> <p><h1> Extreme and missing values</h1> Protobi shows a histogram for each element, which makes extreme and missing/blank values stand out.</p> <p>For instance, immediately we can see in the example below that most respondents answered between 1 and 100 patients per week, but 3.1% of respondents answered between 980 and 1000. Click to drill into this value and you can see that these are all values of "999". So we might mark these respondents with a "Yellow flag" (see below).</p> <p>Further, there are many [NA] responses, which may indicate faulty skip logic or that the survey didn't require an answer to this question. So we might mark these respondents with a "Yellow flag" too.</p> <div class="protobi"><div class="protobi element" data-key="PTVOL" >Loading...</div></div> <h1 id="overly-frequent-values">Overly frequent values</h1> <p>In this example here there are an unusual number of respondents with one IP address. In practice this can happen if respondents work at the same organization or have a common broadband provider. Or it can indicate multiple responses from the same respondent. So we might mark these respondents with a "Yellow flag" as well.</p> <p>In this case, the IP address is from a known survey spam bot based in China, even though this is a survey of US doctors. If you click in to this value, we can see that all of the respondents also didn't answer the patient volume question above. So for these respondents we should definitely set a "Red flag".</p> <div class="protobi"><div class="protobi element" data-key="IP" style="margin: 0 auto";>Loading...</div></div> <h1 id="suspicious-response-patterns">Suspicious response patterns</h1> <p>Another common pattern is respondents "flatlining" or giving the same response to a battery of questions, as in the example below. These are easy to spot in Protobi simply by clicking to drill in, as we've done here. And more sophisticated metrics are possible in Protobi. See <a href="https://protobi.com/post/find-and-identify-straightline-respondents">Find and identify straightline respondents</a> tutorial.</p> <p><img src="/images/blog/2015-07-31/2015-07-31-outlier-scale.png" style="width: 100%"/></p> <h1 id="flag-outliers-in-protobi">Flag outliers in Protobi</h1> <p>You can set Yellow and Red flags on respondents in Protobi. Drill into one or more respondents whose answers appear suspect, and click the "Flag" button in the toolbar:</p> <p><img src="/images/blog/2015-07-31/2015-07-31-outlier-flags.png" style="width: 200px; margin: 0 auto;display: block;"/></p> <p>You can layer multiple flags:</p> <ul> <li>Setting a Red flag replaces a Yellow flag.</li> <li>Setting a Yellow flag will not override a Yellow flag.</li> </ul> <p>To turn on the Flag feature, set an ID for the project under the Project Settings button in the toolbar. This field should contain values that uniquely identify each respondent.</p> <p><img src="/images/blog/2015-07-31/2015-07-31-outlier-id-menu.png" style="width: 200px; margin-left: 40px;"/></p> <!--img src="/images/blog/2015-07-31/2015-07-31-outlier-q2.png" style="width: 360px; margin: 0 auto; display: block;"/></script--> <script type="text/javascript" src="/javascripts/lodash.js"></script> <script type="text/javascript" src="/javascripts/backbone.min.js"></script> <script data-main="/javascripts/protobi" src="/javascripts/require.js"></script> <link rel='stylesheet' href='/stylesheets/pure.protobi.css'/> <link rel='stylesheet' href='/stylesheets/protobi.css'/> <style> .protobi.element { display: block; width: 400px; margin: 0 auto; } </style> <script type="text/javascript"> function randomInt(range) { return Math.floor(Math.random()*range); } function generate_ip() { return randomInt(256) + "." + randomInt(256) + "." + randomInt(256) + "." + randomInt(256) } require(['protobi'], function (protobi) { window.tabular = protobi.getTabularInstance(); var rows = []; for (var i = 0; i < 10; i++) { var rnd = 0; for (var j=0; j<2; j++) { rnd += 50 - Math.sqrt(Math.random() * 2500) } rnd = Math.min(rnd,100); rnd = Math.max(rnd,0); rnd = Math.floor(rnd); rows[i] = { PTVOL : rnd, IP: generate_ip() }; } for (var i=0; i<17; i++) rows.push({PTVOL: 999, IP: generate_ip()}) for (var i=0; i<40; i++) rows.push({PTVOL: null, IP: "<div style='background-color:yellow'>146.0.74.205</div>"}) tabular.setData(rows); tabular.getDimension('PTVOL').set({'type':'number', roundby: 20, title: "How many patients did you see in the last week?"}) tabular.getDimension('IP').set({'type':'string', title: "IP Address"}) protobi.bootstrap(tabular); }); </script>
Date
Status
Published
Draft
Slug
edit
Thumbnail
Categories
Manage
Release
Features
Datasets
Surveys
Tips
NAMCS
Applications
Crosstab
Tutorial
Design
Concepts
Segmentation
Examples
Blog Test Category
Delete
Convert to MD