Protobi uses standard commonly accepted statistical methods for identifying statistically significant differences:

  • Z-test for proportions between independent samples
  • T-tests for means between independent samples
  • Chi-square test for crosstabs
  • McNemar’s test for proportions between paired samples
  • Paired T-test for proportions between paired samples

However you can override these methods with the preferred algorithm for your client on a single project or the preferred algorithm for your organization across all projects.

You might think that stats tests in simple crosstabs would be a solved problem by now, with exactly one right answer. But the topic is surprisingly nuanced, with many perspectives, including not just what algorithm to use and when, but even whether “p-values” are answering the right questions.

Common adaptations are whether (and when) to use Fisher’s Exact Test for proportions, approximations of Fisher’s Exact Test for larger samples, etc.

The actual algorithms implemented in Protobi are presented below.

But you can customize these algorithms by pasting the code below in to either a single project, or the global reference project for your organization (See the Organization Admin tutorial for setting enterprise level preferences).

Open “Project settings…” for your project, choose the “Pre-Calculate” tab and paste the following code, and customize as need. The key thing is to return an object with attributes indicating the test type, p-value, and either the z-value or t-value.


Protobi Z-test for proportions of independent samples

/
  Returns p-value for for two side t test of independent sample me
  @param rowKey               e.g. the raw value a categorical distribution, or exact bound of a numeric range
  @param dim2                 Key of other element against which to compare
  @param isDim2SubsetOfThis   Boolean true/false if second dimension’s sample is wholly included in this dimensions’ sample
  @returns {}
 /
protobi.tabular.Distribution.prototype.getTTest = function (rowKey, dim2, isDim2SubsetOfThis) { var m0, m1, m2, v0, v1, v2, n0, n1, n2, t, that = dim2; var MINSAMPLE = options.MINSAMPLE; m1 = this.getMean(); m2 = that.getMean(); v1 = this.getVariance(); v2 = that.getVariance(); n1 = this.getBasis(false); n2 = that.getBasis(false); if (n1 < MINSAMPLE || n2 < MINSAMPLE) return NaN; //http://stats.stackexchange.com/questions/43159/how-to-calculate-pooled-variance-of-two-groups-given-known-group-variances-mean?rq=1 if (isDim2SubsetOfThis) { n0 = n1; m0 = m1; v0 = v1; n1 = n0 - n2; m1 = (m0 n0 - m2 n2) / n1; v1 = ((v0 + m0 m0) n0 - (v2 + m2 m2) n2) / n1 - m1 m1; } t = (m1 - m2) / Math.sqrt(v1 / n1 + v2 / n2); var df = n1 + n2 - 2 var p = (1 - stat.ttable(Math.abs(t), df)) * 2 return { type: ‘t’, value: t, t: t, p: p } } /
Returns the z-value @param rowKey @param dim2 @param isDim2SubsetOfThis @param showMissing @returns {} /

Protobi T-test for proportions of independent samples

  @param rowKey               e.g. the raw value a categorical distribution, or exact bound of a numeric range
  @param dim2                 Key of other element against which to compare
 * @param isDim2SubsetOfThis   Boolean true/false if second dimension’s sample is wholly included in this dimensions’ sample
protobi.tabular.Distribution.prototype.getZTest = function (rowKey, dim2, isDim2SubsetOfThis, showMissing) {
  var f1, f2, n1, n2, z;
  var MINSAMPLE = options.MINSAMPLE;

  f1 = this.getFreq(rowKey);
  f2 = dim2.getFreq(rowKey);

  n1 = this.getBasis(showMissing);
  n2 = dim2.getBasis(showMissing);


  if (isDim2SubsetOfThis) {
    f1 -= f2;
    n1 -= n2;
  }
  if (n1 < MINSAMPLE || n2 < MINSAMPLE) {
    return “!”;
  }

  return stat.ZTest(f1, f2, n1 - f1, n2 - f2)
};