Correlation and Rank-Based Inverse Normal Transformation

This post is based on Bishara and Hittner's article (2012), introduced to me by Dr. Takashi Yamauchi. Correlation is simple yet one of the most important tools in establishing the relationship between two variables. However, if Pearson's r is used and the data is non-normally distributed, the amount of Type I erros (false positives, i.e. seeing correlation where there is truly no correlation) might increase dramatically. To offset this, the authors recommend a data normalization technique called Rank-Based Inverse Normal transformation (RIN).

This transformation is peformed according to the following formula:

\[ f(x) = \Phi^{-1} \Big(\frac{x_{r} - \frac{1}{2}} {n} \Big), \]

where "$x_{r}$ is ascending rank of $x$, such that $x_{r} = 1 $ for the lowest value of $x$" (p. 401), $\Phi^{-1}$ is the inverse normal cumulative distribution function and $n$ is the number of observations (sample size). Let's now create a script to calculate it in Python. Note that ppf, percent point function, is an alternative name for the quantile function:

from scipy.stats import norm
import pandas as pd

def rinfunc(ds):
    ds_rank = ds.rank()
    numerator = ds_rank - 0.5 
    par = numerator/len(ds)
    result = norm.ppf(par)
    return result

Let's test it.

import pandas as pd

d = {'one' : pd.Series([10, 25, 3, 11, 24, 6]), 
      'two' : pd.Series([10, 20, 30, 40, 80, 70]),
    'index': ['p','r','o','g','r','a']}  
df = pd.DataFrame(d) 
df.set_index('index', inplace = True)
df['one_transformed'] = rinfunc(df['one'])
df['two_transformed'] = rinfunc(df['two'])

df
onetwoone_transformedtwo_transformed
index
p1010-0.210428-1.382994
r25201.382994-0.674490
o330-1.382994-0.210428
g11400.2104280.210428
r24800.6744901.382994
a670-0.6744900.674490

References

  1. Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological methods, 17(3), 399.
Avatar
Anton Leontyev
Assistant Professor of Psychology & Data Scientist

I am a scientist interested in applyting machine learning, statistics and data visualization techniques to answer political, psychological and economic questions.