Correlation and Rank-Based Inverse Normal Transformation

Last updated on Feb 12, 2020 2 min read

This post is based on Bishara and Hittner's article (2012), introduced to me by Dr. Takashi Yamauchi. Correlation is simple yet one of the most important tools in establishing the relationship between two variables. However, if Pearson's r is used and the data is non-normally distributed, the amount of Type I erros (false positives, i.e. seeing correlation where there is truly no correlation) might increase dramatically. To offset this, the authors recommend a data normalization technique called Rank-Based Inverse Normal transformation (RIN).

This transformation is peformed according to the following formula:

$f (x) = Φ^{- 1} (\frac{x_{r} - \frac{1}{2}}{n}),$

where " $x_{r}$ is ascending rank of $x$ , such that $x_{r} = 1$ for the lowest value of $x$ " (p. 401), $Φ^{- 1}$ is the inverse normal cumulative distribution function and $n$ is the number of observations (sample size). Let's now create a script to calculate it in Python. Note that ppf, percent point function, is an alternative name for the quantile function:

from scipy.stats import norm
import pandas as pd

def rinfunc(ds):
    ds_rank = ds.rank()
    numerator = ds_rank - 0.5 
    par = numerator/len(ds)
    result = norm.ppf(par)
    return result

Let's test it.

import pandas as pd

d = {'one' : pd.Series([10, 25, 3, 11, 24, 6]), 
      'two' : pd.Series([10, 20, 30, 40, 80, 70]),
    'index': ['p','r','o','g','r','a']}  
df = pd.DataFrame(d) 
df.set_index('index', inplace = True)

df['one_transformed'] = rinfunc(df['one'])
df['two_transformed'] = rinfunc(df['two'])

df

	one	two	one_transformed	two_transformed
index
p	10	10	-0.210428	-1.382994
r	25	20	1.382994	-0.674490
o	3	30	-1.382994	-0.210428
g	11	40	0.210428	0.210428
r	24	80	0.674490	1.382994
a	6	70	-0.674490	0.674490

References

Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological methods, 17(3), 399.

Anton Leontyev

Assistant Professor of Psychology & Data Scientist

I am a scientist interested in applyting machine learning, statistics and data visualization techniques to answer political, psychological and economic questions.