Alpha normalize: [0, 1] linear transformation of density ratio to [alpha, 1] range. #20
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Synopsis
The pull request introduces a new function:$^1$  linear transformation.
helpers.alpha_normalize(values: ndarray, alpha: float) -> ndarraythat changes the lower0bound (infimum) of the inputvaluesargument in the range of[0, 1]to[alpha, 1]by applying a nearlyRationale
There are many possible scenarios where the estimated density ratio is further process on the logarithmic scale, such as increasing numerical stability by replacing a product of conditionally independent variates with their sum, or a quotient with the difference, or for the sake of clarity when plotting respective probability density functions. Alpha-relative density ratio estimator yields results in the
[0, alpha^-1]boundary, and as long as0is not in the logarithmic domain, it must be handled prior to applying the logarithmic transformation. Thealpha_normalizefunction does exactly that by applying the following linear transformation to input values in the range of[0, 1]:where
alphais the normalization term, a small real number (technically rational, because it is implemented as a floating point number).The
[0, 1]range where the function is applied has been selected to modify the estimated density ration as little as possible, especially so the upperalpha^-1supremum is not changed, as it would contravene the properties of alpha-relative density ratio estimator. Additionally,log(1)=0on the logarithmic scale might attribute specific qualities in case of some models (Probabilistic Record Linkage, for instance), hence inputvaluesare not modified in (and beyond) this point.Implementation specifics
To preserve vital estimator properties, there are 2 invariants always met by the function results:
numpy.argsort).numpy.nextafteris employed in the direction of0.By virtue of this implementation approach, in extreme cases there is a possibility of output values that:
alpha, orShould it happen, a relevant warning is issued.
DensityRatio.alpha_normalizefunctionThe function simply calls
helpers.alpha_normalizepassing itsalphafield value as the second argument (normalization term). It leads to the[alpha, alpha^-1]boundary of the estimated density ratio values, which transforms to[log(alpha), -log(alpha)]on the logarithmic scale, withlog(1)=0being exactly in the middle. Such range can render some probability density graphs (e.g. Fellegi Sunter log likelihood) really lucid and comprehensible.Closing remarks
Both
alpha_normalizefunctions are supplementary and optional, users can do without them and stick to the rawcompute_density_ratiooutcome. The only place thealpha_normalizefunction has been introduce in the original code flow is thealpha_KL_divergencefunction. Because the divergence numerator makes use ofnumpy.log, should0occur in the estimated density ratio, the invalid-infKullback–Leibler divergence value is returned.