Skip to content

difference between java implementation and python #47

@idanmoradarthas

Description

@idanmoradarthas

I have
my_set.zip.

When I'm using the java code:

import com.tdunning.math.stats.TDigest;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;

import java.io.*;
import java.util.stream.StreamSupport;

public class TDigestTry {
    public static void main(String[] args) throws IOException {
        ClassLoader classLoader = TDigestTry.class.getClassLoader();
        File file = new File(classLoader.getResource("my_set.csv").getFile());
        Reader in = new FileReader(file);
        Iterable<CSVRecord> records = CSVFormat.EXCEL.withHeader().parse(in);
        TDigest digest = TDigest.createAvlTreeDigest(20);
        StreamSupport.stream(records.spliterator(), false)
                .map(record -> new Double(record.get("change rate")))
                .forEach(digest::add);
        System.out.println(digest.quantile(0.05));
        System.out.println(digest.quantile(0.95));
    }
}

I'm getting the results:

3.0
5.0

But when I'm this code:

from pathlib import Path

import pandas
from tdigest import TDigest

if __name__ == '__main__':
    frame = pandas.read_csv(Path(__file__).parents[0].joinpath("resources").joinpath("my_set.csv"))
    digest = TDigest()
    digest.batch_update(frame["change rate"].values)
    print(f"Quantile 0.05 = {digest.percentile(5)};\t\tQuantile 0.95 = {digest.percentile(95)}")

I'm getting the results:

Quantile 0.05 = 2.6495903059149586;		Quantile 0.95 = 3689686.790917569

How come there's a large difference between between the 0.95 quantiles?

P.S
same results when I use:

for value in frame["change rate"].values:
        digest.update(value)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions