-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
I have
my_set.zip.
When I'm using the java code:
import com.tdunning.math.stats.TDigest;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
import java.io.*;
import java.util.stream.StreamSupport;
public class TDigestTry {
public static void main(String[] args) throws IOException {
ClassLoader classLoader = TDigestTry.class.getClassLoader();
File file = new File(classLoader.getResource("my_set.csv").getFile());
Reader in = new FileReader(file);
Iterable<CSVRecord> records = CSVFormat.EXCEL.withHeader().parse(in);
TDigest digest = TDigest.createAvlTreeDigest(20);
StreamSupport.stream(records.spliterator(), false)
.map(record -> new Double(record.get("change rate")))
.forEach(digest::add);
System.out.println(digest.quantile(0.05));
System.out.println(digest.quantile(0.95));
}
}I'm getting the results:
3.0
5.0
But when I'm this code:
from pathlib import Path
import pandas
from tdigest import TDigest
if __name__ == '__main__':
frame = pandas.read_csv(Path(__file__).parents[0].joinpath("resources").joinpath("my_set.csv"))
digest = TDigest()
digest.batch_update(frame["change rate"].values)
print(f"Quantile 0.05 = {digest.percentile(5)};\t\tQuantile 0.95 = {digest.percentile(95)}")I'm getting the results:
Quantile 0.05 = 2.6495903059149586; Quantile 0.95 = 3689686.790917569
How come there's a large difference between between the 0.95 quantiles?
P.S
same results when I use:
for value in frame["change rate"].values:
digest.update(value)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels