-
-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Description
Hello, is this the expected behavior?
I'm running the code below, using a composition of groupBy, select and inflate and comparing it to a pivot call, both returning the same result. The first call runs in 0.235 ms while the pivot one runs in 146.8 ms, a 62,000% slower. A call to "toArray" takes 51.27 ms with the groupBy and 34.456 ms using pivot. 48 % faster.
Dataset is a 1.5 Mbytes file containing 27k rows.
const dataForge = require('data-forge');
require('data-forge-fs');
let start = process.hrtime();
const elapsed_time = function(note) {
const precision = 3; // 3 decimal places
const elapsed = process.hrtime(start)[1] / 1000000; // divide by a million to get nano to milli
console.log(process.hrtime(start)[0] + " s, " + elapsed.toFixed(precision) + " ms - " + note); // print message + time
start = process.hrtime(); // reset the timer
}
const df = dataForge
.readFileSync('./data.csv')
.parseCSV({ dynamicTyping: true })
.withIndex((row) => `${row.meeting_id}_${row.item_id}_${row.user_id}_${row.source_id}`)
elapsed_time('parsecsv')
const sintetico = df
.groupBy((row) => `${row.meeting_id}_${row.item_id}_${row.vote}`)
.select((group) => ({
meeting_id: group.first().meeting_id,
item_id: group.first().item_id,
vote: group.first().vote,
stock: group.deflate(row => row.stock).sum(),
}))
.inflate()
elapsed_time('groupBy, select, inflate')
const sinteticoPivot = df.pivot(['meeting_id', 'item_id', 'vote'], {
stock: dataForge.Series.sum
})
elapsed_time('pivot')
const data = sintetico.head(5).toArray()
elapsed_time('groupBy, select, inflate => toArray')
const data2 = sintetico.head(5).toArray()
elapsed_time('groupBy, select, inflate => toArray again')
const data3 = sinteticoPivot.head(5).toArray()
elapsed_time('pivot => toArray')
const data4 = sinteticoPivot.head(5).toArray()
elapsed_time('pivot => toArray again')
These are the outputs:
0 s, 183.236 ms - parsecsv
0 s, 0.235 ms - groupBy, select, inflate
0 s, 146.789 ms - pivot
0 s, 51.270 ms - groupBy, select, inflate => toArray
0 s, 1.200 ms - groupBy, select, inflate => toArray again
0 s, 34.456 ms - pivot => toArray
0 s, 13.261 ms - pivot => toArray again
Is this intended? Should I dig deeper to fix it and make a pull request?
Thanks,
Metadata
Metadata
Assignees
Labels
No labels