-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I noticed today that the total size of data in IDC is reported at 184TB in one of the dashboards. I expected ~90TB, based on my earlier queries.
Running the query, I confirmed 184TB is what we see in BQ:
select round(sum(instance_size)/pow(1000,4),3) from `bigquery-public-data.idc_current.dicom_all`It appears that the portal is reporting those numbers too!
I then did a query per-collection:
SELECT collection_id, sum(instance_size)/pow(10,12) as size_TB
FROM `bigquery-public-data.idc_v21.dicom_all`
group by collection_id
order by size_TB descAnd inexplicably, some of the collections that were not supposed to change from the previous release increased in size dramatically! See top-20 collections (see spreadsheet here).
v21:
v22:
This looks like a very serious regression. Could it be that earlier versions of those collections were ingested?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working