Skip to content

Something went wrong with ingestion of some collections in v22 #124

@fedorov

Description

@fedorov

I noticed today that the total size of data in IDC is reported at 184TB in one of the dashboards. I expected ~90TB, based on my earlier queries.

Running the query, I confirmed 184TB is what we see in BQ:

select round(sum(instance_size)/pow(1000,4),3) from `bigquery-public-data.idc_current.dicom_all`

It appears that the portal is reporting those numbers too!

Image

I then did a query per-collection:

SELECT collection_id, sum(instance_size)/pow(10,12) as size_TB
FROM `bigquery-public-data.idc_v21.dicom_all` 
group by collection_id
order by size_TB desc

And inexplicably, some of the collections that were not supposed to change from the previous release increased in size dramatically! See top-20 collections (see spreadsheet here).

v21:

Image

v22:

Image

This looks like a very serious regression. Could it be that earlier versions of those collections were ingested?

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions