Ability to filter on modality and dataset size #47

shamikbose · 2022-07-13T03:13:15Z

shamikbose
Jul 13, 2022

@davanstrien @cakiki
As someone working on this in their spare time, I think it might be helpful to add labels to the datasets indicating their size and modalities. With limited compute, some people might be able to tackle smaller text datasets whereas others with large compute and time might be able to handle large multimodal datasets. I'd be happy to add the labels to the existing datasets if this is something others would like to see as well

In terms of labels, I was thinking of something along these lines

Label	Size
Small	<500MB
Medium	500MB - 2GB
Large	>2GB

davanstrien · 2022-07-13T15:38:57Z

davanstrien
Jul 13, 2022
Collaborator

That's a great idea. We thought of having a difficulty flag for datasets which could also be partially about dataset size.

@cakiki @albertvillanova @clancyoftheoverflow WDYT -- the difficulty rating is probably a little bit harder to determine objectively. We maybe want to include the size info (where the data is in a repository/easily found) so people have a sense of which datasets are easier to work with. We could then encourage people to use the issue itself for more detailed comments about potential challenges i.e. complex data/APIs to work with.

7 replies

shamikbose Jul 13, 2022
Author

I was thinking in terms of building the dataloaders so participiants could filter in the project, but ^ this would be very helpful too!

davanstrien Jul 14, 2022
Collaborator

I think this definitely makes a lot of sense (I'm regularly doing the dance of trying to fit datasets onto my laptop which have no business being there!).

cakiki Jul 14, 2022
Collaborator

Ah! Apologies for misunderstanding :)

Yes your suggestion makes perfect sense.

shamikbose Jul 28, 2022
Author

Pinging @cakiki @davanstrien to see if this can be implemented given that we're adding a lot of new datasets

davanstrien Jul 28, 2022
Collaborator

Sorry, I didn't get to this yet, I'll try and get this added to the datasets issue form tomorrow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ability to filter on modality and dataset size #47

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Ability to filter on modality and dataset size #47

Uh oh!

shamikbose Jul 13, 2022

Replies: 1 comment · 7 replies

Uh oh!

davanstrien Jul 13, 2022 Collaborator

Uh oh!

shamikbose Jul 13, 2022 Author

Uh oh!

davanstrien Jul 14, 2022 Collaborator

Uh oh!

cakiki Jul 14, 2022 Collaborator

Uh oh!

shamikbose Jul 28, 2022 Author

Uh oh!

davanstrien Jul 28, 2022 Collaborator

shamikbose
Jul 13, 2022

Replies: 1 comment 7 replies

davanstrien
Jul 13, 2022
Collaborator

shamikbose Jul 13, 2022
Author

davanstrien Jul 14, 2022
Collaborator

cakiki Jul 14, 2022
Collaborator

shamikbose Jul 28, 2022
Author

davanstrien Jul 28, 2022
Collaborator