-
Notifications
You must be signed in to change notification settings - Fork 668
AO3-6067 Uncategorized fandoms in collections not showing #5514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The new fandom method actually would return any fandom, even possible copy of fandom already in fandom_ids
|
I'd like to add a test to make sure that non-canonical fandoms don't get double counted. How exactly do I set up a non-canonical fandom under a canonical fandom in cucumber? After this is done, I shall remove put this as ready for review |
simplified check to know if a fandom is not categorized Co-authored-by: ömer faruk <b585j@transformativeworks.org>
|
Hmm, I seem to not be able to locally replicate the issue with the latest automated testing |
|
That failed after a rerun too, and we suspect it might be because of caching. Can you disable caching for that scenario (like it is done for the first scenario on line 4) |
| And I should see "Papyrus" | ||
| And I should see "Secret Sans" | ||
|
|
||
| @disable_caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try the the collection counts have expired step before interacting with the collections sidebar/the fandom counts instead of disabling caching like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test still fails. It doesn't on my end which makes troubleshooting difficult. The fact it used to work and broke on an unrelated change confuses me.
|
We've discussed this and decided that we can just go with a unit test instead of an integration test. Can you add one for fandoms_controller.rb on fandoms_controller_spec.rb? |
Alrighty. The fandom counting part is not part of the fandom_controller. Would it be more appropriate to add a test to the collection_controller? |
|
Since this line on fandoms_controller counts the fandoms (and the collections_controller doesn't) I thought it would be fine but I don't have a strong preference really. |
|
I decided to slightly change the definition of the search_count helper. Since we're already going through the tags to find the unwrangled one I figured that handling the rest of the fandoms that way should be faster. The test pass now which is good. |
omerfaruk-pseud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
| @collections = @tag.collections | ||
| # if from a collection, only show works from that collection | ||
| if @collection.present? | ||
| @works &= @collection.works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not include works in the subcollections, but the counts on the collection page do include them, so this is a mismatch - I can click on "new fandom (1)" and get to a page with no works. For canonical tags the works from the subcollections are included, so we should do the same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, is there a quick way of having a list of collection from a parent collection? (since I assume child of child are also included)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, child collections can't have children -- subcollections only go one level deep.
| Then I should see "Fandoms (1)" | ||
| And I should see "Television" | ||
| When I follow "Fandoms (1)" | ||
| Then I should not see "Television" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also check TV (2) here, so we're sure that the syn is counted under the fandom?
| When I follow "Fandoms (1)" | ||
| Then I should not see "Television" | ||
|
|
||
| @disable_caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still required or can it be replaced with a well placed the collection counts have expired?
| query.search_results.each do |work| | ||
| work.fandoms.each do |fandom| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to load all works in the collection and all fandoms of those works. This is not great, especially in the context in which this was implemented to be done via elasticsearch: https://otwarchive.atlassian.net/browse/AO3-6116. Since I'm writing this during maintenance and you can't look it up right now: Yuletide has 42 806 works. So we'd be loading all that at once, and then also all the fandom tags of those works.
During the migration of collections to elatiscsearch we tried to optimise a similar query and despite that it was still quite slow: #5234 (comment). This here isn't as bad because it doesn't include bookmarks or the crossover check, but we will also run this code here regularly, so we have higher performance requirements than a one-time task.
Well, the above isn't quite correct: As is, query.search_results will return one page worth of works. If we were to fix that (so that we don't get wrong information), then we'd load all of it.
So unfortunately I think we can't go with this approach. If we are going to get this information, we should get it directly from elasticsearch, maybe with some computation in Ruby, not with additional large database queries.
I believe this issue was marked as blocked by collections in elasticsearch because the original plan there was to include fandom information on the collection, so we could have gotten information via that. However, that plan was changed because it was looking too performance intensive. So now we have to solve this issue here differently.
With the existing information we have on works in elasticsearch, the uncategorized fandoms are included in filter_ids, mixed with a bunch of other tag types. So if we were to use them here, we'd still have to get the tags from the db to get their type. That's slightly better because we don't have to get the works, but still not great.
This may work on the fandoms page, since we already have to get the fandoms there to display them, so if we're clever about it we wouldn't introduce another query. But for the count in the sidebar, that doesn't apply.
In the big picture, we should put the data into elasticsearch in a format we can use well (so doing the computation when indexing), not do lots of computation afterwards to use it. So I'm leaning towards putting this information into elasticsearch over trying to use filter_ids. But that means reindexing etc. So I don't know where to go from here. Do you have an idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While imperfect, an idea could be to combine both methods? That is, if a collection has a lot of work, we could go with the previous approach, but if it has less then we could have a more exact count using the tags?
Another would be to put the uncategorized fandom_ids in the fandom_id field of the query when they're added so we can go with the previous method
Could we add a property to query so that we can fetch only items with an uncategorized tag of some sort? That way we could do two queries (not ideal) then we do one query with fandom_ids for all, then one query that's more involved with the filter_ids when a work has an uncategorized tag then fetch the ones that are uncategorized fandoms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a collection has a lot of work, we could go with the previous approach, but if it has less then we could have a more exact count using the tags?
That's not viable because yuletide is one of the collections where it's especially important that uncategorized fandoms are shown, since it has many of those each year.
Another would be to put the uncategorized fandom_ids in the fandom_id field of the query
We can't modify what is in the field like that, it's used by more code than the counts here, like the facets (the filter sidebar) that expect only canonical tags
Could we add a property to query ...
Something like that is what I mean with "putting the right information into elasticsearch", yes. If we're adding a new property we should add it in a way that is directly useful here (e.g. all fandom tag ids or something), not something that needs another workaround in the queries. But that has the downsides I noted - we'd need to reindex completely, since the mapping changes, plus put a bunch of brainpower into designing that property. Which brings to the end of my last comment: I don't know what we should be doing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some possibilities for that new propriety:
We could add a new property that is all fandom tag ids. This would have the downside of having to parse through the tags to check canonicality, which might be not as useful. Some fandoms have a lot of synonyms so I'm not sure if that's the correct way forward. Plus, this feels like something which would have limited usefulness.
I'm unsure about how true that would be, but could having multiple uncategorized properties be useful for tag wrangling? For instance, this could allow someone to get all of the uncategorized tags or fandoms from all work with a specific fandom. I'm unsure if we can have one property that's like a list of list. So like freeform tags, relationship tags, character tags, which I guess could be more generically a list of list with types as keys. This is what feels the more generic and possibly more justifying a huge reindex.
I'm not full certain what is allowed as return type of elastic search, but this feels like a potential idea?
Finally, I apologize for not having understood what you meant.
| Then I should see "Unrelated Fandom" | ||
| And I should see "Undertale" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also check the counts here
Issue
https://otwarchive.atlassian.net/browse/AO3-6067
Purpose
This makes uncategorized fandoms be listed on collection fandoms page. As a bonus, this also makes the unwrangled tags collection page show only the work in the collection. This modifies the method used to fetch the fandom (being a combination of how it used to work with the fandom_ids field, and how view works, by going through the fandom tags on the work).
Credit
Danaël / Rever ( they / he )
Danaël Villeneuve on jira