ML Data-Pipeline Ingestion - Optimisation for scaling #32
Replies: 3 comments 9 replies
-
Druid is never supposed to be used as source of truth. There are two approaches over here:
In addition, don't generalize an exhaust job on Cassandra. Cassandra query patterns are different and needs to be fine tuned specific to corresponding tables similar to ProgressExhaust or ResponseExhaust. You can just create a ProjectExhaust dataproduct similar to ProgressExhaust. |
Beta Was this translation helpful? Give feedback.
-
@Shakthieshwari which block do Projects and Observations sit under currently? Is it SB Ed ? |
Beta Was this translation helpful? Give feedback.
-
@alok Gupta ***@***.***> , Sure, Will schedule a call in the
next couple of days to discuss the SL capabilities and alignment of the
same to the Sunbird BBs.
Regards
Vijayashree
…On Tue, Sep 27, 2022 at 10:34 AM Alok Gupta ***@***.***> wrote:
couple of points
1. there is nothing ML services in Sunbird. ML (Manage Learn) is an
construct in context to use cases which an adopter might enable using
Sunbird BBs. @Shakthieshwari <https://github.com/Shakthieshwari> - can
you pls add Vijayshree into this thread. I am not able to find her user
name.
2. @Shakthieshwari <https://github.com/Shakthieshwari> - pls request
Vijayshree and Khushboo to initiate a call to discuss and finalize what are
the "new components" SL has been contributing and which BB these components
should be in
—
Reply to this email directly, view it on GitHub
<#32 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASLP6Q7AEEDZBFAFWJATVGTWAJ54DANCNFSM6AAAAAAQTE2QZM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Team,
We are planning to do few ML Data Optimisation for scaling .
JIra Ticket Link :- https://project-sunbird.atlassian.net/browse/OB-57
Problem Statement :- Avoid Deletion of Projects Druid Datasource -> Program Dashboard CSV use this datasource
Reason for Deletion of Datasource :- Since the Status of the project vary every time and druid doesn't support updating a record, We are daily deleting the entire data from druid and re-ingesting the whole data into druid on a daily basis to get the updated status of a submission.
Concern :- Huge Data Handling
Approach(Solution) :- Please check this confluence doc https://project-sunbird.atlassian.net/l/cp/P7nq918u , we have detailed out the design.
Similar to OnDemandDruidExhaustJob, we need to create the OnDemandCassandraExhaustJob Data Product.
Please provide us your @SanthoshVasabhaktula @sowmya-dixit @anandp504 approval and suggestions, if we can go a head on this.
Cc- @aishwaryashikshalokam @Ashwiniev95 @Prateek-slokam @aks30 @kiranharidas187 @vijiurs @snehangsude
Please do the needful at the earliest, as this is very highest priority for the program launch.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions