forked from apache/spark
    
        
        - 
                Notifications
    
You must be signed in to change notification settings  - Fork 0
 
Open
Description
- generate a synthetic workload
 - initial implementation:
- how to index cache? Serialize the list of tuples [(column, predict), …]
 - how to store the cache items? use the cache() provided by sparksql
 - when executing a query, the cache planner asks logical planner for all the tables, columns and the predicates applied on them, then pass a list of key value pairs to cache manager. cache manager is responsible for inserting callbacks to spark so that the internal results will be materialized.
 
 - next implementation:
- cache the joined tables
 
 - Workflow analyze
- Use cache planner to analyze all the workflow at first
 - Cache planner then start the sparkSQL plans sequentially.
 
 - RDD collector:
- Cache manager need to collect data, maybe from spark
 
 
Metadata
Metadata
Assignees
Labels
No labels