-
Notifications
You must be signed in to change notification settings - Fork 619
Schema - Layout decoupling #1327
Conversation
| } | ||
| return new_layout; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wondered what is the difference between function CreateLayout() and CreateDefaultLayout() cause they both have the same parameters passes in and same returned value. Is CreateLayout() only a helper function that is being called in test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And from my understanding, only HYBRID layout type will be persist in pg_layout catalog table, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current transformation infrastructure, the layouts are changed in two contexts:
- Transform a specific
TileGroup. In this case we want to change the layout of thatTileGroupbut not thedefault_layout_of the table (used to allocate newTileGroups). - The
LayoutTunerchanges thedefault_layout_of the entire table based on its tuning mechanism.
I have provided the APIs to support both operations, however, we might later want to change this because currently theInserteronly supports inserts forLayoutType::Row. I believe this can be decided once the tuner is integrated into the system.
Yes, we only need to persist HYBRID layouts because they contain real mapping information. For ROW and COLUMN layouts, you can reconstruct it as long as you know the number of columns.
src/include/storage/data_table.h
Outdated
| } | ||
|
|
||
| column_map_type GetDefaultLayout() const; | ||
| inline void ResetDefaultLayout(LayoutType type = LayoutType::ROW) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comments to clarify under what circumstances the Default LayoutType would be set to COLUMN(Seem like only appears in test cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, currently it is only done in test cases. However, once the tuner is in place, the brain may decide that a particular table is always used for OLAP workloads and hence it might make sense to transform it to COLUMN layout.
src/catalog/table_catalog.cpp
Outdated
| return false; | ||
| } | ||
|
|
||
| valid_layout_objects_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set to be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made this function similar to TableCatalog::InsertColumnObject. My understanding is that this updates the validity of the TableCatalogObject cache and makes sure that subsequent requests are serviced by the cache directly. Was this assumption incorrect? Please let me know if i should change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our logic within TableCatalogObject, we only the parameter valid_table_objects_ when ALL the table objects are inserted into cache. For your case, set valid_layout_objects_ only when all the layout objects from this table is inserted into txn cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| tile->pool->Free(varlen_ptr); | ||
| } | ||
| } | ||
| void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything else has been change inside gc_manager.cpp except formatting and change the name of pass in parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, There is no tile_schemas object in the TileGroup. So it changes the way we check whether or not we are freeing a VARCHAR attribute.
| std::shared_ptr<ColumnCatalogObject> GetColumnObject( | ||
| const std::string &column_name, bool cached_only = false); | ||
|
|
||
| // Evict all layouts from the cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may remember this wrong, but last time we talked you were saying add an extra column in pg_table to record the largest layout_id for each data table.
| txn_manager.CommitTransaction(txn); | ||
|
|
||
| txn = txn_manager.BeginTransaction(); | ||
| auto table = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use table catalog object instead of data table to obtain information like database oid and table oid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@poojanilangekar And thank you for fix the typo in DEFAULT_SCHEMA_NAME! |
| // Drop the default layout. | ||
| auto default_layout_oid = default_layout->GetOid(); | ||
| txn = txn_manager.BeginTransaction(); | ||
| EXPECT_EQ(ResultType::SUCCESS, catalog->DropLayout(database_oid, table_oid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said Persist the HYBRID layouts to the pg_catalog table. The two pre-defined layouts, ROW and COLUMN stores are not persisted.
I may understand this wrong, the word ** Persist** means store this information in catalog table, right? If that is true, why DropLayout(default_layout_oid) returns SUCCESS? Since default_layout is ROW stored, it should not be inserted into pg_layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
I was returning SUCCESS to indicate that the layout has been removed from pg_layout.
The initial default_layout_ of each table is always ROW store. However, it can be changed via CreateDefaultLayout function, wherein it creates a layout, persists it in pg_layout and updates the DataTable. In the case of this test case, I transformed the table to contain a HYBRID layout as the default_layout_ of the table.
Does this test case make sense? Or should I change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it makes sense now. Thanks for your explanation
mengranwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing I found confusing.
pervazea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments added. Will read further. Mostly looks good.
src/catalog/catalog.cpp
Outdated
| CATALOG_SCHEMA_OID, CATALOG_SCHEMA_NAME, pool_.get(), txn); | ||
| system_catalogs->GetSchemaCatalog()->InsertSchema( | ||
| DEFUALT_SCHEMA_OID, DEFUALT_SCHEMA_NAME, pool_.get(), txn); | ||
| DEFUALT_SCHEMA_OID, DEFAULT_SCHEMA_NAME, pool_.get(), txn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well fix:
DEFUALT_SCHEMA_OID to be DEFAULT_....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
| } | ||
|
|
||
| /* | ||
| * @brief create a new layout for a table and make it the deafult if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deafult -> default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
| return ResultType::SUCCESS; | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The headers documenting the functions should be in the .h file (not the .cpp file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
| * @param table_oid the table to which the layout belongs | ||
| * @param layout_oid the layout to be dropped | ||
| * @param txn TransactionContext | ||
| * @return TransactionContext ResultType(SUCCESS or FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete word TransactionContext? Just a ResultType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/table_catalog.cpp
Outdated
| bool TableCatalogObject::InsertLayout( | ||
| std::shared_ptr<const storage::Layout> layout) { | ||
| // Invalid object | ||
| if (layout == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to be nice and permit nullptr or should this be an assertion that it is non-null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the existing TableCatalog convention. I have updated it to check for INVALID_OID as well.
@mengranwo Do you think this is fine or should we change this and all other instances?
src/storage/data_table.cpp
Outdated
|
|
||
| // Check that both tile groups have the same schema | ||
| // Currently done by checking that the number of columns are equal | ||
| // TODO Pooja: Handle schena equality for multiple schema versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schena -> schema
Haven't looked at where / how equality is used yet, but ...
Is equality via same number of columns sufficient? Should equality also check equality of column types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Currently it is only called in tests. I agree it is not sufficient. The reason this code existed earlier is because we didn't support schema changes. So a table always had the same schema and so the old code worked fine. I plan to add additional checks once we have the schema changes in place.
| if (catalog->CreateDefaultLayout(database_oid, table_oid, column_map, txn) == | ||
| nullptr) { | ||
| txn_manager.AbortTransaction(txn); | ||
| LOG_DEBUG("Layout Update to failed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the correct thing to do here?
void UpdateDefaultPartition, so no success/fail returnable at the moment, but the function, I assume, expects success. Therefore throwing an assertion here might be more correct. Failing with just a debug message is probably not ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is going to be a part of the brain code to Tune the table layout. A transaction may fail because it conflicts with other concurrent updates. In an ideal case it must fail because tuning is a background task while the other transaction would be user specified. Since this is a background process, in case of failure, it would come around in the next iteration and attempt the change again.
I am not sure if it would be appropriate to throw an exception when the failure was due to an internal event rather than a user action. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think it is fine not to throw an exception.
This is not your issue, but I'm wondering why the tuner doesn't apply a bit more intelligence. Just re-trying an operation without keeping any stats or ensuring you don't end up re-trying it forever, seems optimistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this function to return a bool to indicate success. Later, @malin1993ml can determine how to handle failures in the Tuner as a part of thebrain. Can you please review the rest of the changes? This needs to be merged in order to support schema changes.
Also the query_logger_test failure is the same as #1307. So I am assuming we can ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pervazea I have changed that part. Please let me know if there are any other changes I need to make.
test/catalog/catalog_test.cpp
Outdated
| // NOTE: everytime we create a database, there will be 9 catalog tables inside | ||
| EXPECT_EQ( | ||
| 11, | ||
| CATALOG_TABLES_COUNT + 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +3 hardcoded number... improvement that now using CATALOG_TABLES_COUNT. Can we fix it all the way by using define constant instead of the 3 or updating CATALOG_TABLES_COUNT if that is the right solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/tuning/layout_tuner_test.cpp
Outdated
| // Identification: test/tuning/layout_tuner_test.cpp | ||
| // | ||
| // Copyright (c) 2015-16, Carnegie Mellon University Database Group | ||
| // Copyright (c) 2015-16, Carnegie Mellon University Databa e Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental deletion of s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/include/catalog/layout_catalog.h
Outdated
|
|
||
| class LayoutCatalog : public AbstractCatalog { | ||
| public: | ||
| // Global Singleton, only the first call requires passing parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we trying to eliminate our use of singletons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was accidental, removed it.
poojanilangekar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengranwo @pervazea Thanks for reviewing the PR. I have addressed all your comments inline. Please take a look at it again and let me know if I need to make any other changes.
| tile->pool->Free(varlen_ptr); | ||
| } | ||
| } | ||
| void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, There is no tile_schemas object in the TileGroup. So it changes the way we check whether or not we are freeing a VARCHAR attribute.
src/catalog/catalog.cpp
Outdated
| CATALOG_SCHEMA_OID, CATALOG_SCHEMA_NAME, pool_.get(), txn); | ||
| system_catalogs->GetSchemaCatalog()->InsertSchema( | ||
| DEFUALT_SCHEMA_OID, DEFUALT_SCHEMA_NAME, pool_.get(), txn); | ||
| DEFUALT_SCHEMA_OID, DEFAULT_SCHEMA_NAME, pool_.get(), txn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
| } | ||
|
|
||
| /* | ||
| * @brief create a new layout for a table and make it the deafult if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| txn_manager.CommitTransaction(txn); | ||
|
|
||
| txn = txn_manager.BeginTransaction(); | ||
| auto table = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
| * @param table_oid the table to which the layout belongs | ||
| * @param layout_oid the layout to be dropped | ||
| * @param txn TransactionContext | ||
| * @return TransactionContext ResultType(SUCCESS or FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/tuning/layout_tuner_test.cpp
Outdated
| // Identification: test/tuning/layout_tuner_test.cpp | ||
| // | ||
| // Copyright (c) 2015-16, Carnegie Mellon University Database Group | ||
| // Copyright (c) 2015-16, Carnegie Mellon University Databa e Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/include/catalog/layout_catalog.h
Outdated
|
|
||
| class LayoutCatalog : public AbstractCatalog { | ||
| public: | ||
| // Global Singleton, only the first call requires passing parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was accidental, removed it.
| // Drop the default layout. | ||
| auto default_layout_oid = default_layout->GetOid(); | ||
| txn = txn_manager.BeginTransaction(); | ||
| EXPECT_EQ(ResultType::SUCCESS, catalog->DropLayout(database_oid, table_oid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
I was returning SUCCESS to indicate that the layout has been removed from pg_layout.
The initial default_layout_ of each table is always ROW store. However, it can be changed via CreateDefaultLayout function, wherein it creates a layout, persists it in pg_layout and updates the DataTable. In the case of this test case, I transformed the table to contain a HYBRID layout as the default_layout_ of the table.
Does this test case make sense? Or should I change it?
src/storage/data_table.cpp
Outdated
|
|
||
| // Check that both tile groups have the same schema | ||
| // Currently done by checking that the number of columns are equal | ||
| // TODO Pooja: Handle schena equality for multiple schema versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Currently it is only called in tests. I agree it is not sufficient. The reason this code existed earlier is because we didn't support schema changes. So a table always had the same schema and so the old code worked fine. I plan to add additional checks once we have the schema changes in place.
| if (catalog->CreateDefaultLayout(database_oid, table_oid, column_map, txn) == | ||
| nullptr) { | ||
| txn_manager.AbortTransaction(txn); | ||
| LOG_DEBUG("Layout Update to failed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is going to be a part of the brain code to Tune the table layout. A transaction may fail because it conflicts with other concurrent updates. In an ideal case it must fail because tuning is a background task while the other transaction would be user specified. Since this is a background process, in case of failure, it would come around in the next iteration and attempt the change again.
I am not sure if it would be appropriate to throw an exception when the failure was due to an internal event rather than a user action. What do you think?
c16f90a to
cf631f4
Compare
pmenon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick aesthetic/style review. Will take a closer look at design soon.
| // Get the tile offset assuming that it is still a row store | ||
| // Hence the Tile offset is 0. | ||
| auto layout = tile_group->GetLayout(); | ||
| PELOTON_ASSERT(layout.IsRowStore()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You probably want to make this a
constreference to avoid the copy. - You'll need to add
UNUSED_ATTRIBUTEto pass Release builds.
src/include/catalog/table_catalog.h
Outdated
|
|
||
| namespace storage { | ||
| class Layout; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close namespace.
|
|
||
| namespace catalog { | ||
| class Schema; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close namespace.
src/include/common/internal_types.h
Outdated
| /* tile_map_type used to store the mapping between a tile and its columns | ||
| * <tile index> to vector{<original column index, tile column offset>} | ||
| */ | ||
| typedef std::map<oid_t, std::vector<std::pair<oid_t, oid_t>>> tile_map_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold that these are common enough types for the whole system that they should be here ... but I guess it's okay.
src/include/storage/data_table.h
Outdated
| void ClearLayoutSamples(); | ||
|
|
||
| void SetDefaultLayout(const column_map_type &layout); | ||
| inline void SetDefaultLayout(std::shared_ptr<const Layout> new_layout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the inline keyword.
src/include/storage/layout.h
Outdated
|
|
||
| namespace catalog { | ||
| class Schema; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close namespace.
src/include/storage/layout.h
Outdated
| // | ||
| // Identification: src/include/storage/layout.h | ||
| // | ||
| // Copyright (c) 2015-18, Carnegie Mellon University Database Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the formatter.py script, it'll add the properly formatted copyright.
src/include/storage/layout.h
Outdated
| Layout(const column_map_type &column_map, oid_t layout_oid); | ||
|
|
||
| /** @brief Check whether this layout is a row store. */ | ||
| inline bool IsRowStore() const { return (layout_type_ == LayoutType::ROW); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove inline.
src/include/storage/layout.h
Outdated
| inline bool IsRowStore() const { return (layout_type_ == LayoutType::ROW); } | ||
|
|
||
| /** @brief Check whether this layout is a column store. */ | ||
| inline bool IsColumnStore() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove inline.
| CreateAndLoadAllColsTable(); | ||
| } | ||
|
|
||
| void ExecuteTileGroupTest(peloton::LayoutType layout_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the base test class functions to accept a LayoutType? Then you should reuse the create/loading functions rather than creating a one-off function. This also allows other tests to use custom table layouts for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed all your other comments except this one.
From what I understood, the PelotonCodeGenTest does not use the LLVM inserter. The LLVM inserter would only insert into LayoutType::ROW.
Can I make that assumption or do we plan to change that later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the tests do not use the inserter, but instead directly insert into the table here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmenon I have modified the CreateTestTables function to get the layout as a parameter. However, since its called from the constructor, so the entire test suit would get called with the given layout. I have however moved the CreateAndLoad function to the base class. Please let me know if you want me to change anything else.
pervazea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I did not read the logic as completely as I would have liked, as there is a lot of code here.
Comments added are all style issues. The main one is that oid_t is used in many places where uint32_t or other flavor of int should be used.
src/gc/gc_manager.cpp
Outdated
| void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group, | ||
| oid_t tuple_id) { | ||
| oid_t tile_count = tile_group->tile_count_; | ||
| oid_t tile_col_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tile_count and tile_col_count should be uint32_t not oid_t.
While use of oid_t will compile and run correctly, it is confusing and incorrect style.
src/include/catalog/catalog.h
Outdated
| const std::string &table_name, std::unique_ptr<catalog::Schema>, | ||
| concurrency::TransactionContext *txn, bool is_catalog = false, | ||
| oid_t tuples_per_tilegroup = DEFAULT_TUPLES_PER_TILEGROUP); | ||
| oid_t tuples_per_tilegroup = DEFAULT_TUPLES_PER_TILEGROUP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oid_t -> uint32_t
src/include/storage/tile_group.h
Outdated
| AbstractTable *table; // this design is fantastic!!! | ||
|
|
||
| // number of tuple slots allocated | ||
| oid_t num_tuple_slots; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
src/include/storage/tile_group.h
Outdated
|
|
||
| // number of tiles | ||
| oid_t tile_count; | ||
| oid_t tile_count_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
|
|
||
| storage::Tile *tile = tile_group->GetTile(tile_itr); | ||
| const catalog::Schema &schema = *(tile->GetSchema()); | ||
| oid_t tile_column_count = schema.GetColumnCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
| double GetLayoutDifference(const storage::Layout &other) const; | ||
|
|
||
| /** @brief Returns the tile_id of the given column_id. */ | ||
| oid_t GetTileIdFromColumnId(oid_t column_id) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be oid_t because the function returns a tile_id.
| oid_t GetTileIdFromColumnId(oid_t column_id) const; | ||
|
|
||
| /** @brief Returns the column offset of the column_id. */ | ||
| oid_t GetTileColumnOffset(oid_t column_id) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be oid_t because it's the column_oid within the Tile.
src/include/storage/layout.h
Outdated
| oid_t GetTileColumnOffset(oid_t column_id) const; | ||
|
|
||
| /** @brief Returns the number of columns in the layout. */ | ||
| oid_t GetColumnCount() const { return num_columns_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
src/include/storage/layout.h
Outdated
| /** @brief Sets the tile id and column id w.r.t. of the tile corresponding | ||
| * to the specified tile group column id. | ||
| */ | ||
| void LocateTileAndColumn(oid_t column_offset, oid_t &tile_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misnamed? More correctly something more like SetTileAndColumn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry! This function was previously a part of TileGroup, I copied the comments as is.
I have changed it now. Essentially, we are using references here because we need to return both the tile_id and tile_column_id.
src/storage/layout.cpp
Outdated
| PELOTON_ASSERT(num_columns_ > column_offset); | ||
|
|
||
| // For row store layout, tile id is always 0 and the tile | ||
| // column_id and tile column_id is the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment text is unclear...
and the tile column_id and tile column_id is the same
pmenon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The design is nice, code is pretty well documented. I just had a few questions below.
src/catalog/catalog.cpp
Outdated
| bool result = | ||
| pg_layout->InsertLayout(table_oid, new_layout, pool_.get(), txn); | ||
| if (!result) { | ||
| LOG_DEBUG("Failed to create a new layout for table %u", table_oid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be LOG_ERROR?
| new const storage::Layout(column_map, layout_oid)); | ||
|
|
||
| // Add the layout the pg_layout table | ||
| auto pg_layout = catalog_map_[database_oid]->GetLayoutCatalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but this map seems to be accessed concurrently without a lock in Catalog::BootstrapSystemCatalogs()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/include/storage/layout.h
Outdated
| /** @brief used to store the mapping between a tile and its columns | ||
| * <tile index> to vector{<original column index, tile column offset>} | ||
| */ | ||
| typedef std::map<oid_t, std::vector<std::pair<oid_t, oid_t>>> tile_map_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to rename this to something more descriptive? Maybe TileToColumnMap?
| TileGroup(BackendType backend_type, TileGroupHeader *tile_group_header, | ||
| AbstractTable *table, const std::vector<catalog::Schema> &schemas, | ||
| const column_map_type &column_map, int tuple_count); | ||
| std::shared_ptr<const Layout> layout, int tuple_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're passing in a shared pointer, does that mean this can be null? If not, should we make it a reference?
Also, in terms of ownership, does the TileGroup own the layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layout is actually shared among multiple TileGroups which are initialized with the same layout. We would want the ptr to be alive as long as we have atleast one TileGroup with the same.
To make sure a TileGroup always get a non-null layout, I am now throwing an exception in the TileGroupFactory if it passed a nullptr. Let me know if you want me to handle this differently.
d2ce8d2 to
d4e151f
Compare
* Added layout.h. Modified Runtime functions to handle row stores. * Modify TileGroup api * Changed Updater & Inserter to use Layout object * Remove calls from optimizer and executor * Removed all calls to schemas vector from TileGroup * Removed Calls to LocateTileAndColumn for TileGroup * Moved definition of column_map_type. Removed column_map from tile_group * Added GetInfo function * make check pases. Need to change a few more function calls * Convert PL_ASSERT -> PELOTON_ASSERT * Modified DataTable and LayoutTunerTest * Added Layout Test for codegen * Get the tests to actually use different layouts * Modified storage layer, builds successfully. Need to fix tests * Moved GetColumnMapStats -> GetColumnLayoutStats * Minor change in TableScanTranslator * Modify TransformTileGroup * Change layout.cpp to better handle HYBRID layouts * use std::make_shared * Modified tests to change calls to GetTileGroup() * make check passes. Need to add test with hybrid layout. * Modify print functions * Add TODOs for catalog * Added LayoutCatalog. Yet to modify TableCatalog * Added catalog functions * Added DeleteLayouts to delete all layouts of a table_oid * LayoutTunerTest fixed * Fix build failures * Access LayoutCatalog via the global Catalog object * Added Multi layout scan test * Fixed tests after catalog refactor * Failing LayoutCatalog test because of deserialization * Fix all tests * Added documentation * Style Fix * Address review comments + minor clean-up in the layout.h API * Change valid_layout_objects_ * Revert unused changes * Addressed Prashanth's initial comments * Modify CreateTable for tests + modify LayoutTuner * Address Pervaze's review comments. * Changed after Prashanth's review
This PR decouples the logical schema from the physical layout of the data. Previously, each TileGroup contained a vector of schemas for which violates the notion of the storage layer being a physical store. We also had cases where the
LayoutTunermodifies thecolumn_mapwhich ended up modifying theschemasvector via theDataTable. There were multiple other instances where theSchemawas used to get the layout of aTileGroupwhile all we needed was the physical mapping between column_ids and a <tile_id, column_offset> pair. This PR now makes it possible to make schema changes like changing column name, adding & dropping columns without making any calls to the storage layer (i.e., It can now be done entirely in the catalog).Changes:
LayoutandLayoutCatalogclasses to store the layout in a physical object and thepg_layout.default_layotut_object to each table (replacingdefault_partition) andtile_group_layout_to eachTileGroup.HYBRIDlayouts to thepg_catalogtable. The two pre-defined layouts,ROWandCOLUMNstores are not persisted.Layouton a perTileGroupwhile invoking theTableScanTranslator.TileGrouphas aLayoutType::ROWlayout.LayoutTunerto tune theLayoutrather than thecolumn_map. Additionally, the tuning decisions are now transactional.pg_catalogto theSystemsCatalogto ensure that the each layout is stored in a catalog with respect its database.Testing:
LayoutTunertest to ensure it works as expected after the refactor.TileGroupLayoutTestandSeqScanTestwhich ensures that the old engine works as expected.TableScanTranslatorTeststo replicate the tests inTileGroupLayoutTestandSeqScanTest. The tests check that the LLVM engine can perform scans on tables different layouts and on tables where eachTileGrouphas a differentLayout.CatalogTeststo ensure that thepg_layoutstores and retrieves data as expected and its state is transactional.Disclaimer:
Tilestill contains aSchemawhich will be used by the old engine, theGetValueandSetValuefunctions and (currently) the LLVM engines to determine the stride in case of aHYBRIDlayout. I believe we have to continue supporting theGetValueandSetValuefunctions even after the old engine is deprecated because we will be needing these interfaces in theCatalogand possibly theOptimizer.LogicalTilegenerated by the executor engine, I had to reconstruct theSchemas. This isn't very efficient but the idea is that it is currently only used in tests and will soon be deprecated.ColumnCatalog,StatsCatalog), theLayoutCatalogdirectly returns aLayoutrather than aLayoutCatalogObjectsince the API are designed to be read only. In essence, the layout is immutable because if you were changing the layout, you need to change the physical representation. So it felt redundant to create a read only object to be returned by the catalog.@mengranwo Can you please review the catalog changes?
@pmenon Can you please review the changes to the LLVM engine? Mainly,
RuntimeFunctions,Inserter,Updater,TestingCodegenUtilandTableScanTranslatorTests.@pervazea Can you please review the overall changes? I have modified a few APIs in the old engine. The current code doesn't break the existing tests and it seems to function as expected. But please let me know if you think there is anything I need to change.
@jarulraj I know you may not have the bandwidth for this, but if possible could you please review the changes to
LayoutTunerandLayoutTunerTest. Nobody else has modified that code and it would be great if you could take a look.