Conversation
sim4rec/response/nn_response.py
Outdated
| print("Warning: the historical data is empty") | ||
| hist_data = spark.createDataFrame([], schema=SIM_LOG_SCHEMA) | ||
| # filter users whom we don't need | ||
| hist_data = hist_data.join(new_recs, on="user_idx", how="inner").select( |
There was a problem hiding this comment.
what is going on here? why do you join all new_recs columns? you need to take at least new_recs.select("user_idx").distinct() and do not select(hist_data["*"])) after
There was a problem hiding this comment.
Wow, this is really a huge bug. I think, it is an artifact of one of the intermediate versions, where I tried to work with tables whose user_idx is unique and each row represent the whole itertaion. I'll fix it.
There was a problem hiding this comment.
seems the fix was wrong, see the new suggestion #8 (comment)
sim4rec/response/nn_response.py
Outdated
| print("Warning: the simulator log is empty") | ||
| simlog = spark.createDataFrame([], schema=SIM_LOG_SCHEMA) | ||
| # filter users whom we don't need | ||
| simlog = simlog.join(new_recs, on="user_idx", how="inner").select(simlog["*"]) |
sim4rec/response/nn_response.py
Outdated
| ) | ||
| ) | ||
|
|
||
| # not very optimal way, it makes one worker to |
There was a problem hiding this comment.
need to discuss. you batch id should not influence the partitioning. one partition != one batch and the users are grouped to batches within one partition. do not now how to implement it for now.
There was a problem hiding this comment.
BatchID won't influence the partition, because each batch must consist of the whole interaction history of a specific group of users. I
There was a problem hiding this comment.
ok, i see, just remove the comment
| self.backbone_response_model = None | ||
|
|
||
| def _fit(self, train_data): | ||
| """ |
There was a problem hiding this comment.
pls describe the dataframe format here and for transform. what should be included to properly convert dataframe to the RecommendationData. pls add corresponding docstrings
There was a problem hiding this comment.
It is exactly the same as the simulator logs format. Please give me advice, where I can obtain it's description.
|
Thank you for your contribution! Please, have a look at the comments and add a time measurements to the notebook to show the speed of the main stages of simulation pipeline. |
sim4rec/response/nn_response.py
Outdated
| """ | ||
| Predict responses for given dataframe with recommendations. | ||
|
|
||
| :param dataframe: new recommendations. |
There was a problem hiding this comment.
the param name is not correct, should be new_recs
| print("Warning: the historical data is empty") | ||
| hist_data = spark.createDataFrame([], schema=SIM_LOG_SCHEMA) | ||
| # filter users whom we don't need | ||
| hist_data = hist_data.join(new_recs, on="user_idx", how="semi") |
There was a problem hiding this comment.
If you really want to leave the history of only distinct users from new_recs in hist_data.
| hist_data = hist_data.join(new_recs, on="user_idx", how="semi") | |
| hist_data = hist_data.join(sf.broadcast(new_recs.select("user_idx").distinct()), on="user_idx", how="inner") |
| print("Warning: the simulator log is empty") | ||
| simlog = spark.createDataFrame([], schema=SIM_LOG_SCHEMA) | ||
| # filter users whom we don't need | ||
| simlog = simlog.join(new_recs, on="user_idx", how="semi") |
There was a problem hiding this comment.
same as for hist data...
| simlog = simlog.join(new_recs, on="user_idx", how="semi") | |
| simlog = simlog.join(sf.broadcast(new_recs.select("user_idx").distinct()), on="user_idx", how="inner") |
No description provided.