A doubt about data augmentation

Thanks for your nice work, but the detail of data augmentation may have a leakage problem. More precisely, the pseudo-prior items may see the test information ahead of the inference.
```Python

def data_augment(model, dataset, args, sess, gen_num):

    [train, valid, test, original_train, usernum, itemnum] = copy.deepcopy(dataset)
    all_users = list(train.keys())

    cumulative_preds = defaultdict(list)
    for num_ind in range(gen_num):
        batch_seq = []
        batch_u = []
        batch_item_idx = []

        for u_ind, u in enumerate(all_users):
            u_data = train.get(u, []) + valid.get(u, []) + test.get(u, []) + cumulative_preds.get(u, [])

            if len(u_data) == 0 or len(u_data) >= args.M: continue

            seq = np.zeros([args.maxlen], dtype=np.int32)
            idx = args.maxlen - 1
            for i in reversed(u_data):
                if idx == -1: break
                seq[idx] = i
                idx -= 1
            rated = set(u_data)
            item_idx = list(set([i for i in range(itemnum)]) - rated) 

            batch_seq.append(seq)
            batch_item_idx.append(item_idx)
            batch_u.append(u)
```
The user data (i.e. ‘u_data = train.get(u, []) + valid.get(u, []) + test.get(u, []) + cumulative_preds.get(u, [])’) consist of the test data and used for generate the prior data. And the augmented data (i.e. prior data + train data + valid data) training the left-to-right model in the fine-tuning stage and the model to infer the rec result. So both augmented data and the left-to-right model see the test data(leakage of the test data)  ahead of the inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A doubt about data augmentation #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A doubt about data augmentation #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions