-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Combining RDD of labels and split raw text using zip() in MLlib adaptations leads to
error: Cannot deserialize RDD with different number of items in pair: (89, 86)
Both RDD's have 2 partitions and 379 items within. The current workaround below combines both in order but is less efficient.
# Combine using zip workaround
temp_labels = labels.zipWithIndex().map(lambda x: (x[1], x[0]))
temp_tfidf = tfidf.zipWithIndex().map(lambda x: (x[1], x[0]))
training = temp_labels.leftOuterJoin(temp_tfidf)
raw_label_and_values = training.values()
raw_label_and_values = raw_label_and_values.map(lambda x: (x[0], x[1]))
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels