Skip to content

Commit 7a42081

Browse files
authored
Limit embedding requests to text samples under 10k (google#155)
The `embeddings-gecko-001` model doesn't seem to like anything over 10k, so filter them out in the tutorial.
1 parent 67c8fd3 commit 7a42081

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

site/en/examples/clustering_with_embeddings.ipynb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -418,6 +418,8 @@
418418
"df_train['Label'] = newsgroups_train.target\n",
419419
"# Match label to target name index\n",
420420
"df_train['Class Name'] = df_train['Label'].map(newsgroups_train.target_names.__getitem__)\n",
421+
"# Retain text samples that can be used in the gecko model.\n",
422+
"df_train = df_train[df_train['Text'].str.len() < 10000]\n",
421423
"\n",
422424
"df_train"
423425
]

0 commit comments

Comments
 (0)