Skip to content

Commit 572e03f

Browse files
fix: fast tokenizer conversion should happen offline (#106)
#### Motivation The server is launched with `HF_HUB_OFFLINE=1` and is meant to treat model files as read-only; however, the fast tokenizer conversion happening in the `launcher` does not follow this (if a `revision` is not passed). This can cause problems if a model in HF Hub is updated and the tokenizer conversion downloads the tokenizer files for the new commit of the model but then the server doesn't download the new model files... the server fails to load because it can't find the model files. #### Modifications - Set `local_files_only=True` with and without the revision arg when doing the fast tokenizer conversion - Set `HF_HUB_OFFLINE=1` in the env as well for good measure - Little refactoring to have the command building be shared #### Result Fast tokenizer conversion in the launcher should never download new files. #### Related Issues - Fast tokenizer conversion added in IBM#48 - Setting `local_files_only` if `revision` is passed: IBM#63 Signed-off-by: Travis Johnson <[email protected]>
1 parent 5b5938e commit 572e03f

File tree

1 file changed

+16
-11
lines changed

1 file changed

+16
-11
lines changed

launcher/src/main.rs

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -870,19 +870,24 @@ fn save_fast_tokenizer(
870870
info!("Saving fast tokenizer for `{model_name}` to `{save_path}`");
871871
let model_name = model_name.escape_default();
872872
let revision = revision.map(|v| v.escape_default());
873-
let code = if let Some(revision) = revision {
874-
format!(
875-
"from transformers import AutoTokenizer; \
876-
AutoTokenizer.from_pretrained(\"{model_name}\", \
877-
revision=\"{revision}\", local_files_only=True).save_pretrained(\"{save_path}\")"
878-
)
873+
let revision_arg = if let Some(revision) = revision {
874+
format!("revision=\"{revision}\", ")
879875
} else {
880-
format!(
881-
"from transformers import AutoTokenizer; \
882-
AutoTokenizer.from_pretrained(\"{model_name}\").save_pretrained(\"{save_path}\")"
883-
)
876+
"".to_string()
884877
};
885-
match Command::new("python").args(["-c", &code]).status() {
878+
let code = format!(
879+
"from transformers import AutoTokenizer; \
880+
AutoTokenizer.from_pretrained( \
881+
\"{model_name}\", \
882+
{revision_arg} \
883+
local_files_only=True \
884+
).save_pretrained(\"{save_path}\")"
885+
);
886+
match Command::new("python")
887+
.args(["-c", &code])
888+
.env("HF_HUB_OFFLINE", "1")
889+
.status()
890+
{
886891
Ok(status) => {
887892
if status.success() {
888893
Ok(())

0 commit comments

Comments
 (0)