-
Notifications
You must be signed in to change notification settings - Fork 851
Fix Gemma3N notebook loading by forcing eager attention #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -456,6 +456,7 @@ | |||||
| " dtype = None, # None for auto detection\n", | ||||||
| " max_seq_length = 1024, # Choose any for long context!\n", | ||||||
| " load_in_4bit = True, # 4 bit quantization to reduce memory\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
| " full_finetuning = False, # [NEW!] We have full finetuning now!\n", | ||||||
| " # token = \"YOUR_HF_TOKEN\", # HF Token for gated models\n", | ||||||
| ")" | ||||||
|
|
@@ -1920,6 +1921,7 @@ | |||||
| " model_name = \"gemma_3n_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", | ||||||
| " max_seq_length = 2048,\n", | ||||||
| " load_in_4bit = True,\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment mentions a "vision tower", which might be confusing in a conversational notebook. For clarity, consider a more general comment about the incompatibility with
Suggested change
|
||||||
| " )\n", | ||||||
| "\n", | ||||||
| "messages = [{\n", | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -456,6 +456,7 @@ | |||||
| " dtype = None, # None for auto detection\n", | ||||||
| " max_seq_length = 1024, # Choose any for long context!\n", | ||||||
| " load_in_4bit = True, # 4 bit quantization to reduce memory\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment mentions a "vision tower", which might be confusing in a conversational notebook. For clarity, consider a more general comment about the incompatibility with
Suggested change
|
||||||
| " full_finetuning = False, # [NEW!] We have full finetuning now!\n", | ||||||
| " # token = \"YOUR_HF_TOKEN\", # HF Token for gated models\n", | ||||||
| ")" | ||||||
|
|
@@ -1920,6 +1921,7 @@ | |||||
| " model_name = \"gemma_3n_lora\", # YOUR MODEL YOU USED FOR TRAINING\n", | ||||||
| " max_seq_length = 2048,\n", | ||||||
| " load_in_4bit = True,\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment mentions a "vision tower", which might be confusing in a conversational notebook. For clarity, consider a more general comment about the incompatibility with
Suggested change
|
||||||
| " )\n", | ||||||
| "\n", | ||||||
| "messages = [{\n", | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -432,6 +432,7 @@ | |||||
| " dtype = None, # None for auto detection\n", | ||||||
| " max_seq_length = 1024, # Choose any for long context!\n", | ||||||
| " load_in_4bit = True, # 4 bit quantization to reduce memory\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment mentions a "vision tower", which might be confusing in a conversational notebook. For clarity, consider a more general comment about the incompatibility with
Suggested change
|
||||||
| " full_finetuning = False, # [NEW!] We have full finetuning now!\n", | ||||||
| " # token = \"hf_...\", # use one if using gated models\n", | ||||||
| ")" | ||||||
|
|
@@ -1896,6 +1897,7 @@ | |||||
| " model_name = \"gemma-3n\", # YOUR MODEL YOU USED FOR TRAINING\n", | ||||||
| " max_seq_length = 2048,\n", | ||||||
| " load_in_4bit = True,\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment mentions a "vision tower", which might be confusing in a conversational notebook. For clarity, consider a more general comment about the incompatibility with
Suggested change
|
||||||
| " )\n", | ||||||
| "\n", | ||||||
| "messages = [{\n", | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -404,6 +404,7 @@ | |||||
| "model, processor = FastVisionModel.from_pretrained(\n", | ||||||
| " \"unsloth/gemma-3n-E4B\",\n", | ||||||
| " load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
| " use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for long context\n", | ||||||
| ")" | ||||||
| ] | ||||||
|
|
@@ -1400,6 +1401,7 @@ | |||||
| " model, processor = FastVisionModel.from_pretrained(\n", | ||||||
| " model_name=\"lora_model\", # YOUR MODEL YOU USED FOR TRAINING\n", | ||||||
| " load_in_4bit=True, # Set to False for 16bit LoRA\n", | ||||||
| " attn_implementation = \"eager\", # Gemma 3N vision tower is incompatible with flex_attention\n", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For consistency with the surrounding code (e.g.,
Suggested change
|
||||||
| " )\n", | ||||||
| " FastVisionModel.for_inference(model) # Enable for inference!\n", | ||||||
| "\n", | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions a "vision tower", which might be confusing in a conversational notebook. For clarity, consider a more general comment about the incompatibility with
flex_attention.