Issues when running the code

Hi! I really like the work you did! I was trying to run your code, but encountered some issues. I would appreciate it if you could help me resolve that 

1. To extract harmfulness vectors, for Llama3, I run: 
```
python src/extract_hidden.py --model llama3 \
	--harmful_pth data/advbench.json \
	--harmless_pth data/alpaca_data_instruction.json \
	--output_pth output/llama3_harmful_vector.pt
```
I also changed path to Llama3 model in the code from local path to 
```
model_path = "unsloth/Meta-Llama-3.1-8B-Instruct"
        model = AutoModelForCausalLM.from_pretrained(
            model_path,
            device_map="auto",
        )
        tokenizer = AutoTokenizer.from_pretrained(
            model_path,
            cache_dir='../cache',
        )
```
, is that correct? I used this Llama-3 version as it is used in your inference script. 

But, when src/extract_hidden.py with the above code, I get an error:
```
File "<home_dir>/LLMs_Encode_Harmfulness_Refusal_Separately/src/../src/extract_hidden.py", line 116, in hook_fn
    context = activation[:, -len(positions)-step:-len(positions), :]
              ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: too many indices for tensor of dimension 2 
```

So I fixed it by adding .unsqueeze(0) to line 109 (activation = output[0].half().unsqueeze(0)). Is that correct?

2. I then tried to run intervention.py using computed harmfulness vectors as follows: 

```
python -u ../src/intervention.py \
	--test_data_pth "../data/advbench.json" \
	--output_pth "output/infer/llama3-advbench-2.json" \
	--intervention_vector "output/llama3_harmful_vector.pt" \
	--reverse_intervention 1 \
  	--intervene_context_only 0 \
	--arg_key_prompt 'instruction' \
	--model "llama3" \
	--left 0 \
	--right 50 \
	--layer_s 12 \
	--layer_e 13 \
	--coeff_select 2 \
	--max_token_generate 100 \
	--max_decode_step_while_intervene 1 \
	--model_size "7b" \
  	--use_inversion 1 \
	--inversion_prompt_idx 1
```
I make sure I use the same unclothe Llama3 model in intervention.py. However, the model still responses something like "This is harmful instruction, I cannot provide...". If I set --coeff_select 5, the model sometimes starts outputting "Certainly" in the beginning, and then says that it can't provide instructions. 

Could you suggest what I'm doing wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues when running the code #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues when running the code #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions