From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

This repository contains the official code implementation of our paper:

Setup

First, create a virtual environment using Anaconda:

conda create -n symmark python=3.10
conda activate symmark

Second, you need to install the necessary dependencies:

pip install -r requirements.txt

You can download datasets such as C4 and OpenGen from here, and place them under the ./data directory.

You can implement our watermark embedding and extraction process by following these steps:

Modify the mode in run.sh to specify whether to embed a watermark (train) or extract it (test).
Customize the watermark by adjusting various hyperparameters in the run.sh file:
- watermark_type: Specifies the type of semantic watermark to use (S, H, I, or P).
- dataset_name and dataset_size: Determine the type and size of the dataset.
- target_model_name: Specifies which target model to use for watermark text generation.
- attack_method: Used to evaluate the robustness of the watermarking method.
- text_source: Determines whether the source text is generated by humans or machines.
Run the command bash run.sh to embed or verify the watermark.

Additionally, for watermark stealing attack, please refer to https://github.com/eth-sri/watermark-stealing.

Our SymMark framework is based on MarkLLM. We thank the team for their open-source implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
data		data
evaluation		evaluation
exceptions		exceptions
img		img
test		test
utils		utils
visualize		visualize
watermark		watermark
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh