This repository contains the official code implementation of our paper:
First, create a virtual environment using Anaconda:
conda create -n symmark python=3.10
conda activate symmarkSecond, you need to install the necessary dependencies:
pip install -r requirements.txtYou can download datasets such as C4 and OpenGen from here, and place them under the ./data directory.
You can implement our watermark embedding and extraction process by following these steps:
-
Modify the
modeinrun.shto specify whether to embed a watermark (train) or extract it (test). -
Customize the watermark by adjusting various hyperparameters in the
run.shfile:watermark_type: Specifies the type of semantic watermark to use (S,H,I, orP).dataset_nameanddataset_size: Determine the type and size of the dataset.target_model_name: Specifies which target model to use for watermark text generation.attack_method: Used to evaluate the robustness of the watermarking method.text_source: Determines whether the source text is generated by humans or machines.
-
Run the command
bash run.shto embed or verify the watermark.
Additionally, for watermark stealing attack, please refer to https://github.com/eth-sri/watermark-stealing.
Our SymMark framework is based on MarkLLM. We thank the team for their open-source implementation.

