Skip to content

ASCII-LAB/SymMark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

This repository contains the official code implementation of our paper: arXiv: paper

SymMark

Setup

First, create a virtual environment using Anaconda:

conda create -n symmark python=3.10
conda activate symmark

Second, you need to install the necessary dependencies:

pip install -r requirements.txt

Datasets

You can download datasets such as C4 and OpenGen from here, and place them under the ./data directory.

Usage

You can implement our watermark embedding and extraction process by following these steps:

  1. Modify the mode in run.sh to specify whether to embed a watermark (train) or extract it (test).

  2. Customize the watermark by adjusting various hyperparameters in the run.sh file:

    • watermark_type: Specifies the type of semantic watermark to use (S, H, I, or P).
    • dataset_name and dataset_size: Determine the type and size of the dataset.
    • target_model_name: Specifies which target model to use for watermark text generation.
    • attack_method: Used to evaluate the robustness of the watermarking method.
    • text_source: Determines whether the source text is generated by humans or machines.
  3. Run the command bash run.sh to embed or verify the watermark.

Additionally, for watermark stealing attack, please refer to https://github.com/eth-sri/watermark-stealing.

Acknowledgements

Our SymMark framework is based on MarkLLM. We thank the team for their open-source implementation.

About

Accepted to ACL'25 (main)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Other 0.9%