Interactive watermark detector

What is LLM watermarking?

LLM watermarking is a technique that slightly modifies how language models generate text, making it possible to detect if text was generated by a specific AI model without visibly changing the text quality.

How to use this demo

Enter a prompt in the top text area to generate watermarked text
The generated text will appear in the second text box
The text will be automatically analyzed to show which tokens (parts of text) were influenced by the watermark
The statistics at the bottom show the detection results
You can also paste any text in the second box to test if it contains a watermark

Detection Methods

Maryland: A token-level detection algorithm that analyzes how unexpected each token is, based on the paper "A Watermark for Large Language Models" by Kirchenbauer et al.

OpenAI: A similar watermarking method inspired by initial reports from OpenAI.

Maryland Z-score: A worse variant of the Maryland detector that uses z-scores for statistical interpretation.

OpenAI Z-score: A worse variant of the OpenAI detector that uses z-scores for statistical interpretation.

Parameters Explained

Detector Type: The algorithm used to detect watermarks in the text. Different detectors perform better in different scenarios.
Seed: The random seed used for watermarking. The detector must use the same seed that was used when generating the text. In a real-world scenario, this would be kept private by the model provider.
N-gram Size: The number of previous tokens considered when choosing "greenlist" tokens. Larger values make the watermark less robust against edits but may improve text quality.
Delta: The bias added to "greenlist" tokens during generation. Higher values make the watermark stronger but might affect text quality. Typical values range from 1.0 to 5.0.
Temperature: Controls randomness in text generation. Higher values (e.g., 1.0) produce more diverse outputs; lower values (e.g., 0.2) make outputs more focused and deterministic.

Understanding Results

Tokens: The total number of tokens in the analyzed text. Tokens are units of text that may represent words, parts of words, or punctuation.
Scored Tokens: The number of tokens that were actually evaluated by the detector (excludes first few tokens that don't have enough context).
Final Score: A measure of how likely the text contains a watermark. Higher scores indicate stronger evidence of watermarking.
P-value: The statistical significance of the detection. Lower values (especially p < 1e-6) indicate strong evidence that the text was watermarked. Values close to 0.5 suggest no watermark is present.

Interactive watermark detector

Watermark Detection Help

What is LLM watermarking?

How to use this demo

Detection Methods

Parameters Explained

Understanding Results

Related Papers

Advanced Parameters