What is LLM watermarking?
LLM watermarking is a technique that slightly modifies how language models generate text, making it possible to detect if text was generated by a specific AI model without visibly changing the text quality.
How to use this demo
- Enter a prompt in the top text area to generate watermarked text
- The generated text will appear in the second text box
- The text will be automatically analyzed to show which tokens (parts of text) were influenced by the watermark
- The statistics at the bottom show the detection results
- You can also paste any text in the second box to test if it contains a watermark
Detection Methods
Maryland: A token-level detection algorithm that analyzes how unexpected each token is, based on the paper "A Watermark for Large Language Models" by Kirchenbauer et al.
OpenAI: A similar watermarking method inspired by initial reports from OpenAI.
Maryland Z-score: A worse variant of the Maryland detector that uses z-scores for statistical interpretation.
OpenAI Z-score: A worse variant of the OpenAI detector that uses z-scores for statistical interpretation.
Parameters Explained
- Detector Type
- The algorithm used to detect watermarks in the text. Different detectors perform better in different scenarios.
- Seed
- The random seed used for watermarking. The detector must use the same seed that was used when generating the text. In a real-world scenario, this would be kept private by the model provider.
- N-gram Size
- The number of previous tokens considered when choosing "greenlist" tokens. Larger values make the watermark less robust against edits but may improve text quality.
- Delta
- The bias added to "greenlist" tokens during generation. Higher values make the watermark stronger but might affect text quality. Typical values range from 1.0 to 5.0.
- Temperature
- Controls randomness in text generation. Higher values (e.g., 1.0) produce more diverse outputs; lower values (e.g., 0.2) make outputs more focused and deterministic.
Understanding Results
- Tokens
- The total number of tokens in the analyzed text. Tokens are units of text that may represent words, parts of words, or punctuation.
- Scored Tokens
- The number of tokens that were actually evaluated by the detector (excludes first few tokens that don't have enough context).
- Final Score
- A measure of how likely the text contains a watermark. Higher scores indicate stronger evidence of watermarking.
- P-value
- The statistical significance of the detection. Lower values (especially p < 1e-6) indicate strong evidence that the text was watermarked. Values close to 0.5 suggest no watermark is present.
Related Papers