Background – FYP24066 WordPress Site

\subsection{Retrieval Augmented Generation (RAG)}
RAG\cite{ragpaper} is a technique used to ground the generation from an LLM to a related textual corpus from a knowledge database to provide domain context, minimize hallucinations, and ensuring data freshness.
without requiring expensive fine-tuning or re-training operations. A RAG system typically includes three main components: a knowledge database, a RAG retriever and an LLM generator.

The knowledge database $\mathcal{D}$ is typically composed of a set of texts collected from various sources, which can include general knowledge (from Wikipedia, news articles, social media, etc.) and domain knowledge for specialized RAG systems.

When given a user query $q$, the retriever generates encoding of the query $f_Q(q)$ with query encoder $f_Q$ and encodings of all document texts from the knowledge database $f_T(t_i), \forall t_i \in \mathcal{D}$ with the text encoder $f_T$. The query encoder $f_Q$ and text encoder $f_T$ are typically trained jointly. Then, the retriever calculates the similarity score between each pair of query and text $S(q, t_i) = Sim(f_Q(q), f_T(t_i))$ to identify the top-k most related documents from the knowledge database. The $Sim$ function measures the similarity between two vectors, and is usually cosine similarity, dot product of the two embeddings.

The top-k similar text documents $Retrieve(q, \mathcal{D})$ retrieved by the retriever are then aggregated in a text prompt and passed to the LLM generator with the user’s query. With the most similar contexts, the LLM then generates texts customized for the specific domain $LLM(Prompt, q, Retrieve(q, \mathcal{D}))$.

Overall, RAG systems augments the LLM with text or domain knowledge grounding and avoids possibly expensive operations including pretraining and fine-tuning. There are other RAG systems optimized for different tasks such as GraphRAG \cite{graphrag} aiming to resolve global sensemaking questions. In this work, we primarily focus on general RAG systems.

% Privacy
\subsection{Privacy and Copyright Protection with Unlearnable Data}

Empirical studies have shown that large language models like GPT may memorize entire chunks of texts seen during training \cite{llmcopyrightviolation}. This raises concerns over the unauthorized exploitation of private and copyrighted data for training commercial models, and threats including data extraction attacks\cite{llmdataextraction}.

To resolve such concerns, \cite{unlearnableexample} proposes means to make data unlearnable by deep learning models. Specifically, in image classification task, a small error-minimizing noise $\delta$ that prevents the model from being penalized by the objective loss function $L$ during training is added to the original private image $x$. The noise is derived by solveing a bi-level optimization problem

$$\arg\min_{\theta} \mathbf{E}{(x, y) \in D} \Big[\min{\delta} \mathcal{L}\big(f_\theta (x+\delta) – y\big)\Big]$$

iteratively with gradient descent for the outer layer and projected gradient descent (PGD) \cite{projectedgradientdescent} for the inner layer. The error imperceptible to human eyes. \cite{textunlearnable} further extends this idea to NLP, replacing the PGD with a word-substitution search approach to accommodate the non-differentiable nature of text token and possible change of text semantics.
% Also manually generating simple patterns; and NLP QA scenario (BERT), not current decoder-only LLMs

The abovementioned methods can effectively make private texts invulnerable to being memorized or learned by large language models in the pretraining and fine-tuning stages; however, these methods are not helpful in the scenario when these texts are collected by the knowledge database, as the retriever retrieves texts related to query and passes them LLMs as context according to the similarity scoring regardless of their value of objective functions.

% Attacks: Poisoning, Text Authorship Attack / Prevent Authorship Leakage
\subsection{Attack Methods to RAG Systems}

To avoid RAG systems from querying and generating from private data, adversarial attack methods \cite{llmattack} are considered to achieve such tasks. Studies have shown that LLMs are vulnerable to data poisoning. \cite{llmpoisoning} shows by poisoning web-scale datasets is possible intentionally introduce malicious examples to a model’s performance.

Attacks towards RAG systems are more relevant to our setting. Despite data poisoning attack techniques for LLMs have been extensively studied, those specific to RAG systems are recent. Zhong et al. \cite{poisonragcorpus} introduces corpus poisoning attacks for RAG systems where a malicious user generates a small number of adversarial passages and maximize similarity with a provided set of training queries; Zou et al. proposes PoisonedRAG \cite{poisonedrag} which formulates knowledge corruption attacks as optimization problems, and by injecting five malicious texts for each target question, the RAG system will answer a target answer selected by malicious user with $90\%$ success rate; Chaudhari et al. proposes Phantom \cite{phantom} which attacks the generation of LLM only when a specific trigger is included in the user’s query by ensuring top-k results by retriever must include the poisoned document; Chen et al. \cite{ragopinionmanipulation} proposes black-box opinion manipulation attack towards RAG systems by training a surrogate model on the obtained retrieval ranking data to approximate the relevance preferences and generating adversarial opinion manipulation samples.

In our setting, as the private data may be crawled and placed into RAG, to avoid making the RAG query and generate from the private data, data poisoning attacks can be useful technique; our setting is fundamentally different as (1) we want to maintain the semantics of original texts shared by users, and (2) we do not aim change the responses of the RAG system when queried questions that are not related to the private document.

% Text Fingerprinting
\subsection{Text Watermarking}

Watermarking is one of the techniques to trace the source or ownership of content without notably changing the original data. Usually, a pair of encoder and decoder is jointly to encode and decode hidden information from data the in certain medium. By adding physically invisible data onto original data, image watermarking like \cite{stegastamp} has been widely used to resolve the image source of not only raw images, but also images generated by models including GAN and Diffusion \cite{ganimagewatermark}\cite{scalablefingerprint}. Text watermarking can also help claim the ownership of text content or identify the malicious users who distribute misleading content like machine-generated fake news issues. Abdelnabi et al. \cite{textwatermarktransformer} introduces an using Transformer; Yang et al. \cite{contextlexicalsubstitutionwatermark} introduces text watermarking by using context-aware lexical substitution; Yang et al. \cite{blackboxllmgenerationwatermark} further proposes a pipeline for any black-box LLM to generate watermarked texts.

Since one of our goals is to minimize the change of the original private data and preserve the original meaning, text watermarking techniques can be useful.