RM3 subsection

CodingTil · CodingTil · commit e65681e746ba · 2023-10-18T11:36:04.000+02:00
diff --git a/report/main.bib b/report/main.bib
@@ -87,3 +87,38 @@ @article{thakur2021beir
   journal={arXiv preprint arXiv:2104.08663},
   year={2021}
 }
+
+@article{vaidyanathan2015query,
+  title={Query expansion strategy based on pseudo relevance feedback and term weight scheme for monolingual retrieval},
+  author={Vaidyanathan, Rekha and Das, Sujoy and Srivastava, Namita},
+  journal={arXiv preprint arXiv:1502.05168},
+  year={2015}
+}
+
+@article{abdul2004umass,
+  title={UMass at TREC 2004: Novelty and HARD},
+  author={Abdul-Jaleel, Nasreen and Allan, James and Croft, W Bruce and Diaz, Fernando and Larkey, Leah and Li, Xiaoyan and Smucker, Mark D and Wade, Courtney},
+  journal={Computer Science Department Faculty Publication Series},
+  pages={189},
+  year={2004}
+}
+
+@inproceedings{lavrenko2017relevance,
+  title={Relevance-based language models},
+  author={Lavrenko, Victor and Croft, W Bruce},
+  booktitle={ACM SIGIR Forum},
+  volume={51},
+  number={2},
+  pages={260--267},
+  year={2017},
+  organization={ACM New York, NY, USA}
+}
+
+@article{chen2022pseudo,
+  title={Pseudo-Relevance Feedback Method Based on the Topic Relevance Model},
+  author={Chen, Zhaofeng and Guo, Naixuan and Sun, Jiu and Wang, Yuanyuan and Zhou, Feng and Xu, Sen and Wang, Rugang},
+  journal={Mathematical Problems in Engineering},
+  volume={2022},
+  year={2022},
+  publisher={Hindawi}
+}
diff --git a/report/main.pdf b/report/main.pdf
diff --git a/report/main.tex b/report/main.tex
@@ -74,6 +74,12 @@ \section{Related Work}\label{sec:related}
 In this Section, we delve into pertinent research encompassing the realms of conversational search engines and the broader area of information retrieval. While certain highlighted studies do not directly cater to conversational search engines or explicit information retrieval, their techniques remain invaluable in various stages of the conversational retrieval process.
 
 \subsection*{Pseudo-Relevance Feedback by Query Expansion}\label{sec:prf}
+Oftentimes, queries entered by users are too short and include too little hint for the search engine to retrieve all relevant documents \cite{vaidyanathan2015query}. To counter this, the user-generated query must be reformulated to better reflect the user's informational needs. One approach to this is known as query expansion using relevance feedback: Assume we have a set of documents relevant to the user-generated query. One can then reformulate the query such that the retrieval system retrieves documents similar to the set of relevant documents, i.e. documents that are probably also relevant to the user's query. This expansion is accomplished by selecting keywords from the set of relevant documents and appending those to the query. Different strategies for this exist \cite{vaidyanathan2015query}.
+
+However, in a real setting, we do not have a set of relevant documents. Instead, we can generate a set of documents that are hopefully relevant to the user's query, and then follow the same process. Since the relevance of the generated set of documents is unknown, this approach is called pseudo-relevance feedback. In order to retrieve this set, we can use the original user entered query to retrieve only the top $p$ documents. If $p$ is very small, it is likely that all those documents are relevant to the query. Expanding the query using those documents will then likely result in a better retrieval of all relevant documents \cite{vaidyanathan2015query}.
+
+One particular technique for this is known as \texttt{RM3} \cite{abdul2004umass}. \texttt{RM3} uses relevance-based language models \cite{lavrenko2017relevance} to estimate the relevance of a word based on the word's probability in the language models of the (pseudo-)relevant documents. This relevance of the word to the query is then interpolated with the original query language model to avoid the query drifting too far away from the original query. The resulting language model is then used to generate a new query. Overall, the \texttt{RM3} approach can be parameterized in the number of documents to use for the pseudo-relevance feedback, the number of terms to add to the query, and the interpolation parameter \cite{abdul2004umass,lavrenko2017relevance,chen2022pseudo}.
+
 
 \subsection*{Text-to-Text Transfer Transformer}\label{sec:t5}
 The vast domain of natural language processing (NLP) revolves around the understanding of natural language, whether presented in text or speech form. NLP aspires to equip computers with the capability to grasp the depth of human language and harness this understanding to execute a range of tasks, such as text summarization, machine translation, and question answering. Given the diverse nature of these tasks in terms of their input, output, and underlying challenges, developing a unified model proficient across the entire spectrum poses a significant challenge.