# Research Citations and Implementation Mapping This document maps research papers to their implementations in the codebase. ## HNSW (Hierarchical Navigable Small World) **Implementation:** `llmds/hnsw.py` **Primary Citation:** - Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4), 824-836. **Related Papers:** - Efficient Vector Search on Disaggregated Memory with d-HNSW (for memory-efficient variations) **Techniques Implemented:** - Hierarchical multi-layer graph structure (`_layers`) - Greedy search algorithm (`_search_layer`) - Level assignment with exponential distribution (`_random_level`) - Entry point selection and navigation - Dynamic connection management (M parameter) - efConstruction and efSearch parameters for quality/speed trade-offs **Code References:** - `HNSW` class: Main implementation - `_random_level()`: Level assignment following exponential distribution - `_search_layer()`: Greedy search in a single layer - `add()`: Vector insertion with connection management - `search()`: Multi-layer search from top to bottom ## KV Cache with Prefix Sharing **Implementation:** `llmds/kv_cache.py`, `llmds/paged_allocator.py` **Primary Citation:** - Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation (specific paper on KV cache optimization for RAG) **Techniques Implemented:** - Paged allocation with fixed-size pages (`PagedAllocator`) - Prefix/prompt sharing with copy-on-write semantics (`KVCache._copy_if_shared`) - Hash-based deduplication (`_hash_prefix`) - Reference counting for shared pages (`_page_refs`) - Defensive copying to prevent corruption (`get()` returns deep copies) **Code References:** - `KVCache` class: Main KV cache implementation - `PagedAllocator` class: Page-based memory management - `_copy_if_shared()`: Copy-on-write implementation - `_hash_prefix()`: SHA256-based prefix hashing - `attach()` / `detach()`: Sequence management with reference counting ## Count-Min Sketch **Implementation:** `llmds/cmsketch.py` **Primary Citations:** - Cormode, G., & Muthukrishnan, S. (2005). An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1), 58-75. - Fair-Count-Min: Frequency Estimation under Equal Group-wise Approximation Factor **Techniques Implemented:** - Count-Min Sketch with multiple hash functions (`depth` parameter) - Conservative update strategy to reduce overestimation bias - Error bound calculation (`get_error_bound()`) - Hot item detection (`is_hot()`) **Code References:** - `CountMinSketch` class: Main sketch implementation - `add()`: Conservative update algorithm - `estimate()`: Minimum across all hash rows - `get_error_bound()`: Theoretical error bound calculation - Uses MurmurHash3 for hash functions ## BM25 Inverted Index **Implementation:** `llmds/inverted_index.py` **Primary Citation:** - Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389. **Techniques Implemented:** - BM25 scoring formula with k1 and b parameters - Inverted index structure with compressed postings - Varint encoding for integer compression (`_encode_varint`) - Zigzag encoding for signed integers (`_zigzag_encode`) - Term frequency and document frequency tracking **Code References:** - `InvertedIndex` class: Main inverted index implementation - `_bm25_score()`: BM25 scoring function - `add_document()`: Index construction - `search()`: BM25 retrieval ## Hybrid Retrieval (Dense + Sparse) **Implementation:** `llmds/retrieval_pipeline.py` **Primary Citation:** - Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data **Techniques Implemented:** - Hybrid dense (HNSW) + sparse (BM25) retrieval - Score fusion with configurable weights (`fusion_weight` parameter) - Top-K maintenance using indexed heap - Hot query caching using Count-Min Sketch **Code References:** - `RetrievalPipeline` class: End-to-end hybrid retrieval - `search()`: Combines HNSW and BM25 with score fusion - Uses `IndexedHeap` for efficient top-K maintenance ## Indexed Heap **Implementation:** `llmds/indexed_heap.py` **Techniques Implemented:** - Indexed binary heap for O(log n) priority updates - Support for both min-heap and max-heap - O(1) key lookup via position map (`_pos` dictionary) - Decrease/increase key operations with correct bubble directions **Code References:** - `IndexedHeap` class: Indexed heap implementation - `decrease_key()` / `increase_key()`: Key update operations - `_bubble_up()` / `_bubble_down()`: Heap property maintenance ## Scheduler and Batching **Implementation:** `llmds/scheduler.py`, `llmds/admissions.py` **Techniques Implemented:** - Dynamic micro-batching with configurable wait time - Priority queue using indexed heap - Admission control with QPS and token rate limiting - Moving window average for rate tracking **Code References:** - `Scheduler` class: Batching scheduler - `AdmissionController` class: Rate limiting and admission control - Uses `IndexedHeap` for priority queue ## Token-Aware LRU **Implementation:** `llmds/token_lru.py` **Techniques Implemented:** - LRU eviction with token-aware budgeting - Cumulative token tracking across cache entries - Eviction based on token count rather than entry count **Code References:** - `TokenLRU` class: Token-aware LRU cache - `total_tokens()`: Cumulative token tracking - `put()`: Token-aware insertion with eviction --- ## How to Cite ### Citing This Software If you use this codebase in your research, please cite: ```bibtex @software{llm_rag_ds_optimizer, title = {LLM RAG Data Structures Optimizer}, author = {Gutierrez, Carlos}, email = {cgutierrez44833@ucumberlands.edu}, year = {2025}, url = {https://github.com/CarGDev/llm-rag-ds-optimizer} } ``` ### Citing Related Papers When using this codebase in research, please also cite the relevant papers: 1. **HNSW**: Cite Malkov & Yashunin (2018) for HNSW algorithm 2. **KV Cache**: Cite Cache-Craft paper for prefix sharing techniques 3. **Count-Min Sketch**: Cite Cormode & Muthukrishnan (2005) for Count-Min Sketch 4. **BM25**: Cite Robertson & Zaragoza (2009) for BM25 scoring 5. **Hybrid Retrieval**: Cite survey paper for hybrid dense+sparse approaches ## Additional References - Papers in `papers/` directory contain full citations and implementation details - See `README.md` for usage examples and performance benchmarks