Skip to content

Latest commit

 

History

History
430 lines (308 loc) · 32.4 KB

File metadata and controls

430 lines (308 loc) · 32.4 KB

YPIR — Engineering Notes

1. Lineage
2. Core Idea
3. Variants
4. Cryptographic Foundation
5. Ring Architecture / Modulus Chain
6. Key Data Structures
7. Database Encoding
8. Protocol Phases
9. Correctness Analysis
10. Complexity
11. Optimization Catalog

12. Performance Benchmarks
13. Application Scenarios
14. Portable Optimizations
15. Implementation Notes
16. Comparison with Prior Work
17. Deployment Considerations
18. Key Tradeoffs & Limitations
19. Open Problems
20. Related Papers in Collection
21. Uncertainties

Field Value
Paper YPIR: High-Throughput Single-Server PIR with Silent Preprocessing (2024)
Archetype Construction + Engineering optimization
PIR Category Group 1b — Stateless Client, Stateless Server
Security model Semi-honest single-server
Additional assumptions RLWE + circular security (key-dependent pseudorandomness for RLWE encodings, Definition A.1)
Correctness model Probabilistic (correctness error delta <= 2^{-40})
Rounds (online) 1 (non-interactive: single query, single response)
Record-size regime Small (1-bit or 1-byte baseline); large-record variant YPIR+SP supports 32--64 KB records

Lineage

Field Value
Builds on SimplePIR/DoublePIR [Group 2a] for the matrix-LWE PIR structure; CDKS packing [CDKS21] for LWE-to-RLWE conversion; Spiral [Group 1a] for key-switching-based response compression
What changed Replaces the bootstrapping-based LWE-to-RLWE conversion used in Tiptoe and HintlessPIR with the CDKS ring packing transformation, which uses key-switching rather than homomorphic decryption. This allows a smaller RLWE modulus, yielding 7.5x smaller packed encodings and 2.7x faster packing than HintlessPIR.
Superseded by N/A
Concurrent work Tiptoe [HDCZ23a], HintlessPIR [LMRS24a] (both use bootstrapping-based approaches to eliminate the hint)

Core Idea

YPIR eliminates the offline hint download required by SimplePIR/DoublePIR by compressing the DoublePIR response into a small number of RLWE encodings using the Chen-Dai-Kim-Song (CDKS) packing transformation. 1 Unlike Tiptoe and HintlessPIR which use bootstrapping to homomorphically evaluate the hint-decryption function (requiring the RLWE plaintext modulus to be as large as the LWE encoding modulus), YPIR uses key-switching-based packing that keeps the LWE and RLWE encodings under the same modulus, yielding much smaller RLWE ciphertexts and near-SimplePIR throughput (97% at 32 GB) with zero offline communication. 2

Variants

Variant Key Difference Offline Comm Online Comm (32 GB) Best For
YPIR (base) Packs DoublePIR response via CDKS None 2.5 MB total Small records (1-bit/1-byte), private SCT auditing
YPIR+SP Packs SimplePIR response instead of DoublePIR None 2.6 MB total (64 KB records) Large records (32--64 KB), password breach checking

YPIR+SP applies the CDKS packing to the SimplePIR output (an entire column of the database) rather than the DoublePIR output, naturally encoding large records. 3 For large records (32--64 KB), YPIR+SP achieves 7--14x smaller responses than HintlessPIR with similar query size and only 5% less throughput. 4

Cryptographic Foundation

Layer Detail
Hardness assumption RLWE_{d1,m1,q1,chi1} for SimplePIR step; RLWE_{d2,m2',q2,chi2} for packing step (where m2' = m2 + (floor(log_z q2) + 1) * log d2)
Encryption/encoding scheme(s) Plain LWE encoding (SimplePIR step, dimension d1 = 1024); RLWE encoding via CDKS packing (dimension d2 = 2048); Regev-style encodings for both steps
Ring / Field R_{d1} = Z[x]/(x^{d1}+1) with d1 = 2^{10} = 1024 for SimplePIR; R_{d2} = Z[x]/(x^{d2}+1) with d2 = 2^{11} = 2048 for DoublePIR + packing
Key structure Two independent secret keys: s1 from chi1 (discrete Gaussian, sigma = 11sqrt(2pi)) for SimplePIR; s2 from chi2 (discrete Gaussian, sigma = 6.4sqrt(2pi)) for DoublePIR/packing. Client also generates CDKS packing key pk from s2.
Correctness condition delta <= 2d2rhoexp(-pitau_double^2/sigma_double^2) + 2exp(-pitau_simple^2/sigma_simple^2) <= 2^{-40} (Theorem 3.4)

Ring Architecture / Modulus Chain

Ring Dimension Modulus (bits) Role / Phase
Z_{q1} (integers) d1 = 1024 (LWE dimension) 32-bit (q1 = 2^{32}) SimplePIR step: matrix-vector product over integers
R_{d2,q2} d2 = 2048 56-bit (q2 = (2^{28} - 2^{16} + 1)(2^{28} - 2^{24} - 2^{21} + 1)) DoublePIR step + LWE-to-RLWE packing
R_{d2,q-hat_{2,1}} d2 = 2048 28-bit Modulus-switched response component 1
R_{d2,q-hat_{2,2}} d2 = 2048 20-bit Modulus-switched response component 2

Key Data Structures

  • Database: Matrix D in Z_N^{l1 x l2} where l1 = m1d1, l2 = m2d2 for ring dimensions d1, d2 and integers m1, m2. Records are indexed by row-column pair (i1, i2) in [l1] x [l2]. Database elements are Z_N values (N = 2^8 for 1-bit records, each record is 8 bits). 5
  • Hint matrices: H1 = G^{-1}{d1,p}(A1D) in Z^{kappad1 x l2} where kappa = ceil(log q-hat1 / log p); H2 = A2 * H1^T in Z{q2}^{d2 x kappa*d1}. Both are precomputed once per database update. 6
  • Packing key: pk = (W1, ..., W_ell) consisting of ell = log d2 key-switching matrices, each W_i in R_{q2}^{2 x t}. Total size: 462 KB (fixed, independent of database size). 7
  • Negacyclic matrices: A_j = NCyclicMat(a_j^T) in Z_{q_j}^{d_j x d_j} -- structured matrices enabling NTT-based preprocessing.

Database Encoding

  • Representation: 2D matrix D in Z_N^{l1 x l2}, where l1 and l2 are multiples of d1 and d2 respectively.
  • Record addressing: Row-column pair (i1, i2), decomposed as i_j = alpha_j * d_j + beta_j where alpha_j in [m_j] and beta_j in [d_j].
  • Preprocessing required: Negacyclic matrix A1 = NCyclicMat(a1^T) enables NTT-based hint computation. Vectors a1, a2 in public parameters are derived from a PRG seed (ChaCha20 in counter mode). 8
  • Dimension selection: For a database of t one-bit records, set t' = ceil(t / log N), l1 = 2^{ceil(log t'/2)}, l2 = 2^{floor(log t'/2)}. The database is arranged as a square (or near-square) matrix. 9

Protocol Phases

Phase Actor Operation Communication When / Frequency
DB Setup Server Compute H1, H2 from D using structured matrices A1, A2. Precompute random components of CDKS.Pack output. -- Once per DB update
Query Gen Client Sample s1, s2, compute packing key pk via CDKS.Setup(1^λ, s2, z). Construct LWE encodings c1 (row indicator), c2 (column indicator). q = (pk, c1, c2) uploaded: 846 KB -- 2.5 MB Per query
Answer (SimplePIR) Server Compute T = g_p^{-1}(c1^T * D _{q1,q-hat1}) -- linear scan over database --
Answer (DoublePIR) Server Compute C = delta * [H2, A2T^T; c2^TH1^T, c2^T*T^T] -- Per query
Answer (Packing) Server Apply CDKS.Pack(pk, C_i) for each block C_i; only O(kappa + log d2) NTTs needed online (rest precomputed) -- Per query
Answer (Modulus switch) Server Apply ModReduce to packed RLWE encodings resp = rho pairs (c_{i,1}, c_{i,2}): 12 KB Per query
Extract Client Decrypt RLWE encodings using s2, recover coefficient vector, apply gadget expansion with s1, round to recover mu in Z_N -- Per query

Correctness Analysis

Option A: FHE Noise Analysis

YPIR uses the independence heuristic to bound noise as sub-Gaussian variance rather than worst-case magnitude. 10

Phase Noise parameter Growth type Notes
SimplePIR step sigma_simple^2 <= d1*sigma1^2/4 + (q-hat1/q1)^2 * l1 * N^2 * sigma1^2 / 4 Additive (inner product) Dominated by database-dependent term
DoublePIR + packing sigma_double^2 <= (q-hat_{2,2}/q-hat_{2,1})^2 * d2 * sigma2^2/4 + (q-hat_{2,2}/q2)^2 * (sigma2^2/4)(l2p^2 + (d2^2-1)(td2*z^2)/3) Additive + packing noise Packing adds (d2^2-1)(td2z^2*sigma_chi^2/4) term from CDKS
  • Correctness condition: delta <= 2d2rhoexp(-pitau_double^2/sigma_double^2) + 2exp(-pitau_simple^2/sigma_simple^2) <= 2^{-40}
  • Independence heuristic used? Yes -- models intermediate error terms as independent sub-Gaussian random variables. Standard in lattice-based PIR. 11
  • Dominant noise source: SimplePIR step dominates for large databases (96% of server time at 32 GB). The packing noise is fixed (independent of database size).

Complexity

Core metrics

Metric Asymptotic Concrete (32 GB, 1-bit records) Phase
Query size O(sqrt(N) * log q / log p + d * log d * log q) 2.5 MB (upload) Online
Response size O(d^2 * kappa * log q-hat / d2) 12 KB (download) Online
Server computation O(N * d) 2.64 s Online
Client computation -- Negligible (decryption + rounding) Online
Throughput O(M) where M = memory bandwidth 12.1 GB/s/core Online
Response overhead -- 12 KB / 32 GB ~ 0 (response independent of DB size) --

Preprocessing metrics

Metric Asymptotic Concrete (32 GB) Phase
Server preprocessing O(l1 * l2 * d1 * log d1) 11 CPU-minutes (48 MB/s) Offline (per DB update)
Client hint download 0 0 --
Client offline upload 0 0 --
Server per-client storage 0 0 (stateless) --

FHE-specific metrics

Metric Asymptotic Concrete Phase
Expansion factor (LWE) (n+1)*log q / log p ~1000x (LWE) reduced to ~2x (RLWE) --
Expansion factor (RLWE) 2*log q / log p Response uses RLWE: (d+1)/2 compression --
Communication rate -- Response size independent of DB size (only depends on lattice parameters) --

If-reported metrics

Metric Value Notes
Memory bandwidth utilization 83% of machine memory bandwidth at 32 GB Core metric: YPIR is memory-bandwidth-limited, not compute-limited
Financial server cost (SCT auditing) $228/week per 1M clients 8x cheaper than DoublePIR ($1822), 84x cheaper than Tiptoe
Financial server cost (with batching) $183/week per 1M clients With cross-client batching, batch size k=4

Optimization Catalog

Optimization Known/Novel Source Improvement Applicable to
CDKS packing (LWE-to-RLWE) Known (adapted) [CDKS21] 1000x ciphertext expansion reduction; eliminates hint download Any LWE-based PIR needing response compression
Negacyclic matrix encoding for preprocessing Novel This paper 10--15x faster preprocessing (d/log d asymptotic) Any SimplePIR-family protocol
Preprocessing of CDKS.Pack random components Novel This paper 9x reduction in online packing cost (from O(kappa*d1 + log d2) to O(kappa + log d2) NTTs) Any scheme using CDKS packing
Cross-client batching Known (adapted) [LG15, CMS16] 1.5--1.7x effective throughput improvement Any memory-bandwidth-limited PIR
PRG seed compression for public parameters Known Standard (ChaCha20) Compresses a1, a2 and pk random components Any LWE/RLWE PIR with random public matrices
Split modulus switching Known [MW22] Reduces response size via two-target modulus reduction Any RLWE-based PIR with multi-prime modulus
NTT representation for key-switching / automorphisms Known (reorganized) NTT standard practice Eliminates per-query NTT for packing; permutation on NTT representation replaces automorphism Any CDKS-based packing
CRT decomposition with native 64-bit arithmetic Known (adapted) This paper's implementation Avoids modular reduction after every operation; reduces Eq 3.2 cost by 60x vs. naive Any NTT-friendly multi-prime scheme

Performance Benchmarks

Hardware: Amazon EC2 r6i.16xlarge -- Intel Xeon Platinum 8375C @ 2.9 GHz, 64 vCPUs, 512 GB RAM. Single-threaded. GCC 11, AVX2 + AVX-512 enabled. Memory bandwidth measured via STREAM: approximately 14.6 GB/s per core. 12

Implementation: 3000 lines of Rust + 1000 line C++ kernel for 32-bit matrix multiplication. Open source: https://github.com/menonsamir/ypir.&#8201;[^24]

Table 2 (reproduced): Single-bit retrieval from databases of varying sizes

Database Metric SimplePIR DoublePIR Tiptoe HintlessPIR YPIR
1 GB Prep. Throughput 3.7 MB/s 3.4 MB/s 1.6 MB/s 4.8 MB/s 39 MB/s
Off. Download 121 MB 16 MB -- -- --
Upload 120 KB 312 KB 33 MB 488 KB 846 KB
Download 120 KB 32 KB 2.1 MB 1.7 MB 12 KB
Server Time 74 ms 94 ms 2.47 s 743 ms 129 ms
Throughput 13.6 GB/s 10.6 GB/s 415 MB/s 1.3 GB/s 7.8 GB/s
8 GB Prep. Throughput 3.1 MB/s 2.9 MB/s 1.6 MB/s 5.2 MB/s 46 MB/s
Off. Download 362 MB 16 MB -- -- --
Upload 362 KB 724 KB 33 MB 1.4 MB 1.5 MB
Download 362 KB 32 KB 1.7 MB 1.7 MB 12 KB
Server Time 708 ms 845 ms 9.75 s 1.62 s 687 ms
Throughput 11.3 GB/s 9.5 GB/s 840 MB/s 4.9 GB/s 11.6 GB/s
32 GB Prep. Throughput 3.3 MB/s 3.3 MB/s 1.4 MB/s 5.7 MB/s 48 MB/s
Off. Download 724 MB 16 MB -- -- --
Upload 724 KB 1.4 MB 34 MB 2.4 MB 2.5 MB
Download 724 KB 32 MB 17 MB 3.2 MB 12 KB
Server Time 3.08 s 3.22 s 21.00 s 5.00 s 2.64 s
Throughput 10.4 GB/s 9.9 GB/s 1.5 GB/s 6.4 GB/s 12.1 GB/s

Table 3 (reproduced): Server computation breakdown

DB Size SimplePIR step DoublePIR step Packing step Total
1 GB 0.07 s (59%) 14 ms (11%) 39 ms (30%) 0.13 s
4 GB 0.30 s (82%) 27 ms (7%) 38 ms (10%) 0.37 s
16 GB 1.21 s (93%) 57 ms (4%) 39 ms (3%) 1.31 s
32 GB 2.56 s (96%) 58 ms (2%) 39 ms (1%) 2.66 s

Table 4 (reproduced): Query size breakdown

| DB Size | |c1| | |c2| | |pk| | Total | |---------|------|------|------|-------| | 1 GB | 128 KB (15%) | 256 KB (30%) | 462 KB (55%) | 846 KB | | 4 GB | 256 KB (21%) | 512 KB (42%) | 462 KB (38%) | 1.2 MB | | 16 GB | 512 KB (26%) | 1.0 MB (51%) | 462 KB (23%) | 2.0 MB |

Table 5: LWE-to-RLWE packing microbenchmarks (packing 2^{12} LWE encodings)

Metric Tiptoe HintlessPIR CDKS (no preprocess) CDKS (with preprocess)
log(n, q, p) (10, 32, 8) (10, 32, 8) (11, 56, 15) (11, 56, 15)
Parameter Size 32 MB 360 KB 528 KB 528 KB
Output Size 514 KB 180 KB 24 KB 24 KB
Output Rate 0.01 0.02 0.31 0.31
Offline Compute -- 2012 ms -- 1029 ms
Online Compute 594 ms 141 ms 340 ms 52 ms

Application Scenarios

Private SCT Auditing (Section 4.5)

  • Database: 2^{36} bits (8 GB) representing 5 billion active SCTs via Bloom filters.
  • Workload: 20 audits per client per week (10^4 TLS connections, 1/1000 audit fraction, 2 audits each).
  • Key advantage: YPIR supports daily database updates at no extra client communication cost (no hint to re-download), whereas DoublePIR requires weekly hint re-download compromising real-time auditing. 13
  • Cost comparison (Table 6): YPIR: $228/week per 1M clients. DoublePIR weekly: $1,822. DoublePIR daily: $10,863. Tiptoe: $19,139. HintlessPIR: $3,708. 14

Password Breach Checking (Section 4.6)

  • Database: 1 billion SHA-256 hashes (2^{19} records x 64 KB each, 32 GB total).
  • Approach: Bucket retrieval -- hash password, retrieve bucket by prefix, search locally.
  • Result: Single breach check requires 2.6 MB total communication and 5.2 seconds computation. 2.2x less total communication than HintlessPIR, but 5% more computation. 15

Portable Optimizations

  1. Negacyclic matrix encoding for NTT-based preprocessing: Replace the random matrix A in SimplePIR/DoublePIR/FrodoPIR preprocessing with a structured negacyclic matrix A = NCyclicMat(a^T). This allows computing AD using NTTs in O(l1l2dlog d) time instead of O(l1l2d^2), yielding d/log d ~ 100x asymptotic speedup. Applicable to any SimplePIR-family protocol with zero impact on online costs. Security shifts from plain LWE to RLWE. 16

  2. CDKS packing with precomputed random components: Precompute the random (query-independent) components of CDKS.Pack output during Setup, reducing online packing from O(kappa*d1) to O(kappa + log d2) NTTs. Applicable to any scheme using CDKS packing. 17

  3. Cross-client batching: Process k concurrent queries from independent, non-coordinating clients via a single database scan (matrix-matrix product QD instead of k separate q_i^TD). Effective throughput = k*l/T. With k=4, exceeds memory bandwidth barrier (16+ GB/s effective throughput). Applicable to any memory-bandwidth-limited PIR scheme. Transparent to clients. 18

Implementation Notes

  • Language / Library: Rust (3000 LOC) + C++ (1000 LOC) for fast 32-bit matrix multiplication. Uses Intel HEXL [BKS+21] for AVX2/AVX-512 acceleration where relevant.
  • Polynomial arithmetic: NTT-based. Modulus q2 is a product of two 28-bit NTT-friendly primes, enabling native 64-bit integer arithmetic without frequent modular reduction. 19
  • CRT decomposition: q2 = (2^{28} - 2^{16} + 1) * (2^{28} - 2^{24} - 2^{21} + 1). Arithmetic performed modulo each 28-bit prime using 64-bit integers; CRT reconstruction only when needed.
  • SIMD / vectorization: AVX2 and AVX-512 enabled via compiler flags and Intel HEXL.
  • Parallelism: Single-threaded benchmarks (the primary computation -- matrix-vector product -- is highly parallelizable).
  • PRG: ChaCha20 in counter mode for generating pseudorandom public parameters a1, a2 and packing key random components.
  • Open source: https://github.com/menonsamir/ypir

Comparison with Prior Work

Single-bit retrieval, 32 GB database

Metric YPIR SimplePIR DoublePIR HintlessPIR Tiptoe
Upload 2.5 MB 724 KB 1.4 MB 2.4 MB 34 MB
Download 12 KB 724 KB 32 MB 3.2 MB 17 MB
Total online comm 2.5 MB 1.4 MB 33.4 MB 5.6 MB 51 MB
Offline download 0 724 MB 16 MB 0 0
Server time 2.64 s 3.08 s 3.22 s 5.00 s 21.00 s
Throughput 12.1 GB/s 10.4 GB/s 9.9 GB/s 6.4 GB/s 1.5 GB/s
Prep. throughput 48 MB/s 3.3 MB/s 3.3 MB/s 5.7 MB/s 1.4 MB/s

Key takeaway: YPIR is the preferred scheme when offline communication must be eliminated (dynamic databases, cold-start clients, multi-database access) while maintaining near-maximal throughput. It achieves 97% of SimplePIR's throughput and 83% of the machine's memory bandwidth at 32 GB, with zero offline communication. The main tradeoff is 1.8--3.6x larger total online communication compared to SimplePIR/DoublePIR (which additionally require offline downloads of 16--724 MB). 20

Deployment Considerations

  • Database updates: Server must re-preprocess (recompute H1, H2, and CDKS.Pack random components) on each database update. With NTT-based preprocessing, this takes 11 CPU-minutes for 32 GB (vs. 144 CPU-minutes for SimplePIR). 21
  • Sharding: Explicitly discussed for communication-throughput tradeoff. Running k instances on k sub-databases of size N/k reduces query size while increasing response size. Can achieve 95% of DoublePIR throughput at 1.5 MB total communication. 22
  • Anonymous query support: Yes -- fully stateless. No client-specific state on server, no offline phase. Each query is independent.
  • Session model: Ephemeral client. No persistent state needed between queries.
  • Cold start suitability: Excellent -- no offline communication, no hint download. First query works immediately.
  • Communication floor: Cannot achieve <1 MB total communication for a 32 GB database (minimum query size ~1.1 MB due to packing key). For settings with tight communication budgets, DoublePIR with its 16 MB offline hint may be preferable. 23

Key Tradeoffs & Limitations

  • Larger queries: YPIR queries are 1.8--3x larger than DoublePIR and 3--7x larger than SimplePIR due to the packing key (462 KB fixed overhead) and larger c2 encoding (needs log q2 / log q1 ~ 2x more bits). 24
  • Small database penalty: At 1 GB, YPIR throughput (7.8 GB/s) is 43% lower than SimplePIR (13.6 GB/s) because the fixed packing cost (39 ms) represents 30% of server time. The packing cost becomes negligible (1%) at 32 GB. 25
  • RLWE assumption required: The NTT-based preprocessing optimization requires RLWE hardness (not plain LWE). The online computation remains over integers Z_q, not the polynomial ring.
  • Circular security assumption: Requires key-dependent pseudorandomness for RLWE encodings (Definition A.1), standard in lattice-based schemes but non-trivial. 26
  • Memory bandwidth bound: Like all SimplePIR-family schemes, YPIR cannot exceed the memory bandwidth of the machine. At 32 GB, YPIR achieves 83% of the theoretical memory bandwidth ceiling (~14.6 GB/s). 27

Open Problems

  • Can we design PIR schemes with silent preprocessing that achieve significantly smaller communication than YPIR while retaining comparable throughput? 28
  • Can faster matrix multiplication algorithms (e.g., Strassen) improve the concrete efficiency of cross-client batching in SimplePIR-based protocols? 29

Related Papers in Collection

  • SimplePIR/DoublePIR [Group 2a]: Direct predecessors. YPIR builds on their matrix-LWE structure and database encoding. YPIR eliminates their offline hint download.
  • Spiral [Group 1a]: Shares the key-switching-based response compression philosophy. YPIR's CDKS packing is conceptually similar to Spiral's response packing.
  • HintlessPIR [Group 1b]: Concurrent hintless scheme using bootstrapping-based LWE-to-RLWE conversion. YPIR achieves 2--6x higher throughput and 7.5x smaller packed encodings.
  • VeriSimplePIR [Group 2a]: Adds verifiability to SimplePIR; YPIR's NTT preprocessing optimization is directly applicable.
  • IncrementalPIR [Group 2a]: Addresses preprocessing cost for SimplePIR family; YPIR's NTT-based preprocessing is complementary.

Uncertainties

  • The paper uses N for both the record-count modulus (Z_N, where N = 2^8 for byte-valued records) and implicitly the total number of records in asymptotic expressions. Context disambiguates: N = l1 * l2 for total records, N (in Z_N) for the record value space.
  • Memory bandwidth measurements use STREAM [McC95] but the precise per-core bandwidth ceiling (~14.6 GB/s referenced on p.1, dashed line in Figure 2 at approximately 15 GB/s) is approximate, read from the chart.
  • The "conjectured max throughput" line in Figure 4 (approximately 22 GB/s) assumes minimum 2 arithmetic operations per database byte; this is an engineering estimate, not a proven bound.

Footnotes

  1. Section 1.1 (p.2): "Instead of using bootstrapping, we take a packing approach... and 'pack' the DoublePIR response into a more compact representation using polynomial rings."

  2. Abstract (p.1): "YPIR achieves 12.1 GB/s/core server throughput and requires 2.5 MB of total communication... On the same setup, the state-of-the-art SimplePIR protocol achieves 12.5 GB/s/core."

  3. Section 4.6 (p.28): "We consider a variant of YPIR where we apply the LWE-to-RLWE packing procedure to the SimplePIR output rather than the DoublePIR output."

  4. Table 7 (p.28): YPIR+SP achieves 444 KB download vs. 3.2 MB for HintlessPIR at 32 GB x 64 KB records (SimplePIR is 724 KB).

  5. Section 3 (p.11): "We model the database as a matrix D in Z_N^{l1 x l2}."

  6. Equation 3.1 (p.11): "H1 = G^{-1}{d1,p}(A1*D|{q1,q-hat1}) and H2 = A2 * H1^T."

  7. Table 4 (p.22): Packing key |pk| = 462 KB across all database sizes.

  8. Remark 3.2 (p.12): "The vectors a1 and a2 are uniformly random, and could be derived from a random oracle."

  9. Section 4 (p.15): "When the database consists of t one-bit records, we let t' = ceil(t/log N), and set l1 = 2^{ceil(log t'/2)} and l2 = 2^{floor(log t'/2)}."

  10. Section 2 (p.7): "Like many lattice-based PIR constructions, we use the independence heuristic that models the error terms arising in intermediate homomorphic computations to be independent."

  11. Theorem 3.4 (p.13-14): Full correctness bound with tau_double and tau_simple margin expressions.

  12. Section 4.4 (p.19): "We use an Amazon EC2 r6i.16xlarge instance running Ubuntu 22.04, with 64 vCPUs (Intel Xeon Platinum 8375C CPU @ 2.9 GHz) and 512 GB of RAM."

  13. Section 1 (p.1): "Since there is no offline communication in YPIR, our approach allows clients to always audit the most recent Certificate Transparency logs."

  14. Table 6 (p.26): Full cost breakdown. YPIR total cost is 8x lower than DoublePIR-weekly and 48x lower than DoublePIR-daily.

  15. Section 4.6 (p.29): "Using YPIR+SP, performing a single password breach check against 1 billion compromised passwords requires just 2.6 MB of total communication and 5.2 seconds of computation."

  16. Section 4.1 (p.15) + Appendix B (p.36-37): Full derivation showing A1d computed via NCyclicMat multiplication in O(d1log d1) per column.

  17. Section 4.2 (p.16-17): Preprocessing reduces online NTTs from O(kappa*d1 + log d2) to O(kappa + log d2).

  18. Section 4.3 (p.17-18): "Cross-client batching can increase the effective server throughput of many PIR schemes by 1.5--1.7x."

  19. Footnote 8 (p.14): "Since we use 64-bit integer arithmetic to implement arithmetic operations with respect to a 28-bit modulus, we do not need to perform a modulus reduction after every arithmetic operation."

  20. Section 1.1 (p.3): "YPIR achieves 97% of the throughput of one of the fastest single-server PIR schemes while fully eliminating all offline communication and only incurring a modest increase in query size."

  21. Section 4.4 (p.21): "The offline precomputation of YPIR would take about 11 CPU-minutes, whereas for SimplePIR/DoublePIR, it would take roughly 144 CPU-minutes."

  22. Section 4.4 (p.22-23): Communication-computation tradeoff via sharding/rebalancing discussed with Figure 3.

  23. Section 1.1 (p.4): "The minimum YPIR query size is 1.1 MB... YPIR may not be appropriate" for small fixed communication budgets.

  24. Section 1.1 (p.4): "A YPIR query is 1.8-3x larger than a DoublePIR query... and 3-7x larger than a SimplePIR query."

  25. Table 3 (p.22): Packing is 30% of total at 1 GB, 1% at 32 GB.

  26. Section 2.1 (p.10): "Pseudorandomness thus relies on a 'circular security' assumption."

  27. Remark 4.1 (p.15-16): A ring-based SimplePIR incurs a 3.6x throughput reduction (11.5 GB/s to 3.2 GB/s) due to log q/log N representation blowup.

  28. Section 4.4 (p.23): "An important open question is to design PIR schemes that require significantly smaller communication while retaining comparable server throughput (and silent preprocessing)."

  29. Section 4.3 (p.18): "It is interesting to see whether faster matrix multiplication algorithms can be used to further improve concrete efficiency in SimplePIR-based protocols."