PIR-Eng-Notes/research/Group.1a.Stateless.Client.Stateful.Server/xpir_2016/xpir_2016_notes.md at main · 0xalizk/PIR-Eng-Notes

XPIR — Engineering Notes

_{1. Lineage}
_{2. Core Idea}
_{3. System Context}
_{4. Cryptographic Foundation}
_{5. Key Data Structures}
_{6. Database Encoding}
_{7. Protocol Phases}
_{8. Auto-Optimization Algorithm}
_{9. Encryption Parameter Sets}
_{10. NTTTools Library}
_{11. PIR Pre-processing and Reply Generation Throughput}
_{12. Encryption and Decryption Performance}

_{13. Complexity}
_{14. Performance Benchmarks}
_{15. Application Scenarios}
_{16. Comparison with Prior Work}
_{17. Portable Optimizations}
_{18. Implementation Notes}
_{19. Key Tradeoffs & Limitations}
_{20. Deployment Considerations}
_{21. Open Problems (as stated or implied by the authors)}
_{22. Uncertainties}
_{23. Footnotes}

Field	Value
Paper	XPIR: Private Information Retrieval for Everyone (2016)
Archetype	Construction + System
PIR Category	Group 1a — Stateless Client, Stateful Server
Security model	Semi-honest single-server
Additional assumptions	Ring-LWE (standard lattice assumption); IND-CPA from [28]
Correctness model	Deterministic (given correct parameter selection by the auto-optimizer)
Rounds (online)	1 (non-interactive: client sends query, server returns reply)
Record-size regime	Large (>=10 Mbit elements in throughput benchmarks); parameterized

Lineage ⤴

Field	Value
Builds on	Brakerski-Vaikuntanathan Ring-LWE HE scheme [28]; Stern's cPIR formulation [33]; Lipmaa's recursive PIR [34]; Aguilar et al. broken cPIR scheme [7] (reuses the protocol structure, replaces the cryptosystem)
What changed	Replaced number-theoretic (Paillier/RSA-based) cPIR cryptosystems with a Ring-LWE-based additively homomorphic encryption scheme, achieving multi-gigabit/s processing throughput on a commodity CPU. Added an auto-optimizer that selects cryptographic and PIR parameters jointly.
Superseded by	SealPIR [Group 1a] introduced query compression via oblivious expansion, which XPIR lacks. XPIR's NTTTools library and optimization ideas influenced later work.
Concurrent work	Doroz, Sunar, Hammouri "Bandwidth Efficient PIR from NTRU" (ePrint 2014/232, WAHC'14) — NTRU-based, different authors and techniques. Community calls that paper "XPIR-2014."

Core Idea ⤴

Prior cPIR schemes based on number-theoretic assumptions (Paillier, RSA) process the database at ~1 Mbit/s, making trivial full-database download faster in almost all settings. ¹ XPIR replaces the underlying cryptosystem with a Ring-LWE-based additively homomorphic encryption scheme that achieves 5–20 Gbit/s server-side processing throughput on a commodity laptop, making cPIR faster than trivial download for databases with more than ~10 elements over a 100 Mbit/s connection. ² The system includes an auto-optimizer that jointly selects encryption parameters (N, q), PIR parameters (recursion depth d, aggregation alpha), and factors in upload/download bandwidth to minimize round-trip time or other user-defined cost functions. ³

System Context ⤴

Application: General-purpose cPIR library with four evaluated use-cases: Netflix-like private video streaming (static DB), IPTV (dynamic DB), Match.com-like private keyword search (static DB), NYSE stock market private data retrieval (dynamic DB). ⁴
Key constraint driving PIR design: Total round-trip time (query generation + upload + server processing + download + decryption) must beat trivial full-database download for the given network setting. ⁵
System architecture: Single server (MSI GT60 laptop, Core i7-3630QM, 8 GB DDR3 RAM) with client on the same or remote machine. Network simulated at ADSL (1 Mbps up / 20 Mbps down) and FTTH (100 Mbps symmetric). ⁶
Where PIR fits: The XPIR library is the complete system — it includes the Ring-LWE cryptosystem, the PIR protocol, and the auto-optimizer as a single GPLv3 package. ⁷

Cryptographic Foundation ⤴

Layer	Detail
Hardness assumption	Ring-LWE over R_q = Z_q[X] / <X^N + 1>, based on the analysis of [28] (Brakerski-Vaikuntanathan). Indistinguishability reduces to the standard Ring-LWE lattice problem. ⁸
Encryption/encoding scheme(s)	Symmetric Ring-LWE encryption scheme (SKE) with derived public-key variant. Additively homomorphic: supports Sum (ciphertext addition) and Absorb (plaintext-ciphertext multiplication). No ciphertext-ciphertext multiplication — strictly additive HE, not FHE. ⁹
Ring / Field	R_q = Z_q[X] / <X^N + 1> with N in {1024, 2048, 4096} and q a product of 60-bit or 30-bit primes congruent to 1 mod 2N (for NTT compatibility). Polynomials stored in NTT-CRT representation. ¹⁰
Key structure	Secret key s sampled from chi (error distribution) over R_q. Public key pk = (pk1, pk2) derived via SKE.Encrypt(s, 0). Randomness from Salsa20/20 CSPRNG (up to 256-bit security). ¹¹
Correctness condition	Decryption recovers m = e mod t where e = b - a*s. Correct if accumulated noise stays below the plaintext modulus t. Maximum number of additions h_a is a parameter to ParamGen. ¹²

Key Data Structures ⤴

Database: n elements of l bits each, optionally reshaped into a d-dimensional hypercube of side length ceil(n^{1/d}) via recursion. ¹³
Query: d vectors of ceil(n^{1/d}) ciphertexts each (one per recursion dimension). Each ciphertext encrypts 0 except at the target index position, which encrypts 1. ¹⁴
Reply: A vector of ceil(l / l_0) ciphertexts, where l_0 is the absorbable plaintext bits per ciphertext. ¹⁵
NTT-CRT representation: All polynomials (keys, query elements, database chunks) stored in Number-Theoretic Transform + Chinese Remainder Theorem form for O(N log N) polynomial multiplication instead of O(N^2). ¹⁶

Database Encoding ⤴

Representation: Flat array of n elements, each split into chunks of l_0 bits (plaintext capacity per ciphertext). With recursion (d > 1), reshaped into a d-dimensional array of side ceil(n^{1/d}). ¹⁷
Record addressing: Multi-dimensional index decomposition (i_1, ..., i_d) in base ceil(n^{1/d}). ¹⁸
Preprocessing required: For static databases, elements are pre-imported into NTT-CRT form at 5 Gbit/s (laptop) to 10 Gbit/s (high-end server). This is the main bottleneck for the preprocessing phase. ¹⁹
Aggregation: Groups of alpha elements can be aggregated into a single "super-element" of size l * alpha, reducing n to ceil(n/alpha) at the cost of increased reply size. ²⁰

Protocol Phases ⤴

Phase	Actor	Operation	Communication	When / Frequency
Performance Cache	Client + Server	Both sides benchmark encryption/decryption throughput and reply generation throughput for all candidate parameter sets	--	Once at startup
Optimization	Client	Auto-optimizer selects best (N, q, d, alpha, EncParams) given database shape (n, l), bandwidth (U, D), and target function f_target	--	Once per database shape / network change
DB Import (static)	Server	Convert database elements to NTT-CRT form	--	Once (pre-processing); 5 Gbit/s on laptop ¹⁹
Choice	Client + Server	Server sends catalog; client picks index i	Catalog metadata	Per query
Query Gen	Client	Generate d selection vectors of ceil(n^{1/d}) ciphertexts each; encrypt 1 at target, 0 elsewhere	d * ceil(n^{1/d}) * ciphertext_size (up)	Per query
Reply Gen	Server	For each recursion dimension j in [1..d]: absorb database chunks with query ciphertexts, sum results. Produces a single reply vector.	-- (server-side)	Per query
Reply Send	Server	Send reply vector R = (R_1, ..., R_{ceil(l/l_0)})	ceil(l/l_0) * ciphertext_size (down)	Per query
Reply Extract	Client	Decrypt d encryption layers; recover element at index i	--	Per query

Auto-Optimization Algorithm ⤴

The auto-optimizer is XPIR's signature engineering contribution. It runs on the client after receiving the database shape (n, l) and the server's performance cache from the server; the client determines upload/download bandwidth (U, D) via its own bandwidth test. ³

Inputs:

Recursion range (d_1, d_2): range of dimension values to try
Aggregation range (alpha_1, alpha_2): range of aggregation values
Encryption parameter list EncParams: all candidate (N, q) pairs
Upload/download bandwidth (U, D)
Target optimization function f_target (default: minimize RTT)

Search procedure:

Server sends database shape (n, l) and its performance cache to client.
Client runs a bandwidth test if U or D are unknown.
For every encryption scheme parameters in EncParams:
- For every dimension d between d_1 and d_2:
  - For every aggregation value alpha between alpha_1 and alpha_2:
    - Estimate queryGenerationTime from client performance cache
    - Estimate querySendingTime from upload bandwidth U
    - Estimate replyGenerationTime from server performance cache
    - Estimate replySendingTime from download bandwidth D
    - Estimate replyDecryptionTime from client performance cache
    - Compute performance measure via f_target on these five values
Output the parameter set with the best performance measure. ²¹

Default target function: f_target = MAX(queryGenTime, querySendTime) + MAX(replyGenTime, replySendTime, replyDecryptTime). This models the pipelining: query generation and sending overlap; reply generation, sending, and decryption overlap. ²²

Other target functions: minimum total resources, cloud cost (dollar value per CPU-ms and per bit transmitted). ²²

Convexity optimization: When alpha_1 = 1 and alpha_2 = n (large range), the optimizer uses a dichotomy (binary search) on alpha, exploiting the empirical convexity of the target function: as alpha grows, query size shrinks but reply size grows. ²³

Runtime: The optimizer runs in a few milliseconds for any database configuration. ²³

Encryption Parameter Sets ⤴

Parameters (N, q bits)	Max Sec (bits)	Plaintext capacity	Ciphertext size	Expansion factor F
(1024, 60)	97	<= 20 Kbits	128 Kbits	>= 6.4
(2048, 120)	91	<= 100 Kbits	512 Kbits	>= 5.12
(4096, 120)	335 (capped at 256 by Salsa20/20)	<= 192 Kbits	1 Mbit	>= 5.3

²⁴

Security notes: For constant modulus q, security increases exponentially with polynomial degree N while computational cost increases only (almost) linearly. Upgrading from (2048, 120) to (4096, 120) roughly doubles cost but increases theoretical security to 335 bits (capped at 256 by the PRNG). ²⁵

NTTTools Library ⤴

XPIR's custom polynomial arithmetic library (NTTTools) is a key engineering contribution, built without external dependencies (no NTL, no GMP except for decryption lifting coefficients). ²⁶

NTT-CRT Representation

NTT (Number-Theoretic Transform): FFT analog over finite fields for polynomial multiplication in O(N log N). Uses Harvey's NTT algorithm, restricted to power-of-two polynomial degrees. ²⁷
CRT (Chinese Remainder Theorem): Represents large modulus q as a product of small primes, enabling native 64-bit arithmetic. Multiplication cost is linear in log(q) (number of CRT components). ²⁸
Combined NTT-CRT: All polynomials stored in evaluation-domain, coordinate-wise representation. Encryption requires only 3 NTT-CRT transforms plus basic operations. Decryption requires inverse NTT plus CRT lifting. ²⁹

Pre-computed Newton Coefficients

A novel optimization for modular multiplication when one operand (the query element) is reused many times. ³⁰

Problem: Multiplying two 60-bit integers modulo a 60-bit prime requires 64x64->128-bit multiplication followed by costly integer division.
Solution: Pre-compute y' = floor(y * 2^64 / p) for the reused operand y. Then: q = xy' / 2^64; r = xy - q*p mod 2^64; if r > p: r = r - p.
Cost: Two integer multiplications + one conditional subtraction (replacing the costly division).
Impact: In the PIR reply generation, the query elements are the reused multipliers. Pre-computation is amortized across all database element absorptions. ³⁰

Performance Comparison: NTTTools vs Double-CRT (HElib)

Pre-processing (NTT transforms): ³¹

Parameters	NTTTools	Double-CRT
(1024, 60\|\|44)	16 us	178 us
(2048, 120\|\|132)	78 us	1100 us

Processing (multiply and add): ³¹

Parameters	NTTTools	Double-CRT
(1024, 60\|\|44)	2.3 us	5 us
(2048, 120\|\|132)	9.6 us	27 us

Multiplication (single polynomial mult): ³²

Parameters	NTTTools	Double-CRT
(1024, 60\|\|44)	1.8 us	3.4 us
(2048, 120\|\|132)	7.3 us	20.2 us

NTTTools is 2-3x faster for processing and ~10x faster for pre-processing, mainly due to Harvey's NTT and the restriction to power-of-two degrees. ³¹

Memory: NTTTools uses 8 KB per polynomial (degree 1024, 60-bit coefficients) by default, plus 2x with pre-computed quotients. Double-CRT uses ~40 KB per object for large amounts. ³³

PIR Pre-processing and Reply Generation Throughput ⤴

Parameters	(1024, 60)	(2048, 120)	(4096, 120)
Input size (per poly)	20 Kbits	100 Kbits	192 Kbits
Pre-processing (per poly)	4.2 us	19 us	38 us
Pre-processing (PIR throughput)	4.8 Gbps	5.2 Gbps	5 Gbps
Processing (per poly)	0.57 us	2.3 us	4.8 us
Processing (PIR throughput)	18 Gbps	22 Gbps	20 Gbps

³⁴ Tests on MSI GT60 laptop, Core i7-3630QM 2.67 GHz, using all cores with OpenMP.

Implication: After NTT-CRT import, the database is processed at roughly 20 Gbit/s during reply generation. The bottleneck is importing data into NTT-CRT form (~5 Gbit/s). For dynamic databases where import cannot be done ahead of time, processing is limited by the import phase to ~5 Gbit/s. ³⁵

Encryption and Decryption Performance ⤴

For polynomial degree 4096 and varying modulus size (Figure 5): ³⁶

Modulus bitsize	Encryption (us, approx)	Decryption (us, approx)
60	~230	~100
120	~500	~500
180	~800	~1500
240	~1100	~2100

Encryption cost scales linearly in modulus size (and ciphertext size).
Decryption cost jumps above 60 bits due to CRT lifting requiring GMP multi-precision arithmetic.
At 60-bit modulus: query generation at 700 Mbit/s, decryption at 5 Gbit/s.
At 120-bit modulus: query generation at 850 Mbit/s, decryption at 710 Mbit/s (CRT lifting bottleneck). ³⁶

Complexity ⤴

Core metrics

Metric	Asymptotic	Concrete (benchmark params)	Phase
Query size	O(d * n^{1/d} * F * l_0) where F = expansion factor	d=1: n * 128 Kbit (1024,60) or n * 512 Kbit (2048,120); d=2: 2sqrt(n) ciphertext_size	Online
Response size	O(F^d * l)	F^d * l where F ~= 5-6	Online
Server computation	O(n * l) — linear scan of entire database	18-22 Gbit/s throughput (pre-processed static); ~5 Gbit/s (dynamic) ³⁴	Online
Client computation (query gen)	O(d * n^{1/d}) encryptions	700–850 Mbit/s ³⁶	Online
Client computation (decryption)	O(d * l / l_0) decryptions	710 Mbit/s – 5 Gbit/s depending on modulus ³⁶	Online
Throughput (user-perceived)	Depends on n, d, bandwidth	~15/n Gbit/s for d=1 (approximate, from Figure 6) ³⁷	Online

FHE-specific metrics

Metric	Asymptotic	Concrete (benchmark params)	Phase
Expansion factor (F)	>= q/t	5.12 – 6.4 depending on parameters ²⁴	--
Multiplicative depth	0 (additive-only HE; no ciphertext-ciphertext multiplication)	0	--
Recursion depth d	1 or 2 (chosen by optimizer)	d=1 best for throughput; d=2 best for latency with n >= 1000 ³⁷	--

Performance Benchmarks ⤴

Hardware

Server: MSI GT60 laptop, Core i7-3630QM 2.67 GHz (mobile), 8 GB DDR3 RAM. Single-core for NTTTools microbenchmarks (Figure 3, 4). All cores (4C/8T) with OpenMP for PIR throughput (Figure 2). ³⁴ ³¹
Storage: OCZ Vertex 460 SSD (4 Gbit/s contiguous read) for databases exceeding RAM. ³⁸
Alternative server: 10-core Xeon E7-4870 roughly doubles throughput and caps at RAM bandwidth. ³⁹

Static Database Throughput (Figure 6, log-scale, approximate)

User-perceived throughput of streaming data on FTTH (100 Mbit/s), database files >= 10 Mbit: ³⁷

n (files in DB)	d=1, 91-bit sec (Gbit/s, approx)	d=2, 91-bit sec (Gbit/s, approx)	d=1, 256-bit sec (Gbit/s, approx)	Trivial FTTH PIR (Gbit/s)
10	~1.5	~0.3	~1.3	0.1
100	~0.15	~0.08	~0.13	0.1
1,000	~0.015	~0.015	~0.013	0.1
10,000	~0.0015	~0.005	~0.0013	0.1
100,000	~0.00015	~0.002	~0.00013	0.1

Key observations:

Trivial PIR (full download at 100 Mbit/s FTTH) is 10-200x slower than cPIR for small n values (n <= ~100); for larger n (n >= ~1000 in d=1), trivial PIR is actually faster. ³⁷
d=1 throughput follows ~15/n Gbit/s (straight line on log-log plot). ³⁷
d=2 with recursion: query size proportional to 2*sqrt(n), significant overhead for small n but superior for n >= ~1000. ⁴⁰
256-bit security (4096,120) is only ~10% worse than 91-bit security (2048,120) due to doubled ciphertext capacity. ²⁵

Static Database Latency (Figure 7, log-scale, approximate)

Initial latency before user starts receiving data, on FTTH: ⁴⁰

n (files in DB)	d=1 FTTH (s, approx)	d=2 FTTH (s, approx)
10	~0.03	~0.1
100	~0.3	~0.3
1,000	~3	~1
10,000	~30	~5
100,000	~300	~15

Latency grows linearly in n for d=1 and as sqrt(n) for d=2. Upload bandwidth is the main bottleneck. ⁴⁰

Dynamic Database Throughput (Figure 8, log-scale, approximate)

Without pre-processing (IPTV-like setting), FTTH 100 Mbit/s: ⁴¹

Throughput is roughly 6x lower than static (pre-processed) case.
d=1 at n=100: ~25 Mbit/s (approximate). Sufficient for one 720p-30fps stream per 50 simultaneous clients. ⁴¹
d=2 provides less benefit for dynamic data since the import bottleneck dominates. ⁴¹

Round-Trip Time (Figure 10, static data, log-scale, approximate)

RTT on FTTH, using (1024, 60) parameters, F ~= 6: ⁴²

n*l (DB size)	n=10,000 elements	Trivial FTTH PIR	Request Processing (RP)
1 Mbit	~0.05 s	~0.00001 s	~0.0001 s
10 Mbit	~0.05 s	~0.0001 s	~0.001 s
100 Mbit	~0.06 s	~0.001 s	~0.01 s
1 Gbit	~0.1 s	~0.01 s	~0.1 s
10 Gbit	~1 s	~0.1 s	~1 s
100 Gbit	~10 s	~1 s	~10 s

Trivial PIR is faster only for databases with fewer than ~10 elements (where cPIR expansion factor dominates). ⁴²

Concrete Worked Example (from text, exact)

n = 10,000 elements, l = 1 Mbit each, parameters (1024, 60), F ~= 6, FTTH, no recursion/aggregation: ⁴³

Query generation: 0.05 s (at 2.2 Gbit/s)
Query sending: 12.8 s (10,000 * 128 Kbit over 100 Mbit/s upload)
Reply processing: 0.1 s (at 10 Gbit/s)
Reply sending: 0.06 s (6 * 1 Mbit over 100 Mbit/s download)
Reply decryption: ~1 ms

With d=2 recursion: Query sending time divided by factor ~50 (from 10,000 ciphertexts to 200), with little impact on other times. ⁴³

Netflix Use-Case (exact from text)

Database: 100,000 movies (static files, pre-processable with H.265/HEVC compression): ⁴⁴

720p at 30fps (400 Kbit/s needed): User can hide choice among 35,000 movies.
1024p at 60fps (2 Mbit/s needed): Hide choice among ~8,000 movies.

Sniffer Use-Case (exact from text)

Private packet sniffer filtering traffic by source IP: ⁴⁵

Parameters (1024, 60) for Class B network (65,535 addresses): query size 1 GByte.
Parameters (2048, 120): query size 4 GBytes (stored on disk, not re-sent per packet).
Bimodal traffic distribution (40% small, 40% MTU, 20% mid): sniffer processes link at 600 Mbit/s (approximate, from Figure 9). ⁴⁶
Buffered packets filling a plaintext: sniffer processes link at roughly 3 Gbit/s for parameters (2048, 120). ⁴⁷

Application Scenarios ⤴

Private video streaming (Netflix-like): Static pre-processed database. User hides choice among 8K-35K movies depending on quality/framerate tradeoff. Multiple concurrent users are scalable: disk access is synchronous across users, so costs do not increase. ⁴⁸
IPTV: Dynamic database (cannot pre-process). Single processor handles 100 720p-30fps streams for 50 simultaneous clients, or 5000 streams of distant IP web cameras. ⁴¹
Private keyword search (Match.com): Private criteria search over dating profiles. With public pre-filtering by city, latency drops from 10 minutes to 6-60 seconds for 5-keyword queries. ⁴⁹
Private stock market data (NYSE): Dynamic stream at 5-10 Gbit/s. Latency-oriented: client retrieves company information in ~100 ms, which is acceptable since the data is already 100 ms old. ⁵⁰

Comparison with Prior Work ⤴

Metric	XPIR (Ring-LWE)	Paillier cPIR	Trivial PIR (full download)
DB processing throughput	5–20 Gbit/s (laptop) ³⁴	~1 Mbit/s ¹	N/A (bandwidth-limited)
Expansion factor F	5.1–6.4 ²⁴	Lower (~2-3) but much slower	1.0
Security basis	Ring-LWE (post-quantum)	Factoring (not post-quantum)	None needed
Best regime	n > ~10 elements, moderate bandwidth	Extremely low bandwidth only ⁵¹	Very high bandwidth (>20 Gbit/s) or n <= 4 ⁵²
Query size scaling	O(d * n^{1/d}) ciphertexts	O(n) (no recursion standard)	0 (just request index)

Key takeaway: XPIR is the first cPIR system where server processing throughput (5-20 Gbit/s) exceeds typical network bandwidth, making cPIR faster than trivial download for nearly all practical database sizes. The auto-optimizer selects Ring-LWE cPIR in almost all settings; Paillier is chosen only for extremely low bandwidths; trivial download only when bandwidth exceeds ~20 Gbit/s. ⁵¹

Portable Optimizations ⤴

Pre-computed Newton quotients for modular multiplication: Replace costly integer division with pre-computed scaled approximation when one multiplicand is reused. Applicable to any NTT-based polynomial arithmetic where the same polynomial is multiplied against many others (universal for PIR reply generation). ³⁰
NTT-CRT mixed representation: Combine NTT (for polynomial multiplication) with CRT (for handling large moduli in native 64-bit words). Applicable to any Ring-LWE-based system. ²⁸
Harvey's NTT algorithm: Very fast NTT restricted to power-of-two polynomial degrees; applicable to any RLWE scheme with matching degree constraints. ²⁷
Auto-optimization framework: The joint parameter search over (encryption params, recursion depth, aggregation, bandwidth) is applicable to any PIR system with configurable parameters. ³

Implementation Notes ⤴

Language / Library: C++, custom NTTTools library. No external crypto dependencies (no NTL, no GMP except for CRT lifting in decryption). GPLv3 license. ⁷
Polynomial arithmetic: NTT-based using Harvey's algorithm, restricted to power-of-two polynomial degrees. CRT for large moduli decomposed into 60-bit or 30-bit primes. ²⁷ ²⁸
CRT decomposition: q is a product of 60-bit primes (or 30-bit for CRT lifting). Each prime is congruent to 1 mod 2N for NTT compatibility. ¹⁰
SIMD / vectorization: Not explicitly mentioned. OpenMP for multi-threading across database chunks. ³⁴
Parallelism: Multi-threaded via OpenMP for PIR throughput benchmarks. Single-core for microbenchmarks. ³⁴ ³¹
Network: TCP sockets with simulated bandwidth constraints for ADSL/FTTH. Authors note TCP buffering/windowing overhead for very small latencies and suggest UDP for more linear behavior. ⁵³
Randomness: Salsa20/20 CSPRNG providing up to 256 bits of security. ¹¹
Open source: https://github.com/XPIR-team/XPIR ⁷

Key Tradeoffs & Limitations ⤴

No query compression: XPIR sends one ciphertext per database element per recursion dimension. Without recursion (d=1), query size is O(n). Recursion (d=2) reduces this to O(sqrt(n)) but increases reply expansion to F^2. SealPIR's later oblivious expansion technique dramatically improves this. ¹⁴
Additive-only HE: The Ring-LWE scheme supports only Sum and Absorb (plaintext * ciphertext), not ciphertext * ciphertext multiplication. This prevents the multiplicative homomorphic operations used in later schemes (GSW external products, BFV SIMD). The advantage is zero multiplicative depth, keeping parameters small. ⁹
Expansion factor: F >= 5.1 means at minimum 5x communication overhead for the reply. This is inherent to the Ring-LWE encryption scheme's ciphertext-to-plaintext ratio. ²⁴
Memory for large databases: Databases up to 10 Gbit fit in RAM. For larger databases (Netflix 100K movies), data is chunked through RAM with SSD backing, adding disk transfer time as a bottleneck. ³⁸
Decryption bottleneck at large moduli: For moduli > 60 bits, CRT lifting requires GMP multi-precision arithmetic, reducing decryption throughput from 5 Gbit/s to 710 Mbit/s at 120-bit modulus. ³⁶
Dynamic database penalty: Without pre-processing, throughput drops ~6x because NTT-CRT import must be done on-the-fly during reply generation. ⁴¹
Trivial PIR still wins for n <= ~4 elements or when available bandwidth exceeds database processing throughput (~20 Gbit/s static, ~5 Gbit/s dynamic). ⁵²

Deployment Considerations ⤴

Database updates: Static databases can be fully pre-processed. Dynamic databases must import data on-the-fly at ~5 Gbit/s. No incremental update mechanism — the entire database (or changed chunks) must be re-imported. ³⁵
Sharding: Not explicitly addressed, but the authors note that disk access is synchronous for concurrent users, so adding users does not increase per-query I/O cost. ⁴⁸
Key rotation / query limits: Not discussed. The encryption scheme generates fresh randomness per query via Salsa20/20.
Anonymous query support: Yes — the scheme is stateless. No client-dependent preprocessing or persistent per-client state on the server.
Session model: Ephemeral client. Each query is independent.
Cold start suitability: Yes (no offline communication beyond catalog exchange), though the performance cache benchmark adds startup latency.
Multiple users: Synchronous disk access means concurrent users share I/O cost. 12 users hiding choice among 8K movies at 720p-30fps use 8% of one processor. ⁴⁸

Open Problems (as stated or implied by the authors) ⤴

Extending the auto-optimizer to multi-core and distributed server settings. ⁵⁴
Combining cPIR with replicated-database PIR for better security/performance tradeoffs (the Popcorn approach cited in [46]). ⁵⁵
Adapting XPIR for very small databases (n <= 10) where cPIR expansion factor dominates and trivial download is faster. ⁵²
Supporting ciphertext-ciphertext multiplication for richer homomorphic operations (addressed by later schemes using BFV/GSW). ⁹

Uncertainties ⤴

"n" notation collision: The paper uses uppercase N for polynomial degree and lowercase n for number of database elements. This is explicitly noted as "unusual notation" on p. 4. ⁵⁶ Throughout these notes, N = polynomial degree, n = number of database elements.
Chart-derived values: All values extracted from Figures 6, 7, 8, 9, 10, 11 are approximate (log-scale charts). Values marked "approximate" throughout.
Security estimation methodology: The paper uses the Lindner-Peikert approach [32] for parameter generation, which predates the modern Lattice Estimator. The 91-bit and 97-bit security levels for (2048,120) and (1024,60) may differ under modern estimation. ⁵⁷
Paillier comparison details: The Paillier-based cPIR comparison ("~1 Mbit/s") is attributed to Sion and Carbunar [1] (NDSS'07) and refers to number-theoretic cPIR with 1024-bit RSA modulus. Modern Paillier implementations may be faster, but the paper's main claim is about the orders-of-magnitude gap with Ring-LWE. ¹
Multi-threading scope: Figure 2 throughput values use all cores (OpenMP), but Figures 3 and 4 microbenchmarks are single-core. The throughput figures in Section 4 use all cores. The distinction is not always explicit.

Footnotes ⤴

p. 1, Abstract and Section 1 -- "Sion and Carbunar showed that cPIR schemes were not practical"; p. 2, Section 1.1 -- "a multiplication over a large modulus... restricts both the database size and the throughput." ↩ ↩² ↩³
p. 1, Abstract -- "the paradigm shift introduced by lattice-based cryptography... cPIR is of practical value"; p. 13, Section 4 -- multi-gigabit processing throughput demonstrated. ↩
p. 7, Section 3.1 -- Protocol overview with optimization steps 1-5; pp. 23-24, Appendix A -- Full optimization algorithm pseudocode. ↩ ↩² ↩³
pp. 12-19, Section 4 -- Four use-cases: Netflix (p. 15), IPTV (p. 15-16), Match.com (pp. 18-19), NYSE (p. 19). ↩
p. 2, Section 1.1, "Focused issue" -- "The main performance metric we use is the time needed for a client to retrieve an element privately." ↩
pp. 12-13, Section 4, "Experimental setting" -- MSI GT60 laptop, Core i7-3630QM 2.67 GHz, 8 GB DDR3. FTTH and ADSL simulated via timers. ↩
p. 1, NOTE box -- "XPIR is free (GPLv3) software and available at https://github.com/XPIR-team/XPIR." ↩ ↩² ↩³
p. 5, Section 2.1 -- "Based on the analysis of [28], this scheme ensures indistinguishability if the standard lattice problem Ring-LWE is hard." ↩
pp. 4-5, Section 2.1 -- SKE scheme with Add (ciphertext addition) and Absorb (plaintext * ciphertext multiplication). No ciphertext-ciphertext multiply defined. ↩ ↩² ↩³
p. 5, Section 2.1 -- R_q = Z_q[X] / <X^N + 1>; ParamGen forces N in {1024, 2048, 4096} and q to be a multiple of 60-bit or 30-bit primes congruent to 1 mod 2N. ↩ ↩²
p. 13, Section 4, "Security" -- "To generate randomness for our scheme, we use Salsa20/20 (Salsa20/20 is able to provide up to 256 bits of security)." ↩ ↩²
pp. 4-5, Section 2.1 -- SKE.ParamGen takes security parameter k and maximum additions h_a. ↩
p. 6, Section 2.2 -- Recursion: "an integer d called dimension... the client only needs to send d x n^{1/d} query elements." ↩
p. 6, Section 2.2, "Query Generation" -- "generate the i-th query element q_i as: A random encryption of zero if i != i_0; A random encryption of one if i = i_0." ↩ ↩²
p. 6, Section 2.2, "Reply Generation" -- "For j from 1 to ceil(l/l_0): Compute R_j := Sum_{i=1}^n Absorb(m_{i,j}, q_i)." ↩
pp. 8-9, Section 3.2.1 -- "In XPIR we use a mixed NTT-CRT representation to reduce computational costs." ↩
p. 6, Section 2.2 -- "aggregate them by groups of size alpha and obtain a database with ceil(n/alpha) elements of size l x alpha." ↩
p. 24, Appendix A, "Query generation" -- "Define (i_1, ..., i_d) the decomposition in base ceil(n^{1/d}) of i." ↩
p. 9, Figure 2 -- Pre-processing throughput: 4.8 Gbit/s (1024,60), 5.2 Gbit/s (2048,120), 5 Gbit/s (4096,120). ↩ ↩²
p. 6, Section 2.2 -- "To reduce query size it is possible to aggregate them by groups of size alpha." ↩
pp. 23-24, Appendix A -- "Optimization (Client and Server)" algorithm pseudocode. ↩
p. 8, Section 3.1 -- "the round-trip time of the retrieval... is given by the function MAX(queryGenerationTime, querySendingTime) + MAX(replyGenerationTime, replySendingTime, replyDecryptionTime)." ↩ ↩²
p. 24, Appendix A, "Remark (convexity)" -- "the optimizer always returned very reasonable results and was able to run in a few milliseconds for any database." ↩ ↩²
p. 5, Figure 1 -- Parameter sets table with N, q, max sums, plaintext, ciphertext, and F values. ↩ ↩² ↩³ ↩⁴
p. 13, Section 4, "Security" -- "security (in attacker operations) increases exponentially... computational costs increase only (almost) linearly"; (4096,120) theoretical security 335 bits, capped at 256. ↩ ↩²
p. 9, Section 3.2.1, "Comparison with [40]" -- "we have built our library without using any external library (Double-CRT is built over NTL which in turn is built over GMP) which results in a big performance improvement." ↩
p. 9, Section 3.2.1 -- "we use Harvey's NTT algorithm [42] which is very fast but only works for some polynomial degrees (powers of two)." ↩ ↩² ↩³
p. 8, Section 3.2.1 -- "The CRT representation ensures that the multiplication cost is also linear in log p, instead of quadratic for a trivial algorithm." ↩ ↩² ↩³
p. 11, Section 3.2.3 -- "encryption requires only the computation of three NTT-CRT transformations and some basic operations." ↩
p. 10, Section 3.2.2 -- Pre-computing Newton coefficients algorithm: "q = xy'/2^64; r = xy - qp mod 2^64; if r > p: r = r - p." "This algorithm requires just two integer multiplications a shift and a conditional subtraction." ↩ ↩² ↩³
p. 10, Figure 3 -- NTTTools vs Double-CRT pre-processing and processing times. "Tests are on a single-core." ↩ ↩² ↩³ ↩⁴ ↩⁵
p. 11, Figure 4 -- Multiplication times for NTTTools vs Double-CRT. ↩
p. 10, Section 3.2.1 -- "the memory footprint in NTTTools is of 8 Kbytes by default and twice that with pre-computed quotients... Double-CRT objects memory usage increases linearly at 40Kbytes per object." ↩
p. 9, Figure 2 -- PIR pre-processing and processing throughput table. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
p. 9, Section 3.2.1, "Implications on cPIR performance" -- "After importation, the database is processed during the reply generation phase at roughly 20Gbits/s"; import at 5 Gbit/s. ↩ ↩²
p. 11 (60-bit numbers, prose) and p. 12 (Figure 5) -- "We are able to generate a query at 700Mbits/s and decrypt an incoming reply at 5Gbits/s" (60-bit); p. 12 (120-bit numbers) -- "encryption scales well... it is possible to generate a query at 850Mbits/s, but decryption suffers... 710Mbits/s" (120-bit). ↩ ↩² ↩³ ↩⁴ ↩⁵
p. 14, Figure 6 -- "User-perceived throughput of XPIR streaming static data." Log-log plot. "this line is pretty close to the straight line defined by 15/n Gbps." ↩ ↩² ↩³ ↩⁴ ↩⁵
p. 13, Section 4, "Experimental setting" -- "OCZ Vertex 460 SSD (4Gbit/s access)"; p. 15, "Medium Access Issues" -- databases up to 10 Gbits in RAM, larger chunked via SSD. ↩ ↩²
p. 14, Figure 6 caption -- "Performance on a server with a better processor (e.g. ten-core Xeon E7-4870) roughly doubles and caps at that level as RAM bandwidth is saturated." ↩
p. 14, Figure 7 -- "Initial latency before the user starts to receive streaming data." "latency grows linearly in n in dimension 1 and in sqrt(n) in dimension 2, and that the main bottleneck is the available upload bandwidth." ↩ ↩² ↩³
p. 16, Figure 8 and text -- "user-perceived throughput is roughly divided by six" vs static. "For an IPTV like application, a single processor can handle one hundred 720p-30fps streams for 50 simultaneous clients." ↩ ↩² ↩³ ↩⁴ ↩⁵
p. 18, Figure 10 -- "Round-trip (RTT) and request processing (RP) times... Trivial PIR (from top to bottom the second filled line) is faster than cPIR for databases with less than ten elements." ↩ ↩²
p. 18, text -- Worked example: n=10,000, l=1Mb, (1024,60), F~=6. "sending the query over the FTTH link takes 12.8 seconds... Using recursion divides query sending time by a factor 50." ↩ ↩²
p. 15, Section 4.1, "The Netflix Use-case" -- "If the user is willing to receive a 720p-30fps video stream he can hide his choice among 35K movies." ↩
pp. 16-17, Section 4.2, "The Private Sniffer Use-Case" -- Query sizes for Class B network range. ↩
p. 17, Section 4.2 and Figure 9 -- "the sniffer is able process a link at 600Mbps (purple line)." ↩
p. 17, Section 4.2 -- "we can process a link at roughly 3Gbps (blue line), for parameters (2048,120)." ↩
p. 15, Section 4.1, "Multiple Users" -- "if data is accessed synchronously for concurrent users, disk access costs do not increase, so scalability is not an issue." ↩ ↩² ↩³
pp. 18-19, Section 4.3, "Match.com Use-Case" -- "Using the public keyword pre-filtering... we can hope to divide the size of the database by a factor 10 to 100... which would lower the waiting time to 6-60 seconds." ↩
p. 19, Section 4.3, "NYSE Use-Case" -- "the user should get the information in roughly 100ms, which is a reasonable waiting time for information that is already 100ms old." ↩
p. 19, Section 4.4 -- "the Paillier based cPIR will be chosen for extremely small bandwidths"; "Ring-LWE based cPIR is chosen by the optimizer... in almost all situations." ↩ ↩²
p. 19-20, Section 4.4 -- "trivial PIR will be the natural choice when available bandwidth is higher than our database processing throughput"; "for database with two to four elements." ↩ ↩² ↩³
p. 14-15, text -- "The strange behaviour of the FTTH lines for a small number of elements comes from the fact that we use TCP sockets." ↩
p. 20, Section 5 -- "using a high end server in a multi-core setting can only increase this difference further." ↩
p. 15, Section 4.1 -- "Replacing it with XPIR would bring security and a x100 performance boost on the cPIR subroutine." ↩
p. 4, Section 2.1, "Notations" -- "we use uppercase N for the polynomial degree (which is an unusual notation) to distinguish the polynomial degree from the number of elements." ↩
p. 13, Section 4, "Security" -- Parameters generated "following the approach of [32]" (Lindner-Peikert, 2011). ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XPIR — Engineering Notes

Lineage ⤴

Core Idea ⤴

System Context ⤴

Cryptographic Foundation ⤴

Key Data Structures ⤴

Database Encoding ⤴

Protocol Phases ⤴

Auto-Optimization Algorithm ⤴

Encryption Parameter Sets ⤴

NTTTools Library ⤴

NTT-CRT Representation

Pre-computed Newton Coefficients

Performance Comparison: NTTTools vs Double-CRT (HElib)

PIR Pre-processing and Reply Generation Throughput ⤴

Encryption and Decryption Performance ⤴

Complexity ⤴

Core metrics

FHE-specific metrics

Performance Benchmarks ⤴

Hardware

Static Database Throughput (Figure 6, log-scale, approximate)

Static Database Latency (Figure 7, log-scale, approximate)

Dynamic Database Throughput (Figure 8, log-scale, approximate)

Round-Trip Time (Figure 10, static data, log-scale, approximate)

Concrete Worked Example (from text, exact)

Netflix Use-Case (exact from text)

Sniffer Use-Case (exact from text)

Application Scenarios ⤴

Comparison with Prior Work ⤴

Portable Optimizations ⤴

Implementation Notes ⤴

Key Tradeoffs & Limitations ⤴

Deployment Considerations ⤴

Open Problems (as stated or implied by the authors) ⤴

Uncertainties ⤴

Footnotes ⤴

FilesExpand file tree

xpir_2016_notes.md

Latest commit

History

xpir_2016_notes.md

File metadata and controls

XPIR — Engineering Notes

Lineage ⤴

Core Idea ⤴

System Context ⤴

Cryptographic Foundation ⤴

Key Data Structures ⤴

Database Encoding ⤴

Protocol Phases ⤴

Auto-Optimization Algorithm ⤴

Encryption Parameter Sets ⤴

NTTTools Library ⤴

NTT-CRT Representation

Pre-computed Newton Coefficients

Performance Comparison: NTTTools vs Double-CRT (HElib)

PIR Pre-processing and Reply Generation Throughput ⤴

Encryption and Decryption Performance ⤴

Complexity ⤴

Core metrics

FHE-specific metrics

Performance Benchmarks ⤴

Hardware

Static Database Throughput (Figure 6, log-scale, approximate)

Static Database Latency (Figure 7, log-scale, approximate)

Dynamic Database Throughput (Figure 8, log-scale, approximate)

Round-Trip Time (Figure 10, static data, log-scale, approximate)

Concrete Worked Example (from text, exact)

Netflix Use-Case (exact from text)

Sniffer Use-Case (exact from text)

Application Scenarios ⤴

Comparison with Prior Work ⤴

Portable Optimizations ⤴

Implementation Notes ⤴

Key Tradeoffs & Limitations ⤴

Deployment Considerations ⤴

Open Problems (as stated or implied by the authors) ⤴

Uncertainties ⤴

Footnotes ⤴

Footnotes