ResNet-20 Optimized Implementation

This example demonstrates a high-performance ResNet-20 inference using FHEON. It highlights performance-focused techniques: optimized encodings, multi-channel striding, minimal rotations, strategic bootstrapping, and dynamic slot switching for throughput and memory efficiency.

Overview

High-Performance Context & Keys
- Initialize FHEController with tuned parameters (ring degree, slots, circuit depth, etc.).
- Generate compact rotation/bootstrapping key sets per stage to minimize key size and runtime overhead.
Performance-Focused Data Prep
- Encode kernels and biases with optimizations that reduce ciphertext rotations and memory.
- Precompute multi-channel rotation positions and deduplicate rotation requests.
Optimized Multi-Channel Striding
- Use secure_double_optimized_convolution_multi_channels() for strided conv + shortcut in one pass, avoiding extra rotations.
- Downsample shortcuts and main branch with aligned encodings to enable efficient addition/merge.
Dynamic Slot Management
- Switch slot counts between major stages (clear / load per-layer key sets) to match computation needs and maximize parallelism.
- Reduce memory and evaluation cost by using smaller slot sizes where appropriate and expanding only when needed.
Careful Bootstrapping & ReLU Placement
- Apply bootstrapping only when necessary (precision/level recovery), not after every layer, to reduce expensive operations.
- Run secure ReLU evaluations at optimized points to preserve accuracy while minimizing cost.
Inference Loop (High Throughput)
- Encrypt CIFAR-10 inputs, process with multi-channel optimized blocks, perform global average pooling, and apply final FC classification.
- Measure and aggregate wall-clock times for key stages to guide further tuning.

Key Functions (performance notes)

convolution_block() — uses optimized kernel encodings and packs biases for fewer rotations.
double_shortcut_convolution_block() — executes strided convolution + shortcut together to cut rotation/memory overhead.
resnet_block() — minimizes bootstraps and schedules secure ReLU for speed.
fc_layer_block() — uses fully-connected encodings tuned for final-slot layout.
serialize_rotation_keys() / rotation generation — deduplicates and orders rotations for minimal keygen and eval cost.

Performance Strategies (summary)

Minimize rotations: optimized encodings + grouped rotation key generation.
Reduce bootstraps: apply only when required for level/precision.
Slot right-sizing: pick slot counts per stage to balance parallelism vs. key/bootstrapping cost.
Encode once, reuse often: pre-encode kernels and biases and reuse across images.
Measure & iterate: collect timings for conv / relu / bootstrap / flinear to find bottlenecks.

Full Example Source

ResNet20Optimized.cpp