ResNet-20 Optimized Implementation
This example demonstrates a high-performance ResNet-20 inference using FHEON. It highlights performance-focused techniques: optimized encodings, multi-channel striding, minimal rotations, strategic bootstrapping, and dynamic slot switching for throughput and memory efficiency.
Overview
High-Performance Context & Keys
Initialize
FHEControllerwith tuned parameters (ring degree, slots, circuit depth, etc.).Generate compact rotation/bootstrapping key sets per stage to minimize key size and runtime overhead.
Performance-Focused Data Prep
Encode kernels and biases with optimizations that reduce ciphertext rotations and memory.
Precompute multi-channel rotation positions and deduplicate rotation requests.
Optimized Multi-Channel Striding
Use
secure_double_optimized_convolution_multi_channels()for strided conv + shortcut in one pass, avoiding extra rotations.Downsample shortcuts and main branch with aligned encodings to enable efficient addition/merge.
Dynamic Slot Management
Switch slot counts between major stages (clear / load per-layer key sets) to match computation needs and maximize parallelism.
Reduce memory and evaluation cost by using smaller slot sizes where appropriate and expanding only when needed.
Careful Bootstrapping & ReLU Placement
Apply bootstrapping only when necessary (precision/level recovery), not after every layer, to reduce expensive operations.
Run secure ReLU evaluations at optimized points to preserve accuracy while minimizing cost.
Inference Loop (High Throughput)
Encrypt CIFAR-10 inputs, process with multi-channel optimized blocks, perform global average pooling, and apply final FC classification.
Measure and aggregate wall-clock times for key stages to guide further tuning.
Key Functions (performance notes)
convolution_block()— uses optimized kernel encodings and packs biases for fewer rotations.double_shortcut_convolution_block()— executes strided convolution + shortcut together to cut rotation/memory overhead.resnet_block()— minimizes bootstraps and schedules secure ReLU for speed.fc_layer_block()— uses fully-connected encodings tuned for final-slot layout.serialize_rotation_keys()/ rotation generation — deduplicates and orders rotations for minimal keygen and eval cost.
Performance Strategies (summary)
Minimize rotations: optimized encodings + grouped rotation key generation.
Reduce bootstraps: apply only when required for level/precision.
Slot right-sizing: pick slot counts per stage to balance parallelism vs. key/bootstrapping cost.
Encode once, reuse often: pre-encode kernels and biases and reuse across images.
Measure & iterate: collect timings for conv / relu / bootstrap / flinear to find bottlenecks.