ResNet-20 Optimized Implementation

This example demonstrates a high-performance ResNet-20 inference using FHEON. It highlights performance-focused techniques: optimized encodings, multi-channel striding, minimal rotations, strategic bootstrapping, and dynamic slot switching for throughput and memory efficiency.

Overview

  1. High-Performance Context & Keys

    • Initialize FHEController with tuned parameters (ring degree, slots, circuit depth, etc.).

    • Generate compact rotation/bootstrapping key sets per stage to minimize key size and runtime overhead.

  2. Performance-Focused Data Prep

    • Encode kernels and biases with optimizations that reduce ciphertext rotations and memory.

    • Precompute multi-channel rotation positions and deduplicate rotation requests.

  3. Optimized Multi-Channel Striding

    • Use secure_double_optimized_convolution_multi_channels() for strided conv + shortcut in one pass, avoiding extra rotations.

    • Downsample shortcuts and main branch with aligned encodings to enable efficient addition/merge.

  4. Dynamic Slot Management

    • Switch slot counts between major stages (clear / load per-layer key sets) to match computation needs and maximize parallelism.

    • Reduce memory and evaluation cost by using smaller slot sizes where appropriate and expanding only when needed.

  5. Careful Bootstrapping & ReLU Placement

    • Apply bootstrapping only when necessary (precision/level recovery), not after every layer, to reduce expensive operations.

    • Run secure ReLU evaluations at optimized points to preserve accuracy while minimizing cost.

  6. Inference Loop (High Throughput)

    • Encrypt CIFAR-10 inputs, process with multi-channel optimized blocks, perform global average pooling, and apply final FC classification.

    • Measure and aggregate wall-clock times for key stages to guide further tuning.

Key Functions (performance notes)

  • convolution_block() — uses optimized kernel encodings and packs biases for fewer rotations.

  • double_shortcut_convolution_block() — executes strided convolution + shortcut together to cut rotation/memory overhead.

  • resnet_block() — minimizes bootstraps and schedules secure ReLU for speed.

  • fc_layer_block() — uses fully-connected encodings tuned for final-slot layout.

  • serialize_rotation_keys() / rotation generation — deduplicates and orders rotations for minimal keygen and eval cost.

Performance Strategies (summary)

  • Minimize rotations: optimized encodings + grouped rotation key generation.

  • Reduce bootstraps: apply only when required for level/precision.

  • Slot right-sizing: pick slot counts per stage to balance parallelism vs. key/bootstrapping cost.

  • Encode once, reuse often: pre-encode kernels and biases and reuse across images.

  • Measure & iterate: collect timings for conv / relu / bootstrap / flinear to find bottlenecks.

Full Example Source

ResNet20Optimized.cpp