Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training
The pioneering work BinaryConnect uses Straight Through Estimator (STE) to mimic the gradients of the sign function, but it also causes the crucial inconsistency problem.
Tags:Paper and LLMsBinarizationPricing Type
- Pricing Type: Free
- Price Range Start($):
GitHub Link
The GitHub link is https://github.com/dravenalg/reste
Introduce
The implementation of the Rectified Straight Through Estimator (ReSTE) for training Binary Neural Networks (BNNs) is presented in this GitHub repository. ReSTE addresses the inconsistency problem in training BNNs by balancing estimating error and gradient stability. It introduces indicators to quantify this equilibrium, and proposes a power function based estimator, ReSTE, which outperforms other estimators in terms of balancing these factors. The method is evaluated on CIFAR-10 and ImageNet datasets, demonstrating superior performance without requiring additional modules or losses. The repository provides implementation details and instructions for running the method on these datasets. The paper is accepted at ICCV 2023 and provides insights into this novel approach for training BNNs.
The pioneering work BinaryConnect uses Straight Through Estimator (STE) to mimic the gradients of the sign function, but it also causes the crucial inconsistency problem.
Content
Official implement of ReSTE. | Paper | Personal Homepage. Xiao-Ming Wu, Dian Zheng, Zu-Hao Liu, Wei-Shi Zheng*. If you have any questions, feel free to contact me by [email protected]. Binary Neural Networks (BNNs) attract great research enthusiasm in recent years due to its great performance in neural networks compression. The pioneering work BinaryConnect proposes to use Straight Through Estimator (STE) to mimic the gradients of the sign function in BNNs training, but it also causes the crucial inconsistency problem due to the difference between the forward and the backward processes. Most of the previous methods design different estimators instead of STE to mitigate the inconsistency problem. However, they ignore the fact that when reducing the estimating error, the gradient stability will decrease concomitantly, which makes the gradients highly divergent, harming the model training and increasing the risk of gradient vanishing and gradient exploding. To fully take the gradient stability into consideration, we present a new perspective to the BNNs training, regarding it as the equilibrium between the estimating error and the gradient stability. In this view, we firstly design two indicators to quantitatively demonstrate the equilibrium phenomenon. In addition, in order to balance the estimating error and the gradient stability well, we revise the original straight through estimator and propose a power function based estimator, Rectified Straight Through Estimator (ReSTE for short). Comparing to other estimators, ReSTE is rational and capable of flexibly balancing the estimating error with the gradient stability. Extensive experiments on CIFAR-10 and ImageNet datasets show that ReSTE has excellent performance and surpasses the state-of-the-art methods without any auxiliary modules or losses. The core idea of our method is that we present a new perspective to the BNNs training, regarding it as the equilibrium between the estimating error and the gradient stability. We also design two indicators to demonstrate the estimating error and the gradient stability. The estimating error is the difference between the sign function and the estimator, which can be evaluated by: The gradient stability is the divergence of the gradients of all parameters in an iteration update, which can be evaluated by: In this view, we propose Rectified Straight Through Estimator (ReSTE for short), which is rational and capable of flexibly balancing the estimating error with the gradient stability. The equavalent forward process of ReSTE is : The backward process of ReSTE is: We visualize the forward and backward processes of ReSTE as follow. NOTE: The version is not strictly required and can be flexibly adjusted based on your CUDA. If you use our code or models in your research, please cite our paper with











