Baseline Implementation. As stated in the proposal, we do not have readily available code for BNN. Thus, we implement a baseline method based on the paper. Specifically, it takes two steps to train a bitwise neural network. First, we trained a real-valued network that takes either bitwise inputs or real-valued inputs ranged between -1 and 1 by TensorFlow. Second, we binarized the network and use noisy backpropagation to update the weights.
FPGA implementation. With a fairly good pretrained BNN, we began to implement code on FPGA. Given the amount of weights of the neural network, it is impossible to all weights into logic gates. Therefore, we believe that we need to store all the binary weights in on-chip RAM and calculate the results block by block. Specifically, in our scenario, we properly arrange the the weights to 1024 bits per block. Every time we read out 1024 weights, calculate and store the intermedia results in counters. Then, we shift all the input data by 1 bit. Read the new weights and repeat the previous steps for 1024 times. Finally, if the counter has counted more 1s than 0s, we output 1; otherwise, we output 0. This module is shown in Figure 1, and this module can be reused for different layers to reduce the resource consumptions.
Deep neural networks (DNNs) have substantially pushed the state-of the-art in a wide range of tasks, including speech recognition and computer vision. However, DNNs also require a wealth of compute resources, and thus can benefit greatly from parallelization.
Moreover, power consumption recently has gained massive attentions due to the emerging of mobile devices. As is well know, running real-time text detection and recognition tasks on standalone devices, like a pair of glasses, will quickly drain the battery. Therefore, we may need to exploit the heterogeneity of hardware to boost both performance and efficiency.
Currently, we get stuck with how to choose a proper FPGA with enough storage of nearly 6MB for our implementation.
(Completed) Get familiar with training bitwise neural networks and implement a correct baseline algorithm.
(Completed) Tune parameters and implement retrain process to approach the results of the paper.
(Completed) Implement classifying phase by VHDL.
(Completed) Prepare for final exam and analyze the feasibility of porting the code to hardware.
Port the code to hardware and analyze the results. [3].
Write final report and prepare for competition.