This is an experimental code for reproducing [1]'s result using chainer. No optimization is used for binary operations. I just binalize weight and activation at computation, and use a straight through estimator for gradient computation. [1] "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Yoshua Bengio http://arxiv.org/abs/1602.