published on
tags: Chainer Hack

Assignment Analysis to Semi-Automatically Find Chainer Bugs

Preface

As a system software researcher working for an (you know, one of many) “artificial intelligence research center”, I use Chainer to explore what kind of system characteristics/supports the real AI applications need. Chainer is really good for this purpose because the framework itself is really simple so it is easy to hack as you wish.

Although the framework is intensively maintained, I sometimes happen to find bugs, especially when I use it in a bit different way than normally done. This post explains a tiny tiny idea I came up with to (kind of) semi-automatically find a certain type of bugs of Chainer.

The Idea

So the idea is “the forward and the backward codes for the same function are supposed to do similar things, especially for preparation”. For example, both forward and backward of the linear function converts the first input into a matrix and assign the result into x (x = _as_mat(inputs[0])), and assign the second input into W (W = inputs[1]).

Given this idea, I extracted all assignments for each variable, and compared the extracted assignments between forward and backward codes. If there is a variable with the same name in forward and backward but with different assignments, it might be a potential bug. In the linear example, x has the same assignments in forward and backward, which implies the preparation has no bug (the same applies to W as well).

Bugs It Found

Let’s see how it works. Here is the code I wrote to extract the assignments and compare them. You should set the names of forward and backward functions by hand (l13 and l15), depending on whether they are vanilla forward/backward, or forward_cpu/backward_cpu, or forward_gpu/backward_gpu.

Clone the Chainer repository and revert it to a point before the bug I found in this method has been fixed. After that, apply my script to chainer/chainer/functions/connection/deconvolution_2d.py.

$ git clone chainer && cd chainer
$ git checkout e6a7ec62773f0df0e3e0
$ ~/chainer_dataflow.py chainer/functions/connection/deconvolution_2d.py
different data flow! ( b )
forward:
111 b = inputs[2] if len(inputs) == 3 else None
137 b = cuda.cupy.ascontiguousarray(b)
backward:
228 b = inputs[2] if len(inputs) == 3 else None
--------------------------------------------------
different data flow! ( kh )
forward:
123 kh, kw = W.shape[2:]
backward:
242 _, out_channels, kh, kw = W.shape
--------------------------------------------------
different data flow! ( kw )
forward:
123 kh, kw = W.shape[2:]
backward:
242 _, out_channels, kh, kw = W.shape
--------------------------------------------------
different data flow! ( c )
forward:
125 c = W.shape[1]  # out_c
backward:
243 c, h, w = gy.shape[1:]
--------------------------------------------------
different data flow! ( algo )
forward:
160 algo = libcudnn.getConvolutionBackwardDataAlgorithm(
165 algo = cuda.cupy.cuda.cudnn.CUDNN_CONVOLUTION_BWD_DATA_ALGO_1  # NOQA
backward:
258 algo = libcudnn.getConvolutionForwardAlgorithm(
283 algo = libcudnn.getConvolutionBackwardFilterAlgorithm(
288 algo = cuda.cupy.cuda.cudnn.CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1  # NOQA
--------------------------------------------------

There are many outputs, but (unfortunately) only the first one (b) is relavant here. The output shows that, in forward, b is assigned from inputs[2] in line 111 and converted into c-contiguous in line 137. However in backward, b is assigned in line 228 and that’s it with no conversion into c-contiguous, which is a bug (#2666).

In the same way, it can also find a smilar bug such as #2582. Before trying it, do not forget to set l13 and l15 of chainer_dataflow.py into forward and backward, instead of forward_gpu and backward_gpu. This fix is actually the one that motivated me to try this idea.

Here’s another example:

$ git checkout e6a7ec62773f0df0 # same commit as the above
$ ~/chainer_dataflow.py chainer/functions/connection/dilated_convolution_2d.py
...
(bunch of irrelavant things ...)    
...
--------------------------------------------------
different data flow! ( x_desc )
forward:
133 x_desc = cudnn.create_tensor_descriptor(xji)
backward:
247 x_desc = cudnn.create_tensor_descriptor(x)
--------------------------------------------------

In this case, x_desc is assigned with tensor descriptors created from different tensors, which was actually not a critical bug but a naming inconsisntecy (#2665).

Limitation and Potential Extension

Because both the idea and the script are very simple, of cource there are many limitations. One obvisous limitation is that it yields a loooot of false positives. It might be useful to defined a threshold of “relevant difference level”.

However, the aim of this post is not to be like “a research paper that claims super novelty”, but to tell the idea to other people with a hope that they may come up with a more clever idea besed on mine, which will be beneficial to the whole community.

A possible way to extend the idea I have in mind, is to compare the code among forward_cpu and foward_gpu, in additon to among foward and backward. This is based on a though that some preparation code must be shared both in the cpu code and the gpu code. For example, #2589 fixed missing assertions in the gpu code that already existed in the cpu code.