跳转至

AI算子优化

Problem

You've entered an AI laboratory in a university.

There are many commonly used operators in the field of deep learning, and what you need to do today is to optimize three of them.

The code of your project is included in problem attachments.

matmul

Calculate the result of multiplying two 2D matrices.

You can modify the macro MM_KERNEL_SIMPLE in /src/math/matmul_simple.c to optimize the operator.

If you want to test the operator yourself, you can run this

cd build
cmake ..
make
cd bin
./matmul_simple_test ../../test/test_data/matmul/2_case_32_1024_32_float/input1.dat \
    ../../test/test_data/matmul/2_case_32_1024_32_float/input2.dat \
    ../../test/test_data/matmul/2_case_32_1024_32_float/output.dat

conv2d

Calculate two-dimensional convolution based on 4D input tensor(N, C_in, H, W) and 4D filter(C_in, C_out, K, K).

You can modify the macro CONV2D_KERNEL_SIMPLE in /src/nn/conv2d_simple.c to optimize the operator.

If you want to test the operator yourself, you can run this

cd build
cmake ..
make
cd bin
./conv2d_simple_test ../../test/test_data/conv2d/1_case_2_2_16_16_float/input.dat \
    ../../test/test_data/conv2d/1_case_2_2_16_16_float/filter.dat \
    ../../test/test_data/conv2d/1_case_2_2_16_16_float/output.dat

resize2d_bilinear

Resize 2D images with bilinear interpolation.

You can modify the macro resize2d_bilinear_kernel in /src/nn/resize2d_bilinear.c to optimize the operator.

If you want to test the operator yourself, you can run this

cd build
cmake ..
make
cd bin
./resize2d_bilinear_test \
    ../../test/test_data/resize2d_bilinear/1_case_1024_1024_512_512_float/input.dat \
    ../../test/test_data/resize2d_bilinear/1_case_1024_1024_512_512_float/output.dat \
    ../../test/test_data/resize2d_bilinear/1_case_1024_1024_512_512_float/shape.dat  

How to submit

Put the three macros in one header file and submit it.

A sample submission: (submit this directly and you will get ~20pts)

#ifndef _ANSWER_H_
#define _ANSWER_H_
#pragma GCC optimize("O3")

// kernel of resize2d_bilinear
#define resize2d_bilinear_kernel(typename)                                       \
    typename *in_data = aitisa_tensor_data(input);                               \
    typename *out_data = aitisa_tensor_data(*output);                            \
    for (int64_t i = 0; i < target_h; i++)                                       \
    {                                                                            \
        for (int64_t j = 0; j < target_w; j++)                                   \
        {                                                                        \
            double raw_u = i * (double)h / (double)target_h;                     \
            double raw_v = j * (double)w / (double)target_w;                     \
            int64_t u = (int64_t)raw_u;                                          \
            int64_t v = (int64_t)raw_v;                                          \
            if (u + 1 == h || v + 1 == w)                                        \
            {                                                                    \
                out_data[i * target_w + j] = in_data[u * w + v];                 \
                continue;                                                        \
            }                                                                    \
            typename f00 = in_data[u * w + v];                                   \
            typename f01 = in_data[u * w + v + 1];                               \
            typename f10 = in_data[(u + 1) * w + v];                             \
            typename f11 = in_data[(u + 1) * w + v + 1];                         \
            double x = raw_u - u;                                                \
            double y = raw_v - v;                                                \
            out_data[i * target_w + j] = f00 * (1 - x) * (1 - y) +               \
                                         f01 * (1 - x) * y + f10 * x * (1 - y) + \
                                         f11 * x * y;                            \
        }                                                                        \
    }

// kernel of conv2d_simple
#define CONV2D_KERNEL_SIMPLE(typename, A, B, C, N, C_in, H, W, C_out, K)                                                          \
    int H_out = H - K + 1;                                                                                                        \
    int W_out = W - K + 1;                                                                                                        \
    for (int n = 0; n < N; n++)                                                                                                   \
    {                                                                                                                             \
        for (int c = 0; c < C_out; c++)                                                                                           \
        {                                                                                                                         \
            for (int i = 0; i < H_out; i++)                                                                                       \
            {                                                                                                                     \
                for (int j = 0; j < W_out; j++)                                                                                   \
                {                                                                                                                 \
                    int offset_output = n * C_out * H_out * W_out + c * H_out * W_out + i * W_out + j;                            \
                    for (int kc = 0; kc < C_in; kc++)                                                                             \
                    {                                                                                                             \
                        for (int ki = 0; ki < K; ki++)                                                                            \
                        {                                                                                                         \
                            for (int kj = 0; kj < K; kj++)                                                                        \
                            {                                                                                                     \
                                int offset_input = n * C_in * H * W + kc * H * W + (i + ki) * W + (j + kj);                       \
                                int offset_filter = c * C_in * K * K + kc * K * K + ki * K + kj;                                  \
                                ((typename *)C)[offset_output] += ((typename *)A)[offset_input] * ((typename *)B)[offset_filter]; \
                            }                                                                                                     \
                        }                                                                                                         \
                    }                                                                                                             \
                }                                                                                                                 \
            }                                                                                                                     \
        }                                                                                                                         \
    }

// which means:
// C[n, c, i, j] += A[n, kc, i + ki, j + kj] * B[c, kc, ki, kj]

// kernel of matrix-matrix multiply
#define MM_KERNEL_SIMPLE(typename, A, B, C, M, K, N)                         \
    for (int i = 0; i < M; ++i)                                              \
    {                                                                        \
        for (int j = 0; j < N; ++j)                                          \
        {                                                                    \
            for (int q = 0; q < K; ++q)                                      \
            {                                                                \
                ((typename *)C)[i * N + j] +=                                \
                    ((typename *)A)[i * K + q] * ((typename *)B)[q * N + j]; \
            }                                                                \
        }                                                                    \
    }

#endif

How do we judge

For each operator, we have 5 test cases.

conv2d

Zero Score Time (secs) Full Score Time (secs) Full Score
0.06 0.02 3
0.81 0.24 10
0.06 0.02 3
0.78 0.25 10
0.75 0.24 10

matmul

Zero Score Time (secs) Full Score Time (secs) Full Score
0.75 0.24 10
0.06 0.02 3
0.24 0.08 10
3.85 1.24 14
1.93 0.62 12

resize2d_bilinear

We will exclude io time.

Zero Score Time (secs) Full Score Time (secs) Full Score
0.0095 0.0035 10
0.0165 0.0057 10
0.004 0.0015 10
0.065 0.025 15
0.024 0.0085 10

We use the function

S_{score}(t)=\frac{\ln \frac{t_{zero\_score}}{t}}{\ln \frac{t_{zero\_score}}{t_{full\_score}}}S_{full\_score}

to calculate your score.

附件

The project code

下载project_code.zip