Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
735 views
in Technique[技术] by (71.8m points)

python - What is the proper way to benchmark part of tensorflow graph?

I want to benchmark some part of graph, here is for simplicity I use conv_block that is just conv3x3.

  1. Is it ok that x_np used in the loop is the same or I need to regenerate it each time?
  2. Do I need to do some 'warm up' run before run actual benchmark(seems this is needed for benchmark on GPU)? how to do it properly? is sess.run(tf.global_variables_initializer()) enough?
  3. What is proper way of measuring time in python, i.e. more precise method.
  4. Do I need to reset some system cache on linux before run script(maybe disabling np.random.seed is sufficient)?

Example code:

import os
import time

import numpy as np
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)


def conv_block(x, kernel_size=3):
    # Define some part of graph here

    bs, h, w, c = x.shape
    in_channels = c
    out_channels = c

    with tf.variable_scope('var_scope'):
        w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.contrib.layers.xavier_initializer())
        x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')

    return x


def get_data_batch(spatial_size, n_channels):
    bs = 1
    h = spatial_size
    w = spatial_size
    c = n_channels

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    #print('x_np.shape', x_np.shape)

    return x_np


def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
    print('=' * 60)
    print(f_name.__name__)

    tf.reset_default_graph()
    with tf.Session() as sess:
        x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
        z_tf = f_name(x_tf)
        sess.run(tf.global_variables_initializer())

        x_np = get_data_batch(spatial_size, n_channels)
        start_time = time.time()
        for _ in range(n_iter):
            z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        avr_time = (time.time() - start_time) / n_iter
        print('z_np.shape', z_np.shape)
        print('avr_time', round(avr_time, 3))

        n_total_params = 0
        for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
            n_total_params += np.prod(v.get_shape().as_list())
        print('Number of parameters:', format(n_total_params, ',d'))


if __name__ == '__main__':
    run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

Please log in or register to answer this question.

1 Answer

0 votes
by (71.8m points)

An answer to your primary question, 'What is the proper way to benchmark part of tensorflow graph?':

Tensorflow includes an abstract class that provides helpers for tensorflow benchmarks: Benchmark.

So, a Benchmark object can be made and used to execute a benchmark on part of a tensorflow graph. In the code below, a benchmark object is instantiated and then, the run_op_benchmark method is called. run_op_benchmark is passed the session, the conv_block Tensor (in this case), a feed_dict, a number of burn iterations, the desired minimum number of iterations, a boolean flag to keep the benchmark from also computing memory usage and a convenient name. The method returns a dictionary containing the benchmark results:

benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                     feed_dict={x_tf: x_np}, burn_iters=2, 
                                     min_iters=n_iter, 
                                     store_memory_usage=False, name='example')

This block of code can be inserted within your code as follows to compare the two benchmarkings:

import os
import time

import numpy as np
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)


def conv_block(x, kernel_size=3):
    # Define some part of graph here

    bs, h, w, c = x.shape
    in_channels = c
    out_channels = c

    with tf.compat.v1.variable_scope('var_scope'):
        w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.keras.initializers.glorot_normal())
        x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')

    return x


def get_data_batch(spatial_size, n_channels):
    bs = 1
    h = spatial_size
    w = spatial_size
    c = n_channels

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    #print('x_np.shape', x_np.shape)

    return x_np


def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
    print('=' * 60)
    print(f_name.__name__)

    tf.reset_default_graph()
    with tf.Session() as sess:
        x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
        z_tf = f_name(x_tf)
        sess.run(tf.global_variables_initializer())

        x_np = get_data_batch(spatial_size, n_channels)
        start_time = time.time()
        for _ in range(n_iter):
            z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        avr_time = (time.time() - start_time) / n_iter
        print('z_np.shape', z_np.shape)
        print('avr_time', round(avr_time, 3))

        n_total_params = 0
        for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
            n_total_params += np.prod(v.get_shape().as_list())
        print('Number of parameters:', format(n_total_params, ',d'))

        # USING TENSORFLOW BENCHMARK
        benchmark = tf.test.Benchmark()
        results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                             feed_dict={x_tf: x_np}, burn_iters=2, min_iters=n_iter,
                                             store_memory_usage=False, name='example')

        return results


if __name__ == '__main__':
    results = run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)

This implementation of a benchmarking class within the tensorflow library itself provides hints as to the answers to your other questions. As the tensorflow implementation does not necessitate use of a new feed_dict for each benchmark iteration, it would appear that the answer to question 1) 'Is it ok that x_np used in the loop is the same or I need to regenerate it each time?' is that it is OK to use the same x_np each loop. In regards to question 2), it does appear that some 'warm up' is necessary. The default number of burn iterations suggested by the tensorflow library implementation is 2. In regards to question 3), timeit is an excellent tool for measuring execution time of small code snippets. However, the tensorflow library itself uses time.time() in a similar manner to what you have done: run_op_benchmark (source). Interestingly, the tensorflow benchmark implementation reports back the median rather than the mean of the operation walltimes (presumably to make the benchmark more robust to outliers).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
...