Feeding TFRecord to a Custom Training Loop (CTL) with Mirrored Strategy.
2020-03-09 17:39:56

Yesterday I was trying to apply MirroredStrategy to my TF2.1 based training script. But there are tons of unexpected errors.

First I try to follow the tutorial https://www.tensorflow.org/tutorials/distribute/custom_training. The example was perfectly working. However, the way was not applied to my source as I expected. I thought the issue was caused by TFRecord. It is partially true but The real problem was that I did not carefully read "Alternate ways of iterating over a dataset" in that tutorial.

Now it leaves a warning message but it works fine.

WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices

I use two RTX2080 Ti cards, actually, two GPUs are not powerful than I expected.

As for training speech transformer using WSJ, it showed only 1.5x faster training speed than that with single GPU training.

with one GPU card, 770 secs for the first epoch and 625 secs for remained epochs.

with two GPU cards, 540 secs for the first epoch and 389 secs for remained epochs.

Anyway, I spent the whole Sunday doing this. I'm happy at last but for what..?

▼ more
about multiple gpus
2020-03-07 21:39:48

when your multiple gpu code is not workingking.

from https://github.com/tensorflow/tensorflow/issues/36510

TF_FORCE_GPU_ALLOW_GROWTH=true

of course, you can also set this option on in your source code.

gpu_devices = tf.config.experimental.list_physical_devices('GPU')

for device in gpu_devices:

  tf.config.experimental.set_memory_growth(device, True)

if you want to check the status of nvlink.

nvidia-smi nvlink --status

▼ more
in convex optimization book.
2020-02-14 20:11:54

Once the skill of recognizing or formulating convex optimization problems is developed, you will find that surprisingly many problems can be solved via convex optimization.

- convex optimization: https://web.stanford.edu/~boyd/cvxbook/

I believe convex optimization will solve my practical problems too.

▼ more
can you?
2020-02-11 10:38:47

8 hours of sleep

decaffeination

▼ more