are you me?    id passwd

status  

 new experiences

picture

 

 with kids

calender

tensorflow operations (matmul vs einsum, maximum vs maxout) - 컴퓨터

# the results from einsum and matmul (or multiply) differ but the difference can be negligible.

# maxout operation of tensorflow_addons returns exactly the same results of tf.maximum.

import tensorflow as tf
import tensorflow_addons as tfa

batch = 50
seq_len = 100
in_h = 60
out_h = 30
in_d = 20
out_d = 10

emb = tf.random.normal([batch, seq_len, in_h, in_d], stddev=0.1)
wgt = tf.random.normal([1, 1, in_h, out_h, out_d, in_d], stddev=0.1)
v = tf.random.normal([batch, 1, out_h, out_d, 1], stddev=0.1)

caps1_ex = tf.expand_dims(tf.expand_dims(emb, -1), 3)
caps1_ex_tiled = tf.tile(caps1_ex, [1, 1, 1, out_h, 1, 1])
u_hats = tf.matmul(tf.tile(wgt, [batch, seq_len, 1, 1, 1, 1]), caps1_ex_tiled)
naive_u_hats = tf.reshape(u_hats, [batch, seq_len, in_h, out_h, out_d, 1])

u_hat = naive_u_hats[:, 1, :, :, :, :]
naive_b = tf.matmul(u_hat, tf.tile(v, [1, in_h, 1, 1, 1]), transpose_a=True)
c = tf.nn.softmax(naive_b, axis=2)
naive_s = tf.multiply(c, u_hat)

naive_u_hats = tf.squeeze(naive_u_hats)
naive_b = tf.squeeze(naive_b)
naive_s = tf.squeeze(naive_s)

#naive_u_hats, naive_b, naive_s
wgt = tf.squeeze(wgt)
v = tf.squeeze(v)

u_hats = tf.einsum("ijkl,bsil->bsijk", wgt, emb)
einsum_u_hats = tf.reshape(u_hats, [batch, seq_len, in_h, out_h, out_d])
u_hat = einsum_u_hats[:, 1, :, :, :]
einsum_b = tf.einsum("biod,bod->bio", u_hat, v)
c = tf.nn.softmax(einsum_b, axis=2)
einsum_s = tf.einsum("bmi,bmij->bmij", c, u_hat)

tf.print(tf.reduce_sum(tf.abs(einsum_u_hats) - tf.abs(naive_u_hats))) # -4.99396774e-05
tf.print(tf.reduce_sum(tf.abs(einsum_b) - tf.abs(naive_b))) # -1.51097353e-07
tf.print(tf.reduce_sum(tf.abs(einsum_s) - tf.abs(naive_s))) # -2.60405955e-07

dense = tf.keras.layers.Dense(units=out_h)
ecs = tf.keras.layers.Conv2D(filters=50, kernel_size=(3, 3),
activation='linear', padding='same', strides=1)
mo = tfa.layers.Maxout(num_units=25, axis=-1)

emb = tf.expand_dims(dense(tf.reshape(emb, [batch, seq_len, in_h * in_d])), -1)
emb = ecs(emb)
y_maxout=mo(emb)
y_maximum=tf.maximum(emb[:,:,:,:25],emb[:,:,:,25:])

tf.print(tf.reduce_sum(tf.abs(y_maxout) - tf.abs(y_maximum))) # 0

written time : 2020-12-03 07:37:20.0

tf.keras.layers.RNN calls the cell using the first timestep of the timeseries twice - 컴퓨터

url: https://github.com/tensorflow/tensorflow/issues/30227
https://github.com/tensorflow/tensorflow/issues/30227#issuecomment-506783788

Searching deeply, I found that the first timestep is also used to determine the cell output shape and its dtype.

written time : 2020-11-30 20:50:17.0

nbody RTX 3090 - 컴퓨터

(py3-tf2-gpu) sephiroce@bike:/usr/local/cuda/samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=2560000 -device=0
Run "nbody -benchmark [-numbodies=]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies= (number of bodies (>= 1) to run in simulation)
-device= (where d=0,1,2.... for the CUDA device to use)
-numdevices= (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Ampere
> Compute 8.6 CUDA device: [GeForce RTX 3090]
number of bodies = 2560000
2560000 bodies, total time for 10 iterations: 69005.547 ms
= 949.721 billion interactions per second
= 18994.415 single-precision GFLOP/s at 20 flops per interaction

written time : 2020-09-27 02:34:49.0
...  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |  ...