1 min readSep 22, 2018
Akshat Pathak , Well, mostly auto encoder like architectures use sigmoid as activation functions. But this being the decoding architecture, I believe the researchers took the liberty and used the non-linearity as it is. Mostly the advantage it offers is the presence of non zero gradients everywhere and hence helps in back-propagation without the risk of “dying “ case!