Researching for my master thesis I tried to understand the paper by Goodfellow et al. on the Maxout Units. I found it very hard understanding the details and thought a clear explanation in combination with a nice figure would be really helpful. So this is my shot at doing so.
Although I knew how to implement an “normal” flat Multilayer Perceptron (MLP), I got very confused when trying to implement a CNN. In the end I wanted to fully understand the math to understand how to get a working implementation. Isn’t this just backpropagation as usual? Well, basically yes, but a bit more complex and with some pitfalls, so read on.