- #36
Zap
- 406
- 120
I'm trying to implement the backpropagation algorithm, I've just derived that algorithm myself instead of just copying some stuff from a website and plugging and chugging it.pbuk said:I think you should have made it clearer that you are not attempting to implement the backpropagation algorithm on the wikipedia page, you are trying to implement an algorithm you have invented yourself. This algorithm will never work as it is based on a fundamental misunderstanding:
It is not. It is the difference between the current values of the coefficients in layer ## l ## and their unknown optimum values.
Here is my recommendation:
Once you have done this you can start designing and implementing your own 'improved' supervised learning algorithm.
- study the subject so that you understand how it works; I would never recommend Wikipedia for this, in this case neuralnetworksanddeeplearning.com is a great free resource (which looks to have been quoted wholesale in Wikipedia, but why rely on the accuracy of that quote when you can go to the original?)
- study how scikit-learn does it https://github.com/scikit-learn/sci...earn/neural_network/_multilayer_perceptron.py
- implement the standard algorithm (or some other proven algorithm) using Python structures designed for the purpose (with the layers represented by a list of numpy.arrays)
When I take the derivative of the cost function with respect to one of the weights in the output layer, I get a gradient ##\nabla_{a}C## that is a matrix after applying the general chain rule on the cost function. For some reason, the algorithm found in http://neuralnetworksanddeeplearning.com and in wikipedia is treating the same gradient ##\nabla_{a}C## as a vector. This doesn't make any sense to me, unless they are applying the cost function to one example at a time. But why are they doing that? The cost function is a function of all examples in the output matrix.
It will take some extra effort on my part, but I can show you how I derive ##\nabla_{a}C## as a matrix, simply by applying the chain rule when trying to take the derivative of the cost ##C## with respect to one of the weights in the output layer. ##\nabla_{a}C## is a vector only if you are treating the cost function ##C## as being a function of only one output vector. I could make a separate thread about this, because I believe I have now identified this as being the issue in my program. However, I don't understand why it's an issue, since ##C## is a function of the output matrix, not only an output vector.
Last edited: