I have had some difficulty determining numbers for training times for mnist, so I am going to post some of mine, and also discuss what my network is doing.

So far in my work on mnist, I have generated a convergent network. Using cpu only, ryzen 7 1700, I train a convergent network in under five minutes.

I have made a video of a small convergent network which was worked on for few hours, and here it is:

You may need to go to youtube to view it full screen, which I recommend.

This network consists of 10 layers, two of which, are not shown, these are the input and output layers.

The input layer is the size of the mnist image, 784, and would have a pixel dimension of 7840, this layer is excluded.

The output layer is the number of classifications, it has 11, one for each digit and also null. This layer is also excluded.

The remaining hidden layers are all ten nodes in length and therefore 100 pixels in dimension.

The total number of nodes in the network is 875, with the vast majority of these nodes on the input layer alone.

The network does no convolution, all layers are fully connected layers. I have been studying convolution for some time and am unsure how necessary it is for image recognition. I read things on the internet which suggest that there is no relationship between nodes, and therefore relationships must be created through convolution. This is not true, there are indeed spacial relationships, and this is evident in my videos. I have not implemented convolution in C++ yet, nor python, for comparisons of non-convolution vs convolution image recognition. Furthermore, I do not have much experience with python and am only starting to study things like tensorflow.

From what I think initially, is that convolution revists many pixels, and I wonder, does this result in more processing speed or recognition power than non-convolution, I do not know, yet.