vsoc: 2017

Dienstag, 29. August 2017

NN cannot determine the correct output bias

Training a feedforward neural net with one hidden layer leads always to an output value that is shifted for a certain amount. Increasing the number of iterations or the size of the testdataset does not improve the the result.

These are some of the parameters I used to build/train the NN.

title	Bias
class	class vsoc.training.BatchSizeTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 500000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x A 1000, playerpos_x B 1000, playerpos_x C 1000, playerpos_x D 1000, playerpos_x E 1000
iterations	200
optAlgo	STOCHASTIC_GRADIENT_DESCENT
numHiddenNodes	100, 300, 500
regularisation	None
seed	-751785241836862251

Testing the trained NNs against different testdatasets leads to the following result.

The output value is always shifted for a certain amount no matter what testdataset is used.

Open Question

Is there a way to determine the correct output bias during training the NN ?

Freitag, 25. August 2017

Manual Bias

Parameters

title	Manual Bias
class	class vsoc.training.BatchSizeTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 500000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 1000
iterations	200
optAlgo	STOCHASTIC_GRADIENT_DESCENT
numHiddenNodes	100, 200, 300, 500
regularisation	None
seed	-8876029723576768028, -3375934931721753594, 1696412033065026326, 6283369899114212938, -3587145959788761006, -49633614034392359, 839473069744556995, 2684180910583920968, -7386426029039824287, 3837550597042215702, -4997332816031131463, 3815956045927450641, -3403545420327942156, 3661589999326108919, 2455182172019959569, 3162067559174869087, -1669667726484573068, -397364268861304359, -7913912163456363611, 7909811908685690708, 2311254609887171408, 8435778613472697536, 504319445395541907, -433047265908778508, -5423116490601089927, 8381378201148666576, -6904400535536458560, -4437293041802599002, 5946303504748415202, 7150743982293367765, 2242790324642662994, -3814375723625398089

Manually adopting the output bias leads to the following result.

Manually added bias values

A_100	-1.1
A_200	-1.35
A_300	-1.5
A_500	1.5
B_100	-0.8
B_200	1.7
B_300	1.6
B_500	1.6
C_100	-1.0
C_200	-1.5
C_300	1.4
C_500	1.6

Conclusion

Manually adopting the output bias leads to better results

Open question

Why does the learning algorithm not minimize the output bias ?

Bias Regularisation

title	Bias Regularisation
class	class vsoc.training.RegularisationTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 500000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 1000
iterations	200
optAlgo	STOCHASTIC_GRADIENT_DESCENT
numHiddenNodes	50, 100, 200
regularisation	Some(Regularisation(0.0,0.0,2.0,2.0)), Some(Regularisation(0.0,0.0,1.0,1.0)), Some(Regularisation(0.0,0.0,0.001,0.001)), Some(Regularisation(0.0,0.0,1.0E-5,1.0E-5)), Some(Regularisation(0.0,0.0,0.0,0.0))
seed	3915220284829678

Conclusion

Expected that bias regularisation could fix the wrong bias for nets with more than 100 intermediate nodes, but there was no effect.
Taking 100 intermediate nodes leads so far to the best results.

Further Work

Find a way to learn the correct bias.

Dienstag, 22. August 2017

Number of hidden nodes

title	Number of hidden nodes
class	class vsoc.training.NumHiddenNodesTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 500000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 1000
iterations	200
optAlgo	STOCHASTIC_GRADIENT_DESCENT
numHiddenNodes	50, 100, 200, 500, 700
seed	4278888856359104810, 4039877965451516678, -4611576926702079432, -1188411120859739005

Conclusion

Variance gets smaller until 500 hidden nodes.
There is quite a big bios on the error that gets worse the more hidden layers are used.

Work to do

Try out bios Regularisation.

Sonntag, 20. August 2017

Regularisation l1 l2

title	Regularisation l1 l2
class	class vsoc.training.SizeIterationsTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 500000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 1000
iterations	200
optAlgo	STOCHASTIC_GRADIENT_DESCENT
seed	4228810264686955519

Conclusion

l1, l2 Regularisation seems to have no effect

multiple iterations for different sized datasets

Same data with different scaling of the y-axis.

title	multiple iterations for different sized datasets
class	class vsoc.training.SizeIterationsTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000, playerpos_x A 5000000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 5000
iterations	50, 100, 200, 500
optAlgo	STOCHASTIC_GRADIENT_DESCENT
seed	6779276617664099510

Conclusion
The difference of the error variation between a dataset of 10k and 5M is not very big.
A Dataset of 10k with 500 iterations seems to be sufficient for reasonable results.

Test Optimzation Algorithm | iterations: 300

CG	CONJUGATE_GRADIENT
LB	LBFGS
LG	LINE_GRADIENT_DESCENT
SG	STOCHASTIC_GRADIENT_DESCENT

title	Test Optimzation Algorithm \| iterations: 300
class	class vsoc.training.LearningRateIterationsTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 50000, playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 1000
iterations	300
optAlgo	CONJUGATE_GRADIENT, LBFGS, LINE_GRADIENT_DESCENT, STOCHASTIC_GRADIENT_DESCENT
seed	8726567394090864187

Conclusion

Stochastic gradient descent seems to be the best optimisation. Eventually the other algorithms perform better with other meta parameters.

Freitag, 18. August 2017

multiple iterations for different sized datasets

title	multiple iterations for different sized datasets
class	class vsoc.training.SizeIterationsTraining$
learningRate	1.0E-04
trainingData	playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000, playerpos_x A 5000000
batchSizeTrainingDataRelative	0.10
testData	playerpos_x B 5000
iterations	1, 50, 100, 200
seed	687977487191482088

Conclusion

Dataset with 1000000 (1M) lines and 200 Iterations is enough to train a one layer net

test learning rate | iterations: 500

title	test learning rate \| iterations: 500
class	class vsoc.training.LearningRateIterationsTraining$
learningRate	1.0E-02, 1.0E-03, 1.0E-04, 1.0E-05
trainingData	playerpos_x A 50000, playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000
batchSizeTrainingDataRelative	0.50
testData	playerpos_x B 1000
iterations	500
seed	1730784479442308435

Conclusion

Learning rate of 0.0001 (1.0E-4) seems to be the best for all dataset sizes

batch size relative with different iterations. dataset size:100000

title	batch size relative with different iterations. dataset size:1000000
class	class vsoc.training.BatchSizeTraining$
learningRate	1.0E-03
trainingData	playerpos_x A 1000000
batchSizeTrainingDataRelative	0.10, 0.20, 0.50, 0.80
testData	playerpos_x B 1000
iterations	10, 50, 100, 200
seed	2771765897592056378

Conclusion

For a dataset of size 1000000 it seems to be optimal to have a small batch size like 0.1

Further Test

- Check if a batch size smaller 0.1 makes sense

- Check if a small batch size makes sense for smaller datasets. E.g. 100000.

Mittwoch, 16. August 2017

test learning rate | iterations: 500

title	test learning rate \| iterations: 500
class	class vsoc.training.LearningRateIterationsTraining$
learningRate	5.0E-04, 1.0E-04, 5.0E-05, 1.0E-05
trainingData	playerpos_x A 50000, playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000
batchSizeTrainingDataRelative	0.50
testData	playerpos_x B 1000
iterations	500
seed	6741344407080914846, 2973931688859548271, 7003379254294840799, -6932794719115052821, 6080736103119206684, -48874423639357220, 8781826812285936313, 7670089977991256700, 8651878094978452995, -9079235514486697167, -2799581918634965629, 1929973805206056191, -2098837483005116009, 2967323285262997822, 7199009957595655229, -2924705563695555956

Conclusion

The used learning rates make no difference for the error.

Further work

Run the same test with another range of learning rates. E.g. 0.01, 0.001, 0.0001, 0.00001