Posts mit dem Label Playerpos werden angezeigt. Alle Posts anzeigen
Posts mit dem Label Playerpos werden angezeigt. Alle Posts anzeigen

Sonntag, 20. August 2017

Regularisation l1 l2



title Regularisation l1 l2
class class vsoc.training.SizeIterationsTraining$
learningRate 1.0E-04
trainingData playerpos_x A 500000
batchSizeTrainingDataRelative 0.10
testData playerpos_x B 1000
iterations 200
optAlgo STOCHASTIC_GRADIENT_DESCENT
seed 4228810264686955519

Conclusion

l1, l2 Regularisation seems to have no effect

multiple iterations for different sized datasets




Same data with different scaling of the y-axis.




title multiple iterations for different sized datasets
class class vsoc.training.SizeIterationsTraining$
learningRate 1.0E-04
trainingData playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000, playerpos_x A 5000000
batchSizeTrainingDataRelative 0.10
testData playerpos_x B 5000
iterations 50, 100, 200, 500
optAlgo STOCHASTIC_GRADIENT_DESCENT
seed 6779276617664099510

Conclusion
The difference of the error variation between a dataset of 10k and 5M is not very big.
A Dataset of 10k with 500 iterations seems to be sufficient for reasonable results.

Test Optimzation Algorithm | iterations: 300




CG CONJUGATE_GRADIENT
LB LBFGS
LG LINE_GRADIENT_DESCENT
SG STOCHASTIC_GRADIENT_DESCENT


title Test Optimzation Algorithm | iterations: 300
class class vsoc.training.LearningRateIterationsTraining$
learningRate 1.0E-04
trainingData playerpos_x A 50000, playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000
batchSizeTrainingDataRelative 0.10
testData playerpos_x B 1000
iterations 300
optAlgo CONJUGATE_GRADIENT, LBFGS, LINE_GRADIENT_DESCENT, STOCHASTIC_GRADIENT_DESCENT
seed 8726567394090864187

Conclusion

Stochastic gradient descent seems to be the best optimisation. Eventually the other algorithms perform better with other meta parameters.

Freitag, 18. August 2017

multiple iterations for different sized datasets





title multiple iterations for different sized datasets
class class vsoc.training.SizeIterationsTraining$
learningRate 1.0E-04
trainingData playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000, playerpos_x A 5000000
batchSizeTrainingDataRelative 0.10
testData playerpos_x B 5000
iterations 1, 50, 100, 200
seed 687977487191482088

Conclusion

Dataset with 1000000 (1M) lines and 200 Iterations is enough to train a one layer net

test learning rate | iterations: 500





title test learning rate | iterations: 500
class class vsoc.training.LearningRateIterationsTraining$
learningRate 1.0E-02, 1.0E-03, 1.0E-04, 1.0E-05
trainingData playerpos_x A 50000, playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000
batchSizeTrainingDataRelative 0.50
testData playerpos_x B 1000
iterations 500
seed 1730784479442308435

Conclusion

Learning rate of 0.0001 (1.0E-4) seems to be the best for all dataset sizes 

batch size relative with different iterations. dataset size:100000


title batch size relative with different iterations. dataset size:1000000
class class vsoc.training.BatchSizeTraining$
learningRate 1.0E-03
trainingData playerpos_x A 1000000
batchSizeTrainingDataRelative 0.10, 0.20, 0.50, 0.80
testData playerpos_x B 1000
iterations 10, 50, 100, 200
seed 2771765897592056378

Conclusion

For a dataset of size 1000000 it seems to be optimal to have a small batch size like 0.1

Further Test

- Check if a batch size smaller 0.1 makes sense
- Check if a small batch size makes sense for smaller datasets. E.g. 100000.

Mittwoch, 16. August 2017

test learning rate | iterations: 500


title test learning rate | iterations: 500
class class vsoc.training.LearningRateIterationsTraining$
learningRate 5.0E-04, 1.0E-04, 5.0E-05, 1.0E-05
trainingData playerpos_x A 50000, playerpos_x A 100000, playerpos_x A 500000, playerpos_x A 1000000
batchSizeTrainingDataRelative 0.50
testData playerpos_x B 1000
iterations 500
seed 6741344407080914846, 2973931688859548271, 7003379254294840799, -6932794719115052821, 6080736103119206684, -48874423639357220, 8781826812285936313, 7670089977991256700, 8651878094978452995, -9079235514486697167, -2799581918634965629, 1929973805206056191, -2098837483005116009, 2967323285262997822, 7199009957595655229, -2924705563695555956

Conclusion 

The used learning rates make no difference for the error.

Further work

Run the same test with another range of learning rates. E.g. 0.01, 0.001, 0.0001, 0.00001