Premium Only Content

Learning Rate Grafting: Transferability of Optimizer Tuning (Machine Learning Research Paper Review)
#grafting #adam #sgd
The last years in deep learning research have given rise to a plethora of different optimization algorithms, such as SGD, AdaGrad, Adam, LARS, LAMB, etc. which all claim to have their special peculiarities and advantages. In general, all algorithms modify two major things: The (implicit) learning rate schedule, and a correction to the gradient direction. This paper introduces grafting, which allows to transfer the induced learning rate schedule of one optimizer to another one. In that, the paper shows that much of the benefits of adaptive methods (e.g. Adam) are actually due to this schedule, and not necessarily to the gradient direction correction. Grafting allows for more fundamental research into differences and commonalities between optimizers, and a derived version of it makes it possible to computes static learning rate corrections for SGD, which potentially allows for large savings of GPU memory.
OUTLINE
0:00 - Rant about Reviewer #2
6:25 - Intro & Overview
12:25 - Adaptive Optimization Methods
20:15 - Grafting Algorithm
26:45 - Experimental Results
31:35 - Static Transfer of Learning Rate Ratios
35:25 - Conclusion & Discussion
Paper (OpenReview): https://openreview.net/forum?id=FpKgG...
Old Paper (Arxiv): https://arxiv.org/abs/2002.11803
Our Discord: https://discord.gg/4H8xxDF
Abstract:
In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem. In this work, we probe the entanglements between the optimizer and the learning rate schedule. We propose the technique of optimizer grafting, which allows for the transfer of the overall implicit step size schedule from a tuned optimizer to a new optimizer, preserving empirical performance. This provides a robust plug-and-play baseline for optimizer comparisons, leading to reductions to the computational cost of optimizer hyperparameter search. Using grafting, we discover a non-adaptive learning rate correction to SGD which allows it to train a BERT model to state-of-the-art performance. Besides providing a resource-saving tool for practitioners, the invariances discovered via grafting shed light on the successes and failure modes of optimizers in deep learning.
Authors: Anonymous (Under Review)
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
-
LIVE
Spartan
2 hours agoCharlotte Qualifier watch party + Ranked and Expedition 33
177 watching -
LIVE
bigbossrobinson
5 hours agoLIVE - DOUBLE IMPACT - RESIDENT EVIL 4 & METAL GEAR SOLID Δ: SNAKE EATER
45 watching -
8:18
MattMorseTV
4 hours ago $4.51 earned2.2 MILLION in ONE YEAR.
18.5K103 -
14:37
Colion Noir
7 hours agoCanadian Police Chief Urges Citizens To Comply With Home Invaders And Hide
53.3K73 -
3:10:59
OVERKLOC
3 hours ago🔴LIVE - CHILL SUNDAY GAMING - WHO KNOWS WHAT WE'LL PLAY
2.99K -
LIVE
FrizzleMcDizzle
5 hours ago $0.24 earnedNightReign - Lies of P: Overture after - Come hang out
55 watching -
1:18:54
Jeff Ahern
5 hours ago $11.50 earnedThe Sunday Show with Jeff Ahern
56.6K6 -
32:05
Tactical Advisor
5 hours agoNew Thermal Target for the Military | Vault Room Live Stream 038
45.6K3 -
LIVE
ttvglamourx
8 hours ago $1.78 earnedPLAYING WITH VIEWERS !DISCORD
85 watching -
5:59:31
TheManaLord Plays
10 hours agoMANA SUMMIT - DAY 2 ($10,200+) | BANNED PLAYER SMASH MELEE INVITATIONAL
30.5K1