Premium Only Content

Learning Rate Grafting: Transferability of Optimizer Tuning (Machine Learning Research Paper Review)
#grafting #adam #sgd
The last years in deep learning research have given rise to a plethora of different optimization algorithms, such as SGD, AdaGrad, Adam, LARS, LAMB, etc. which all claim to have their special peculiarities and advantages. In general, all algorithms modify two major things: The (implicit) learning rate schedule, and a correction to the gradient direction. This paper introduces grafting, which allows to transfer the induced learning rate schedule of one optimizer to another one. In that, the paper shows that much of the benefits of adaptive methods (e.g. Adam) are actually due to this schedule, and not necessarily to the gradient direction correction. Grafting allows for more fundamental research into differences and commonalities between optimizers, and a derived version of it makes it possible to computes static learning rate corrections for SGD, which potentially allows for large savings of GPU memory.
OUTLINE
0:00 - Rant about Reviewer #2
6:25 - Intro & Overview
12:25 - Adaptive Optimization Methods
20:15 - Grafting Algorithm
26:45 - Experimental Results
31:35 - Static Transfer of Learning Rate Ratios
35:25 - Conclusion & Discussion
Paper (OpenReview): https://openreview.net/forum?id=FpKgG...
Old Paper (Arxiv): https://arxiv.org/abs/2002.11803
Our Discord: https://discord.gg/4H8xxDF
Abstract:
In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem. In this work, we probe the entanglements between the optimizer and the learning rate schedule. We propose the technique of optimizer grafting, which allows for the transfer of the overall implicit step size schedule from a tuned optimizer to a new optimizer, preserving empirical performance. This provides a robust plug-and-play baseline for optimizer comparisons, leading to reductions to the computational cost of optimizer hyperparameter search. Using grafting, we discover a non-adaptive learning rate correction to SGD which allows it to train a BERT model to state-of-the-art performance. Besides providing a resource-saving tool for practitioners, the invariances discovered via grafting shed light on the successes and failure modes of optimizers in deep learning.
Authors: Anonymous (Under Review)
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
-
LIVE
megimu32
1 hour agoON THE SUBJECT: The Epstein List & Disney Channel Original Movies Nostalgia!!
246 watching -
9:06
Colion Noir
10 hours agoKid With Gun Shoots & Kills 2 Armed Robbers During Home Invasion
16.9K3 -
54:28
LFA TV
1 day agoUnjust Man | TRUMPET DAILY 2.27.25 7PM
19.7K2 -
20:10
CartierFamily
7 hours agoAndrew Schulz DESTROYS Charlamagne’s WOKE Meltdown on DOGE & Elon Musk!
36.8K46 -
1:36:39
Redacted News
5 hours agoBOMBSHELL EPSTEIN SH*T SHOW JUST DROPPED ON WASHINGTON, WHAT IS THIS? | Redacted w Clayton Morris
145K247 -
2:03:31
Revenge of the Cis
7 hours agoEpisode 1453: Fat & Fit
48.6K8 -
2:38:12
The White House
7 hours agoPresident Trump Holds a Press Conference with Prime Minister Keir Starmer of the United Kingdom
160K60 -
1:01:04
In The Litter Box w/ Jewels & Catturd
1 day agoDrain the Swamp! | In the Litter Box w/ Jewels & Catturd – Ep. 751 – 2/27/2025
91.7K36 -
1:11:24
Dr. Drew
8 hours agoNEW: Cardiac Arrest In Healthy Young People After mRNA w/ Nicolas Hulscher – Ask Dr. Drew
72.7K38 -
4:24:53
Right Side Broadcasting Network
12 hours agoLIVE REPLAY: President Trump and UK Prime Minister Starmer Meet and Hold a Press Conference 2/27/25
166K47