Grokking, or the sudden generalization by AI models to new knowledge - that occurs after prolonged overfitting in LLMs, is a surprising phenomenon that has challenged our understanding of deep learning and AI in general.
While a lot of progress has been made in understanding grokking, finally we get some answers -we have been waiting for 7 months to be discovered.
GROKKING - Finally understood!
Part II is available here https://youtu.be/H3OofROzlA0
All rights w/ authors:
GROKKING AT THE EDGE OF NUMERICAL STABILITY
Lucas Prieto, Melih Barsbey, Pedro A.M. Mediano, Tolga Birdal
Department of Computing, Imperial College London
#airesearch
#science
#newdiscovery