1. DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)

    DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)

    2
    0
    39
  2. Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

    Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

    11
  3. Pretrained Transformers as Universal Computation Engines (Machine Learning Research Paper Explained)

    Pretrained Transformers as Universal Computation Engines (Machine Learning Research Paper Explained)

    27
  4. Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)

    Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)

    59
    15
    25
  5. HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

    HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

    12
  6. Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained)

    Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained)

    47
  7. ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

    ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

    25
    8
    21
  8. 🐐 OpenAI Tutorial - Learn Text Completion with OpenAI, ChatGPT, Next.Js, React & TailwindCSS

    🐐 OpenAI Tutorial - Learn Text Completion with OpenAI, ChatGPT, Next.Js, React & TailwindCSS

    18
  9. There Is No Such Thing As The COSMO Algorithm! | SSP #606

    There Is No Such Thing As The COSMO Algorithm! | SSP #606

    3
    0
    22
    1
  10. Yolopark AMK Mini G1 Transformers - Model kits | Build and Review

    Yolopark AMK Mini G1 Transformers - Model kits | Build and Review

    30
    2