RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, ...
“Wasting all that time and compute to get a 7B LLM to the same few shot classification performance as a 200M parameter Bert, but with terrible performance for actual generative ... advocates for ...
This project is a comprehensive collection of common machine learning algorithms implemented from scratch using raw Python, as well as utilizing the PyTorch and TensorFlow frameworks.