Open Source GPT-NEO & GPT-J

Victor Roman
4 min readJun 28, 2021
Picture from Unsplash

GPT-3 is the Deep Learning model that was presented a few months ago and that amazed the world with its ability to generate text, translate, perform arithmetic operations, and most importantly, perform tasks for which it has not been specifically trained.

Now, a Discord community, EleutherAI, has trained several models of GPT-3. And they have come up with two very interesting versions that we will explore here:

GPT-Neo

As stated on their web:

GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to train an equivalent model to the full-sized GPT⁠-⁠3 and make it available to the public under an open licence.

GPT⁠-⁠Neo is an implementation of model & data-parallel GPT⁠-⁠2 and GPT⁠-⁠3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.

GPT-Neo currently has two versions, one with 1.3 billion parameters and one with 2.7 billion.

GPT-J-6B

In parallel, a version of GPT, called J-6B, has been developed that performs like the 6 billion parameter version of GPT-3. This would result in the closest…

--

--

Victor Roman

Industrial Engineer and passionate about 4.0 Industry. My goal is to encourage people to learn and explore its technologies and their infinite posibilites.