Open Source GPT-NEO & GPT-J

4 min readJun 28, 2021

GPT-3 is the Deep Learning model that was presented a few months ago and that amazed the world with its ability to generate text, translate, perform arithmetic operations, and most importantly, perform tasks for which it has not been specifically trained.

Now, a Discord community, EleutherAI, has trained several models of GPT-3. And they have come up with two very interesting versions that we will explore here:

GPT-Neo

As stated on their web:

GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to train an equivalent model to the full-sized GPT⁠-⁠3 and make it available to the public under an open licence.
GPT⁠-⁠Neo is an implementation of model & data-parallel GPT⁠-⁠2 and GPT⁠-⁠3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.

GPT-Neo currently has two versions, one with 1.3 billion parameters and one with 2.7 billion.

GPT-J-6B

In parallel, a version of GPT, called J-6B, has been developed that performs like the 6 billion parameter version of GPT-3. This would result in the closest…

Open Source GPT-NEO & GPT-J

GPT-Neo

GPT-J-6B

Written by Victor Roman