I would perhaps change 'GPT-3' to just say 'GPT' instead, as a more salient fix.

stellaathena · on March 21, 2021

GPT-3 isn't a single model. It's a model architecture that is very closely followed by GPT-Neo. The 2.7B model is the exact same size as something OpenAI sells under the label "GPT-3"

ve55 · on March 22, 2021

My line of thinking was that for the average HN reader, who has probably read 'GPT-3' perhaps 500 times by now (every instance of which was referencing OpenAI's infamous 175B model), it may be confusing for them to see this with the same label, when the release is not comparable as far as parameters/performance (yet). But as yourself and another commenter noted, it is still the GPT-3 architecture (or hopefully isomorphic to it), so I appreciate your correction as well.

stellaathena · on March 22, 2021

That's fair. I also later learned that the title didn't explicitly mention model size at first, and I would have probably raised similar complaints had I seen that.

Dylan16807 · on March 22, 2021

Is GPT-2's architecture any different?

stellaathena · on March 22, 2021

Not hugely, but yes. I tend to think of GPT as a style of architecture with consistent themes and major features, but varying minor features and implementation details. Off the top of my head, I believe the most important difference is that GPT-3 alternates global and local attention while GPT-2 is all global attention.

The two published GPT-Neo models follow GPT-3's lead but the repo lets the user pick whether to use global or local attention layers.

nl · on March 21, 2021

This is incorrect. It's the GPT-3 model architecture and optimisations, and uses training techniques similar to GPT-3.

ve55 · on March 22, 2021

Thank you, I've rephrased a few things to improve the wording with respect to this.