GPT-3 isn't a single model. It's a model architecture that is very closely followed by GPT-Neo. The 2.7B model is the exact same size as something OpenAI sells under the label "GPT-3"
My line of thinking was that for the average HN reader, who has probably read 'GPT-3' perhaps 500 times by now (every instance of which was referencing OpenAI's infamous 175B model), it may be confusing for them to see this with the same label, when the release is not comparable as far as parameters/performance (yet). But as yourself and another commenter noted, it is still the GPT-3 architecture (or hopefully isomorphic to it), so I appreciate your correction as well.
That's fair. I also later learned that the title didn't explicitly mention model size at first, and I would have probably raised similar complaints had I seen that.
Not hugely, but yes. I tend to think of GPT as a style of architecture with consistent themes and major features, but varying minor features and implementation details. Off the top of my head, I believe the most important difference is that GPT-3 alternates global and local attention while GPT-2 is all global attention.
The two published GPT-Neo models follow GPT-3's lead but the repo lets the user pick whether to use global or local attention layers.