GPT Neo: open-source GPT model, with pretrained 1.3B & 2.7B weight models

sillysaurusx · on March 21, 2021

Please test their models before you take it at face value.

Eleuther has a history of claiming to replicate projects when they haven't. For example, they shipped a DALL-E repo a few days after OpenAI announced it (https://twitter.com/theshawwn/status/1348017515897659392) which was broken, and they've walked back their GPT-3 replication claims to replicating 1.5B due to the fact that their architecture doesn't scale.

As far as I can tell, they're generating a large amount of hype with grandiose claims that they can't deliver on.

All I care about is whether you like their models and actually use them in practice. If you do, please let me know and I'll pipe down. But so far, I haven't heard of anyone who uses anything they've produced, and that worries me. Has anyone?

One specific claim they made: https://twitter.com/BlancheMinerva/status/134727697554780980...

"DALL-E is quite straight forward and already coded. We just need data to train it."

No, DALL-E is neither straightforward nor was it successfully coded, especially back on January 7th.

Anyway, carry on. I really don't like speaking badly of AI projects, and I hope that they succeed. The model release today is a good step forward, assuming it works. But it might be better to have the expectation of "the models don't work" until proven otherwise.

I'd also like to point out that there are some capable people doing work at Eleuther. Sid in particular is one of the best TPU hackers in the scene. I just wish they would scale down their claims, release more models, and not claim that they've done X until actually doing X. For example, the readme says they have "the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library," which they don't.

loxias · on March 22, 2021

Geez, that's really harsh.

I don't think any single thing you've claimed is factually wrong, and I don't speak for Eleuther nor am I attempting to justify their claims.

But.

As I understand it (mostly from lurking on their discord and reading publicly available materials) this is a group of volunteer academic types trying to replicate something great and awesome, with the only goal of giving it to the world. You could cut them some slack.

I can't speak for you, but as a "for free, weekend project" what they've done certainly makes me feel I need to up my game.

OgAstorga · on March 22, 2021

This has nothing to do with the good work, awesome intentions, nor the fact that they have no financial incentives behind this.

Claiming something that is not true is in itself wrong.

stellaathena · on March 22, 2021

I am sorry that I was misinformed about the state of our DALL-E replication when I made that tweet. It was not malicious - I was reporting what I had been told by someone else.

Yes I was wrong. That said, I had hoped that maybe after two an a half months Shawn would stop holding it over my head.

loxias · on March 22, 2021

> Claiming something that is not true is in itself wrong.

I 100% agree with this.

I also think that one catches more flies with honey than vinegar, and the criticism in the parent comment, while possibly valid, could be phrased more encouragingly and less combatively. It's easy to criticize, it's hard to create, and it's even harder to release.

wiz21c · on March 22, 2021

> Claiming something that is not true is in itself wrong.

yup, in any project, and especially the one done for the community where the only you get is satisfaction and fame, the success is super tied to communication. Good, honest communication is what builds trust.

regisg · on March 22, 2021

"Claiming something that is not true is in itself wrong." Does this apply to butterflies disguising themselves as dead leaves?

nullc · on March 22, 2021

Wouldn't it be nice if OpenAI were like .. actually open? :P

ryanackley · on March 22, 2021

Not just that. To even get access to their API, you need to apply. That is the future of AI I'm afraid without projects like this. That is Elites controlling AI and deciding who is "worthy" to use it.

I'm sure they have the best of intentions but "worthiness" is subjective.

TremendousJudge · on March 22, 2021

Depends on who you mean by "they". If you mean the researchers, then sure, they probably actually believe whatever's written in their ethics statement.

Now, the actual owners? I don't believe it for a second

brunoluiz · on March 22, 2021

Considering last decade social consequences due to easy access to APIs and data, I am quite happy that these initiatives are cautious around opening up software which can have huge impact on society.

hesdeadjim · on March 22, 2021

Unfortunately, the cat is out of the bag. Their methods are documented and the results exciting, so to a bad actor (especially state-sponsored) it's completely justified to spend millions attempting to replicate their results from what is publicly available.

natch · on March 22, 2021

Good perspective but I would like to hear the response from the developers before concluding too much.

This is not meant as a goad to you, but more just as more info for everyone, my understanding is it is an open source community of like minded people type project (as opposed to a bigco) and actively solicits contributions (by which I mean code and data) so anyone seeing room for improvement is welcome to step in from what I can tell.

I did find your comment helpful and informative; just adding another angle here.

stellaathena · on March 22, 2021

It's literally a couple people hanging out in a discord channel and doing this as a way to procrastinate their jobs.

mirekrusin · on March 22, 2021

Peak of laziness - build ai to do your job so you have more time building ai.

ShamelessC · on March 23, 2021

I think it would help your PR efforts to let people know that more often. People hear "we are <org_name>" and assume you're, you know, an organization. That comes with some amount of expected bug fixes, documentation, verifying results _before_ you release, etc.

I'm not really sure how much you gain by attracting tons of people to the discord if you finally release and everyone has unreasonable expectations due to the way you advertise the group as a whole.

stellaathena · on March 24, 2021

This'll probably sound silly, but we weren't really expecting much of a response. We've been sitting on these models (they were trained for the Pile paper) for months for no particular reason besides being focused on other things. We figured we'd put it out there in case anyone was interested and then Aran's tweet blew up hard.

ShamelessC · on March 25, 2021

Fair enough! I can personally attest to the results being quite impressive now. Fantastic work and apologies for the criticism. It must be a strange situation to get dragged in to all this due to a viral tweet.

6gvONxR4sf7o · on March 22, 2021

People are always claiming to release replicated models by replicating the architecture (or main parts of it) but not testing whether it produces the same level of results. It's maddening, especially when the level of results is so directly measurable (just measure what the paper did, not that it's easy, just concrete).

stellaathena · on March 24, 2021

Our README has comparisons with GPT-2 and GPT-3

6gvONxR4sf7o · on March 25, 2021

Is there anything on its few shot learning performance? I took few shot learning as the main point of GPT-3. Sorry if I just overlooked it, but I don’t see anything on few shot learning in the readme.

ImprobableTruth · on March 22, 2021

Disclaimer: I know absolutely nothing about machine learning.

Isn't GPT-3 the architecture? Are they doing something different or why would it not scale?

sendtown_expwy · on March 22, 2021

I would guess that an average FAANG ML engineer could code up and successfully execute a forward/backward pass on a GPT-1 or GPT-2 model with a day of effort or less. (GPT-3 a little harder, but not significantly). But is that model actually going to perform well? Most likely no. Model performance varies significantly due to subtle details in data processing implementations, seemingly insignificant details in code, and even from different numerical methods of calculating the same semantics.

If you don't believe me, consider that many ML researchers track their commits (or exact code versions) extremely carefully, because oftentimes they will make some change (or changes) they think are inconsequential and later find that actually, their model broke. If they made too many changes, whoops, guess you have to binary search over the diff to see what happened since your last "good run".

If the people who spent months (if not years) tuning a model can't tell whether it will work from the code, how could anyone else? Most ML researchers will not bother with most code that doesn't give proof of results (in terms of a model that can actually be evaluated) because it is just so unlikely that it will actually work well. Now, it might "work" in the sense that it converges and does something when you prompt it with examples. But will this GPT-3 reimplementation actually outperform say, the 10x smaller T5 checkpoint that was released by Google, or the other smaller language models others have released? If it doesn't, it's hard to argue that its very useful at all.

I think that's the spirit of why the original commenter said what they did, but I still do applaud the efforts of this team (and hope that their implementation is, in fact, highly performant!)

nmfisher · on March 22, 2021

GPT-3 is the name for the architecture, but there are a few different versions/sizes. The OpenAI version that impressed us all was ~170B parameters, this is far smaller.

To go from 2.7B to 170B parameters will need more than just a few config tweaks. There's a whole bunch of hacks and tricks needed to coax a model to train at that scale, the Eleuther version is almost guaranteed to fail out-of-the-box.

minimaxir · on March 22, 2021

It's worth noting that the GPT-3 paper did train models with more sane sizes (e.g. 1.5B) as a point of comparison. I am surprised/annoyed they never released them though.

stellaathena · on March 22, 2021

It's because OpenAI sells them for profit. The "Ada" model is the same size as the larger of these two EleutherAI models.

minimaxir · on March 22, 2021

Huh, I was wondering what the size of the non-davinci models were; guess that make sense.

It's still telling that a "small" GPT-3 model can risk cannibalizing a larger model.

stellaathena · on March 22, 2021

Ada is 2.7B, Babbage is 6.7B, Curie is 13.0B, and DaVinci is 175B. The new one they announced last month is in the 20-50B range I think, not totally sure though.

m00x · on March 22, 2021

It's the model, not the architecture, but you could say the model contains the architecture.

nl · on March 22, 2021

Almost all the challenges with GPT-sized models are engineering and training challenges, not architectural.

How do you train a model too big to fit in a single GPU? It's doable, but not simple. How do you update weights across your cluster? etc etc

cookiengineer · on March 22, 2021

What I find interesting about their marketing(?) is that they identified a market niche that they want to position themselves in.

Enterprise customers that have no idea about the technical details will just hear about OpenAI's success in this fancy new model and assume that Eleuther can deliver.

I mean, most use cases for "big data" projects that are tiny in comparison with Alphabet's datasets will just work with GPT2 fine, probably.

And Enterprise customers that hear those claims and see some code, maybe some demo, is enough for them to start the consultancy process.

In my opinion that's a policy problem that OpenAI introduced by not requiring the absolute reproducability of both the code and model, and both training and dataset of their models upon release.

Stakes are pretty high in the AI industry, and OpenAI actively influences it. My dream was in the beginning that they are a source of verification, audits and "proof" that models are legit...yet I have the feeling lately that they just buzz around like everyone else.

To this date I haven't seen anyone replicate any of the DNC results, for example.

Anyways, just my two cents on this one.

leogao · on March 22, 2021

To date, EleutherAI as an "organization" (read: basically a Discord server) has not really attempted any kind of marketing. It has no PR dept, just individuals tweeting about the work that Eleuther does.

ma2rten · on March 22, 2021

Also I'm pretty sure there is no consultancy process.

ve55 · on March 21, 2021

This is a nice release, but the title is a bit misleading as the released sizes (1.3B and 2.7B parameters) do not yet compare to the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead (although future releases may have significantly more!).

Edit: title improved, thank you!

nl · on March 21, 2021

Yeah. They say they are doing a 10B release soon[1].

I suspect they have run into training issues since they are moving to a new repo[2]

[1] https://twitter.com/arankomatsuzaki/status/13737326468119674...

[2] https://github.com/EleutherAI/gpt-neox/

chillee · on March 21, 2021

It's more about hardware - these models were trained on TPUs, while GPT-NeoX is being trained on GPUs graciously provided by Coreweave.

orra · on March 21, 2021

Any idea what the required GPU time would cost (if not donated)? Is GPT-3 just a commodity soon?

stellaathena · on March 21, 2021

Our current estimate is that it requires between 2000 and 4000 V100 months.

minimaxir · on March 21, 2021

With training improvements such as DeepSpeed, the GPU costs will likely be substantially lower than what was available at the time OpenAI trained GPT-3. Still not free, though.

The hard part with GPT-3 is it's big enough to make it difficult to actually deploy.

teruakohatu · on March 21, 2021

The number thrown around for gpt-3 is $4.6 million, but I am not sure where that figure originates.

minimaxir · on March 21, 2021

It was a number tossed around by a GPU hosting provider, based on their own costs: https://lambdalabs.com/blog/demystifying-gpt-3/

The reality is that GPT-3 was likely "free" to train on Azure, as Microsoft has provided a lot of resources to OpenAI.

exikyut · on March 24, 2021

If this is true, I wonder what sort of social capital transactional exchange is going on instead.

sailingparrot · on March 21, 2021

~4M$ per full training give or take.

pizza · on March 21, 2021

Fixed title to reflect that, thanks

ve55 · on March 21, 2021

I would perhaps change 'GPT-3' to just say 'GPT' instead, as a more salient fix.

stellaathena · on March 21, 2021

GPT-3 isn't a single model. It's a model architecture that is very closely followed by GPT-Neo. The 2.7B model is the exact same size as something OpenAI sells under the label "GPT-3"

ve55 · on March 22, 2021

My line of thinking was that for the average HN reader, who has probably read 'GPT-3' perhaps 500 times by now (every instance of which was referencing OpenAI's infamous 175B model), it may be confusing for them to see this with the same label, when the release is not comparable as far as parameters/performance (yet). But as yourself and another commenter noted, it is still the GPT-3 architecture (or hopefully isomorphic to it), so I appreciate your correction as well.

stellaathena · on March 22, 2021

That's fair. I also later learned that the title didn't explicitly mention model size at first, and I would have probably raised similar complaints had I seen that.

Dylan16807 · on March 22, 2021

Is GPT-2's architecture any different?

stellaathena · on March 22, 2021

Not hugely, but yes. I tend to think of GPT as a style of architecture with consistent themes and major features, but varying minor features and implementation details. Off the top of my head, I believe the most important difference is that GPT-3 alternates global and local attention while GPT-2 is all global attention.

The two published GPT-Neo models follow GPT-3's lead but the repo lets the user pick whether to use global or local attention layers.

nl · on March 21, 2021

This is incorrect. It's the GPT-3 model architecture and optimisations, and uses training techniques similar to GPT-3.

ve55 · on March 22, 2021

Thank you, I've rephrased a few things to improve the wording with respect to this.

graiz · on March 21, 2021

A great start for a truly open approach. It's ironic that OpenAI isn't particularly open about its tech.

victor9000 · on March 21, 2021

It was disappointing to see just how quickly ClosedAI changed its tune once they produced something of value.

qPM9l3XJrF · on March 22, 2021

Many people (correctly, in my view) criticized OpenAI for the name, saying that openness should be evaluated on a case by case basis. Glad they listened to critics instead of trying to maintain consistency for its own sake.

choxi · on March 21, 2021

Is there anything a non-AI researcher can do to help support this project? Is there a way to donate money? Or could a software engineer help with testing, tooling, or other kinds of infrastructure?

I was really excited about OpenAI's original plan and still believe that an open source solution is the best way to prevent the potential negative impacts AI might have on society. I can sort of appreciate why OpenAI went the route of going private and trying to monetize their work instead, it might prevent people from using their work nefariously and will probably provide them with way more capital to continue their efforts. But, I trust humanity as a collective more than any particular group of people in the long run. I'm sure there are many others like me who would be eager to help out if they could.

Edit: EleutherAI has a whole page on their site about how others can contribute: https://www.eleuther.ai/get-involved/. I didn't see anything about accepting donations though, if anyone involved with the project was interested in setting up a crowdfunding account somewhere I'd be eager to donate.

zmix · on March 21, 2021

You may indirectly support the project by supporting the host, that hosts their data, https://the-eye.eu

Right on the front they write:

    > Hey there fans! We are currently looking for help funding large storage upgrades, 
    > if you want to help us serve more data see our donation options (crypto, etc) 
    > Thanks for reading, happy downloading!

stellaathena · on March 21, 2021

The Eye has been a phenomenal partner and enables a lot of what we do. In addition to providing terabytes of storage for free, they also help us out with CPU from time to time.

dannyw · on March 23, 2021

The Eye stores amazing and important archives. Drives the data hoarding community.

punnerud · on March 21, 2021

Indirectly they say you can donate money, in the form of computation that can be rented: “As an independent organization, we are dependent upon donations for our computing costs. We are always interested in speaking with people who can donate compute times.”

pjfin123 · on March 21, 2021

GPT Paper:

"Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters"

README:

"1T or bust my dudes [...] An implementation of model & data parallel GPT2 & GPT3 -like models, with the ability to scale up to full GPT3 sizes (and possibly more!)"

It seems the largest model they released is 2.7 billion parameters or ~0.01 the size of GPT-3. The most interesting part about GPT-3 was its size and it seems this is only "GPT-3-like" in architecture.

I also have a translation library with ~100 million (0.001 GPT-3) parameters:

https://github.com/argosopentech/argos-translate

stellaathena · on March 21, 2021

GPT-3 is a model architecture, not a model. While the largest GPT-3 model is 175B, that very paper has a table that includes "GPT-3 XL" (1.3B) and "GPT-3 2.7B" as models in the GPT-3 architecture. The 2.7B model is the same size as Ada, a model that OpenAI currently sells API access to under the moniker "GPT-3"

Dylan16807 · on March 22, 2021

None of the other models are even close to the big one, and the paper also suggests calling the big one "GPT-3". And people do that very often in practice. So it's often ambiguous but saying the term only means the architecture isn't right either.

f430 · on March 21, 2021

what does he mean when he says 1T or bust? Is he referring to 1 trillion parameters? Are you saying that GTP-3 has 2.7 trillion parameters? Does it mean that to get to GPT-3 level it needs 100x more amount of dataset?

jon_tau · on March 21, 2021

The saying comes from a slide by Noam Shazeer (see: https://www.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It just means the current goal should be to have models with 1 trillion parameters.

pjfin123 · on March 21, 2021

I interpreted it as aspiring to a trillion paramters but I'm not sure.

sailingparrot · on March 21, 2021

GPT-3 has 175 billion parameters. So they need to scale by 64x. They already have a comparable amount of data than what was used by OpenAI, so it's about scaling the numbers of GPUs.

f430 · on March 21, 2021

I see so that means this GPT Neo is 64 less powerful?

sailingparrot · on March 22, 2021

Accuracy and numbers of parameters don't scale linearly together. It varies widely depending on exactly what you are measuring accuracy on as well etc. But a very approximate rule of thumb would be to say that accuracy scales with the log of the parameter count (for the same architecture).

PufPufPuf · on March 22, 2021

Did anyone manage to successfully run inference in the provided Google Colab (https://colab.research.google.com/github/EleutherAI/GPTNeo/b...)? I can run training, but can't manage to make the inference (even from a pre-trained model) work.

stellaathena · on March 22, 2021

Hi! Thanks for trying it out. There was a bug that should now be fixed. When I run the example unicorn prompt I get the follow. Don't hesitate to open an issue if you're still having trouble.

"In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

Bebek Uranzoglu, another member of the research team from the University of British Columbia, was working on a project the Latino-Canadian rodeo competition equipos to document a rare and remarkable ecosystem in the Andes Mountains.

His curiosity was piqued when he spotted an adolescent herd of about 10 unicorns foraging in a forest near the valley of the Jumbo Flu Group. The unicorns — whose numbers once swelled to 46,000 — were perched on the forest floor and watched the researchers work.

Urizoglu grew excited when he spotted another group that seemed to be thriving in an area below the herd. The team hoped the apparent population growth would indicate a human presence.

But when a team of researchers set up a camera trap, they were surprised to find the unicorns in the first place, and in a forest near a lake — in fact the forest was almost entirely made up of the animals. Despite their own magical presence, the team could not see the herd was populated by humans.

“The whole place almost smelled like animals,” says Bebek. “We were never able to find human footprints at any of the points we stood at. The trees were so large, you wouldn’t have been able to walk 40 meters through them. We assumed that the truth of the matter was, ‘Well the deer didn’t like this forest at all.’”

MasterScrat · on March 22, 2021

Same here. I managed to make it "work" in the sense that it wouldn't crash during inference, but then it generated gibberish. Has anyone managed to make it work reliably?

aeroscripts · on March 22, 2021

The problem in my case was "train_steps" in the model json file. Default is 0. The notebook sets it to 401000 which works.

FL33TW00D · on March 21, 2021

Whilst obviously BERT is not the same as GPT-3 in architecture, Amazons recent paper discussing architecture optimizations for BERT seems pretty relevant here (https://arxiv.org/pdf/2010.10499.pdf) given the chance to improve upon GPT-3s architecture (because it surely isn't the best we can get). Have the Eleuther.ai team been exploring this?

leogao · on March 21, 2021

Could the title of this post be change to emphasize that the model sizes released were 1.3B and 2.7B? Something like "EleutherAI releases 1.3B and 2.7B parameter GPT-like language models". The current title implies that a full sized GPT-3 model is currently available, which is not the case.

edit: the title has been changed, seems good enough

dylanbyte · on March 21, 2021

Curious to see what parameter size of gpt3 this will end up being equivalent to. Obviously we won't know until they evaluate their models.

sailingparrot · on March 21, 2021

It's trained using the same architecture, and with a very similar dataset, so it should be very close.

dylanbyte · on March 22, 2021

My experience is that replicating papers is actually nontrivial. For example someone announced they had replicated gpt2 some time back but when evals were run it turned about to be the equivalent of a much smaller model.

joshhart · on March 21, 2021

I think we need more funding outside of large tech companies and OpenAI for these kinds of things. I wonder if there is a way to crowdsource donations to rent the hardware to train big versions of these things in an open manner.

monkeydust · on March 21, 2021

If I wanted to build an support Q&A system using texts from support logs, training docs, transcribed videos etc etc (basically as much text about my product as I can get) would this model be a good start ?

malka · on March 21, 2021

You should look on hugging faces https://huggingface.co/

https://huggingface.co/transformers/task_summary.html#extrac... more precisely

kristoo · on March 23, 2021

That would be one incredibly expensive Q&A system. As rightly suggested by others, some search + a finetuned Q&A system is what you need.

I've been building a library that makes solving tasks like conversational Q&A easy.

https://github.com/backprop-ai/backprop

If you need some pointers, we have example notebooks and I'm happy to chat about it as well. Would love to know if it helps!

FL33TW00D · on March 21, 2021

Depending on how much content you've got, this blog post from HuggingFace might be interesting: https://yjernite.github.io/lfqa.html

ggnore7452 · on March 22, 2021

also, for some quick and simple Q&A system. Haystack https://github.com/deepset-ai/haystack (essentially dense vector similarity on Elastic Search) looks pretty promising and supports whole pipeline.

monkeydust · on March 23, 2021

Thanks that looks decent for quick start.

ignoranceprior · on March 22, 2021

Are there any output samples? Would like to get a sense of how close this is to GPT-2 vs. GPT-3 levels of coherency.

aeroscripts · on March 22, 2021

I've been having pretty good luck generating all sorts of stuff. There's a large chunk of text generated with the default unicorns prompt here: https://cdn.discordapp.com/attachments/823393531305525288/82...

It seems really good at generating code too

prompt:

class ConvBlock(nn.Module):

    def __init__(self, channels, stride=1):

        super(ConvBlock, self).__init__()
        self.convolution = nn.Conv2d(channels//stride, channels, 3, stride=stride, padding=1)
        self.normalization = nn.BatchNorm2d(channels)
        #self.activation = nn.LeakyReLU()

    def forward(self, x):

output:

        x =
        #  x = self.convolution(x)
        #  x = self.normalization(x)
        #  x = self.activation(x)

        x = self.convolution(x)
        x = self.normalization(x)
        x = self.activation(x)

        return x

class Resblock(nn.Module):

    def __init__(self, stride=1):

        super(Resblock, self).__init__()
        self.res = nn.Sequential(
            nn.Conv2d(9, 9, 3, stride=stride, padding=1),
            nn.BatchNorm2d(9),
            nn.LeakyReLU(0.2)
        )
        self.output = nn.MaxPool2d(2)

    def forward(self, x):

        return self.res(x)

Also, some "tongue twisters":

Big ugly stick.

Bowl full of chicken peas.

New York is universally inadvisable.

Long-eared seagull.

This morning I have a tumor in me.

A dog bark.

Invisible dog hides.

A crow blind.

A cat can’t see if it had a billion eyes.

Bumblebee.

Sheep herder, sheep herder.

A fawns abducts.

Two black birds are trapped.

Bottle on her finger.

Elephant sees another elephant.

Bull.

Mice in a box in the library.

A church swelter.

The door of a hotel opens.

Bosnian honey melons.

Grapes in excess.

Cat is on the loose.

Soil is shoveled into a glass jar

loxias · on March 21, 2021

I'd love to know the minimum hardware requirements to run something like this locally.

FeepingCreature · on March 22, 2021

Is there something like chattingtransformer ( https://pypi.org/project/chattingtransformer/ ) for gpt-neo? Ie. a trivial way to get text completion on a sample with sane defaults from the commandline.

edit: Oh, I see the "generating text" section. Any way to run it on CPU, even if it takes an hour?

victor9000 · on March 22, 2021

Stella mentioned elsewhere in this thread that HuggingFace is adding support for the Eleuther model, so text generation should become trivial once this work is complete.

Bostonian · on March 22, 2021

Is something with billions of parameters actually a "model"? I guess the answer is yes if the data set is even larger than that?

stellaathena · on March 22, 2021

We’ve added a table with some evaluation scores to the GitHub repo, and you can see a comparison between our scores, GPT-2, and GPT-3 here: https://twitter.com/BlancheMinerva/status/137399189661642752...

tl;dr we are doing pretty much exactly as well as we expected on LAMBADA and WikiText. Results on more sophisticated tasks will take some time, but HuggingFace is currently working on implementing our model in the transformers library and when they do so we can easily run a lot of analyses very quickly.

We actually built an evaluation suite that integrates with HF, but interfacing with the MTF code that GPT-Neo was written in was too much of a pain in the ass because Mesh TensorFlow is the worst. https://github.com/EleutherAI/lm-evaluation-harness

minimaxir · on March 21, 2021

Per Twitter, there will be more info about model performance tomorrow: https://twitter.com/arankomatsuzaki/status/13737326454445793...

stellaathena · on March 22, 2021

We’ve added a table with some evaluation scores to the GitHub repo, and you can see a comparison between our scores, GPT-2, and GPT-3 here: https://twitter.com/BlancheMinerva/status/137399189661642752...

tl;dr we are doing pretty much exactly as well as we expected on LAMBADA and WikiText. Results on more sophisticated tasks will take some time, but HuggingFace is currently working on implementing our model in the transformers library and when they do so we can easily run a lot of analyses very quickly.

We actually built an evaluation suite that integrates with HF, but interfacing with the MTF code that GPT-Neo was written in was too much of a pain in the ass because Mesh TensorFlow is the worst. https://github.com/EleutherAI/lm-evaluation-harness

vincentmarle · on March 22, 2021

Does anyone know if there's a hosted version of this kind of GPT model somewhere? All I want to do is just call a GPT-2 API and get a response back, I'm not interested in setting up the entire infrastructure by myself.

kristoo · on March 23, 2021

We have exposed the large GPT 2 model as an API.

https://backprop.co

Anyone can access it and our pricing is usage based. If you use less than 1000 seconds of inference a month then it's completely free.

vincentmarle · on March 24, 2021

Great will try this!

jalammar · on March 22, 2021

Hugging Face has that service

https://huggingface.co/pricing

krick · on March 22, 2021

Are there some truly objective benchmarks to compare this to GPT 2/3?

clircle · on March 22, 2021

I think this is an important problem. With logistic regression or deep learning, at least one can compare (out of sample) calibration curves or discrimination measures. With a language model, what can we do?

bigpumpkin · on March 22, 2021

perplexity score against a corpus such as wikipedia? Basically how well the model predicts the next word.

rkimb · on March 22, 2021

This is a good start, but given the breadth of applications this would hardly give us enough to compare, as the goal of these models isn't to simply recite Wikipedia articles. What about language translation? Content summarization? Code generation? Turing test performance?

stellaathena · on March 22, 2021

Both models were trained on Wikipedia, so that's a particularly bad choice. But yes, in practice this is what people tend to do. Take results with a very large grain of salt though, as the domain of the prompts you feed it make a huge difference.

ipsum2 · on March 22, 2021

yes, see GLUE or superGLUE benchmarks. It assumes the answers have not been scraped and included in the dataset though.

flemhans · on March 22, 2021

So I would want to include a big corpus like GPT-3 or this newfangled "Neo" thing but still have it trained to respond to our own customers based on 200k email passages.

How to create a hybrid?

stellaathena · on March 22, 2021

200k emails is not enough to train a model from scratch. If you check out the google colab file in the GPT-Neo repository, it talks about how to fine-tune the model on data which is what you want to do

kristoo · on March 23, 2021

You might get some really promising results with finetuning.

If anything, you could build writing assistance that almost automates responses.

I've been co-authoring a library that lets you finetune such models in a single line of code.

https://github.com/backprop-ai/backprop

In specific the text generation finetuning example should be what you are looking for: https://github.com/backprop-ai/backprop/blob/main/examples/F...

Hope this helps, happy to chat more about it. Pretty curious about the results.

jalammar · on March 22, 2021

I wouldn't trust any model to generate text for customers yet. Not even the largest GPT3. There are no guarantees on what they will output and could be damaging to your business.

You're better off either: 1- Defining common "intents" that a lot of customer queries are categorized into, and having a model map the incoming message to the appropriate canned response. Look at Rasa, for an example of this.

2- if you insist on generating the text, have it be a recommendation to a human agent that either chooses to send it or writes their own response.

flemhans · on March 23, 2021

Thanks for the advice.

minimaxir · on March 22, 2021

You fine-tune an existing pretrained model on your proprietary dataset.

The_rationalist · on March 21, 2021

Does it leverage deepspeed/zero 3?

MasterScrat · on March 22, 2021

GPT-NeoX, which is a model from the same group but using GPUs instead of TPUs, uses techniques from DeepSpeed:

https://github.com/EleutherAI/gpt-neox/

minimaxir · on March 21, 2021

That’s PyTorch only; the current models are TensorFlow.

The_rationalist · on March 21, 2021

Oh that's unfortunate, can't the models be exported to pytorch through e.g onnx?

stellaathena · on March 22, 2021

There's a PyTorch + DeepSpeed repository here: https://github.com/EleutherAI/gpt-neox

machinevision · on March 21, 2021

Even small models can be a headache to export if they have even they use anything custom. I can't even imagine something the size of GPT-3.

sailingparrot · on March 21, 2021

Larger models aren't really more complicated than smaller ones though. GPT-2 is already supported, I believe only difference with GPT-3 is sparse attention.

minimaxir · on March 21, 2021

The GPT-2 models included with Transformers can export to ONNX fine w/ a helper included with the Python onnxruntime.

minimaxir · on March 21, 2021

To run in PyTorch the model architecture must be ported.

ONNX is slightly different; you could export the model to run in the onnxruntime but that has tradeoffs.

f430 · on March 21, 2021

Can somebody explain to this beginner how to use this? Where can I load this code and start running it? How can I train it on a dataset and what do I need to prepare?

Lots of language here I don't understand like what is he referring to when he says 1.5B or 1T weights?

What resources/videos can I watch in order to start tinkering with this?

vertis · on March 21, 2021

The repository readme actually includes a link to a notebook[1] that helps getting started on Google colab. It's as good a place to start as any:

[1]: https://colab.research.google.com/github/EleutherAI/GPTNeo/b...

f430 · on March 21, 2021

thanks, I never used this before. Do I have to add a credit card? How much will it cost to run this?

zora_goron · on March 21, 2021

Colab is free to use -- you can click Runtime → Run All to run the cells in the notebook free-of-charge. (You may need to be logged in to a Google Account to run it.)

f430 · on March 22, 2021

very cool! side question but is there a complete guide to learn PyTorch with Colab?

I tried to learn ML a few years ago but gave up because I couldn't install CUDA on my machine for some reason. The landscape seems to change dramatically.

I am interested in transformers in particular completing incomplete images like what https://openai.com/blog/image-gpt/ does, is there a project that implements that and can let me start training?

I'm excited but I just get overwhelmed as to where I need to focus my attention on.

My goal is to utilize something like image-gpt but for a more narrow domain (ex. only dealing with cats), how can I build my knowledge and skills towards that goal?

Much thanks for your answers I'm really looking to learn these stuffs

p1esk · on March 22, 2021

Your questions are easily googleable, but if you insist start at pytorch.org

f430 · on March 22, 2021

I'm sure they are

godmode2019 · on March 22, 2021

Following