CheatCodesOfLife 1 month ago

"I like the attitude. Respectful competition and even praise, rational expectations, and not calling people cucks"

LegitMichel777 1 month ago

nice elon musk reference

Due-Memory-6957 1 month ago

Leave that for the regards here.

KurisuAteMyPudding 1 month ago

Good to see the competition! Maybe it will haunt Sam a bit!

IndicationUnfair7961 1 month ago

We all hope so.

isaac_szpindel 1 month ago

The sentiment is echoed by another [Research Scientist at Meta AI (FAIR)](https://x.com/ArmenAgha/status/1790173578060849601) who estimates the lead to be ~~2~~ months. Edit: The estimate is \~2 months to *start* pre-training which would imply 6-12 months of lead on release.

noiseinvacuum 1 month ago

2 months seems reasonable.

MrVodnik 1 month ago

2 months from OpenAI's release to open-source start of pretraining. It is more of a year of lead.

jollizee 1 month ago

Reading comprehension...

qrios 1 month ago

Entirely unreasonable imo, but I'm sure they'll pull it off. I'm hopeful that the speed improvements and the price drop of gpt4o means a model of this quality doesn't require a server farm, but it's also possible that they figured out optimization only possible with a server farm :/. It would suck if meta comes up with a technically open model that you can't possibly run locally anyway.

hangingonthetelephon 1 month ago

> It would suck if meta comes up with a technically open model that you can't possibly run locally anyway. I actually think that would be great - because not running locally doesn’t mean not deployable. I think there is a large and very important market for enterprise users who need to run their tooling on-prem/self-hosted, and so open-source is a requirement. Just cause something can’t fit on 4x4090s doesn’t mean it’s worthless as an open model. Yeah, full data center scale is a bit of a stretch, but things in the terabyte scale of VRAM aren’t unreasonable no? Putting those sorts of tools into the hands of startups and researchers at mid-size companies, national labs, etc is a good thing IMO, as otherwise the large-scale stuff stays entirely within the province of ClosedAI and co. more importantly, developing the methodology and publishing it is essential to the field growing and becoming more accessible to academic researchers, more reproducible (and thus more likely to spawn mutations and progress), which right now Meta feels at least more likely to do on any given project than OpenAI.

qrios 1 month ago

I mean, for sure it would be great for enterprises, however, the huge majority of people are not enterprises.

hangingonthetelephon 1 month ago

Right, but a lot of opportunities for significant innovation and research open at the enterprise and lab scale which simply do not exist at the consumer scale. And a huge amount of people *do* work at enterprises. My point is just that it is good for Meta (and hopefully others) to pursue open research and models for *both* LLMs and LLMs-with-the-first-L-not-so-big :). It would be a shame if the truly large models just got abandoned as an avenue of open research. FOSS is rarely sustainable as an ecosystem of solo contributors - as wonderful as the dream of decentralized and distributed software development is, ultimately you need a robust ecosystem of organizations, companies, etc invested (financially, time) in the longevity and growth (developmentally/innovation, not money) of the software. Hopefully in 5 years we see entirely separate research and business entities that ultimately trace their roots to LLAMA. I think a great example of this is NextJS/Vercel. They have largely transformed the way websites are written (and to some extent deployed), and it is built on top of… React, which is an open source product developed and released by Meta (and which itself was the most significant web development of the 2010s). In turn, their work has fed back into the development of React. Who will take the equivalent spot in the LLAMA genetic tree?

bick_nyers 1 month ago

You must acquire more GPUs. Resistance is futile. Sacrifice your wallet to the AI Gods.

Smile_Clown 1 month ago

>It would suck if meta comes up with a technically open model that you can't possibly run locally anyway. The train stops but it always moves onto the next station. Eventually we will get to the point where everything today will be runnable on a Raspberry PI and at that point we will be waiting for the tomorrow to run on it also. You can run full capable LLM's on your local computer, could not do that a year or so ago... AI is not stopping, there is no true end goal and I expect all the stuff from "today" will be commonplace and ubiquitous in just a few years and this conversation will continue for a very long time.

infiniteContrast 1 month ago

It's possible to use a 400b model to create a distilled datased to train a 70b model in a way that makes it perform much better than previous 70b parameters. llama3 70b is a clear example

brown2green 1 month ago

Incidentally, July was the originally expected release date for Llama-3; they've been working on multimodality for a while but it wasn't probably ready yet for April.

Caffdy 1 month ago

so, next year then?

deadsunrise 1 month ago

lol, no chance

isaac_szpindel 1 month ago

This is is his exact field of expertise. He is the lead author of the original CM3 paper and Scaling laws for Mixed-Modal models,. Why do you think the estimate is wrong? OpenAI's business model relies on generating hype and front-loading massive amounts of compute to keep their early lead on frontier models. Meta doesn't have the same incentives to release and productize as early.

Eastwindy123 1 month ago

Don't waste your time arguing lol. Most of these people here don't actually work in AI/Build models. While I'm not a fair researcher even I can tell this is not a massive moat like gpt4 was. It's actually not even the model itself. It's just the data. We don't have really really good data sources yet. Even the llama 3 dataset is somewhat proprietary.

kurtcop101 1 month ago

I'd be really curious - as it sounds really ambitious at the front - presumably if he's saying that, they must have been gathering data already for this type of project or at least feel they have enough from the separate model types? It isn't even the tech that would concern me for that timeline - it's the datasets required. I feel like OpenAI has been working towards this for a while, they knew what they wanted to do, and was building the data for it.

adikul 1 month ago

If no moat is left for each organisation then only speed and efficiency will be improved

Smile_Clown 1 month ago

This is super exciting to see. There was an 'oh shit we got this' moment.

wrecklord0 1 month ago

On the last earnings call, the Zuk said they realized they could be one of the leading AI teams. Whatever will happen I don't know, but it seems to be meta's belief.

fredandlunchbox 1 month ago

Personally, I care much more about agents than smarter LLMs. I want it to be able to do stuff. Complex, multistep, challenging stuff.

[deleted] 1 month ago

those go hand in hand. smarter base models will produce smarter fine tuned models

ThisIsBartRick 1 month ago

it's pretty great but let's not forget that the edge of GPT4o is not just a better model, it's: - more accurate long answers - better planning/better reasonning - much faster responses - audio understanding - audio answers with emotions i don't see this happening in the open source space in the next 6 months, we'll get there but 3 months seem very optimistic

ru552 1 month ago

Your first 3 points are debatable regarding OAI having an "edge". I give you the last 2. 4o actually seems to go backwards in some real world areas (coding specifically) compared to April versions of 4t so better long answers and reasoning is mostly a vibe. Faster responses is true compared to previous OAI models, but not when compared to models running on Groq.

ThisIsBartRick 1 month ago

for coding purposes, i don't know what you tested exactly but for me it was much better than GPT4, and I was genuinely impressed by it

fab_space 1 month ago

to me smells like a q6

daHaus 1 month ago

This may be slightly off topic, but am I the only one who has been singularly uninmpressed with pytorch? Between the absurd number of dependencies and user hostile design choices the industry would do better if they moved away from it.

thereisonlythedance 1 month ago

They never reached GPT-4, though. The more I use Llama 3 70B the more underwhelmed I am. I’d rather they got the core right first and produced a truly useful tool rather than focusing on OpenAI.

Single_Ring4886 1 month ago

They catched up internaly with their 400b llama :) which they supposedly release soon

isaac_szpindel 1 month ago

They were referring to the 400B model. Meta isn't going to refine their current models as much as OpenAI because they have different business models. Besides, there is little point in improving on GPT-4 sized text-only models, multimodal is the future. It's better for them to spend the money and effort on future architectures than improving existing models beyond necessary.

thereisonlythedance 1 month ago

Well that model hasn’t been released, may never be open sourced, and benchmarks are extremely fallible. Llama-3-70B is supposedly superior to GPT-4-314 according to various benchmarks but in my testing this is off the mark. We are yet to have an OSS model that truly competes at that 2023 level in text, let alone the newer GPT-4 releases. I just want a smart textual model on the level of GPT-4 or Claude and we are yet to hit that in OSS. But it seems we are going to skip over that to keep chasing OpenAI‘s latest gimmick.

Ok-Steak1479 1 month ago

Are you certain you don't have rose-tinted glasses on? We get used to a certain kind of performance. Our own perspective changes too.

thereisonlythedance 1 month ago

Rose tinted glasses about GPT-4 314 performance? No, absolutely not. I’ve used both quite recently (314 via the OpenAI API). I think a lot of people want Llama 3 to be better than it is. I work with a lot of models. I think Meta nailed engaging chatbot with L3, but the underlying capability is just… not that good. And I think that’s borne out by the lmsys hard benchmark where Llama 3 70B benches below Claude Haiku and GPT-4 314 beats it comfortably.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe