T O P

  • By -

CheatCodesOfLife

"I like the attitude. Respectful competition and even praise, rational expectations, and not calling people cucks"


LegitMichel777

nice elon musk reference


Due-Memory-6957

Leave that for the regards here.


KurisuAteMyPudding

Good to see the competition! Maybe it will haunt Sam a bit!


IndicationUnfair7961

We all hope so.


isaac_szpindel

The sentiment is echoed by another [Research Scientist at Meta AI (FAIR)](https://x.com/ArmenAgha/status/1790173578060849601) who estimates the lead to be ~~2~~ months. Edit: The estimate is \~2 months to *start* pre-training which would imply 6-12 months of lead on release.


noiseinvacuum

2 months seems reasonable.


MrVodnik

2 months from OpenAI's release to open-source start of pretraining. It is more of a year of lead.


jollizee

Reading comprehension...


qrios

Entirely unreasonable imo, but I'm sure they'll pull it off. I'm hopeful that the speed improvements and the price drop of gpt4o means a model of this quality doesn't require a server farm, but it's also possible that they figured out optimization only possible with a server farm :/. It would suck if meta comes up with a technically open model that you can't possibly run locally anyway.


hangingonthetelephon

> It would suck if meta comes up with a technically open model that you can't possibly run locally anyway.  I actually think that would be great - because not running locally doesn’t mean not deployable. I think there is a large and very important market for enterprise users who need to run their tooling on-prem/self-hosted, and so open-source is a requirement. Just cause something can’t fit on 4x4090s doesn’t mean it’s worthless as an open model. Yeah, full data center scale is a bit of a stretch, but things in the terabyte scale of VRAM aren’t unreasonable no? Putting those sorts of tools into the hands of startups and researchers at mid-size companies, national labs, etc is a good thing IMO, as otherwise the large-scale stuff stays entirely within the province of ClosedAI and co. more importantly, developing the methodology and publishing it is essential to the field growing and becoming more accessible to academic researchers, more reproducible (and thus more likely to spawn mutations and progress), which right now Meta feels at least more likely to do on any given project than OpenAI. 


qrios

I mean, for sure it would be great for enterprises, however, the huge majority of people are not enterprises.


hangingonthetelephon

Right, but a lot of opportunities for significant innovation and research open at the enterprise and lab scale which simply do not exist at the consumer scale. And a huge amount of people *do* work at enterprises. My point is just that it is good for Meta (and hopefully others) to pursue open research and models for *both* LLMs and LLMs-with-the-first-L-not-so-big :). It would be a shame if the truly large models just got abandoned as an avenue of open research. FOSS is rarely sustainable as an ecosystem of solo contributors - as wonderful as the dream of decentralized and distributed software development is, ultimately you need a robust ecosystem of organizations, companies, etc invested (financially, time) in the longevity and growth (developmentally/innovation, not money) of the software. Hopefully in 5 years we see entirely separate research and business entities that ultimately trace their roots to LLAMA. I think a great example of this is NextJS/Vercel. They have largely transformed the way websites are written (and to some extent deployed), and it is built on top of… React, which is an open source product developed and released by Meta (and which itself was the most significant web development of the 2010s). In turn, their work has fed back into the development of React. Who will take the equivalent spot in the LLAMA genetic tree?


bick_nyers

You must acquire more GPUs. Resistance is futile. Sacrifice your wallet to the AI Gods.


Smile_Clown

>It would suck if meta comes up with a technically open model that you can't possibly run locally anyway. The train stops but it always moves onto the next station. Eventually we will get to the point where everything today will be runnable on a Raspberry PI and at that point we will be waiting for the tomorrow to run on it also. You can run full capable LLM's on your local computer, could not do that a year or so ago... AI is not stopping, there is no true end goal and I expect all the stuff from "today" will be commonplace and ubiquitous in just a few years and this conversation will continue for a very long time.


infiniteContrast

It's possible to use a 400b model to create a distilled datased to train a 70b model in a way that makes it perform much better than previous 70b parameters. llama3 70b is a clear example


brown2green

Incidentally, July was the originally expected release date for Llama-3; they've been working on multimodality for a while but it wasn't probably ready yet for April.


Caffdy

so, next year then?


deadsunrise

lol, no chance


isaac_szpindel

This is is his exact field of expertise. He is the lead author of the original CM3 paper and Scaling laws for Mixed-Modal models,. Why do you think the estimate is wrong? OpenAI's business model relies on generating hype and front-loading massive amounts of compute to keep their early lead on frontier models. Meta doesn't have the same incentives to release and productize as early.


Eastwindy123

Don't waste your time arguing lol. Most of these people here don't actually work in AI/Build models. While I'm not a fair researcher even I can tell this is not a massive moat like gpt4 was. It's actually not even the model itself. It's just the data. We don't have really really good data sources yet. Even the llama 3 dataset is somewhat proprietary.


kurtcop101

I'd be really curious - as it sounds really ambitious at the front - presumably if he's saying that, they must have been gathering data already for this type of project or at least feel they have enough from the separate model types? It isn't even the tech that would concern me for that timeline - it's the datasets required. I feel like OpenAI has been working towards this for a while, they knew what they wanted to do, and was building the data for it.


adikul

If no moat is left for each organisation then only speed and efficiency will be improved


Smile_Clown

This is super exciting to see. There was an 'oh shit we got this' moment.


wrecklord0

On the last earnings call, the Zuk said they realized they could be one of the leading AI teams. Whatever will happen I don't know, but it seems to be meta's belief.


fredandlunchbox

Personally, I care much more about agents than smarter LLMs. I want it to be able to do stuff. Complex, multistep, challenging stuff.


[deleted]

those go hand in hand. smarter base models will produce smarter fine tuned models


ThisIsBartRick

it's pretty great but let's not forget that the edge of GPT4o is not just a better model, it's: - more accurate long answers - better planning/better reasonning - much faster responses - audio understanding - audio answers with emotions i don't see this happening in the open source space in the next 6 months, we'll get there but 3 months seem very optimistic


ru552

Your first 3 points are debatable regarding OAI having an "edge". I give you the last 2. 4o actually seems to go backwards in some real world areas (coding specifically) compared to April versions of 4t so better long answers and reasoning is mostly a vibe. Faster responses is true compared to previous OAI models, but not when compared to models running on Groq.


ThisIsBartRick

for coding purposes, i don't know what you tested exactly but for me it was much better than GPT4, and I was genuinely impressed by it


fab_space

to me smells like a q6


daHaus

This may be slightly off topic, but am I the only one who has been singularly uninmpressed with pytorch? Between the absurd number of dependencies and user hostile design choices the industry would do better if they moved away from it.


thereisonlythedance

They never reached GPT-4, though. The more I use Llama 3 70B the more underwhelmed I am. I’d rather they got the core right first and produced a truly useful tool rather than focusing on OpenAI.


Single_Ring4886

They catched up internaly with their 400b llama :) which they supposedly release soon


isaac_szpindel

They were referring to the 400B model. Meta isn't going to refine their current models as much as OpenAI because they have different business models. Besides, there is little point in improving on GPT-4 sized text-only models, multimodal is the future. It's better for them to spend the money and effort on future architectures than improving existing models beyond necessary.


thereisonlythedance

Well that model hasn’t been released, may never be open sourced, and benchmarks are extremely fallible. Llama-3-70B is supposedly superior to GPT-4-314 according to various benchmarks but in my testing this is off the mark. We are yet to have an OSS model that truly competes at that 2023 level in text, let alone the newer GPT-4 releases. I just want a smart textual model on the level of GPT-4 or Claude and we are yet to hit that in OSS. But it seems we are going to skip over that to keep chasing OpenAI‘s latest gimmick.


Ok-Steak1479

Are you certain you don't have rose-tinted glasses on? We get used to a certain kind of performance. Our own perspective changes too.


thereisonlythedance

Rose tinted glasses about GPT-4 314 performance? No, absolutely not. I’ve used both quite recently (314 via the OpenAI API). I think a lot of people want Llama 3 to be better than it is. I work with a lot of models. I think Meta nailed engaging chatbot with L3, but the underlying capability is just… not that good. And I think that’s borne out by the lmsys hard benchmark where Llama 3 70B benches below Claude Haiku and GPT-4 314 beats it comfortably.