I think this #FutureLaw 2023 panel on GPT4 is good in terms of a balanced view of the risks and opportunities of using tools like GPT4. What surprises me is that no one is talking about how soon GPT4 will be obsolete, and replaced by something that improves over the previous iteration just as significantly. #LegalTech #LawFedi
OpenAI’s CEO confirms the company isn’t training GPT-5 and ‘won’t for some time’

OpenAI’s CEO Sam Altman has confirmed that the company is not currently training GPT-5 — the successor to its language model GPT-4, released this March. Altman was discussing fears about AI safety.

The Verge
@ltmccarty that article just says you can't assume that new versions are better than earlier versions in any stable degree. Granted. But no one is assuming. It is objectively getting significantly better. That they haven't started "training" GPT5 yet doesn't mean anything. I don't see any evidence of a plateau, and I see lots of evidence to the contrary.

@lexpedite @ltmccarty — Two ways this could play out:

1. OpenAI feels threatened by the many free + open-source competitors (e.g., Eleuther, Dolly-2), and wonders whether the time/expense of generating a Foundational Model is worth it — when they're competing with "free."

2. OpenAI — with Microsoft money — takes a run at improving the existing GPT-4 model incrementally. Like they did with the davinci releases of GPT3, GPT3.5, etc.

Seems like they're choosing Option 2. Long Microsoft runway

OpenAI’s CEO Says the Age of Giant AI Models Is Already Over

Sam Altman says the research strategy that birthed ChatGPT is played out and future strides in artificial intelligence will require new ideas.

WIRED

@ltmccarty @lexpedite Yes, that's really helpful. Thanks, Thorne.

I wonder if this comes from the lack of high quality data sources. There are only so many human-created words. Reddit will only get you so far.

Last bastion of high quality legal data: Law? Judicial, statutory, and regulatory text seems like an evergreen source.

@damienriehl @lexpedite

Damien: Here's a question for Fastcase/vLex.

Suppose you trained a model the size of GPT-4, from scratch, on just the Fastcase/vLex database. No Wikipedia, no Reddit, just legal texts, but including Docket Alarm, etc. Everything you have.

How many tokens? How would this compare to the other datasets that have been used for other models? Note that I am talking about pre-training, not fine-tuning.

I am sure you have done the calculations, or will be soon.

@ltmccarty @lexpedite

You should talk to John Nay about what he's planning to build. 😏

@damienriehl @lexpedite

Sorry, I don't know him. Where is he?

@ltmccarty @lexpedite

LLM researcher affiliated with NYU and Stanford:
https://law.stanford.edu/directory/john-nay/

https://arxiv.org/a/nay_j_1.html

He's cooking something that should be good.

John Nay | Stanford Law School

John Nay is an A.I. researcher and the co-founder and CEO of an A.I. technology company, Brooklyn Artificial Intelligence Research (Skopos Labs, Inc.)

Stanford Law School