My initial thought after looking at the #GPT-4 event and browsing through the "technical report" ( technical really? most marketing...).
But with a 32k context window, the matrixes in the attention layers will be huge. What GPU and memory resources are needed...and how much power/carbon footprint?

I think this model is outdated, before it is released.

But i am not sure anyone else is training a model with this amount of data, so i guess we are stuck with it for awhile...

#ml #openai

I really liked the new concept "system message". I think we can adopt this term + in--context learning...to get rid of the term "prompt engineering" (and all versions).

Anyone agree?

Or just me that dislike the awkward user experience called prompting....