#llm This is a followup of my last transcription experiment post - I actually did a complete writeup for those that want all the details (and see all the code and output, some of which is hilariously bad).
I really believed that for a “simple” task like this, especially since the open models score so well on eval benchmarks lately, that if gpt-3.5 could do this, that one of the open models would as well, but some more testing has disabused me of the notion. Only one of the largest 70B models got close in testing but still hallucinated output for the actual trancript.
In the end, using custom chunking code for gpt-3.5-turbo-16k was probably the best output. Lastly I applied similar code to the Claude 2 API which gave good output as well. (Claude 2 has 100K context, but it turns out that with the developer access I have, it kills calls at exactly 300s and I’d need about twice that to finish the task).
