This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
How is the time sharing handled? I assume if I submit a unit of work it will load to VRAM and then run (sharing time? how many work units can run in parallel?)
How large is a full context window in MiB and how long does it take to load the buffer? I.e. how many seconds should I expect my worst case wait time to take until I get my first token?