Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint
https://modal.com/blog/truly-serverless-gpus
#HackerNews #CuttingInference #ColdStarts #LP #FUSE #CUDACheckpoint #ServerlessGPUs
Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint
https://modal.com/blog/truly-serverless-gpus
#HackerNews #CuttingInference #ColdStarts #LP #FUSE #CUDACheckpoint #ServerlessGPUs