The user is seeking to further reduce the GPU instance launch time to below 1 second, which would significantly improve the efficiency of running short-lived GPU jobs.
We run [JarvisLabs](https://jarvislabs.ai/), a developer-first GPU cloud platform. When we brought up our new Noida region, GPU instance launch time was around 8 seconds. We spent 3 days profiling the launch path and got it down to about 1.8s. At **\~1.8s**, GPU instance launch is now almost near instant and one of the best in the industry, and we’re working to push it below **1 second**. https://preview.redd.it/kazec1x6d6og1.png?width=1398&format=png&auto=webp&s=edf78a9dd2cd4da9ae0dcc786ab9a138140d11a7 *Posting this because the biggest wins were much simpler than we expected.* This matters more now because people are starting to run lots of short-lived GPU jobs through agents. Karpathy's [autoresearch](https://github.com/karpathy/autoresearch) is a simple example: an agent can run many experiments in a loop, and instance startup time adds up quickly. Once we added millisecond-level timing, the rough breakdown looked like this: * networking setup (including ARP): \~3500ms * storage creation: \~1500ms * SSH overhead: \~930ms * container launch: \~670ms * database work: \~500ms * proxy + other: \~400ms The biggest fixes were pretty straightforward once we could actually see the timings. **1. Networking** A big chunk of the delay was a blocking `ping -c 2 -W 2` during network setup. We only needed to trigger the ARP update, not wait for the reply. Changing that alone took total launch time from about 8s to about 5s. **2. SSH** A single launch was making 12 to 15 separate SSH calls to the GPU server for storage, networking, and container operations. We first batched those calls. Then we replaced the whole path with a worker service running on each GPU server, so those operations now go through internal API calls instead of SSH. SSH is now gone from the create, pause, and destroy path. **3. Container startup** Container launch was spending time probing GPU devices on every start. Our GPU servers are stable, so that work was unnecessary. We switched to a faster runtime path and used static device specs instead of probing every time. **4. Storage** Creating and formatting a fresh volume during launch was expensive. We now keep a pool of pre-created, pre-formatted volumes ready, so launch just picks one and mounts it. **5. Smaller cleanup** We also reduced DB commits from 4-5 down to 1, moved billing off the critical path, and removed a few blocking log and database steps. End result: * launch time: \~8s -> \~1.8s * SSH calls per launch: 12-15 -> 0 * DB commits per launch: 4-5 -> 1 These changes are live in our Noida region now. We are working to get this under 1 second. We're also building a CLI tool and an agent skill so agents can interact with instances more cleanly, spin them up, run experiments, and tear them down seamlessly. Full writeup with more details: [https://docs.jarvislabs.ai/blog/gpu-instance-launch-4x-faster](https://docs.jarvislabs.ai/blog/gpu-instance-launch-4x-faster) Happy to answer questions if anyone wants details!