Users need a way to run inference without data traversing the public internet, especially when data resides in a VPC on another cloud. This would improve security and reduce the need for manual authentication and networking configuration, which is currently a concerning security issue.
I've been trying out a few AI inference platforms. Baseten was one of them. I think it's genuinely good. Truss is clean. The playground is also good. I was able to spin up models in a few minutes. If your job is "serve this model fast," you won't be unhappy. Then I tried to use it for an actual workflow. Something like "read from storage, trigger on an event, run a pipeline, store the result." to mimic actual production AI. There are few things I found. Models are exposed as public HTTPS endpoints, which means if your data lives in a VPC on another cloud, every inference call goes out over the public internet which is really a concerning security issue. To prevent it you need to configure the auth, networking manually which is another complexity you're adding. I saw an interesting part on async inference as well, Baseten supports the async inference but they don't store the output. They themselves mentioned that it's by design but it comes with a tradeoff because web-hooks fail all the time. How does not storing the results help them? I will be more interested to know this part. I think Baseten is great for what it's made for, to serve models with great ergonomics. But if I'm hosting a model on one platform and for other services to communicate with the model, I'll have to expose it on an HTTPS endpoint, and write manual auth and networking stuff. Perhaps have VPC on another platform and database on another. With this I think we're making AI infra more complex than it needs to be. Platforms where you don't need to reinvent such middle layers will probably win in such situations because you will avoid such infra complexity, billing forecast issues and bugs. What do you think? What does your AI stack look like?