Loading request...
User requests the implementation of batched inference, noting it's supported by Triton and requires adding support to the proxy. The user has already developed independent code for this functionality.
This is supported by Triton, we just need to add support for it to the proxy. I have written code to do this independently here: https://moyix.net/~moyix/batch_codegen_full.py ; I just need to integrate that into the [the proxy code](https://github.com/moyix/fauxpilot/blob/main/copilot_proxy/utils/codegen.py).