Implement the ability to configure a remote GPU endpoint for running embedding models, which would be more efficient than using local CPU resources.
It is very silly to use local cpu to run embedding models. Simple configure a remote gpu endpoint (we can support both v1/chat/completions nad v1/embeddings) will be very convient for us. Where can I add the options ? @tobi ``` node@3f218953d795:/app$ qmd status QMD Status Index: /home/node/.cache/qmd/index.sqlite Size: 3.7 MB Documents Total: 68 files indexed Vectors: 118 embedded Updated: 15m ago Collections local (qmd://local/) Pattern: **/*.md Files: 48 (updated 15m ago) openclaw-engram (qmd://openclaw-engram/) Pattern: **/*.md Files: 0 (updated never) Examples # List files in a collection qmd ls local # Get a document qmd get qmd://local/path/to/file.md # Search within a collection qmd search "query" -c local Models Embedding: https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF Reranking: https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF Generation: https://huggingface.co/tobil/qmd-query-expa