RequestHunt is a community-driven platform that collects and curates feature requests from X (Twitter), Reddit, and GitHub. We help product teams discover what features real users are asking for, making it easier to validate product ideas and prioritize feature development based on actual user demand.

How does RequestHunt collect feature requests?

RequestHunt uses AI-powered extraction to automatically collect feature requests from social media platforms including X, Reddit, and GitHub. The platform monitors discussions, issues, and posts to identify genuine user feature requests, extracting key information like the requested feature, product mentioned, and user sentiment.

Who uses RequestHunt?

RequestHunt is used by product managers, founders, developers, and product teams who want to discover what features real users are requesting. It's particularly valuable for startups looking to validate ideas, established companies prioritizing their roadmap, and developers building products that solve real user problems.

Is RequestHunt free to use?

Yes, browsing and searching feature requests on RequestHunt is completely free. We also offer API access for developers who want to integrate feature request data into their own applications. Check our pricing page for API plan details.

How often is RequestHunt data updated?

RequestHunt continuously collects feature requests from Reddit, X (Twitter), and GitHub. You can also trigger on-demand scraping via the API or web interface to get the freshest data for any topic.

[vLLM] Simple Data Parallelism in vLLM

Enable users to easily deploy multiple vLLM instances on a single machine with multiple GPUs, sharding the work across replicated instances for offline use cases.

Original Source

@@simon-mo | 15 pts | 10/9/2024

### 🚀 The feature, motivation and pitch It is common to have a scenario where folks want to deploy multiple vLLM instances on a single machine due to the machine have several GPUs (commonly 8 GPUs). The work can then be sharded across replicated instances. This issue describes the intended UX for such feature. Notably we might not want to tackle large distributed settings (100s of parallel vLLM instances), which should be better handled by a higher layers. * Offline use case, for the LLM class, a new argument data_parallel_size and support dispatching requests to one engine per GPU (or per tensor parallel size). ```python from vllm import LLM llm = LLM(model="...", data_parallel_size=X) # spawn X number of engine processes and shard the work among them llm = LLM(model="...", data_parallel_size=X, tensor_parallel_size=Y) # this is supported if X*Y <= total number of GPUs ``` For the server, same argument, route requests to different engine processes, we can start with s

Discussion

Loading comments...