A user expressed interest in using multiple models on their local server for coding projects, suggesting that having multiple models could be more effective than relying on one large model. They seek advice on this feature.
I decided to build my own local server to host cause I do a lot of coding on my spare time and for my job. For those who have similar systems or experienced, I wanted to ask with a 96GB vram + 96Gb ram on a am5 platform and i have the 4 gpus running at gen 4 x4 speeds and each pair of rtx 3090 are nvlinked, what kind of LLMs can I use to for claude code replacement. Im fine to provide the model with tools and skills as well. Also was wondering if mulitple models on the system would be better than 1 huge model? Be happy to hear your thoughts thanks. Just to cover those who fret about the power issues on this, Im from an Asian country so my home can manage the power requirement for the system.