Implement SPMD (Single Program, Multiple Data) style distributed message passing to enable GNN training on large-scale graphs that exceed the memory capacity of a single GPU.
### 🚀 The feature, motivation and pitch Hi everyone, We’re currently extending our [GFM-RAG](https://github.com/RManLuo/gfm-rag) model to support reasoning over large-scale graphs and would appreciate your insights. ## Motivation The existing message-passing framework used in GNNs is only conducted on a local GPU, which cannot generalize to large-scale graphs due to the constraints of GPU memory. Existing distributed (Multi-GPU) GNN training frameworks (e.g., [PyG](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/distributed.html), [DGL](https://www.dgl.ai/dgl_docs/stochastic_training/multigpu_node_classification.html)) focus on node-based subgraph partitioning strategies and learning unconditioned node embeddings. This might not support well for some advanced GNNs as reasoners works (e.g., [NBFNet](https://github.com/DeepGraphLearning/NBFNet), [ULTRA](https://github.com/DeepGraphLearning/ULTRA/) and [GFM-RAG](https://github.com/RManLuo/gfm-rag)) as they require pro