Request for detailed information about the training schemes, datasets, preprocessing steps, model architectures, and fine-tuning techniques used for ChatLaw2 models.
I am currently exploring the ChatLaw models and I have a few questions regarding their training schemes and roles within the ensemble model. 1. Could you please provide detailed information about the training schemes used for ChatLaw2_plain and ChatLaw2E_plain? Specifically, I am interested in the datasets, preprocessing steps, model architectures, and any fine-tuning techniques applied. 2. Additionally, I would like to understand the role that ChatLaw2_plain and ChatLaw2E_plain play within the ChatLaw2_MOE (Mixture of Experts) model. How do these models interact and contribute to the overall performance of ChatLaw2_MOE? Thank you in advance for your assistance. I look forward to your response.