The user wants to define a SalesGPT agent that uses a local HuggingFace model, specifically Llama-2-7b-chat-hf, due to GPU resource constraints.
How do I define a salesGPT agent that uses a local huggingface-pulled model? I am trying to use Llama-2-7b-chat-hf as the base model, but I can't pull the model directly bcs of my current GPU resources. I followed Llama's docs to make it smaller by the code shown below: ``` model_id = 'meta-llama/Llama-2-7b-chat-hf' device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu' bnb_config = transformers.BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16 ) hf_auth = "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" model_config = transformers.AutoConfig.from_pretrained( model_id, use_auth_token=hf_auth ) model = transformers.AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, config=model_config, quantization_config=bnb_config, device_map='auto', use_auth_token=hf_auth ) model.eval() ``` Then I wrappe