Technical requirements
Before diving into the BLIP-2 and LLaVA, let’s use Stable Diffusion to generate an image for testing.
First, load up a deliberate-v2 model without sending it to CUDA:
import torch from diffusers import StableDiffusionPipeline text2img_pipe = StableDiffusionPipeline.from_pretrained( Â Â Â Â "stablediffusionapi/deliberate-v2", Â Â Â Â torch_dtype = torch.float16 )
Next, in the following code, we first send the model to CUDA and generate an image, then we offload the model to CPU RAM, and clear the model out from CUDA:
text2img_pipe.to("cuda:0")
prompt ="high resolution, a photograph of an astronaut riding a horse"
input_image = text2img_pipe(
    prompt = prompt,
    generator = torch.Generator("cuda:0").manual_seed(100),
    height = 512,
    width = 768
).images[0]
text2img_pipe.to("cpu...