Merging images with the API

I am trying to merge 2 images in some sort of way.
I have looked at all _J’s posts, and nothing has worked for me.
The 3 things I need to have working:

  • Both images are pngs, and local.
  • I should prefer having chatgpt-4o or o3 to be the model I call originally, for better text understanding.
  • I would like the image model to be gpt-image-1, not DALL-E, again because of quality difference.

Now to the problems:
I get a tool error (function not defined!) even though the image-generation tool should work!

base64_image1 = encode_image(img1)
    base64_image2 = encode_image(img2)
    response = client.chat.completions.create(
        model="chatgpt-4o-latest",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url" : f"data:image/png;base64,{base64_image1}"
                        }
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url" : f"data:image/png;base64,{base64_image2}"
                        }
                    }
                ],
            }
        ],
        modalities= ["image"],
        tools=[{"type": "image_generation",
            "quality": "high",}],
    )

I would like the image model to be gpt-image-1

Your image model is not “gpt-image-1”. Your showing “chatgpt-4o-latest”. Don’t think image_generation is a tool…

Look here: https://platform.openai.com/docs/api-reference/images/createEdit

Yes! I just had to update my OpenAI package, then using images.edit could accept multiple images again!

Completions API doesn’t support the image tool.

For that, you will need either to use the Responses API or the Images API.

If you need multi-turn, Responses API is recommended. If it is just one shot, Images API should be enough.

1 Like

The fault, using internal tools on the wrong endpoint, is in documentation: you must select in a drop-down at upper-right either “Chat Completions” or “Responses”, and then still the sections of documentation that are copied and modified for each endpoint and switch on you may not be ideal.

You do not need better understanding from an interloper AI in front of a tool to make images, and with gpt-image-1 as a tool on the Responses API endpoint, it has little impact, as the image model behind the tool is provided significant chat context, including many images from a chat where you also must pay for vision on them, besides more past images than your target being inputs to the tool and costing. It doesn’t run solely on AI-written prompting.

  • Do you want to chat and occasionally ask for an image? or
  • Do you want to develop an image tool that specializes in delivering the best results?

The latter, using the edits endpoint, is more focused, where you can observe exactly what is sent as input to receive your product and can directly specify parameters. Chat only amplifies your costs per image.

gpt-image-1 model in any capacity requires you to submit to the ID verification with scans of a government ID and videos of yourself - submitted to an unreliable third-party.