That's really strange. I've implemented this a few times, referring back to this episode, and didn't have any issues with it's implementation. I did use the Cloudflare Proxy on those apps, so I wonder if that sets something in the header that could make a difference. It's pretty annoying to test those kinds of things out since the browser does some level of caching. It's usually a dance of having to clear my local DNS settings and the browser's.
I forgot to mention this, but I've used this methodology with a few different custom models for an expense tracker app that I use for personal finances. I use it to upload receipts which go through an OCR process and then feeds the receipt text to the LLM. It makes a few calls to determine the date, vendor, category and total price. So, it cuts down the amount of data entry that I have to do and all newly uploaded receipts to into a "pending" state until they can be reviewed and "processed".