I had a bit of fun exploring Bing's Martian bullshit seam, but more seriously, the problem isn't limited to Bing, and it illustrates how models created by parsing token relationships from human natural language fail.
There's this hope that we can front LLMs on top of "truthful" search results, but the fact is as soon as you obscure the context of the result it becomes impossible to distinguish facts from bullshit.