@marginalia several Years ago, I ran an experiment in my crawler: I tried using HEAD request for external links, with the (unfounded) hope that the sites that usually block crawlers won't block HEAD requests, since they have no body, i.e. no content to protect. After two weeks of running the experiment, I found that those sites block HEAD requests just as well, and worse, I found out that many sites don't support HEAD requests at all. Do you have recent data on the state of HEAD support across the webs? Thanks!
@zefu I don't have any statistics, but I agree with the description that HEAD is pretty poorly supported, and you have to be prepared to try again with GET.
I think the most common case for HEAD is to use it to grab Last-Modified and ETag, and for dynamically generated pages, that may still entail doing most of the work for rendering the page, e.g. multiple database queries. So in that sense, I do think the spotty HEAD support makes sense, as network bandwidth is very rarely a bottleneck.