How to check if a file exists over HTTP
Here’s a piece of C# code that determines the existence of a file over HTTP, given its URL. (Note that URLs should be encoded.)
try
{
WebRequest request = HttpWebRequest.Create("http://www.microsoft.com/NonExistantFile.aspx");
request.Method = "HEAD"; // Just get the document headers, not the data.
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
// This may throw a WebException:
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusCode == HttpStatusCode.OK)
{
// If no exception was thrown until now, the file exists and we
// are allowed to read it.
MessageBox.Show("The file exists!");
}
else
{
// Some other HTTP response - probably not good.
// Check its StatusCode and handle it.
}
}
}
catch (WebException ex)
{
// Cast the WebResponse so we can check the StatusCode property
HttpWebResponse webResponse = (HttpWebResponse)ex.Response;
// Determine the cause of the exception, was it 404?
if (webResponse.StatusCode == HttpStatusCode.NotFound)
{
MessageBox.Show("The file does not exist!");
}
else
{
// Handle differently...
MessageBox.Show(ex.Message);
}
}
As you can see, it’s fairly simple – we use the HttpWebRequest class to perform an HTTP request using the verb HEAD.
What is an HTTP HEAD request?
HEAD is similar to GET, only that instead of getting the file contents, we get just the headers. This is what proxies do to: When a proxy server gets a GET request for a URL it has in its local cache, it performs a HEAD request to the real server, to determine the requested file’s date/time stamp. If the file on the server is newer than the cached copy, the proxy will download it and cache it again, and of course – will serve the newer version to the client.
So how does this code work?
If the web request we perform will stumble into a non existing file, a WebException will be thrown: So we just need to catch the WebException and deal with it.
However, consider these two important notes:
1. Don’t assume any WebException indicates a 404 error, check its StatusCode, after casting it to a HttpWebException type. You might get a 500 Internal Server Error or a 401 Unauthorized response when doing web requests, so it’s important to check the error code.
2. As far as I checked, the HttpWebRequest might get redirected to a different URL, and you will not be notified of that. Some sites perform redirects to custom 404 pages instead of regular pages, and would give out a HTTP 200 OK code eventually, since they served the custom 404 page successfully. So check how that website behaves.
You’re welcomed to contribute your own ideas on how to improve this code. This is such a trivial task, yet I found almost no code samples for it online.
Dor Rotman.