This is an upgrade of the code previously posted in
Catching Unwanted Spiders And Content Scraping Bots In ASP.NET.
To use it, add the following to your code-behind files on the page you've set up to catch the unwanted scrapers...
GrokkingCode.ClientTrap.BadClients.Instance.AddClient();
... and the next line of code gets added on the pages you don't want scraped...
GrokkingCode.ClientTrap.BadClients.Instance.TestClient(false);
... if you don't want to compare the UserAgent and ...
GrokkingCode.ClientTrap.BadClients.Instance.TestClient(true);
... if you do.
Remember to put the bad bot catching page in your robots.txt file so legitimate bots won't try to access it and badly behaved bots will.
The following code is saved as App_Code/BadClients.cs
using System;
using System.Web;
using System.Web.Caching;
using System.Collections.Generic;
namespace GrokkingCode.ClientTrap {
/// <summary>
/// Handle clients forbidden from accessing site
/// </summary>
public sealed class BadClients {
private HttpContext http = null;
private Dictionary<string, string> dictBadClients = null;
private string sUserIP = "";
public static BadClients Instance {
get {
try { return (HttpContext.Current.Items["oBadClients"] ?? (HttpContext.Current.Items["oBadClients"] = new BadClients())) as BadClients; }
catch (Exception ex) { throw new Exception("Failed to instantiate BadClients.", ex); }
}
}
private BadClients() {
http = HttpContext.Current;
sUserIP = http.Request.UserHostAddress;
dictBadClients = http.Cache["badclients"] as Dictionary<string, string>;
if (dictBadClients == null) {
dictBadClients = new Dictionary<string, string>();
http.Cache.Insert("badclients", dictBadClients, null, Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(60), CacheItemPriority.Normal, null);
}
}
public void AddClient() {
if (!dictBadClients.ContainsKey(sUserIP)) {
dictBadClients.Add(sUserIP, http.Request.UserAgent);
}
return;
}
[Obsolete("You should specify a MatchAgent value. The default of true is being used.")]
public void TestClient() {
TestClient(true);
}
public void TestClient(bool MatchAgent) {
if (dictBadClients.ContainsKey(sUserIP)) {
if (!MatchAgent || dictBadClients[sUserIP].Equals(http.Request.UserAgent)) {
http.Trace.Write("Attempting to block client from " + sUserIP);
try { http.Response.Clear(); }
catch { http.Trace.Write("Could not clear response buffer. I suggest you move the TestClient call to earlier in your code."); }
http.Response.End();
}
}
}
}
}
The class is a drop-in replacement for ASP.Net 2.0 users to the ASP.Net 1.1 compatible version. I made the following changes...
- TestClient() has an optional boolean parameter to ignore the UserAgent sent by the visitor's browser. The default is true to match the previous version's behavior. In the previous version, a bot could evade the block by sending a different UserAgent with each request. Passing false will allow you to block by IP address alone.
- I've extended the lifespan of the cached entries to 60 minutes.
- I have removed the unused using statements.
- The tests for a web environment have been removed as unnecessary overhead. Even a beginning programmer will figure out real fast that this is of no use outside web applications.
- Reinserting the storage object over itself in the cache has been removed. According to this article on ASP.Net Cache and Session State Storage the object is stored in the cache as a live reference to a real memory location, meaning the call to Dictionary.Add() is updating the object referenced by both the cache object and the dictBadClients variable. The call to Cache.Insert() was redundant.
- The constructor was changed to remove the check for the cache entry and use Cache.Add() if it wasn't present. Using the C# "as" type casting returns a simple null when the cache entry isn't there and reduces the chances for a race condition between different site visitors. The Cache.Insert() works just as well whether the object already exists in the cache or not.
- An [Obsolete] directive has been added to the TestClient() without parameters to warn people dropping this in in place of the prior version that they should specify whether or not they want to match UserAgent strings.