MSNbot: stupid or just plain evil

by Maria on March 19, 2010

Many spiders index the web every day and cause no problems. In fact, we don’t even notice them do their job. But there’s an exception to every rule and among crawlers, MSNBot is the ugly duckling.

MSNBot: Stupid or Evil?

Judge for yourself:

1. Microsoft claims that MSNbot will only request pages once every 10 seconds. In reality, MSNBot could be making 5-10 requests per second. In fact, MSNBot gets so bad at times that it could trigger a DOS attack – alone! If don’t believe me, check out what CPAN testers have to say about MSNBot.

2. There have been multiple reports about MSNBot ignoring (or not recognizing?) nofollow and noindex rules. Apparently, it is illiterate in its own language because it often does not (not sure if intentionally or not) read and follow the robots.txt. If you don’t believe me on this one, see what GitHub has to say about their experience with MSNBot.

There has even been a story about MSNBot using a different website’s robots.txt!

3. MSNBot could be crawling your website but not including it in the index. I can’t tell you how many times I have been wondering why. And I am not alone: SEO Chat forum has a thread on this.

4. MSNBot could be crawling your website and removing pages from the index, replacing them with outdated content that should be deleted. Whether it does this because it’s stupid, or as part of the global evil plan, this sucks. Search Engine Roundtable talked about MSNBot crawling fake file names some time ago.

5. MSNBot is not shy and is likely to try to access a private server. Knock-knock, damnit.

6. There have been instances when MSNBot was not recognized as a crawler altogether! Apparently, WebTrends does not like stupid.

7. Slow indexing is another well-known fault but I’m sure we’d be forgiving of this if there was solid intention to change for the bet and failedter. Really, I am not the only person who has tried to report MSNBot problems to Microsoft (see Toolserver Journal’s experience when they tried to make MSNBot better). Indeed, you could try to visit the Bing Webmaster Center troubleshoot link but it’s unlikely you’ll get anything out of it. You could also email msnbot@microsoft.com to get a user unknown.

In my experience with MSNBot I have gone from thinking that it’s stupid to believing that it is intentionally evil. And the lack of desire from Microsoft to fix this is really something they should be ashamed of. These issues are causing good, reputable websites to elect to block MSNBot, block it for good. Really, Microsoft, do something before we all give up on it.

{ 1 trackback }

MSN / Bing crawler spider madness. « Computer Solutions Blog
May 18, 2012 at 7:21 am

{ 13 comments… read them below or add one }

GirlyGirl May 25, 2010 at 12:19 pm

I’ve been sending links to this site to MS. I wish they would fix MSNbot, but they seem to spend their time making Bing pretty instead of working better. We need to get on their case.

u64 November 17, 2010 at 7:08 am

LOL, that evil monkey-bot may be a result of Ballmer’s
developers-developers-developers-developers jump-around
rant.

Just had a visit from it on my home webserver
msnbot 207.46.204.223
is now on my PeerBlock blacklist.

David April 5, 2011 at 4:14 am

Got here after I was looking for info on MSNbot after I found out it sucked 7.8GB (!!!) by ONE of their bots (msnbot-157-55-116-79.search.msn.com)!
I’m adding their IPs to the .htaccess file, but isn’t there any other elegant way?
And no – they are not obeying my robots.txt which is defined to dissallow all bots.

Jack May 20, 2011 at 4:05 pm

msnbot triggered a dos attack alert on my server it had over 100 tcp connections over a range of IP addresses, I blocked their range with APF. Sorry msn you are not worth the headache

Deepak June 2, 2011 at 4:37 pm

we are having the same issue as mentioned by @David and @Jack. In the last one week msnbots have bought our servers down twice. We have blocked all the ip ranges of msnbot now!!!

Nick June 16, 2011 at 11:46 am

Urgh, and the headache with msn-bot starts again.

207.46.204.240 has brought the server to a hault twice in 2 days.

Ray June 26, 2011 at 4:50 pm

MSNbot has been hammering our site for the past few days setting off our max connections the entire weekend killing performance on the server. It doesn’t make any sense to me as to why they would attack the server, we setup some additonal error logs and after further diggin we found that the bot was stuck constantly hitting a url that was maxing out. Hopefully this is an error on our side because traffic is traffic and these days you want as much as you can get.

Nick June 29, 2011 at 10:12 am

Yep, MSN brought server to standstill today. Not a happy bunny.

Kenny G. Adams July 2, 2011 at 2:35 pm

I just got 2,347 hits from msnbot-207-46-xxx-xxx.search.msn.com within 7 minutes! Because of the high number of requests I’m having to block them. Not many servers can handle that many page loads and I’m sure they know that.

Lawrence May 18, 2012 at 8:00 am

Seems they’re still doing this 2 years later, as I had to block them for exactly the issues noted above:

- multiple simultaneous connections
- tons of bandwidth being used
- ignoring robots.txt

In other words – being typical Microsoft..

My notes / logs on that here –
http://www.computersolutions.cn/blog/2012/05/msn-bing-crawler-spider-madness/

Trevor December 10, 2012 at 3:04 am

Hi Lawrence,

Can you please explain how you blocked them if the msnbot ignored your robots.txt file.
Cheers.

Trevor

WebDev June 13, 2013 at 12:11 am

Yesterday msnbot-media flooded our servers. 20 competitive connections whole day!!!
How stupid is this???

Az October 5, 2013 at 5:23 am

“Yesterday msnbot-media flooded our servers. 20 competitive connections whole day”

Check this out:
# netstat -ntu | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -nr | head -n 10 | grep -v 127.0.0.1
953 65.52.104.109
225 157.56.93.60
202 157.55.34.94
182 157.55.32.86
167 157.55.34.32
122 65.52.104.141
100 157.55.35.117

ROFLMFAO

Leave a Comment

Previous post:

Next post: