Guest Post By Michael Castello On Google: “Block The Bullies”

May 15, 2014 by Michael Berkens

This is a guest post by Michael Castello who is the CEO and President of Castello Cities Internet Network, Inc. who is solely responsibly for its contents

CCIN owns, manages and develops some of the most recognized Geo and Generic domain name brands in the world including PalmSprings.com, Nashville.com, Manicure.com and Traveler.com. He is also the owner of Daycare.com which he and his wife Sheri founded in 1997.

In 2014 Michael sold Whisky.com for $3.1 million which is the 24th largest domain name sale in history.

Michael has spoken internationally at many conferences including TRAFFIC, Borrell Advertising Conference, GEO Domain Expo and the Internet Marketing & Domaining Conference in Punta del Este Uruguay among others.

He was inducted into the Targeted TRAFFIC Hall of Fame in 2009 and Geo Domain Hall of Fame 2010.

Michael was on the Board of Director for both Associated Cities and Geo Publishers and has been an active member of ICANN’s Business Users Constituency since 2009.

Here is the Guest Post:

As a child, I did not like spiders. I even have recordings that my father made when I was 2 years old where I said, “I don’t like SPIDERS!” Now, I think I know why.

Block the Bullies is an effort to level the playing field between domain owners and search engines.

In my opinion, search engines are hogging the bandwidth that website owners pay for.

I pay a lot of money to have a large “pipe” for my website servers, so more people can use Nashville.com or Manicure.com without waiting a long time for pages to appear.

However, search engine spiders swarm through this “pipe” constantly, restricting the bandwidth that I want my visitors to use. This is a kind of Denial of Service (DOS) attack. I may be losing customers if a visitor waits too long, leaves, and goes to a competitor. This competitor may be a large corporation which can spend millions of dollars more on a much larger pipe.

What are our options?

Well, we want search engines to find our content because many visitors find us via search. That’s fair enough – but I only want my virtual house to be invaded when I open the door and let visitors in.

Shouldn’t I have the right to manage my front door?

Block the Bullies is an effort that empowers users to start using their server’s ROBOTS.TXT file.

It should be guaranteed by a virtual Bill of Rights.

The file can tell search engines to look only at what we want them to look at, or go away entirely.

Google is the 800lb gorilla in the room. Google made most of its power and money by putting your content on their pages for people to search for. Then they started making billions allowing competitors to pay top dollar to be on top of the search results. I think Google owes us more respect.

Try putting robots.txt after any website address, i.e. http://www.cnn.com/robots.txt

This is a line I’d like to see more of:

User-agent: Googlebot
Disallow: /

It tells Google to go away.

Some might think that is commercial suicide, and while I agree, there is no reason why we can’t do this symbolically once a year to make a statement that WE are still in control.

I would take it further: search spiders should read the robots.txt file to tell them how often and at what time they can visit our websites.

I want them to come at 2am daily, when business is slow.

Let’s use the tools available to us to gain more leverage.

The more of us that engage in our movement, the more we will make ourselves profitable in a future that is looking less friendly to the individual entrepreneur, who assumes the risks of running a business.

I invite everyone who owns a domain name to join us at TRAFFIC Vegas this month to talk about our future, and how we can make ourselves more self-sufficient and successful in the virtual world.

Thanks for your time,

Michael Castello

You can read about Michael Call To Action program here

You can read about the sale of Whisky.com here

About Michael Berkens

Michael Berkens, Esq. is the founder and Editor-in-Chief of TheDomains.com. Michael is also the co-founder of Worldwide Media Inc. which sold around 70K domain to Godaddy.com in December 2015 and now owns around 8K domain names . Michael was also one of the 5 Judges selected for the the Verisign 30th Anniversary .Com contest.

« Tucows Reports Earnings & New gTLD’s Accounted For 6.5% of Total New Sales

Univision Launches Mobile Service & Got The Perfect Domain For It Through A UDRP »

Comments

John McCormac says

May 15, 2014 at 10:51 am

The real problem for large content sites isn’t from the Google/Microsoft etc spiders but rather from content scrapers. A lot of the issues mentioned can be dealt with in the robots.txt file and it is possible to change the rate at which Google spiders a site in its Webmaster Tools interface.
- Domenclature.com says
  
  May 15, 2014 at 3:39 pm
  
  Google does NOT obey Robots.txt file.
  I’ve been trying to ban them on all my sites now for months. My hosts say that it is voluntary on the Search engine’s part. WHAT???? Voluntary? I’m PAYING FOR THIS GODDAMN Server?
  
  Look, I support Castello 100% on this. This is important. But I must caution you, all the SPIDERS do on websites isn’t just soak up band=width, they are there to do serious damage, period. And they have been. I want a serious comprehensive action on this; I want Washington DC. alerted, and involved in the investigation.
  - Domenclature.com says
    
    May 15, 2014 at 6:27 pm
    
    Let me clarify that my 100% support is just for this issue of Google, and other search engines intrusively masquerading on my websites, without a commensurate gain. In fact, I don’t want any serach engines at all on my websites. I will acquire clients and customers fro TV ads. It should not be voluntary for search engines to get access to my sites. Hosts are apparently stupid lap-dogs for Google, else they should effectuate technology to ban their asses when their client, who pays their bills insist on kicking Google out. The whole internet backbone has have got to grow a pair! Why let one entity to dominate everything, and make billions, and the rest just go along with whatever it wants because it’s big?
    
    I don’t support Castello’s general proposal as it is now at Ricksblog.
citcitsac says

May 15, 2014 at 11:04 am

thanks for article
DNPric.es says

May 15, 2014 at 11:23 am

Had to disable AhrefsBot in the past as they were sending thousand requests.

See the poetry here: http://dnpric.es/robots.txt

Then our coders implemented smart firewall that kicks the ddosers out back to their localhost for a long time to make them think twice.
- John McCormac says
  
  May 15, 2014 at 12:35 pm
  
  Some of the other content scrapers won’t use robots.txt. A more effective way on Linux servers is to add the offending IP address to iptables and just drop incoming packets.
  - Domenclature.com says
    
    May 15, 2014 at 6:36 pm
    
    @John McCormac
    
    It’s impossible to ban all the IP’s Google uses to visit a site; they use Vietnam, East European Countries, IRAN, Syria, China and many unfriendly Country IP’s to visit a site when they are out to do nefarious things on a site.
    - John McCormac says
      
      May 15, 2014 at 9:44 pm
      
      It is often not Google that causes the problems. Most Google spiders come from the 66.249.64.0 – 66.249.95.255 IP range but the user agent (Googlebot) is one of the most spoofed user agents around and many of these nefarious content scrapers will spoof it so that it appears that the activity is from Google. Because many webservers don’t have reverse dns switched on, all that is recorded in the webserver log is the IP, the user agent, the page requested and the referring page. It is also possible to block most activity from an entire country by IP range either in the webserver configuration (httpd.conf on Apache) or at an IP level at the firewall.
      
      A lot of the content scraper activity comes from data centres and Amazon cloud. It is possible to block these ranges and one of the best resources is the Search Engine Spider and User Agent Identification forum on Webmaster World ( http://www.webmasterworld.com/search_engine_spiders/ )
Francois Carrillo says

May 17, 2014 at 5:26 am

— WARNING—

Just a note to warn everyone to not add this snippet of code in their robots.txt as (maybe too quickly) suggested by Michael:

User-agent: Googlebot
Disallow: /

I am not a SEO expert but from what I recall (I stopped playing SEO in 2006), one single day this stuff posted is enough to have all indexed pages of your site removed from the SERPS and after it will take months or years to be reindexed!!!

So I am not sure the more penalized will be Google there, at the inverse:
You can simply run out of businness if your site largely depends from Google organic traffic (like most).

Maybe a SEO expert can confirm?

So if you are not happy with the Google gorilla, I recommend to not follow this suggestion.
Michael who is very active trying to defend domainers rights will probably find a smarter way to have domainers voice listen by Google.

Thansk Micahel to take of your time and money to try to have the domaining industry respected.
Louise says

May 18, 2014 at 12:56 am

@ MIchael Castello said

In my opinion, search engines are hogging the bandwidth that website owners pay for.

I pay a lot of money to have a large “pipe” for my website servers, so more people can use Nashville.com or Manicure.com without waiting a long time for pages to appear.

However, search engine spiders swarm through this “pipe” constantly, restricting the bandwidth that I want my visitors to use. This is a kind of Denial of Service (DOS) attack. I may be losing customers if a visitor waits too long, leaves, and goes to a competitor. This competitor may be a large corporation which can spend millions of dollars more on a much larger pipe.

What are our options?

Thanx, people, for your suggestions, and @ Francois, for your feedback! My host agreed googlebot doesn’t cause issues. But bingbot does!

Support at my host kindly gave his permission to share his suggestion. The conversation went

[Michael Castello]needs to examine the access logs and block bots that consume that traffic . . . I know that bingbot can cause such issues . . . I have never seen issues with googlebot . . .Bingbot uses hundreds of different IP addresses when it scans a website . . . It would take days to configure your firewall to block it . . . On the other hand, it takes just a few text lines in .htaccess file to block it.

Here is the sample .htaccess code:

Options +FollowSymLinks
RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} .*bingbot/.* [NC]
RewriteRule ^.* robots.txt [L]

Of course, you can block several search engine bots this way:

Options +FollowSymLinks
RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} .*Exabot/.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*AhrefsBot/.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*bingbot/.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*YandexBot/.*
RewriteRule ^.* robots.txt [L]

All above rules will redirect listed search engine bots to your robot.txt file, saving server resources and traffic.
Louise says

May 18, 2014 at 12:58 am

Pretty cool, eh? That is support at my host, Fluidhosting.com. The latter code and comments is his words on the support ticket, quoted.