Archive for search engine spiders
You are browsing the archives of search engine spiders.
You are browsing the archives of search engine spiders.
UPDATE: Editors’ Note: At the request of Google, we’ve removed the photo of Google engineer Jayant Madhavan, co-author (with Alon Halevy) of the Google Webmaster Central blog post, Crawling through HTML forms, posted by Maile Ohye, Senior Support Engineer at Google. The photo was deleted at Google’s request to respect the privacy of Google’s corporate data and the personal privacy of Jayant Madhavan.
– Kevin Heisler, Executive Editor, Search Engine Watch

A few hours ago, Google announced to the world that the company has been crawling forms on “high-quality” Web sites to index “Invisible Web” content in the Google.com search engine.
Google’s intention (as always) aims to improve the quality of search results for users of Google’s search engine.
Crawling Web site forms, though, constitutes a sea change in terms of data privacy; specifically, the privacy of corporate data.
“In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google,” according to Jayant Madhavan and Alon Halevy, from the Crawling and Indexing Team on an official Google blog.
Here’s how Googlebot does it, according to Google engineers:
“We might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.”
Last year, as the search marketing analyst for JupiterResearch, I said that the biggest issue in 2007 would be the threat to the privacy of corporate data.
I was wrong, 2008 is the year corporate IT departments worldwide will be forced to spend time, money and resources to ensure that search engine spiders do not inadvertently index data a company would prefer to be private.
The same holds true for non-profit organizations and other institutions.
From a personal standpoint, I have confidence in Google’s data security systems, despite the recent departure of Google CIO, Doug Merrill.
I have full confidence that Google practices “good Internet citizenship.”
I’m confident Google has paved the road to relevance with good intentions.
This is not simply a “pioneering move” by Google.
That the robotic filling-in of forms has already been practiced by AOL’s Quigo, according to SearchEngineLand, does not reassure me.
I’m sorry, Sergey, Larry, Eric. I can’t in good conscience defend Google’s decision to our readers. The costs to CEOs, CIOs and CTOs at corporations far outweigh the benefits to consumers.
Please, reconsider.
Do not make the robotic querying of Web site forms the default spidering practice for Google. As a search engine, Google has become the gateway to the Internet and with great power comes great responsibility.
End this experiment now.
Stop this experiment before the backlash against Google develops. It’s not a question you want to answer when Wall St. analysts quiz you on the company’s performance on April 17th during the First Quarter earnings conference call.
Want a snapshot of the day’s search marketing news? Here we’ve collected today’s top news stories posted to the Search Engine Watch Blog, along with search-related headlines from around the Web:
From the SEW Blog:
FTC approves Google’s acquisition of DoubleClickThe FTC’s investigation focused on antitrust issues, and in its clearance opinion released today, explicitly rejected any [...]
Huge databases that generate Web site content on the fly can be the bane of search engine spiders’ existence. They can’t find pages; they can’t see URLs. So they can’t index pages. In a two-part SearchDay series, “Search Engine Visibility and Site Crawlability, Part 1,” and “Search Engine Visibility and Site Crawlability, Part 2,” Eric [...]
The Flash Player has been installed on millions of PCs worldwide, making it an attractive way for web developers to present content to their site visitors.
As much as site publishers want to present the best possible presence to their visitors, they equally want to stay in the good graces of search engine spiders, and Google’s [...]
If you have a website you really need to have a robots.txt file. It gives search engine spiders…
More: continued here
the easy guide to making a robotstxt fileRate this: 2.5
Search Engine Spiders love new content. Therefore they visit press release sites, article submission services and blogs frequently. Placing a link to a website will in the signature block of press releases blogs and articles will get the link crawled by search engine spiders quicker then submitting…
More: continued here
write your way to more [...]
Smart Search Engine optimization starts with looking at web pages with one eye closely watching the search engine spiders. once that the search engine optimizer is capable of doing this, he or she is …
More: continued here
how search engines rank web pagesRate this: 2.5
The early stages of internet and the search engines have seen the power of Meta Tags in ranking a website on the top of the search engine results. Meta Tags refer to the keyword rich text inserted into an HTML page. It is visible to the search engine spiders yet remains hidden through a browser. [...]
Its true, backlinks almost control the internet. To prove it, put up a site and don’t promote it and you will soon find that you site will get almost zero visitors and worst of all no one will know your site even exists. Search Engine marketers know this and as a result many services have [...]
A description of how a font that is size H1 can be part of a good SEO strategy as the search engine spiders consider keywords in that font to be more important.If you want to search engine optimize your web page keywords using fonts then you might want to consider using the web design software [...]