Limiting the power of the Googlebots

A coalition of online publishers are joining together in an effort to increase their control over when and how much of their content appears on search engines, Time reports.

Currently websites are able to exert some control over which of their pages search engines have access to by inserting a text known as robots.txt. In effect, these files contain a set of instructions to the netcrawlers that search engines use to map and index the web, allowing the website to block indexing of individual web pages, specific directories or the entire site.

The coalition of publishers wants to extend the kinds of commands they can put into these robots.txt files to expand the control they have over their content by, for instance, limiting how long search engines may retain copies in their indexes or telling the crawler not to follow any of the links that appear within a web page.

The publishers say this will better enable them to express terms and conditions on access and use of content – in particular, they’re concerned about their information staying on search engines long after they’ve locked it on their sites, or excerpts and headlines being used without their permission.

There could be an obvious downside in terms of users’ access to information, however. A Google spokesperson said the company supports all efforts to bring web sites and search engines together, but needed to evaluate the proposed “automated content access protocol” to ensure it can meet the needs of millions of web sites — not just those of a single community. “Before you go and take something entirely on board, you need to make sure it works for everyone,” the spokesperson says.

 

COMMENTS