The Censorware Project

Blacklisted by Cyber Patrol

From Ada to Yoyo

A report from The Censorware Project


Problems

We have already seen that the improper and overbroad blocking is a serious problem.

But must it be this way? Might Cyber Patrol 5.0, or 6.0, be a spectacularly better product that will eliminate all our concerns?

To answer this, we must consider how filtering software's databases are constructed. We need not get too detailed. The exact means by which Microsystems Software creates its blocklist are not much of a concern to us; let's consider how it might best be done, theoretically.

The web contains somewhat over 200 million pages, and rating each of these pages "by hand," using a web browser, might take about two minutes. This is 3,600 person-years of full-time work. That figure is our starting point: three and a half person-millenia.

Censorware companies do not use just a web browser to examine web pages, however. They have other tools at their disposal which can reduce the workload.

The results will not be as accurate, however. turtle A compressed JPEG file takes less time to load, but the more data is squeezed out of it, the blockier the image gets.

And the more a censorware vendor uses automation to replace human eyes and brains, the "blockier" their blocklists will become.

They can use web spiders -- theirs or existing search engines' -- to try to find keywords. They can write custom software to present questionable information rapidly, so the employees can make judgements while having to read less and less data. They might even try to use years of failed AI research to have the computer guess the difference between a breast and a chicken breast.

How much advantage will this give them over surfing with a web browser? Will it eliminate 50% of the work? 90%?

Even if ninety percent of the work can be eliminated, rating the web would take 360 full-time employees a year. But the web is not a static medium, like a library, where books once censored remain censored. The average webpage lasts only two months before changing or disappearing.

And new websites are being created at a fantastic rate: over a thousand domain names are registered each day, each hosting an unknown number of sites. As that year came to an end, the work would already be hopelessly out of date -- and the next year's web would be twice as large and take twice as long.

Common sense tells us that, to stay ahead of the web, censorware vendors must drastically reduce their human workload by making blanket judgements about thousands of pages at once, and by providing their employees so little data that they must make snap decisions based on almost nothing. And this is exactly what we have seen in practice: the blocking of twenty thousand websites because they are gay-oriented, and the blocking of sites which have no explicit content whatsoever.

turtleNow, let's take a look at what we know of how Cyber Patrol operates. Susan Getgood, a public relations employee of the company, wrote on April 22nd, 1997:

Cyber Spyder visits the sites and creates a report including 25 characters before and 25 characters after each occurence of the keywords used in a particular search. The researchers start by reviewing this report. If necessary, the sites are visited and viewed by a human being before being added to the CyberNOT list. If not necessary, the sites are not viewed or added. For example, if the context of the word "breast" was the proper way to prepare chicken, that is a good indication that the site doesn't meet the CyberNOT criteria.

Twenty-five characters of text is not very much. Here's an example on the keyword "breast" -- this could be a recipe or a sex site:

want you to squeeze that breast to be sure it's firm, and

Now, that's rather contrived, but the point is that context can make a great deal of difference. Here's a less contrived example on the keyword "porn":

This page contains HOT PORN. If you are under 21 or

That page is called "XXX Hot Sex Site," and it's just a joke. It leads you to Oliver Clark's Totally Useful and totally non-pornographic homepage. Thanks to his little joke, Mr. Clark's entire site is blocked by Cyber Patrol as FullNude SexActs -- including, and we're not making this up, his recipes for cooking a chicken breast.

Ms. Getgood then added:

A site that is added to the CyberNOT list is viewed by a person before being added.

This would imply that any errors made would be to mistakenly not block a site: false positives rather than false negatives. This is not the case. We can pretty well guess that nobody clicked through to look at what Oliver Clark had to offer. And the members.tripod.com site may have been visited by a person, but relatively few of its 1.4 million pages ever met a human being's eyes before the decision was made to block it.

And how could they? The human beings who work at Microsystems Software, doing their part of the blocking process, are not paid to uphold a Constitutional right to free speech, or to adjudicate each individual webpage with the utmost of care. Their job is to find as much pornography as they can, as quickly as possible, and block it all. The web is huge, and Cyber Patrol's staffers cannot take the time to carefully consider each page or even each site at members.tripod.com. So the entire site, all 1.4 million pages, is either blocked or not.

We simply cannot expect otherwise from software that costs $29.95 -- and this is not to fault the manufacturer, this is simply a fact. If I hire someone to mow my lawn and they return after thirty seconds, asking for the dollar I promised, I don't have to take a close look to know they didn't do a very good job.

Whether the task is cutting my lawn or bowdlerizing my library, more powerful tools mean you can cut a wider swath. But a certain minimum amount of human guidance is always required, no matter how powerful the tool.

In the case of the internet, that minimum amount is impossibly large.

turtleThere are two things that Cyber Patrol 5.0 or 6.0 could do better, which could dramatically improve the product's ratio of misblocks.

The first is to notify blocked sites. Oliver Clark, and hundreds of others like him, would very much have appreciated a short note letting them know they were blocked. They should not have to rely on muckraking activists to drop them a line about their own site.

And the second is to make the entire list of blocked sites public.

We do not hold out much hope that these steps will be taken. Making the blocklist public invites Cyber Patrol's competitors to steal the work that's been invested. And while there may be ways to fight this with copyright law, the company almost certainly does not want to take that risk.

And, notifying blocked sites is a tedious process. Not every site has a contact address of <webmaster@sitename.com>. It takes time to locate the proper email address on a site, time that must be multiplied by the hundreds of thousands of sites which are blockd. And, more importantly, it takes time and personnel to handle the responses to the email, and to staff properly a committee to review the blocks which are protested. We must pessimistically assume that market pressures will put out of business any censorware company which spends its revenue on such an extravagance.

Still, while the software would remain far from perfect if these remedies were implemented, it would at least bear re-examination. Removing the silence and the secrecy of the blacklist is the most important first step.

But until that first step is taken, nothing much will change. Cyber Patrol can fix up each and every improper block we've pointed out in this report. But the improper blocks are just the symptoms. The problems will remain.

The image of the sea turtle is taken from Explore Underwater, an online magazine which happens to be blocked by Cyber Patrol. The image degradation illustrates the "blockiness" that results as less and less information is available.

It bears repeating: we are not comparing Cyber Patrol against other products of its type. Almost all of its competitors, if not all of them, suffer from the same problems we have identified here. And we take no position on its value relative to competing products.

back to top
next: libraries