Censorship and privacy in the Net

"...there is nothing new under the sun" (Ecclesiastes 1:9)
Censorship is an ancient phenomenon indeed. It existed in the Egypt in times of pharaohs, we can find it in the Bible (e.g. persecution of prophets). In all times, there have been those with enough power to slap the mouths of dissidents (or to amputate the mouth along with the head).

The Index librorum prohibitorum of the Roman Catholic Church is a venerable document which was abandoned only in 1966 (and even then rather de facto than de jure). Throughout its long history, it listed a lot of well-known authors (including a large number of Christian ones) but did not contain some names whom the church generally opposes, e.g. Karl Marx or Adolf Hitler.

In turn, the USSR banned the Bible, the Qur'an and all religious literature (there are lots of colourful stories about the Bible-smuggling). The censorship in the former "second world" (still alive in China, North Korea and Cuba) was one of the most profound in history, "sanitizing" nearly all aspects of society.

A pretty weird beast was the Federal Anti-Obscenity Act (commonly known as the Comstock Act) in the US. The bad guys included Voltaire, Aristophanes, Chaucer, Boccaccio and others. What is really interesting - books banned in the US have included "Uncle Tom's Cabin" (accused of racism based on using the word "nigger" - actually it's a VERY anti-racist book), Hugh Lofting's Doctor Dolittle stories (ditto - while they do contain some "politically incorrect" views or words for today's reader, they are deeply rooted in humanity), Mark Twain's "Tom Sawyer" and "Huckleberry Finn" (possibly for the "romantic tramps" and some stings towards the church...). A quite detailed account can be found here. Of today's books, Harry Potter has frequently been banned (mostly for describing witchcraft) - interestingly enough, the "father of fantasy literature" J.R.R. Tolkien with his deep mythodology has mostly managed to escape the censorship...

Samizdat and free culture
Samizdat (roughly "self-publishing" in Russian) was a long-standing practice of illegal copying and distribution of materials which were considered "harmful" by the authorities of the former Eastern Bloc (most of all in the USSR). This included mostly literature, but also other media (movies, etc). The authorities fought samizdat by harsh punishments (possessing an "anti-Soviet" book was enough to send one to prison for years. A well-known writer Vladimir Bukovsky has said on samizdat: "I myself create it, edit it, censor it, publish it, distribute it, and get imprisoned for it.") and severely limiting potential copying mechanisms (typewriters had to be registered in order to identify the sources of samizdat; western photocopying equipment only became available near the end of the Communist system). Photography was widely used to duplicate books (Note: the supervisor of this personally owns a carefully photographed stack of black-and-white 7x13 cm photos of an American book on Aikido - from the times when martial arts were outlawed in the USSR), a lot of material was also copied by hand (despite the obvious threat of recognising by handwriting). Subjects ranged from direct political opposition to the regime to "un-Soviet" literary works to spiritual/religious texts to martial arts. Several copies could be made by using the |carbon paper] - this was used both for typing and handwriting. By default, people who received a piece of samizdat material were obliged to distribute it further.

(A side remark: considering all the abovesaid, the USSR was fortunate to dissolve before the wide emergence of personal computing, Internet, free and open-source software, and the free culture in general. It would have been extremely difficult to maintain the closed-down society - further closure would have had a very adverse effect to the already seriously ineffective economy.

Perhaps the only way the Communist system could have survived would have been to embrace the new open models, and at the same time launch a very effective propaganda system (this has been somewhat achieved in China, and also in the largely Communist-nostalgic Russia). On the other hand, this would have demanded radical increase in economic efficiency which would have been rather improbable.)

Today's free culture (free and open-source software, Web 2.0 and social software, community-driven creation and publication of content) shares a lot of similar features with samizdat:


 * community-based creation and distribution (Wikipedia, YouTube, Flickr...)
 * obligation to distribute further (compare to free licenses!)
 * nearly impossible to block (RIAA, MPAA, BSA and others!)
 * a wide variety of views

While the free culture does not currently face the persecution comparable to the samizdat authors of the USSR, some obstacles are also similar. Examples include the recent rise of censorship worldwide as well as quasi-governmental organisations which serve the interests of large corporations.

Somewhere in Time
The old times of the Net were quite free of censorship for a time - early hackers were probably the antithesis to the circles promoting censorship. The storm broke out with the publication of the July 1995 cover story of the Time Magazine, "Cyberporn" by Philip Elmer-DeWitt. While generally deemed largely incompetent (and by some critics even fraudulent; see the articles here), this event started the large-scale debate about online content.

As with other emerging media, also the Net sparked different opinions on the information that passes through it. There are people who support absolute freedom - Internet as the flagship of free speech, limiting of which is seen as emergence of dictatorship. There are others who want "controlled" Internet - they want competent bodies who should generally allow free spread of information but remove radical thoughts and obscenities. And there are those who would like to close the thing down altogether.

Why is this so important? Today, access to information is more and more a condition in being a full-scale, free citizen. Information has become a force which attracts power and money. Also, in the world where many places restrict freedom of speech, helping to keep it up is becoming much more important.

The way with words
Different parties use different words to denote censorship. In case of Internet filtering and related software, the most common are


 * filtering - we filter air, coffee and other things; in the sense of "take the garbage out, keep the clean thing"
 * blocking - already a harsher term denoting "refusing access"
 * censoring - the strongest word, meaning "using one's power to deny access"

Censorware - the software used to censor the Net - is defined by the [L] EFF and [L] Censorware.org as follows: "software which is designed to prevent another person from sending or receiving information (usually on the web)."

In general, censorware means software mechanisms designed to categorise network traffic and establishing different access to different categories. At the same time, the way with the words is a great tool to direct the general public either to accept or protest the censorship.

Technology
Below is a list of some most popular blocking software:


 * Bess (N2H2)
 * CyberSitter (Solid Oak Software)
 * CyberPatrol (SurfControl - the market leader in the US)
 * Net Nanny (Net Nanny Software)
 * NetRated (PC DataPower)
 * Smartfilter (Secure Computing Corporation)
 * Surfwatch (SurfControl)
 * I-Gear (Symantec)
 * Websense (Websense)
 * X-Stop (f8e6 Technologies)

So there is plenty of them available. A couple of interesting points:


 * All are typical proprietary products - apparently the free and open-source software is fundamentally different in mindset
 * All are produced in the USA - the whole industry is more or less unique to that country

Still, the market must be quite large to accommodate such a large number of players (compare it to the sectors of office software, graphics, or others which only have a couple of major vendors).

The methods used by these packages include


 * based on address - this simple and rough method is often used against spammers: everything coming from a certain address is flushed automatically. Most "child locks" (e.g. Net Nanny) belong to this category. A major problem here (besides the ever-present threat of a vendor's hidden agenda) is that the Net changes much quicker than the vendors are able to react. While this kind of "carpet bombing" is perhaps acceptable in one's private PC, this will create a major case of censorship when used in public machines.
 * based on keywords - while a little bit more intelligent method, it does have a long history of anecdotal cases with disastrous results (e.g. the Beaver College case, blocking Essex and Sussex for containing "sex" or blocking the word "breast"). All languages contain a lot of context-based slang where the "dirtiness" is determined by context (e.g. English words for young Richard, a donkey or a little cat; even the name of a major apostle in the Bible...).
 * based on context analysis - while in principle the best method, all current censorware has a long way to go before reaching any acceptable level of content control. Also, these applications tend to be quite expensive.
 * based on whitelisting - the "peephole" approach where only a certain list of material can be approached, everything else is forbidden by default. While it may be acceptable in some very specific cases (and would work as expected there), this approach is at direct odds with the very essence of the Net.

Problems with network censorship
As already mentioned above, a major problem is the proprietary model which is used by the vendors - all the algorithms and the filtering database are considered a trade secret. This in turn gives a good chance to stomp on democracy - the company can ban whatever it wants, without informing the customers and often giving them no chance to influence its decisions.

Other problems:


 * Who will decide? Even hardcore pornography is not directly illegal in many countries (some even more sinister forms of pornography are, though)
 * How to reliably categorise the material?
 * How to deal with ever-changing content (e.g. Wikipedia)?
 * How to deal with extremist parents blocking necessary information from their children?
 * What about personal privacy?

Technical problems: they


 * do not block all things that they are intended to
 * block A LOT of things that should not be blocked (Beaver...)
 * can be bypassed
 * cannot keep pace with changes
 * slow down systems by taking extra resources

What gets typically blocked:


 * some (but not all) pornography, hate speech, scams etc
 * a lot of "controversial issues" - citizen rights activists, "greens", sexual minorities, pro-choice organisations, disability organisations. In general, the US Democrats get blocked more than Republicans - but a news story in Slashdot covers the US Second Amendment advocates getting blocked by the Symantec Internet Security
 * A great host of irrelevant material

Finally, an interesting note: many "adult" sites make their efforts to GET blocked. Whether this is a sincere move to keep children away from their material or rather an attempt to create positive publicity and free promotion (forbidden = interesting!) remains a subject of debate.

Additional reading

 * Philip Elmer-DeWitt interview @ HotWired (via EFF)
 * Censorware.net
 * Banned Books Online