The Not-So-Great Firewall of China

by Tokachu

When most people think of Internet censorship, they tend to think about China the most.

While many other countries have some sort of state-controlled Internet policy, most people would refer to China because of the sheer size of the population and government.

Ironically enough, the country with one of the largest Internet populations seemed to go for the lowest bidder when it came to Internet censorship devices, replacing quality control with frantic developers pressed for time.

No matter how strange that may be, it still does not justify a government that wants to keep full control over all media.  Which is why I'll tell you, and hopefully a Chinese friend, how the "Great" firewall works and how to keep it from ruining your Internet.

How It Works

Unlike most other countries that simply block all TCP traffic or utilize a filtering HTTP proxy, China relies almost solely on special routers designed to censor based on raw TCP data instead of HTTP requests.

The government of China relies on two main methods of censorship: flooding fake DNS requests and forging TCP connection resets.

DNS Poisoning

Very few domain names are actually "blocked" using this method.  For a DNS poison to take place, there must be a request for a very, very, very naughty web site (like minghui.org) placed.  This keeps anyone from figuring out how to connect to, let alone download content from, a forbidden host.

Here's what an uncensored DNS request would look like in China:

0.000000 192.168.1.2 -> 220.194.59.17  # DNS Standard query A baidu.com 
0.289817 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 
202.108.22.33 A 220.181.18.114

And here's how it would look if a domain were censored:

0.000000 192.168.1.2 -> 220.194.59.17  # DNS Standard query A minghui.org 
0.288963 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 203.105.1.21 
0.289482 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 203.105.1.21 
0.289838 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 203.105.1.21 
0.290374 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 203.105.1.21 
0.290732 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 203.105.1.21 
0.290757 192.168.1.2 -> 220.194.59.17  # ICMP Destination unreachable (Port unreachable) 
0.291311 220.194.59.17 -> 192.168.1.2  # DNS Standard query response A 169.132.13.103 
0.291337 192.168.1.2 -> 220.194.59.17  # ICMP Destination unreachable (Port unreachable)

The real reply never gets through because the router forges nearly a half dozen fake DNS replies, along with a few random ICMP messages, to whoever requests a "forbidden" web site.

This filter only works on UDP port 53 (DNS), which would theoretically make uncensored DNS requests possible if a sufficient number of DNS servers running on ports other than 53 existed.

You can tell if your packets are going through a Chinese router by one simple test.

Try performing a DNS query to a remote machine in China.  If it doesn't go through, try performing a DNS query for minghui.org on the same machine.

If you get seemingly random responses, you're routing through China.  If you want to determine which router is responsible for the censorship, run a traceroute and perform DNS requests on each hop, starting at the closest.  When you get the fake DNS replies, you've found the offending router.

Forging TCP Resets

If a TCP connection is made from or to a computer in China, the packet data is checked for any "forbidden" words.

If the data contains any of those words, the router forges a TCP RST (reset connection) packet.  This also triggers a temporary block on TCP connections between those two specific computers.  This makes it appear that the server has gone down temporarily.

The list of words not permitted to be used is encoded in GB2312 format, which ensures that businesses with web sites in China will not be able to send any illegal content to computers in China (since GB2312 is a character set required to be supported by all applications in China).

The filter works thusly:

  • If the word can be written in pure ASCII, look for the word in any mixture of lowercase and uppercase ASCII letters.
  • If the word must be written in any combination of CJK ideographs, look for the byte sequence in either raw or URL-encoded GB2312.  Hexadecimal strings are also case-insensitive.

Problems

Nearly all the problems of China's firewalls stem from one problem with the routers: they all perform stateless packet inspection.  It doesn't matter what protocol the packets are using, nor what computer a packet comes from.  All the router is concerned with is finding packets and forging responses, not dropping content.

Unfortunately, that flaw puts the router owners and admins at an extreme disadvantage.  Anybody can do a Google/Yandex search for packet-forging software or libraries (such as libpcap) and whip up a script to flood Chinese routers with fake packets, and the routers will respond, no matter what.

It wouldn't be difficult to set up a botnet with DNS request forgers that can send billions of fake DNS requests to various routers, and in return have the victim think China is attacking his or her server!  It's also possible to forge a TCP data packet with fake source and destination addresses, which means that if you happened to know the IP addresses of two important diplomats, you could easily cut off their ability to communicate.

Popular Chinese web sites are just as vulnerable too; email systems could be cut off for hours at a time.  The possibilities are endless.  The TCP RST timer may be fairly short, but keep in mind that it only takes one fake packet to close a connection.

Getting Around It

The TCP Stack

One way to tell fake RST packets from real RST packets is to look at the Time-To-Live (TTL) parameter.  Forged packets will always have higher TTLs than the real ones.  Getting around this, however, would require that both parties have a stateful TTL comparison filter at the kernel level.  That's no good.

You could, however, rewrite a TCP-based application to send "forbidden" words by using the TCP urgent flag (URG).  This only requires that both parties have a modified application no kernel tweaking necessary.  A great example of a program that sends data like that is a proof-of-concept C program called covertsession (search for it on Packet Storm Security).  It can bypass most stateful packet inspectors, so it easily gets around the stateless inspectors in China.

This is probably the best way to modify instant messaging (such as QQ) and IRC applications, assuming one couldn't just use encryption on both ends.

HTTP Traffic

There's nothing really special about how the firewall treats HTTP traffic.  Mind you that it only looks for certain strings, no matter where they are.  But notice how I said it only uses the GB2312 character set: there's nothing stopping us from simply using UTF-8 instead.  You can "switch" your web sites from GB2312 to UTF-8 by simply running them through iconv.  It's impossible for any UTF-8 sequence to match a GB2312 sequence, even by accident, so you're partially assured good exposure (for a period of time).

Most China-based web hosts, such as Baidu and Yahoo! China, rely on the firewalls to block some content for them.  Google China, however, is the one huge exception.  Google's Chinese servers are located in the United States and their censorship is done entirely in-house.  What does that mean?  For one, we don't need to worry about text being sent in GB2312 format (Google insists on using UTF-8).

We can also exploit a "feature" in Google's text engine that was overlooked during the Google China development.

Google doesn't compare strings in their text engine like most of us do.  Instead of simply comparing bytes, Google considers some words and characters equal to other words and characters that wouldn't match with a byte comparison algorithm.  The character equality is what we want to look at here: mainly, how Google considers "fullwidth" ASCII characters (wide, fixed-width characters mostly used in Japanese character sets) equal to their ASCII counterparts.  If you were to search for "computers" using fullwidth characters, you'd get the same results as you would with a simple ASCII search (although some ads might not show up).

Now here's where the hack comes in: Google's censors don't look for those fullwidth characters.  So, if we were to search Google China for "tiananmen square" using fullwidth characters, the results wouldn't be filtered (the connection may be reset from what Google sends).  Luckily, this trick works for Google Images, meaning that it isn't too hard to get Google's cache of images normally unfindable in China!

Here's some sample code to generate fullwidth characters in a shell in Perl (assuming you've got Unicode support in your terminal):

#!/usr/bin/perl
# fw.pl - make text W-I-D-E (convert ascii to fullwidth)
# use encoding "UTF-8";
use utf8;

$input = $ARGV[0] or die("Need one argument for text.");

binmode(STDOUT, ":encoding(UTF-8)");

foreach (split //, $input) { 
  print chr(0xFEE0 + ord($_)); 
}
print "\n";

Just type whatever search term you want, plug in the output to Google, and watch once-censored search results just show up!

Conclusion

Censorship isn't a profitable business.

If China were to release an honest budget (and if people and corporations found out a huge percentage of their GDP was going towards censorship and propaganda instead of food and health care), China's economy would collapse in a matter of hours.

Sadly, it isn't just Chinese citizens who believe the lies: corporations like Cisco and Google actually believe you can make money by keeping information from people.  The sooner the Chinese people and their government realize this, the better.

(There are far too many people to thank - you know who you are.)

Code: fw.pl

Return to $2600 Index