Obfuscation and Encoding in PHP

by Bryan Elliott

There a a few PHP obfuscators out there: programs that will take all the unnecessary comments, spaces, tabs, returns out of your code, then go further and find all the functions, variables, and classes that, within the same scope mean the same thing, and change their names to something meaningless.

Indeed, these would make PHP scripts very hard to read.  The result is often something like what happens when you compile then "decompile" a program in C.  The English clues are gone and you're left trying to figure out what i012 happens to represent.

Still, the code continues to be readable in a form.  Run it through SciTE's auto-formatter and you can at least trace what a program's going to do.  It takes longer, but it's still doable.

There are additional options here: one is that you can replace a given PHP include file with an encoded string that is processed and eval'd as the singular action of the include file.

An example (remember that the gzip extension must be available):

eval("?".">".gzuncompress(base64_decode(*[data]*)));

The *[data]* portion should be the contents of the PHPPHP file which has been gzcompressed, then Base64 encoded.

An easy way to do this in PHP is:

function phpCompress ($filenarne) {
     $data=base64_encode(gzcompress(join("",file($filename))));
     $data="<".""?php "."eval(\"?\".\">\".gzuncompress(base64_decode($data))); ? ".">";
     $dest=fopen("obscured-".$filename,"w");
     fwrite($dest,$data);
     fclose($dest);

Even on unmodified PHP code, an unscrupulous fellow would have to decode the data himself (or, you know, replace eval with echo, but who's counting?).  Still, the idea is to make more obscure.

Anyway, this next trick was something that came about when attempting to make pronounceable passwords in PHP.  I figured, "Why just generate random syllabants when you can have the password relate to something?"  This led me to design an encoding I call Phonic64.

Essentially, it's Base64.  It uses the base64_encode() built-in algorithm to get my six-bit stream.  I then translate the Base64 encoding into numbers 0-63.  From that, I pick one of 16 consonants and one of four vowels to represent the Base64 number.

Just to switch things up and to ensure the sound doesn't get repetitive, I have values called oc and ov which are an unused consonant and vowel, each of which gets swapped with the last-used value in the substitution tables.

For example, if something would normally decode to gigi and our oc=f and ov=u, it changes to gifu.

As the data is encoded, spaces, punctuation, and even paragraph breaks are added to the stream.

Keep in mind that data encoded in Phonic64 is far larger than it needs to be.  Consider that for every three 8-bit bytes, you're generating four 6-bit numbers and thus four phonic couples and eight characters.  This isn't even including the punctuation and such.

As an example, phonic64_encode("test") returns Neki tuyonia or something similar.  Remember, the spaces and punctuation are random, designed to be sacrificial chaff.  Someday, they'll be a checksum of a sort.

Anyway, enough beating around the bush.

Here's the code for Phonic64:

   <?php
   /////////////////////////////////////////////////////
   /////////////////////////////////////////////////////
   ////// Phonic64 Phonetic Password Generator /////////
   // by Bryan Elliott, published in 2600 Magazine /////
   /////////////////////////////////////////////////////
   /////////////////////////////////////////////////////
   function phonic64_encode($s)
   {
       mt_srand(microtime(true) * 1000000);
       $med = base64_encode($s);
       $consonants = [
           "",
           "k",
           "g",
           "s",
           "z",
           "t",
           "d",
           "n",
           "h",
           "b",
           "p",
           "m",
           "y",
           "r",
           "w",
           "v",
           "j",
       ];
       $vowels = ["a", "e", "i", "o"];
       $b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
       $eos = ["! ", "? ", "!? ", "!!! "];
       $sspunct = [", ", "; ", "- "];
       $oc = "f";
       $ov = "u";
       $word = "";
       $wct = 0;
       $wln = mt_rand(1, 4);
       $sentence = "";
       $sct = 0;
       $sln = mt_rand(3, 10);
       $paragraph = "";
       $pct = 0;
       $pln = mt_rand(1, 10);
       $out = " ";
       for ($i = 0; $i < strlen($med); $i++) {
           $ch = substr($med, $i, 1);
           $v = strpos($b64, $ch);
           if (false === $v) {
               continue;
           }
           $cons = floor($v / 4);
           $vowel = $v & 3;
           if ($sct == 0 && $wct == 0) {
               $word .= strtoupper($consonants[$cons]);
           } else {
               $word .= $consonants[$cons];
           }
           if ($sct == 0 && $wct == 0 && $consonants[$cons] == "") {
               $word .= strtoupper($vowels[$vowel]);
           } else {
               $word .= $vowels[$vowel];
           }
           $wct++;
           if ($wct == $wln) {
               $sentence .= "$word";
               $word = "";
               $wct = 0;
               $wln = mt_rand(1, 4);
               $sct++;
               if ($sct != $sln) {
                   if (mt_rand(0, 9) == 5) {
                       $g = mt_rand(0, sizeof($sspunct) - 1);
                       $sentence .= $sspunct[$g];
                   } else {
                       $sentence .= " ";
                   }
               } else {
                   $paragraph .= $sentence;
                   $sentence = "";
                   $sct = 0;
                   $sln = mt_rand(3, 10);
                   if (mt_rand(0, 6) == 5) {
                       $g = mt_rand(0, sizeof($eos) - 1);
                       $paragraph .= $eos[$g];
                   } else {
                       $paragraph .= ". ";
                   }
                   $pct++;
                   if ($pct == $pln) {
                       $out .= $paragraph;
                       $paragraph = "";
                       $pct = 0;
                       $pln = mt_rand(1, 10);
                       $out .= "\r\n ";
                   }
               }
           }
           $t = $oc;
           $oc = $consonants[$cons];
           $consonants[$cons] = $t;
           $t = $ov;
           $ov = $vowels[$vowel];
           $vowles[$vowel] = $t;
       }
       if ($wct != 0) {
           $sentence .= $word;
           $sct++;
       }
       if ($sct != 0) {
           $paragraph .= trim($sentence);
           if (mt_rand(0, 6) == 5) {
               $g = mt_rand(0, sizeof($eos) - 1);
               $paragraph .= $eos[$g];
           } else {
               $paragraph .= ".";
           }
           $pct++;
       }
       if ($pct != 0) {
           $out .= $paragraph;
       }
       return $out;
   }
   function phonic64_decode($s)
   {
       $mid = strtolower(preg_replace("/[\s\.\!\?\;\-\,\r\n]/", "", $s));
       $consonants = [
           "",
           "k",
           "g",
           "s",
           "z",
           "t",
           "d",
           "n",
           "h",
           "b",
           "p",
           "m",
           "y",
           "r",
           "w",
           "v",
           "j",
       ];
       $vowels = ["a", "e", "i", "o"];
       $b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
       $oc = "f";
       $ov = "u";
       $state = false;
       $base = "";
       for ($i = 0; $i < strlen($mid); $i++) {
           $char = substr($mid, $i, 1);
           switch ($state) {
               case false:
                   $state = true;
                   $cons = array_search($char, $consonants);
                   if (!$cons) {
                       $cons = 0;
                       $i--;
                   }
                   break;
               case true:
                   $state = false;
                   $g = array_search($char, $vowels);
                   $t = $ov;
                   $ov = $vowels[$g];
                   $vowels[$g] = $t;
                   $t = $oc;
                   $oc = $consonants[$cons];
                   $consonants[$cons] = $t;
                   $v = $cons * 4 + $g;
                   $base .= substr($b64, $v, 1);
                   break;
           }
       }
       while (strlen($base) % 4 != 0) {
           $base .= "=";
       }
       return $base64_decode($base);
   }
   function phonic_password($len)
   {
       mt_srand(microtime(true) * 1000000);
       $seed = "";
       $numTimes = mt_rand(0, 16); // I added this line
       // for ($i=0; $i><32; $i++) { // this is the original line
       for ($i = 0; $i < $numTimes; $i++) {
           $seed .= chr(mt_rand(0, 255));
       }
       $uncpass = phonic64_encode($seed);
       $midpass = preg_replace("/[\s\.\!\?\;\-\,\r\n]/", "", $uncpass);
       $finpass = strtolower(substr($midpass, 0, $len - mt_rand(1, 3)));
       while (strlen($finpass) < $len) {
           $finpass .= mt_rand(0, 9);
       }
       return $finpass;
   }

?>

That's all.  I hope you have fun with it.

An exercise for the astute reader: Get the Base95 input/output version of the RSA-128 algorithm.  There's a pretty good one written in JavaScript if you feel like translating.  Use that instead of Base64 and modify the arrays and numbers in question to use 19 (+oc=z) consonants and five vowels.

Then?  Use this "nearly-sensible gibberish" to pass messages to your friends.

I've used it to obscure my PHP code behind a wall of "Dabi ye ridotiepo.  Da towi ye."-like things.  I dunno.  Practical use didn't really rear its ugly head when I thought this up.  I just thought, "Hey that's a cool idea."

Meanwhile, I can't see how you can get yourself in trouble with this, but you know the drill.  Keep your collective noses clean.  Otherwise you'll make the rest of us respectable-type hackers look bad!

Code: Phonic64.php

Return to $2600 Index