Javascript obfuscation using PHP
For a project of mine I needed a means to compress javascript code. There are numerous programs who do this very well but I thought it would be a fun project to make one myself.
Obfuscation is used most frequently to prevent users from viewing and simply copy pasting your code. It's however also a great way of reducing your file size. I was able to reduce a file to less than half it's orginal size by simply rewriting the original code.
If you don't want all the technical stuff just check out the demo and see for yourself.
The only decent way of doing javascript obfuscation is by using regular expressions. This is done in three steps:
- Analyses phase, where code is searched for variable and function names
- Replacement phase, where function and variable names are replaced with shorter random names
- Compact phase, where whitespace, comments and linebreaks are removed
Doing these steps results in most cases in fully functional code but errors still can occur. More on this later.
Analyzing
To search for variable, function and function argument names I use two different regular expression patterns. First the one for variables:
/(var)*\s+([\w]+)\s*=/
The most crucial bit is the = sign. Everything before it is either an object or a variable. Object properties won't get matched here because where only searching for strings with word characters (\w) everything with a dot in it is ignored. the var bit at the beginning is optional since variables don't have to be declared explicitly.
Finding function and argument names is a little more obvious since functions are always declared starting with the word function.
/function\s+(\w+)\(((\w+,)*\w+)?\)/
Argument names are the tricky bit in this. Since I found no easy way of selecting each individual name I select the complete argument string and parse it using PHP.
Replacing
All names that have been found get stored in an array. This is than looped through and for each entry a search pattern is generated which looks something like this:
/(?![\'"])\bXXX\b(?![\'"])/m
Here XXX is the name of the variable. The crucial part here is the word boundaries (\b) which should prevent partial strings from being matched. The 'look ahead' patterns in front and after the word boundaries are my incomplete attempts at preventing text being matched inside javascript strings.
Each match found is replaced by a randomly generated unique name, which looks something like Ae, KsyF, PMw, etc
Stripping comments and whitespace
The final stage is to remove all unwanted text like whitespace and comments. For the comments we need two patterns one for the single line the other for multiline comments. The single line pattern looks like this:
/\/\/.*$/m
and the multiline:
/\/\*.*?\*\//sm
The whitespace is a little more tricky since some of it has meaning in the javascript code whe can't just remove every whitespace character we find.
I made a pattern which should remove all whitespace where this isn't strictly necessary.
/\s*([,=!<>:;\-%\?\*\+\|\]\[\(\)\{\}]{1,2})\s*/m
Things that can be improved
This script isn't perfect and there's still room for improvement. Currently the following issues can be improved upon:
-
This script doesn't add necessary semicolons. If you use semicolons at
the end of lines you won't get into trouble very fast but it does happen
when you declare a new object (
var obj = {....}) I would normally leave the semicolon out after the closing bracket but when all code is on a single line this results in a javascript error.
-
leaving javascript text strings alone. It can happen that text is replaced
inside a javascript string (text between single or double quotes). Regular
expressions aren't aware of the javascript language and replace everything
that matches a certain pattern. I didn't find a suitable pattern which
leaves all strings alone.
Update - I've looked into this a little further and it seems it is pretty hard to get this working using a regular expression. I tried lookaheads, lookbehinds and conditionals but none seem to really satisfy any of the needed requirements.
-
Functions and variable names aren't found inside object declarations
(
var obj = { func:.....}) this shouldn't be that difficult but I haven't had time to implement this.
Update - They're currently in but need some serious testing.
- Probably other stuff I've overlooked but should surface when other people test this app.
For now I will keep the code for this app private since it needs a little more thinkering with. When I feel the major problems have been resolved I will post the source here. In the mean time try the demo.