How to create a Javascript Syntax highlighter

Most if not all IDEs come with syntax highlighting. It makes it easier to read your code and gives you less headache when debugging. Sometimes I share code snippets right here on my blog and until recently I wasn't really paying attention on how it looks. Since most of the time it is my code, I have no trouble understand it it, even though it looks like a screen shot from Notepad.

Even though it is formatted with the <pre> and <code> html tag, it is still hard to read code on a 2 color layout. There are plenty of syntax highlighting JavaScript libraries available to download, but since I also like to learn how things work I took the challenge of writing my own.

With syntax highlighted and without

The right side is with syntax highlighting. It is much easier to read

I mostly share PHP, MySQL, JavaScript, and HTML so my syntax highlighter generally cover those. But each of those have things in common which make it easy to write very little code to take care of all of them at once. Lets see how it works.

Assuming the code is in the code tag, you can use jQuery or any other method to select all those elements.

// With jQuery
var codeElements = $(".article pre code");

In order to match the keywords we want to highlight, we can use Regular Expression. We will be matching the following:

And here is the regex for each

var strReg1 = /"(.*?)"/g,
    strReg2 = /'(.*?)'/g,
    specialReg = /\b(new|var|if|do|function|while|switch|for|foreach|in|continue|break)(?=[^\w])/g,
    specialJsGlobReg = /\b(document|window|Array|String|Object|Number|\$)(?=[^\w])/g,
    specialJsReg = /\b(getElementsBy(TagName|ClassName|Name)|getElementById|typeof|instanceof)(?=[^\w])/g,
    specialMethReg = /\b(indexOf|match|replace|toString|length)(?=[^\w])/g,
    specialPhpReg  = /\b(define|echo|print_r|var_dump)(?=[^\w])/g,
    specialCommentReg  = /(\/\*.*\*\/)/g,
    inlineCommentReg = /(\/\/.*)/g;

var htmlTagReg = /(&lt;[^\&]*&gt;)/g;

var sqlReg = /\b(CREATE|ALL|DATABASE|TABLE|GRANT|PRIVILEGES|IDENTIFIED|FLUSH|SELECT|UPDATE|DELETE|INSERT|FROM|WHERE|ORDER|BY|GROUP|LIMIT|INNER|OUTER|AS|ON|COUNT|CASE|TO|IF|WHEN|BETWEEN|AND|OR)(?=[^\w])/g;

The regular expression is self explanatory. This code matches the set of keywords in a word boundary. For example, it will match "if" only if it is not part of a word, "simplified" will not be matched. Then we assign a class name for each of the keywords matched to have a uniform color for related keywords:

codeElements.each(function (){
    var string = this.innerHTML,
    parsed = string.replace(strReg1,'<span class="string">"$1"</span>');
    parsed = parsed.replace(strReg2,"<span class=\"string\">'$1'</span>");
    parsed = parsed.replace(specialReg,'<span class="special">$1</span>');
    parsed = parsed.replace(specialJsGlobReg,'<span class="special-js-glob">$1</span>');
    parsed = parsed.replace(specialJsReg,'<span class="special-js">$1</span>');
    parsed = parsed.replace(specialMethReg,'<span class="special-js-meth">$1</span>');
    parsed = parsed.replace(htmlTagReg,'<span class="special-html">$1</span>');
    parsed = parsed.replace(sqlReg,'<span class="special-sql">$1</span>');
    parsed = parsed.replace(specialPhpReg,'<span class="special-php">$1</span>');
    parsed = parsed.replace(specialCommentReg,'<span class="special-comment">$1</span>');
    parsed = parsed.replace(inlineCommentReg,'<span class="special-comment">$1</span>');

    this.innerHTML = parsed;
});

For my blog I added these CSS rules to highlight the special words:

/**** Parsed Code  ****/
pre code .string {
    color:#A1E46D;
}
pre code .special {
    color:#D6665D;
}
pre code .special-js {
    color:#6DE4D1;
}
pre code .special-js-glob {
    color:#A1E46D;
    font-weight:bold;
}
pre code .special-comment{
    color:#aaa;
}
pre code .special-js-meth {
    color:#E46D8A;
}
pre code .special-html {
    color:#E4D95F;
}
pre code .special-sql {
    color:#1D968C;
}
pre code .special-php{
    color:#597EA7;
}

And that's it. Code on your the page should now be highlighted. Feel free to modify the CSS to choose the colors that suit you best. Note that the JavaScript can always be improved. For example, right now when you highlight a string, if any other keywords are found in between, they will be highlighted too. The same goes for comments. My first approach was to have a call back function in the replace method to remove any HTML present in the code. If you find a better method feel free to share it in the comment section.


Comments(12)

caluba :

Muy inspirador, un gusto encontrarme post minimalistas y concretos como este, tienes un nuevo seguidor.

Dinesh Kumar :

You gave me an idea, how to implement an easy Syntax Highlighter.

Thank you much friend.

Regards, Dinesh

Ronald Bowser :

First I want to thank you for such a clear explanation on the process you used to write a syntax highlighter with JavaScript. Secondly I found the page "Coding Horror" you reference in your about page awesome! Very good insights and reading. Thank You, Ron

Ibrahim Diallo :

I'm glad you liked the article Ronald. I'm a big fan of codding horror and I find myself re-reading it every couple month there are just so much insight.

Wish you good luck with your website.

ahmed zuhar :

Thank you

Perry :

For example, right now when you highlight a string, if any other keywords are found in between, they will be highlighted too. The same goes for comments.

What I'm doing is just get the string/comment matches first, iterate over them and replace "function" with "func<i></i>tion", replace "document" with "docu<i></i>ment", etc. The <i> elements are empty so they don't do anything visually with the string/comment, but it prevents the special regex from touching them now. You can copy code from the page and paste it into notepad, and it won't carry over the <i> elements. Perfect!

Thanks for the article.

Perry :

I had some HTML in my last comment and it messed it up.

Basically you just want to replace "function" with "funcXYZtion" where XYZ is just an empty HTML span or i element. Do this before the other replaces and it should work.

Ibrahim :

Hi Perry, Thank you for the feedback, this is definitely an improvement. I'll check to see how I can integrate this into the script.

Also I fixed the html issue on your comment :)

Angel :

Hello! Thanks for this tutorial.

I followed it and I realized this was the same thing i was trying myself before starting to search on the internet about this topic.

I have an issue with specialReg and specialJSReg. If there's a specialReg or a specialJSReg inside a String or comment, it gets painted as specialReg/specialJSReg and not as a string or comment.

So I made a function to remove those keywords from strings/comments by replacing them with [i][/i] as I read in another comment, but it messes up the <spans>.

Any solution? Thanks again!

Ibrahima Diallo :

Hi @Angel I appreciate your input.

I looked at your question and it is something that I haven't found a simple solution for yet.

The ultimate solution would be to build a language parser that will property highlight keywords or strings. However, that is too complex for this simple tutorial.

My other quick fix, which is not too efficient, is to do something similar than what you did. But the difference would be to access the actual DOM element.

I get the elements with class name string, and set innerHTML = innerText. This will remove the special* html tags and you'll have pure strings and pure comments.

I hope this helps.

Good luck.

CodeTix :

Can you post the whole code in one single HTML file? I'm kinda confused..

letochagone :

génial ! Je cherchais quelque chose de plus léger que highlight.js, et ça m'a permit d'avancer dans ma compréhension de Javascript

Let's hear your thoughts

For my eyes only