When to optimize or index your SQL queries

Not now, later

By Ibrahim Diallo

Published Apr 6 2018 ~ 3 minutes read

Premature optimization is the root of all evil. But not optimizing your queries is a major bottle neck in your web application. During the building process, queries are written and rewritten. It would be cumbersome to write a query, index the fields needed for it, change the query for new requirement, drop the old index and create a new one. Even a simple blog like this one has more then 50 queries that have to be maintained. My strategy is simple. I don't index them right away.

Before jumping into this, I think you should test your SQL index knowledge. No matter what I say here, if you are not familiar indexing tables, then my words are just pixels on the screen. So go ahead and test your skills:

How Much do You Know About MySQL Indexing

If you want to write very fast queries, you have to get the design of your database right from the very beginning. In the real world, this is impossible. New features are added, and removed. Requirements change, or you just discover that you don't need a query anymore.

When you start a new project, your database is more or less empty. It's very hard to test whether or not your indexes have any good effect. There are two common solutions.

Garbage data

Usually, you find developers filling their database with garbage data for testing. This is fine and should give you a good idea of what needs to be improved. Though, you must have a reliable system in place that allows you to test different indexing with different data. It works but it requires a lot of efforts.

Don't index your tables, yet

Don't spend too much time trying to index empty tables. This time could be better spent working on the application and optimizing other parts that are currently being used. If you have a slow query that will take a whole of 10 milliseconds when the number of rows reach 50 thousand, and it is growing by a few hundreds a day, I say leave it alone.

After each blog post, there is a list of related articles. To generate those there is a lot of calculation involved. When I had 12 blog posts in the database, running the query was instant. It took me years to reach 200 blog posts, and at that time, I was starting to notice that few seconds delay every time I saved a post. So I moved the generation of related articles to a cron job that ran every couple hours. Problem solved.

When I reached 400 posts, I checked my server logs and noticed that the CPU was at 100% for 15 minutes every couple hours. I went back to the query, updated it, indexed the right fields and now it runs just under 7 milliseconds.

It's just easier to index an established query as opposed to guessing problems that will occur. It was a very quick fix using EXPLAIN and even when it was slow it was not ruining the experience on the web site.

Conclusion

You can add random data to your database to allow you to test how your indexes will perform under a load. But sometimes it is not worth optimizing for a problem that will only occur years down the line. Write your queries and do the basic indexing of primary keys and what-not, and let the data grow. Down the line, look for performance bottlenecks and fix them then and only then.

Comments

There are no comments added yet.

Garbage data

Don't index your tables, yet

Conclusion

On a related note, here are some interesting articles.

Using MySQLi and PDO instead of MySQL_*

SQL injection and how to protect your website

Abstracting MySql results in PHP

Comments

Let's hear your thoughts