Search is an important feature on a website. When my few readers want to look for a particular passage on my blog, they use the search box. It used to be powered by Google Search, but I have since then changed it to my own home-brewed version not because I can do better but because it was an interesting challenge.
If you are in a hurry and just want your site to be searchable, well do what I did before, use Google.
<form action="/search.php" method="get">
<input type="text" name="query" placeholder="Search term"/>
<input type="submit" value="Search"/>
</form>
// In search.php file
$term = isset($_GET['query'])?$_GET['query']: '';
$term = urlencode($term);
$website = urlencode("www.yourwebsite.com");
$redirect = "https://www.google.com/search?q=site%3A{$website}+{$term}";
header("Location: $redirect");
exit;
What it does is pretty simple. Get the term passed by the user, and forward it to Google search page. Limit the search result to our current domain using the site:
keyword in the search query. All your pages that are indexed by Google will be available through search now. If you do want to handle your search in house however, then keep reading.
Homemade Search Solution
Before we go any further, try using the search box on this blog. It uses the same process that I will describe below. If you feel that this is what you want then please continue reading.
This solution is catered to small websites. I make use of LIKE
with wild cards on both ends, meaning your search cannot be indexed. This means the solution will work fine for your blog or personal website that doesn't contain tons of data. Port it to a bigger website and it might become very slow. MySQL offers Full Text Search which is not what we are doing here.
Note: If you have 5000 blog posts you are still fine. Computers are super fast.
We will take the structure of this blog as a reference. Each blog post has:
- A title
p_title
- A url
p_url
- A summary
p_summary
- A post content
p_content
- And catergories
category.tagname
For every field that matches with our search term, we will give it a score. The score will be based on the importance of the match :
// the exact term matches is found in the title
$scoreFullTitle = 6;
// match the title in part
$scoreTitleKeyword = 5;
// the exact term matches is found in the summary
$scoreFullSummary = 5;
// match the summary in part
$scoreSummaryKeyword = 4;
// the exact term matches is found in the content
$scoreFullDocument = 4;
// match the document in part
$scoreDocumentKeyword = 3;
// matches a category
$scoreCategoryKeyword = 2;
// matches the url
$scoreUrlKeyword = 1;
Before we get started, there are a few words that do not contribute much to a search that should be removed. Example "in","it","a","the","of" ...
. We will filter those out and feel free to add any word you think is irrelevant. Another thing is, we want to limit the length of our query. We don't want a user to write a novel in the search field and crash our MySQL server.
// Remove unnecessary words from the search term and return them as an array
function filterSearchKeys($query){
$query = trim(preg_replace("/(\s+)+/", " ", $query));
$words = array();
// expand this list with your words.
$list = array("in","it","a","the","of","or","I","you","he","me","us","they","she","to","but","that","this","those","then");
$c = 0;
foreach(explode(" ", $query) as $key){
if (in_array($key, $list)){
continue;
}
$words[] = $key;
if ($c >= 15){
break;
}
$c++;
}
return $words;
}
// limit words number of characters
function limitChars($query, $limit = 200){
return substr($query, 0,$limit);
}
Our helper functions can now limit character count and filter useless words. The way we will implement our algorithm is by giving a score every time we find a match. We will match words using the if statement and accumulate points as we match more words. At the end we can use that score to sort our results
Note: I will not be showing how to connect to MySQL database. If you are having problems to efficiently connect to the database I recommend reading this short post about database abstraction.
Let's give our function a structure first. Note I left placeholders so we can implement sections separately.
function search($query){
$query = trim($query);
if (mb_strlen($query)===0){
// no need for empty search right?
return false;
}
$query = limitChars($query);
// Weighing scores
$scoreFullTitle = 6;
$scoreTitleKeyword = 5;
$scoreFullSummary = 5;
$scoreSummaryKeyword = 4;
$scoreFullDocument = 4;
$scoreDocumentKeyword = 3;
$scoreCategoryKeyword = 2;
$scoreUrlKeyword = 1;
$keywords = filterSearchKeys($query);
$escQuery = DB::escape($query); // see note above to get db object
$titleSQL = array();
$sumSQL = array();
$docSQL = array();
$categorySQL = array();
$urlSQL = array();
/** Matching full occurrences PLACE HOLDER **/
/** Matching Keywords PLACE HOLDER **/
$sql = "SELECT p.p_id,p.p_title,p.p_date_published,p.p_url,
p.p_summary,p.p_content,p.thumbnail,
(
(-- Title score
".implode(" + ", $titleSQL)."
)+
(-- Summary
".implode(" + ", $sumSQL)."
)+
(-- document
".implode(" + ", $docSQL)."
)+
(-- tag/category
".implode(" + ", $categorySQL)."
)+
(-- url
".implode(" + ", $urlSQL)."
)
) as relevance
FROM post p
WHERE p.status = 'published'
HAVING relevance > 0
ORDER BY relevance DESC,p.page_views DESC
LIMIT 25";
$results = DB::query($sql);
if (!$results){
return false;
}
return $results;
}
In the query, all scores will be summed up as the relevance variable and we can use it to sort the results.
Matching full occurrences
We make sure we have some keywords first then add our query.
if (count($keywords) > 1){
$titleSQL[] = "if (p_title LIKE '%".$escQuery."%',{$scoreFullTitle},0)";
$sumSQL[] = "if (p_summary LIKE '%".$escQuery."%',{$scoreFullSummary},0)";
$docSQL[] = "if (p_content LIKE '%".$escQuery."%',{$scoreFullDocument},0)";
}
Those are the matches with higher score. If the search term matches an article that contains these, they will have higher chances of appearing on top.
Matching keywords occurrences
We loop through all keywords and check if they match any of the fields. For the category match, I used a sub-query since a post can have multiple categories.
foreach($keywords as $key){
$titleSQL[] = "if (p_title LIKE '%".DB::escape($key)."%',{$scoreTitleKeyword},0)";
$sumSQL[] = "if (p_summary LIKE '%".DB::escape($key)."%',{$scoreSummaryKeyword},0)";
$docSQL[] = "if (p_content LIKE '%".DB::escape($key)."%',{$scoreDocumentKeyword},0)";
$urlSQL[] = "if (p_url LIKE '%".DB::escape($key)."%',{$scoreUrlKeyword},0)";
$categorySQL[] = "if ((
SELECT count(category.tag_id)
FROM category
JOIN post_category ON post_category.tag_id = category.tag_id
WHERE post_category.post_id = p.post_id
AND category.name = '".DB::escape($key)."'
) > 0,{$scoreCategoryKeyword},0)";
}
Also as pointed by a commenter below, we have to make sure that the these variables are not empty arrays or the query will fail.
// Just incase it's empty, add 0
if (empty($titleSQL)){
$titleSQL[] = 0;
}
if (empty($sumSQL)){
$sumSQL[] = 0;
}
if (empty($docSQL)){
$docSQL[] = 0;
}
if (empty($urlSQL)){
$urlSQL[] = 0;
}
if (empty($tagSQL)){
$tagSQL[] = 0;
}
At the end the queries are all concatenated and added together to determine the relevance of the post to the search term.
The full code:
// Remove unnecessary words from the search term and return them as an array
function filterSearchKeys($query){
$query = trim(preg_replace("/(\s+)+/", " ", $query));
$words = array();
// expand this list with your words.
$list = array("in","it","a","the","of","or","I","you","he","me","us","they","she","to","but","that","this","those","then");
$c = 0;
foreach(explode(" ", $query) as $key){
if (in_array($key, $list)){
continue;
}
$words[] = $key;
if ($c >= 15){
break;
}
$c++;
}
return $words;
}
// limit words number of characters
function limitChars($query, $limit = 200){
return substr($query, 0,$limit);
}
function search($query){
$query = trim($query);
if (mb_strlen($query)===0){
// no need for empty search right?
return false;
}
$query = limitChars($query);
// Weighing scores
$scoreFullTitle = 6;
$scoreTitleKeyword = 5;
$scoreFullSummary = 5;
$scoreSummaryKeyword = 4;
$scoreFullDocument = 4;
$scoreDocumentKeyword = 3;
$scoreCategoryKeyword = 2;
$scoreUrlKeyword = 1;
$keywords = filterSearchKeys($query);
$escQuery = DB::escape($query); // see note above to get db object
$titleSQL = array();
$sumSQL = array();
$docSQL = array();
$categorySQL = array();
$urlSQL = array();
/** Matching full occurences **/
if (count($keywords) > 1){
$titleSQL[] = "if (p_title LIKE '%".$escQuery."%',{$scoreFullTitle},0)";
$sumSQL[] = "if (p_summary LIKE '%".$escQuery."%',{$scoreFullSummary},0)";
$docSQL[] = "if (p_content LIKE '%".$escQuery."%',{$scoreFullDocument},0)";
}
/** Matching Keywords **/
foreach($keywords as $key){
$titleSQL[] = "if (p_title LIKE '%".DB::escape($key)."%',{$scoreTitleKeyword},0)";
$sumSQL[] = "if (p_summary LIKE '%".DB::escape($key)."%',{$scoreSummaryKeyword},0)";
$docSQL[] = "if (p_content LIKE '%".DB::escape($key)."%',{$scoreDocumentKeyword},0)";
$urlSQL[] = "if (p_url LIKE '%".DB::escape($key)."%',{$scoreUrlKeyword},0)";
$categorySQL[] = "if ((
SELECT count(category.tag_id)
FROM category
JOIN post_category ON post_category.tag_id = category.tag_id
WHERE post_category.post_id = p.post_id
AND category.name = '".DB::escape($key)."'
) > 0,{$scoreCategoryKeyword},0)";
}
// Just incase it's empty, add 0
if (empty($titleSQL)){
$titleSQL[] = 0;
}
if (empty($sumSQL)){
$sumSQL[] = 0;
}
if (empty($docSQL)){
$docSQL[] = 0;
}
if (empty($urlSQL)){
$urlSQL[] = 0;
}
if (empty($tagSQL)){
$tagSQL[] = 0;
}
$sql = "SELECT p.p_id,p.p_title,p.p_date_published,p.p_url,
p.p_summary,p.p_content,p.thumbnail,
(
(-- Title score
".implode(" + ", $titleSQL)."
)+
(-- Summary
".implode(" + ", $sumSQL)."
)+
(-- document
".implode(" + ", $docSQL)."
)+
(-- tag/category
".implode(" + ", $categorySQL)."
)+
(-- url
".implode(" + ", $urlSQL)."
)
) as relevance
FROM post p
WHERE p.status = 'published'
HAVING relevance > 0
ORDER BY relevance DESC,p.page_views DESC
LIMIT 25";
$results = DB::query($sql);
if (!$results){
return false;
}
return $results;
}
Now your search.php
file can look like this:
$term = isset($_GET['query'])?$_GET['query']: '';
$search_results = search($term);
if (!$search_results) {
echo 'No results';
exit;
}
// Print page with results here.
We created a simple search algorithm that can handle a fair amount of content. I arbitrarily chose the score for each match, feel free to tweak it to something that works best for you. And there is always room for improvement.
It is a good idea to track the search term coming from your users, this way you can see if most users search for the same thing. If there is a pattern, then you can save them a trip and just cache the results using Memcached.
If you want to see this search algorithm in action, go ahead and try looking for an article on the search box on top of the page. I have added extra features like returning the part where the match was found in the text. Feel free to add features to yours.
Comments(47)
Zaryel :
Very nice tutorial ! It is a simple but still powerful algorithm. And your explanations are great.
Thank you.
Zaryel
Ibrahim author :
Thanks Zaryel, I'm glad I could help :)
Ian Mustafa :
This example could be implemented on SQL with nested queries, as other example like MATCH() AGAINST() wouldn't. Great work Ibrahim!
Rob :
Thank you for this interesting approach.
The code fails to run if the method escape() is not defined, and in your database abstraction example it is not.
It isn't obvious to me what that should be.
Thanks.
Ibrahim author :
Thanks for making this observation @Rob. I have updated the database abstraction post to include the escape function. Basically, all it is is a wrapper around the mysql escaping function.
adeem :
Great article, thank you!
Ivan Venediktov :
I have noticed that if you are searching for a single word you are getting and error where implode is used.
Ibrahim author :
Hi Ivan
Thank you for testing the code and finding a bug. It looks like on my own version I had fixed this issue and forgot to update the blog. Here is a few lines of code I added to make sure the variable are not empty arrays:
I have updated to code on the blog post.
Bilal :
This is so helpful. Is there a way to search based on a pattern? I really need it along with this for my Project.
Ibrahim author :
@Bilal you can always add an extra filter to the code to parse the keywords. But what kind of pattern are you referring to?
Bilal :
@ibrahim Any pattern at all, which could filter the results even more. Pattern could be based on anything. Need it really fast :| Would really appreciate it!
Ibrahim author :
@Bilal what is a pattern? If you want to write code that is going to be useful to you, you first have to identify what it is going to do for you.
A pattern by it self doesn't mean anything. But if you have a specific pattern you want to match, that is a start. I can't figure out what you need or write code for you, but I can definitely guide you in the right direction.
Bilal :
@ibrahim My project: It gather news from different sources around the world and plots it on to an interactive map. I used php as my main language. Now i need users to be able to search news which the crawler gathers and saves in an array. Your searching algorithm works on data stored in your mysql database right? I need to use all these features of your searching algorithm to work on the array which has all the news stored in it.
As far as what the pattern should be, i was hoping something like: user's search history should have an impact on the results being shown by a specific user. e.g: if all of his previous searches have been on football, the newer searches will find a link between the previous searches with the current search.
I hope i was able to convey my point. Only have a month to finish my project and i'm totally stuck. Can't seem to get done with the 2 problems mentioned above.
William :
What an excellent tutorial. Thank you very much for posting this.
Ibrahim author :
Thanks @William
lolzec :
hahahah you can't hear it..you can only read it...lol
Ibrahim author :
@lolzec I heard you laugh though!
evans :
greatly helpful ibrahim
Afamefunah Okoro :
Thank you for this, but i want to know if this is only application to a single blog?
Ibrahim author :
@Afamefunah you are welcome.
Actually, it can be used in any situation. Here is an example that someone has implemented as a Symfony component for any project:
https://github.com/folkcert/folkcert-api
If your data is not in the hundred of thousand records it can work just fine. Also remember you that you can modify it to be more efficient. For example, you can add more conditions on the Where clause to limit the results to something relevant.
Good luck
Tommy Ross :
hello so what exactly is the name of the file for the full code??? so i fute index.php just contains the html for the entry search box and then the form action action pointing to search.[h...but is the full code also in search.php...sorry but its not clear at all
Tommy Ross :
soory i made typos
i actually said index.php is where the form action points to search.php and there is the html form for the search box
but does the full codde you displayed go inside the search.php and if so where exactly
Ibrahim author :
Hi Tommy
You can separate it into two files.
search.php
will have the form (search box), andfunctions.php
will have the logic of the search algorithm:Then on top of the search.php file you can add an include to include the functions.php file. I hope that makes sense.
Tommy Ross :
Thank you Mr Ibrahim
Belay G. :
you are perfect
it simple and familiar rather than using database search.... tnxs
Taslim Buldani :
Assalamu'alaikum Ibrahim. I'm newbie here, so I will be very happy if you continue the script until finish. Not only until // Print page with results here :)
I got this error message: Fatal error: Class 'DB' not found in D:\xampp\htdocs\search\function.php on line 47.
Ibrahim author :
Hi Taslim,
The DB class is used to connect to the database, which I did not include on this tutorial but only linked to. You will have to define it, read more here. Note that I have called it
DataManager
there instead ofDB
Good luck
Aiman :
Assalamu'alaikum Ibrahim.
Thank you for such a great lesson.
Nasibu Njoka :
Hi, don't you think adding the keyword building by removing some characters is one of the action that can cause memory exhaustion?
Anyway how can you advice me if my db has over 2 hundred thousand of data? how can I perform search without running into memory issues?
Because limiting the number of rows per query isn't helpful if you have many rows the limit occurs after the search is complete so it's not helpful at this scenario
Thanks
Ibrahim author :
Hi Nasibu
I added a character limit (
limitChars()
) function to prevent the memory from growing out of control.For larger databases, this algorithm is not a good strategy, mostly because of the limit. I recommend you either use MySQL full text search or Elastic Search
However, depending on your table structures you can still use my algorithm and improve it to have a where clause that limits the number results.
If you have a table with 200k records, it is time to look for a more robust solution. Maybe even have an expert look at it for recommendations.
Timeka :
Hello there! Hope this is not a silly question but do you have to attach a separate crawler/indexer? I want people to search urls like a google but on a smaller scale. At the moment Im using SiteSearch360 which is awesome but maybe I can do the same thing on my own. If so do you recommend any crawlers that would be good to use on a linux server?
Thank ya:)
ibrahim author :
Hi @Timeka
For this particular case, the search is conducted in real time with no indexing whatsoever. Obviously this does not work well for a dedicated search engine that sifts through large amount of data.
For a solution that works on linux you should take a look at Elastic search. It should work for most use cases.
Good luck.
Timeka :
Sorry for the late response but THANK YOU for the info it is greatly appreciated :)
Jason :
Hi there. I am a student and this tut looks fantastic. I am new to PHP and SQL and wondered how you are displaying the resulting rows? Is each one the same $name or $unique?
Would it be something like:
Or some other way?
Thank you for your guidance and patience for student noobs.
:)
ibrahim author :
Hi Jason. I am glad this tutorial was useful to you.
Displaying the results is a matter of taste. Looking at your example, you are using bootstrap so as long as it display correctly on the page. My current one is as follows, although it may change in the future:
Then with CSS I can style it however I see fit.
Jason :
Hi Ibrahim, thank you for your reply. When adding the results inside a foreach loop, I'm unable to echo an echo. Can you please correct my understanding with an example of the right syntax for the foreach loop?
ibrahim author :
Hi Jason, this is the syntax for foreach. Note I am using it like a template. You can read more about this syntax here
If you want to know how I return my data from the database, here is a short tutorial.
samuel :
Hi, i try to use your code but i got a SQL error i think your idea is great and can fit to my project i need help on why am getting SQL error the error message was
The SQL being executed was:
Thanks in advance
Ibrahim author :
Hi Samuel,
Looks like you are adding quotes to each of your keys with
%
. This:is supposed to be:
Fix that and don't forget to escape the keys and it should work.
samuel :
Ibrahim thanks so much for your replay, yes you're right there was a single quote around the query string but this was because I used Yii2
quoteValue()
method to escape the string i can't get DB object in yii2 so i decided to escape it with that method but there was quote around it below is the full code, i will follow your suggestion but meanwhile how can i escape the string to avoid sql attack, i tried to usemysqli_escape_string()
but it doesn't work either as i need to provide $link which is the DB connection and what i got from yii was an object rather thatmysqli
and also this is the escape method i'm using
Ibrahim author :
I'm not familiar with
quoteValue()
, I don't know if it escapes properly so please double check. However, you can make the current code work if you include the percent signs in the function.Peter :
Great article! I have my job search website ajiriwa, this would be a much more efficient implementation
Sergei Anov :
I want to make a search for articles on my website - is it ok to use just plain 'LIKE' statement or is there a better search algorithm to use with MySQL? (its important it be efficient) .
emirt :
Another (minor) issue is typing your search queries l i k e t h i s. It will essentially return the entire database up to the 25 limit.
Ibrahim author :
@emirt that's a very good point. We can update the function
filterSearchKeys
to remove one letter keywords.Flask Jenos :
Hello @Ibrahim this is great article. But i am little bit confused while connecting it with the search.php file. It will be very great for me if you help solving this problem.
harsh :
hi my name is harsh gajera i
Let's hear your thoughts