Using Regular Expression to find position instead of strpos or stripos in PHP

I really can't believe that there isn't any article explaining using regular expression to find a position of a given string. It is very command to use strpos or stripos to find the first occurrences of a given string in PHP. However, problem comes if strpos or stripos gives you the wrong result. Assuming you are looking for the symbol "RM" (Ringgit) on a given text. However, on the given text there exist a word called "RMX9182 is the code for this item selling at RM2000". It is obvious that you want your program to retrieve the symbol on "RM2000" instead of "RMX9182". Using the following strpos or stripos will definitely give you a wrong result.

$text = "RMX9182 is the code for this item selling at RM2000";
$position = stripos($text, "RM"); // return 0

Using the following regular expression, we can use regex to pinpoint the correct symbol pattern we want. In this case,

$text = "RMX9182 is the code for this item selling at RM2000";
$pattern = "/RM\\d/i";
preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

It will print out the following array

Array
(
    [0] => Array
        (
            [0] => RM2
            [1] => 45
        )

)

Notice that the second index, 45, is the index of the found text. Assuming we have more than one matches in our text,

$text = "RMX9182 RM90 is the code for this item selling at RM2000";
$pattern = "/RM\\d/i";
preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

It will still retrieve the first found text as shown below

Array
(
    [0] => Array
        (
            [0] => RM9
            [1] => 8
        )

)

However, if you would like to find all the position on a string, just use preg_match_all instead as shown below,

$text = "RMX9182 RM90 is the code for this item selling at RM2000";
$pattern = "/RM\\d/i";
preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

It will gives you a result similar to the one shown below,

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => RM9
                    [1] => 8
                )

            [1] => Array
                (
                    [0] => RM2
                    [1] => 50
                )

        )

)

This may seems very simple and direct but developer often find it easier to just stick to stripos or strpos until things get a bit off. If it's pure string. native methods might be just the right tool for you but if patterns is require, nothing beats regular expression

Simple Html Dom Fatal error: Call to a member function on a non-object

Simple Html Dom is a PHP dom manipulator. This library is one of the easiest and most powerful dom manipulator for PHP. In fact, you can even use this to create your own web crawler like what i have done. However, Simple Html Dom library isn't perfect. Although you are able to do almost everything without a problem using simple htm dom, the most problematic thing that will happen in a complex program would be to have different combination of URL. The combination of a URL is endless and this can cause simple html dom to fail.

I faced this problem with simple html dom where fatal error keep stopping my php crawler using simple html dom. The fatal error always occurs around "call to a member function on a non-object at...." and when I look at the fatal error link being process, it was perfectly fine. In PHP, we cannot really stop a fatal error without using some black magic which not always work. Like many people would have say, prevention is better than cure. Hence, doing a checking to determine whether the variable is an object before proceeding will definitely fixed this problem. If you think like many other people out there, this is most likely what you would have done and bet that it will definitely fix your problem.

$html = file_get_html($url);
if(is_object($html)){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

Well, the above might work in some case but not all. when file_get_html failed, it will return false regardless of 404 or 500 occurs on the other side of the server. Hence, you may do this as well,

$html = file_get_html($url);
if($html){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

But if it still doesn't solve your problem which it really shouldn't be able to take care of all cases, you might turn to do the following,

$html = file_get_html($url);
if($html && is_object($html)){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

Well, if this still doesn't work and your brain is stuck, you might feel lucky this time that you come to this blog of mine.

Simple_Html_Dom Fatal Error Solution

The solution for your problem is actually quite simple but not direct. However, i have tested this with almost few hundred thousands of different URL so i can confirm you that this will definitely solve your fatal error and get rid of the "call to a member function on a non-object" especially when it reaches "find". The solution is simple, using the above example, we will just have to add in a condition where it doesn't fail and the object was created by simple html dom class but it doesn't contain any node! In this case, you should write something like the following,

$html = file_get_html($url);
if($html && is_object($html) && isset($html->nodes)){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

In the above example, i uses all the true condition but in my real program i was using all the false condition (is false). But it should still works. I tested this for a few days as the bot was required to run for a few lots of hours before i bang into a fatal error. This is really a wasteful of time. But the solution is rewarding. I hope this help some fellow out there 🙂

SQL Remove Duplicate Records Solutions

Today i face a little problem. I wanted to change one of my field into unique however there were already some duplicate records (around 900+ records) in that field. And here are some of the ways i can remove duplicate records i though that i would share it out. However, before doing any fixing on our duplicate records. How about we find out which records were duplicated? If you want to know which records is duplicated, just fire the below command,

SELECT photo
FROM photograph
GROUP BY photo
HAVING count(photo) > 1

This sort of sql command is tested in most interview question. However, it is pretty simple by using groupby and having to show which data have more than 1 records.

Duplicate another table

The easiest way is to extract all unique records into temporary table, remove your current table and rename your temporary table.


CREATE TABLE photograph_tmp AS SELECT * FROM photograph GROUP BY photo
DELETE TABLE photograph
RENAME TABLE photograph_tmp TO photograph

The above will definitely fix your duplication problem but do remember to backup your records before doing anything silly.

Similar Duplicate

Another similar way instead we backup our records first, empty our original table and insert the non duplicated records in.

CREATE TABLE photograph_backup AS SELECT * FROM photograph GROUP BY photo
TRUNCATE photograph
INSERT INTO photograph SELECT * FROM photograph_backup  GROUP BY productid

This way, you don't have to worry about backing up or destorying your precious data.

Delete duplicate records

Another method is to delete all the duplicated data in the table without creating another table.

DELETE FROM photograph
   WHERE photo IN
   (SELECT photo
       FROM photograph
	   GROUP BY photo HAVING count(photo) > 1);

bascailly we search all our duplicated data and delete them from the real table. This is quite risky so don't be lazy and try out the safer way above.

Hope it helps 🙂

How to extract word from a string given a position in php

Today i wanted to extract a particular word or text (if the word doesn't make sense) given a position in a string. I tried to search this for php platform but couldn't really find an answer for it. In facts, i cannot find any through google for such functionality in php. However, i manage to find it on python. It was actually pretty simple and straight forward and i believe most of people will get it in one look. But we don't revamp the wheel so here you go.

      function extractWord($text, $position){
         $words = explode(' ', $text);
         $characters = -1; 
         foreach($words as $word){
            $characters += strlen($word);
            if($characters >= $position){
               return $word;
            }   
         }   
         return ''; 
      }   

pretty easy isn't it? The above will basically split the string into individual word and loop these words and calculate the total position. If the position of the total characters is larger or equal to the position you provide, we have reach the word that we want. Here is a little example on how to extract word from a string given a position.

$text = 'This is an example of how to extract word from a string given a position in php
$position = strpos($text, 'examp');
$word = extractWord($text, $position); // return example

It's pretty simple and straight forward but it does save some time and focus on something more important. Hope it helps 🙂

Closing new tab on Google Chrome Extension

This is something very new which i recently having problem with. I would like to open a new website and check for a certain criteria in my google chrome extension and close that particular new tab that i just open using chrome.tabs.create. However, the following code doesn't work when i inject the code into the new tab

windows.close();

The condition of closing a new tab has been fulfilled however i am not able to close that tab. Google for a few times doesn't give me any answer to my question until i try something new and it works. Looking at the below code,

chrome.tabs.create({'url': tab.url + 'feed/'}, 
function(tab) {
	$("#url").val(tab.url.replace("feed/",""));
	chrome.tabs.executeScript(tab.id, {code: "this.close();"});
});

take note that i uses chrome.tabs.create to open up a new tab on my google chrome extension. After i validate my url i injected a close command using "this" instead of "windows" with the id of the tab i just open. Apparently, it works!

Hope it helps someone 🙂