Using Regular Expression to find position instead of strpos or stripos in PHP

I really can't believe that there isn't any article explaining using regular expression to find a position of a given string. It is very command to use strpos or stripos to find the first occurrences of a given string in PHP. However, problem comes if strpos or stripos gives you the wrong result. Assuming you are looking for the symbol "RM" (Ringgit) on a given text. However, on the given text there exist a word called "RMX9182 is the code for this item selling at RM2000". It is obvious that you want your program to retrieve the symbol on "RM2000" instead of "RMX9182". Using the following strpos or stripos will definitely give you a wrong result.

$text = "RMX9182 is the code for this item selling at RM2000";
$position = stripos($text, "RM"); // return 0

Using the following regular expression, we can use regex to pinpoint the correct symbol pattern we want. In this case,

$text = "RMX9182 is the code for this item selling at RM2000";
$pattern = "/RM\\d/i";
preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

It will print out the following array

Array
(
    [0] => Array
        (
            [0] => RM2
            [1] => 45
        )

)

Notice that the second index, 45, is the index of the found text. Assuming we have more than one matches in our text,

$text = "RMX9182 RM90 is the code for this item selling at RM2000";
$pattern = "/RM\\d/i";
preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

It will still retrieve the first found text as shown below

Array
(
    [0] => Array
        (
            [0] => RM9
            [1] => 8
        )

)

However, if you would like to find all the position on a string, just use preg_match_all instead as shown below,

$text = "RMX9182 RM90 is the code for this item selling at RM2000";
$pattern = "/RM\\d/i";
preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

It will gives you a result similar to the one shown below,

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => RM9
                    [1] => 8
                )

            [1] => Array
                (
                    [0] => RM2
                    [1] => 50
                )

        )

)

This may seems very simple and direct but developer often find it easier to just stick to stripos or strpos until things get a bit off. If it's pure string. native methods might be just the right tool for you but if patterns is require, nothing beats regular expression

Simple Html Dom Fatal error: Call to a member function on a non-object

Simple Html Dom is a PHP dom manipulator. This library is one of the easiest and most powerful dom manipulator for PHP. In fact, you can even use this to create your own web crawler like what i have done. However, Simple Html Dom library isn't perfect. Although you are able to do almost everything without a problem using simple htm dom, the most problematic thing that will happen in a complex program would be to have different combination of URL. The combination of a URL is endless and this can cause simple html dom to fail.

I faced this problem with simple html dom where fatal error keep stopping my php crawler using simple html dom. The fatal error always occurs around "call to a member function on a non-object at...." and when I look at the fatal error link being process, it was perfectly fine. In PHP, we cannot really stop a fatal error without using some black magic which not always work. Like many people would have say, prevention is better than cure. Hence, doing a checking to determine whether the variable is an object before proceeding will definitely fixed this problem. If you think like many other people out there, this is most likely what you would have done and bet that it will definitely fix your problem.

$html = file_get_html($url);
if(is_object($html)){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

Well, the above might work in some case but not all. when file_get_html failed, it will return false regardless of 404 or 500 occurs on the other side of the server. Hence, you may do this as well,

$html = file_get_html($url);
if($html){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

But if it still doesn't solve your problem which it really shouldn't be able to take care of all cases, you might turn to do the following,

$html = file_get_html($url);
if($html && is_object($html)){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

Well, if this still doesn't work and your brain is stuck, you might feel lucky this time that you come to this blog of mine.

Simple_Html_Dom Fatal Error Solution

The solution for your problem is actually quite simple but not direct. However, i have tested this with almost few hundred thousands of different URL so i can confirm you that this will definitely solve your fatal error and get rid of the "call to a member function on a non-object" especially when it reaches "find". The solution is simple, using the above example, we will just have to add in a condition where it doesn't fail and the object was created by simple html dom class but it doesn't contain any node! In this case, you should write something like the following,

$html = file_get_html($url);
if($html && is_object($html) && isset($html->nodes)){
   foreach($html->find('img') as $img){
      //bla bla bla..
   }
}

In the above example, i uses all the true condition but in my real program i was using all the false condition (is false). But it should still works. I tested this for a few days as the bot was required to run for a few lots of hours before i bang into a fatal error. This is really a wasteful of time. But the solution is rewarding. I hope this help some fellow out there 🙂

How to extract word from a string given a position in php

Today i wanted to extract a particular word or text (if the word doesn't make sense) given a position in a string. I tried to search this for php platform but couldn't really find an answer for it. In facts, i cannot find any through google for such functionality in php. However, i manage to find it on python. It was actually pretty simple and straight forward and i believe most of people will get it in one look. But we don't revamp the wheel so here you go.

      function extractWord($text, $position){
         $words = explode(' ', $text);
         $characters = -1; 
         foreach($words as $word){
            $characters += strlen($word);
            if($characters >= $position){
               return $word;
            }   
         }   
         return ''; 
      }   

pretty easy isn't it? The above will basically split the string into individual word and loop these words and calculate the total position. If the position of the total characters is larger or equal to the position you provide, we have reach the word that we want. Here is a little example on how to extract word from a string given a position.

$text = 'This is an example of how to extract word from a string given a position in php
$position = strpos($text, 'examp');
$word = extractWord($text, $position); // return example

It's pretty simple and straight forward but it does save some time and focus on something more important. Hope it helps 🙂

Determine Whether JavaScript Is Enabled/Disabled Via PHP

Recently i was working on a project where there is a need to determine whether JavaScript is enabled or was disabled by the user. Depending whether the JavaScript is enable or not, the system will rely on JavaScript operation if it does and PHP operation if it doesn't. The fundamental solution to this is to detect whether JavaScript is enable before the system can determine which approach can be used. However, there is no easy solution to determine whether a client scripting is enable in a server scripting language (PHP) without finish loading the page! Therefore, in this article we will discuss whether there is such possibility to use PHP to determine whether JavaScript is enabled for your web application.

The Problem

The main problem is that a server script language can never be able to determine whether a client script language is available as the server script language will always run first. Furthermore, the client script is always run on the client side and never executed on the server side. Therefore, when the server scripting is running at the server side and send to the client for display, the server scripting language will have no idea what is going on with the client environment. Hence, strictly speaking will be unable to determine JavaScript is enable or disable.

The Solution

Although it sounds impossible for server side to determine whether a client scripting is available such as JavaScript but certain tricks can be perform in order to achieve this. However, it won't be a convenient one. Recall that every web system should have a redirect index.php page to prevent our code from showing in plaintext if anything happen? We can use that page to determine whether javascript is enable by writing a script to either append a value and post over to the next page or a better alternative is to store it into the user cookie. If you store a value and post it to the next page, the validation can only occur within the main page. However, if you utilize cookie to determine whether JavaScript is available, you can always use php to determine whether that cookie value is available. If it is not available (they delete their browser cookie on the way) you can redirect that user to the index.php to revalidate JavaScript is enable. Once it is being verify, you will just show a message to the user after index.php has redirect or run on pure php.

On the index.php script, it will be something like this,

<script type='javascript/text'>
function createCookie(name,value,days) {
	if (days) {
		var date = new Date();
		date.setTime(date.getTime()+(days*24*60*60*1000));
		var expires = "; expires="+date.toGMTString();
	}
	else var expires = "";
	document.cookie = name+"="+value+expires+"; path=/";
}
createCookie('verify_cookie', 'Y', 1);
</script>
<meta http-equiv="refresh" content="2;url=main.php?c=1">

We have a function that help us to create a cookie if JavaScript available. Once this is done, we redirect the user to main.php where our real page is located with a get value of c=1. This value is needed to avoid recursive request. We can't use PHP header function because it will redirect before JavaScript has the opportunity to run and the code should be placed before the head tag to make this valid. On all other pages we will have something like this before the header.

<?php
	//filter the global variable first.
	if(!isset($_COOKIE['verify_cookie']) && $_GET['c'] == 1){
		echo 'JavaScript is disable';
	}else if(!isset($_COOKIE['verify_cookie'])){
		//perform check to determine whether the cookie expire OR it really was disabled.
		header('location: index.php');
	}else{
		//perform another check on javascript similar to index.php if you afraid that the cookie exist but javascript was disabled.
	}
?>

The above is to verify whether javascript exist in each page and use to run either pure php or combination with JavaScript as these script can be imported using PHP if needed. The solution above can be use as a references and not necessary a solid solution.

Alternative Solution

The alternative solution to this is to use the noscript tag which is very simple and make your life a better place to live in.

<script>
document.getElementsByTagName('body')[0].innerHTML = 'JavaScript is enable.';
<script>
<noscript>
JavaScript is disabled.
<noscript>

Conclusion

Many will turn to noscript tag that can really ease and simplify the way we code. However, for some system which required to determine whether script is enabled for different server script to run. This might help those that are doing such approach as noscript tag will only run after the server has processed its information. On the other hand, you can combine this approach with the no tag approach to better validate your logic.

Best way to log details. Database or file log?

Unlike many articles in Hungred Dot Com where i share valuable web development information with my readers, this article is something that required everyone to debate on. Every system will require a logging system (unless it is a crappy system). Regardless is transaction log, result log, database log, error log and etc., there is always a need have a quick, secure and reliable log to store these information for any further investigation. And logging details usually fall into file or database category. We need to look at three important thing to consider a media to log our details. There are performance, security and reliability. Let me elaborate the importance of each point.

Log Performance

Performance, performance performance! This is something we all want to know about. Whether file base log or database log is better? We will be looking at long run where our log gets really huge! Delay and performance problem might arise and which media will be more resistance against such problem. Another good thing to consider between these two media is the extra cost of HTTP request comparing to a read and write and the problem of delay arise from huge size. We won't want to consider the alternative media only after the problem appear don't we?

Log Security

Another thing that every hacker will be interested with is the log file. Valuable information is being stored in our log file and it is necessary to consider how secure can either media gives us. Log file may even carry sensitive details of our customers which was log by our program. Hence, considering the security risk of having plain text and a database is important to prevent security hole in our system environment. Each media will have its own way to further secure its media but which is better?

Log Reliability

Why we bother to have a log file if it is unreliable. This is necessary for a system that is required to keep track of a system that handle important transaction. An unreliable log might miss a log due to various reason such as manual query termination, file lock, database down during logging and etc. It is necessary to have all our log in order to capture important incidents.

Other log criteria

Scalability and flexibility is another thing some of you might want to mention. Migration of server and ease of searching etc. is also points that is important for us to consider as a log that cannot find its detail is consider a useless log.

Database Logging

Performance wise, database might be slower when log amount is small. But once the log amount became a huge amount, database based logging might really be much faster. The problem i can see is that it will fight with other urgent query which has higher priority to be executed and table locking. This is usually resolve by using MySQL Insert Delay operation. Another issue will be latency which cause the delay o of the logging operation. In term of searching database logging surely have the upper hand. Security of the log depends solely on the security of the server and database. There might be risk of SQL injection but usually this should be taken care of by the developers.

In term of reliability, using insert delay will risk the chances of our log getting lost especially if the system is a very active one. In a very busy system every few millisecond time interval there will be additional query that makes the database super busy until the insert delay log are pile up and have to wait till the database is quiet to be active. Hence, any accident such my sql die or forcefully terminated, the log query are gone. Furthermore, additional overhead to delay such insert will degrade MySQL performance by a little.

Log file

Log file is the simplest way to achieve a logging system. Its basically just a few lines of code (depend how paranoid logger are you).  While the greatest advantage is its simplicity, the worst problem of file based logging is searching. Most developers who move to file based logging end up not relaying on logs. But usually this can be overcome with some formatting and regular expression. Performance wise, it should be directly opposite a database logging where smaller size will be better and larger it gets worst. Nonetheless,  theoretically both should be the same in term of opening and closing of file regardless of size. It should be solve easily by utilizing buffer. In term of security, file based logging usually uses plain text file. Knowing the name of the log file is equivalence to exposing to the public (especially open source apps). But this is usually resolve using file permission setting.

Unlike database logging, file based logging doesn't required a call to the database. Hence, everything is done by the server scripting language you are using and operation is complete regardless of whether the connection is down(as long as the request pass from client to server is complete).

The other more critical part to choose file based logging is the problem of file locking where only one person is allowed to open the log file at one time. Hence, in a active system this might really post a big problem where logging is done intensively. The most expensive part in file based logging should be searching. Hence, regular expression can be really handy (or pain in the ass).

Summary

Some uses both file based logging and database logging with a little help from a external batch program. But it really depends on the need and required of your logging system. But my job here is done; I have started the fire. Now its time to heat it up. 😀