Best way to log details. Database or file log?

Unlike many articles in Hungred Dot Com where i share valuable web development information with my readers, this article is something that required everyone to debate on. Every system will require a logging system (unless it is a crappy system). Regardless is transaction log, result log, database log, error log and etc., there is always a need have a quick, secure and reliable log to store these information for any further investigation. And logging details usually fall into file or database category. We need to look at three important thing to consider a media to log our details. There are performance, security and reliability. Let me elaborate the importance of each point.

Log Performance

Performance, performance performance! This is something we all want to know about. Whether file base log or database log is better? We will be looking at long run where our log gets really huge! Delay and performance problem might arise and which media will be more resistance against such problem. Another good thing to consider between these two media is the extra cost of HTTP request comparing to a read and write and the problem of delay arise from huge size. We won't want to consider the alternative media only after the problem appear don't we?

Log Security

Another thing that every hacker will be interested with is the log file. Valuable information is being stored in our log file and it is necessary to consider how secure can either media gives us. Log file may even carry sensitive details of our customers which was log by our program. Hence, considering the security risk of having plain text and a database is important to prevent security hole in our system environment. Each media will have its own way to further secure its media but which is better?

Log Reliability

Why we bother to have a log file if it is unreliable. This is necessary for a system that is required to keep track of a system that handle important transaction. An unreliable log might miss a log due to various reason such as manual query termination, file lock, database down during logging and etc. It is necessary to have all our log in order to capture important incidents.

Other log criteria

Scalability and flexibility is another thing some of you might want to mention. Migration of server and ease of searching etc. is also points that is important for us to consider as a log that cannot find its detail is consider a useless log.

Database Logging

Performance wise, database might be slower when log amount is small. But once the log amount became a huge amount, database based logging might really be much faster. The problem i can see is that it will fight with other urgent query which has higher priority to be executed and table locking. This is usually resolve by using MySQL Insert Delay operation. Another issue will be latency which cause the delay o of the logging operation. In term of searching database logging surely have the upper hand. Security of the log depends solely on the security of the server and database. There might be risk of SQL injection but usually this should be taken care of by the developers.

In term of reliability, using insert delay will risk the chances of our log getting lost especially if the system is a very active one. In a very busy system every few millisecond time interval there will be additional query that makes the database super busy until the insert delay log are pile up and have to wait till the database is quiet to be active. Hence, any accident such my sql die or forcefully terminated, the log query are gone. Furthermore, additional overhead to delay such insert will degrade MySQL performance by a little.

Log file

Log file is the simplest way to achieve a logging system. Its basically just a few lines of code (depend how paranoid logger are you).  While the greatest advantage is its simplicity, the worst problem of file based logging is searching. Most developers who move to file based logging end up not relaying on logs. But usually this can be overcome with some formatting and regular expression. Performance wise, it should be directly opposite a database logging where smaller size will be better and larger it gets worst. Nonetheless,  theoretically both should be the same in term of opening and closing of file regardless of size. It should be solve easily by utilizing buffer. In term of security, file based logging usually uses plain text file. Knowing the name of the log file is equivalence to exposing to the public (especially open source apps). But this is usually resolve using file permission setting.

Unlike database logging, file based logging doesn't required a call to the database. Hence, everything is done by the server scripting language you are using and operation is complete regardless of whether the connection is down(as long as the request pass from client to server is complete).

The other more critical part to choose file based logging is the problem of file locking where only one person is allowed to open the log file at one time. Hence, in a active system this might really post a big problem where logging is done intensively. The most expensive part in file based logging should be searching. Hence, regular expression can be really handy (or pain in the ass).

Summary

Some uses both file based logging and database logging with a little help from a external batch program. But it really depends on the need and required of your logging system. But my job here is done; I have started the fire. Now its time to heat it up. 😀

25 PHP Form Validation Snippets

Recently i have involve myself in another application development. Regular Hungred Dot Com visitors will notice that the site currently offer advertisement space through this form. But really, we as a developers are always looking for such snippets or writing them out from scratch every single time regardless of how many time we know we have store it somewhere in our laptop! Man, its really frustrating searching on Google and find all sort of solution and trying to figure out whether the regular expression implemented is expensive or complete. So i came out with an idea to ease my life a bit and other developers by putting up an article such as this for my/our references. (This can be made into a class if you like to)

Validate Email

We can perform an email validation through this function.

	function isValidEmail($email){
		return eregi('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$', $email);
	}

After fainted for a few seconds when i saw unreal4u finding, i decided to throw up preg_match solution instead.

	function isValidEmail($email){
		return preg_match('/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/i', $email);
	}

PHP 5.2 and above.

function fnValidateEmail($email)
{
  return filter_var($email, FILTER_VALIDATE_EMAIL);
}

Sanitize Email

We can further sanitize our email to ensure that everything is alright.

function fnSanitizeEmaill($string) {
     return  preg_replace( '((?:\n|\r|\t|%0A|%0D|%08|%09)+)i' , '', $string );
}

PHP 5.2 and above.

function fnSanitizeEmaill($url)
{
  return filter_var($url, FILTER_SANITIZE_EMAIL);
}

Validate Email Exist

This is not possible but certain validation can be use to validate email existence.

function check_email($email)
{
	$email_error = false;
	$Email = htmlspecialchars(stripslashes(strip_tags(trim($email)))); //parse unnecessary characters to prevent exploits
	if ($Email == '') { email_error = true; }
	elseif (!eregi('^([a-zA-Z0-9._-])+@([a-zA-Z0-9._-])+\.([a-zA-Z0-9._-])([a-zA-Z0-9._-])+', $Email)) { email_error = true; }
	else {
	list($Email, $domain) = split('@', $Email, 2);
		if (! checkdnsrr($domain, 'MX')) { email_error = true; }
		else {
		$array = array($Email, $domain);
		$Email = implode('@', $array);
		}
	}

	if (email_error) { return false; } else{return true;}
}

Validate Number Only

We can use PHP built-in function to validate whether a given value is a number.

function fnValidateNumber($value)
{
	#is_ double($value);
	#is_ float($value);
	#is_ int($value);
	#is_ integer($value);
	return is_numeric($value);
}

PHP 5.2 and above.

function fnValidateNumber($value)
{
	#return filter_var($value, FILTER_VALIDATE_FLOAT); // float
	return filter_var($value, FILTER_VALIDATE_INT); # int
}

Sanitize Number

We can force all value to be only numeric by sanitize them.

function fnSanitizeNumber($str)
{
	#letters and space only
	return preg_match('/[^0-9]/', '', $str);
}

PHP 5.2 and above.

function fnSanitizeNumber($value)
{
	#return filter_var($value, FILTER_SANITIZE_NUMBER_FLOAT); // float
	return filter_var($value, FILTER_SANITIZE_NUMBER_INT); # int
}

Validate String Only

Sometimes to validate name we can use this function to restrict only letters and spaces.

function fnValidateStringr($str)
{
	#letters and space only
	return preg_match('/^[A-Za-z\s ]+$/', $str);
}

Sanitize String

We can sanitize it instead of validate user input.

function fnSanitizeStringr($str)
{
	#letters and space only
	return preg_replace('/[^A-Za-z\s ]/', '', $str);
}

PHP 5.2 and above. built-in function by PHP provides a much more powerful sanitize capability.

function fnSanitizeStringr($str)
{
	return filter_var($str, FILTER_SANITIZE_STRIPPED); # only 'String' is allowed eg. '<br>HELLO</br>' => 'HELLO'
}

Validate Alphanumeric Characters

This validates alphanumeric characters.

function fnValidateAlphanumeric($string)
{
	return ctype_alnum ($string);
}

Sanitize Alphanumeric Characters

This sanitize alphanumeric characters. eg. "HELLO! Do we have 90 idiots running around here?" => "HELLO Do we have 90 idiots running around here"

function fnSanitizeAlphanumeric($string)
{
	return preg_replace('/[^a-zA-Z0-9]/', '', $string);
}

Validate URL Exist

This function will check whether a given URL exist and not only validate it.

	function url_exist($url)
	{
		$url = @parse_url($url);

		if (!$url)
		{
			return false;
		}

		$url = array_map('trim', $url);
		$url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port'];
		$path = (isset($url['path'])) ? $url['path'] : '';

		if ($path == '')
		{
			$path = '/';
		}

		$path .= (isset($url['query'])) ? '?$url[query]' : '';

		if (isset($url['host']) AND $url['host'] != @gethostbyname($url['host']))
		{
			if (PHP_VERSION >= 5)
			{
				$headers = @get_headers('$url[scheme]://$url[host]:$url[port]$path');
			}
			else
			{
				$fp = fsockopen($url['host'], $url['port'], $errno, $errstr, 30);

				if (!$fp)
				{
					return false;
				}
				fputs($fp, 'HEAD $path HTTP/1.1\r\nHost: $url[host]\r\n\r\n');
				$headers = fread($fp, 4096);
				fclose($fp);
			}
			$headers = (is_array($headers)) ? implode('\n', $headers) : $headers;
			return (bool)preg_match('#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers);
		}
		return false;
	}

Validate URL Format

This function will validate a given url to ensure the format is correct.

function fnValidateUrl($url){
return preg_match('/^(http(s?):\/\/|ftp:\/\/{1})((\w+\.){1,})\w{2,}$/i', $url);
}

PHP 5.2 and above.

function fnValidateUrl($url)
{
  return filter_var($url, FILTER_VALIDATE_URL);
}

Sanitize URL

PHP 5.2 and above.

function fnSanitizeUrl($url)
{
  return filter_var($url, FILTER_SANITIZE_URL);
}

Validate Image Exist

This function will check whether a given image link exist and not only validate it.

	function image_exist($url) {
	if(@file_get_contents($url,0,NULL,0,1)){return 1;}else{ return 0;}
	}

Validate IP Address

This function will validate an IP address.

function fnValidateIP($IP){
	return preg_match('/^(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/',$IP)
}

PHP 5 and above. This can also specific validation for IPV4 or IPV6.

function fnValidateIP($ip)
{
  return filter_var($ip, FILTER_VALIDATE_IP);
}

Validate Proxy

This function will let us detect proxy visitors even those that are behind anonymous proxy.

function fnValidateProxy(){
	if ($_SERVER['HTTP_X_FORWARDED_FOR']
	   || $_SERVER['HTTP_X_FORWARDED']
	   || $_SERVER['HTTP_FORWARDED_FOR']
	   || $_SERVER['HTTP_VIA']
	   || in_array($_SERVER['REMOTE_PORT'], array(8080,80,6588,8000,3128,553,554))
	   || @fsockopen($_SERVER['REMOTE_ADDR'], 80, $errno, $errstr, 30))
	{
		exit('Proxy detected');
	}
}

Validate Username

Before we validate whether a given username is matches the one in our database, we can perform a validation check first to prevent any unnecessary SQL call.

function fnValidateUsername($username){
	#alphabet, digit, @, _ and . are allow. Minimum 6 character. Maximum 50 characters (email address may be more)
	return preg_match('/^[a-zA-Z\d_@.]{6,50}$/i', $username);
}

Validate Strong Password

Another good thing is to validate whether a particular password given by the user is strong enough. You can do that using this function which required the password to have a minimum of 8 characters, at least 1 uppercase, 1 lowercase and 1 number.

function fnValidatePassword($password){
	#must contain 8 characters, 1 uppercase, 1 lowercase and 1 number
	return preg_match('/^(?=^.{8,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$/', $password);
}

Validate US Phone Number

This function will validate US phone number for US users.

function fnValidateUSPhone($phoneNo){
	return preg_match('/\(?\d{3}\)?[-\s.]?\d{3}[-\s.]\d{4}/x', $phoneNo);
}

Validate US Postal Code

This function validate US postal code.

function fnValidateUSPostal($postalcode){
	#eg. 92345-3214
	return preg_match('/^([0-9]{5})(-[0-9]{4})?$/i',$postalcode);
}

Validate US Social Security Numbers

This function validate US Social Security Numbers.

function fnValidateUSSocialSecurityCode($ssb){
	#eg. 531-63-5334
	return preg_match('/^[\d]{3}-[\d]{2}-[\d]{4}$/',$ssn);
}

Validate Credit Card

This function validate credit card format.

function fnValidateCreditCard($cc){
	#eg. 718486746312031
	return preg_match('/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|3(?:0[0-5]|[68][0-9])[0-9]{11}|3[47][0-9]{13})$/', $cc);
}

Validate Date

This is a date format MM-DD-YYYY or MM-DD-YY validation which validate from year 0000-9999.

function fnValidateDate($date){
	#05/12/2109
	#05-12-0009
	#05.12.9909
	#05.12.99
	return preg_match('/^((0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.][0-9]?[0-9]?[0-9]{2})*$/', $date);
}

This is a date format YYYY-DD-MM or YY-MM-DD validation which validate from year 0000-9999.

function fnValidateDate($date){
	#2009/12/11
	#2009-12-11
	#2009.12.11
	#09.12.11
	return preg_match('#^([0-9]?[0-9]?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01]))*$#'', $date);
}

Validate Hexadecimal Colors

This is a good validation for people who allows their user to change color in their system.

function fnValidateColor($color){
	#CCC
	#CCCCC
	#FFFFF
	return preg_match('/^#(?:(?:[a-f0-9]{3}){1,2})$/i', $color);
}

Make Query Safe

This function help sanitize our data to be SQL injection safe.

function _clean($str){
return is_array($str) ? array_map('_clean', $str) : str_replace('\\', '\\\\', htmlspecialchars((get_magic_quotes_gpc() ? stripslashes($str) : $str), ENT_QUOTES));
}

//usage call it somewhere in beginning of your script
_clean($_POST);
_clean($_GET);
_clean($_REQUEST);// and so on..

Make Data Safe

This function help to keep us protected against XSS, JS and SQL injection by removing tags.

function _clean($str){
return is_array($str) ? array_map('_clean', $str) : str_replace('\\', '\\\\', strip_tags(trim(htmlspecialchars((get_magic_quotes_gpc() ? stripslashes($str) : $str), ENT_QUOTES))));
}

//usage call it somewhere in beginning of your script
_clean($_POST);
_clean($_GET);
_clean($_REQUEST);// and so on..

Summary

A paranoid way to perform a form validation would be to validate first then sanitize your values for precautions. If you think the above snippets were suck or you have any good or awesome snippets to share. Please throw your comment and share with us!

8 jQuery Optimization Tips and Tricks

Its been quite a while since i write anything about jQuery. Furthermore, i have been writing many different optimization articles and jQuery is something that i have not write for quite sometime. Since many uses jQuery quite intensively in either design or application, why don't i give some tips on jQuery too. I think it will be beneficial that such optimization tips can be share and utilize in future application. Therefore, in this article you will see 8 jQuery optimization tips and tricks that might just benefit you during your jQuery coding.

1. Profiling

Once of the most important thing in every optimization, profiling. We can easily perform profiling using firebug. Everyone who is a developer or even designer should be well aware of firebug that could help assist in this task. Profiling allow us to see exactly how many times each function has run and how long does it take. With the help of profiling you will be able to see which function is the bottleneck in your jQuery functions.

2. Benchmarking

Another thing that every optimization will have to come across eventually is benchmarking. In programming, it is very common to use different ways to achieve the same result when we are writing our codes. In order to know which ways give us the best result, benchmarking will be required in order to justify correctly that certain way of achieving the same result is better than the other. Benchmarking across different JavaScript frameworks is the only way we can see which framework run faster than the other in certain ways. Hence, benchmarking is really required in order to justify the efficiency of two or more methods or script.

3. Specific Selector

In jQuery v1.3 below, specific a selector is necessary to optimize jQuery selector speed. A selector that are more specific will result in a faster result found. Hence, an id such as this:

jQuery('#someid').attr('value');

id is always the fastest compare to class or tag where classes is the worst (we do not have document.getElementByClasses don't we). Nonetheless, we can improve the efficiency of the selector by being more specific. Hence, instead of just calling out classes like this:

jQuery('.classes').html();

we can improve it by telling it where it exactly located at

jQuery('div ul li.classes').html();

By being specific in jQuery, we can improve the selector speed. However, this is what happen before jQuery v1.3. After the introduction of 'Sizzling' selector engine, this is no longer the case. We can see that by doing a selector test that the selector has speed up than before. This means that being specific with classes will not have significant differences after v1.3 (because it required more call to be specific) unless you are doing a rather odd selector such as this

jQuery('#id .class div').html();

It will be rather faster with just

jQuery(' .class').html();

Try to avoid id, classes and attribute combination that might just hit your selector badly. The point is to specific a selector but not over specific that might cause a degrade in selector performance.

4. Avoid Unnecessary Selector

Common subexpression elimination is a common way to optimize any programming code. Doing unnecessary selector is expensive. Doing something like this:

jQuery('.class').each(function(){
	jQuery(this).html();
	jQuery(this).find('div').each(function(){
		//etc.
	});
});

which required many selector it is best to use 1 instead since we are repeating ourselves and doing some redundant selector.

jQuery('.class').each(function(){
	var obj = jQuery(this);
	obj.html();
	obj.find('div').each(function(){
		//etc.
	});
});

This is something that we often see in many plugin. Many plugin contains unnecessary selector which degrade the perform of the plugin little by little.

5. Avoid Unnecessary Styling

This is another common mistake that many jQuery developer made. jQuery provides us with the ability to manipulate styling through styling methods. Although it is very convenient to do that but it also contribute to the size of the script and makes maintenance difficult. Furthermore, some of the selector is made solely for styling purposes which makes it even undesirable. To make thing worst, the styles made with jQuery are all inline style which doesn't help caching. Hence, instead of styling everything with jQuery. It is much more advisable to give classes to your tag and style it on a external stylesheet. Instead of writing jQuery styling such as these,

jQuery('div').each(function(){
	var obj = jQuery(this);
	obj.css({'background-color' : 'yellow', 'font-weight' : 'bolder', 'color': '#000', 'font-size': '15px'})
});

Where styling is being applied through jQuery, we can just add a class or even better do it on the HTML file itself!

jQuery('div').each(function(){
	var obj = jQuery(this);
	obj.addClass("mystyling");
});

This way we can easily eliminate a lot of extra code in our script. Unless, absolutely necessary to manipulate styling through jQuery, it is better to leave styling to the stylesheet.

6. Optimize Selector

We can also optimize our selector other than being a bit specific on our selector. The key to optimize our selector is simple. Naive JavaScript provides two method to get id and tag (getElementById and getElementByTagName). Hence, it is always faster to use tag or id on your selector. No matter what algorithm used for classes and attributes selection, the naive built-in JavaScript method will always have the upper hand. Nonetheless, classes and attribute selector have also been improved dramatically through the introduction of 'sizzling' selector engine. However, it is always advisable to reduce the number of attribute selector as they are the slowest in many cases. On the other hand, making your selector as simple as possible is good for pure tag and id selector but may not be true for attribute and selector. Hence,

jQuery('div')
jQuery('#id')
jQuery('.classes')
jQuery('p:last')

is considered as good selectors while the below might not give you very good selector result.

jQuery('body div div')
jQuery('div#id')
jQuery('#id div.classes')
jQuery('.classes p:last')

It is important to treat jQuery as a helper of JavaScript instead of a new language.

7. Avoid DOM Manipulation

Since we are using jQuery as a framework, making life difficult for ourselves is the most silly thing anyone can make. Example, using jQuery to write a table in DOM style~

var table = $('<table></table>');
for (var i = 0; i < 6; i++) {
  var tr = $('<tr></tr>');
  for (var j = 0; j < 7; j++) {
    tr.append('<td></td>');
  }
  table.append($tr);
}

Leaving aside the additional operation on append() each time we call, we are also doing some expensive DOM creation each step when we can really help ourselves by doing a one line query and leave the rest to jQuery. For customize table you might do this.

var table = '<table>'
for (var i = 0; i < 6; i++) {
  var tr = '<tr>';
  for (var j = 0; j < 7; j++) {
    tr +='<td></td>';
  }
  tr +='</tr>';
  table += tr;
}
table += '</table>';
var table = $(table);

For a static table, its really one line.

var table = $('<table><tr><td><td>...</tr><tr>...</tr>...</table>');

But if you really want a table, do it in HTML. You don't need to create everything using jQuery. Treat jQuery as an expensive action and use it when absolute needed. (same goes with styling)

8. Balance between JavaScript and jQuery

jQuery is excellent to speed up ANY front-end developer job. However, there are times when jQuery may take more time to perform certain action than using JavaScript. For example,

	css: function( key, value ) {
		// ignore negative width and height values
		if ( (key == 'width' || key == 'height') && parseFloat(value) < 0 )
			value = undefined;
		return this.attr( key, value, "curCSS" );
	}

The above is a CSS method in jQuery that takes a key and a value which we are familiar using in this way

jQuery('#id').css('display', 'block');

We can see that jQuery present certain overhead for doing something simple where JavaScript can also achieve.

document.getElementById('id').style.display = 'block';

Other methods such as show(), hide(), hasClass() and etc. also present some overhead which you should balance between complexity and efficiency. Hence, we must understand some of the cost of jQuery action and balance between JavaScript and jQuery. jQuery is build for general public and making a general public framework might means additional operation to meet everyone needs. Hence, if you truly going all out on optimization, you should consider the trade-off of some action between JavaScript and jQuery.

Summary

jQuery is a great framework that ease all our development life. Operation that gets complicated can easily become simple with jQuery. However, like i stress the important of keeping out unnecessary jQuery code will certainly help generating a more efficiency and optimize jQuery script.

15 Ways to Optimize Your SQL Queries

Previous article was on 10 Ways To Destroy A SQL Database that sort of teaches you what mistakes many company might make on their database that will eventually lead to a database destroy. In this article,  you will get to know 15 ways to optimize your SQL queries. Many ways are common to optimize a query while others are less obvious.

Indexes

Index your column is a common way to optimize your search result. Nonetheless, one must fully understand how does indexing work in each database in order to fully utilize indexes. On the other hand, useless and simply indexing without understanding how it work might just do the opposite.

Symbol Operator

Symbol operator such as >,<,=,!=, etc. are very helpful in our query. We can optimize some of our query with symbol operator provided the column is indexed. For example,

SELECT * FROM TABLE WHERE COLUMN > 16

Now, the above query is not optimized due to the fact that the DBMS will have to look for the value 16 THEN scan forward to value 16 and below. On the other hand, a optimized value will be

SELECT * FROM TABLE WHERE COLUMN >= 15

This way the DBMS might jump straight away to value 15 instead. It's pretty much the same way how we find a value 15 (we scan through and target ONLY 15) compare to a value smaller than 16 (we have to determine whether the value is smaller than 16; additional operation).

Wildcard

In SQL, wildcard is provided for us with '%' symbol. Using wildcard will definitely slow down your query especially for table that are really huge. We can optimize our query with wildcard by doing a postfix wildcard instead of pre or full wildcard.

#Full wildcard
SELECT * FROM TABLE WHERE COLUMN LIKE '%hello%';
#Postfix wildcard
SELECT * FROM TABLE WHERE COLUMN LIKE  'hello%';
#Prefix wildcard
SELECT * FROM TABLE WHERE COLUMN LIKE  '%hello';

That column must be indexed for such optimize to be applied.

P.S: Doing a full wildcard in a few million records table is equivalence to killing the database.

NOT Operator

Try to avoid NOT operator in SQL. It is much faster to search for an exact match (positive operator) such as using the LIKE, IN, EXIST or = symbol operator instead of a negative operator such as NOT LIKE, NOT IN, NOT EXIST or != symbol. Using a negative operator will cause the search to find every single row to identify that they are ALL not belong or exist within the table. On the other hand, using a positive operator just stop immediately once the result has been found. Imagine you have 1 million record in a table. That's bad.

COUNT VS EXIST

Some of us might use COUNT operator to determine whether a particular data exist

SELECT COLUMN FROM TABLE WHERE COUNT(COLUMN) > 0

Similarly, this is very bad query since count will search for all record exist on the table to determine the numeric value of field 'COLUMN'. The better alternative will be to use the EXIST operator where it will stop once it found the first record. Hence, it exist.

Wildcard VS Substr

Most developer practiced Indexing. Hence, if a particular COLUMN has been indexed, it is best to use wildcard instead of substr.

#BAD
SELECT * FROM TABLE WHERE  substr ( COLUMN, 1, 1 ) = 'value'.

The above will substr every single row in order to seek for the single character 'value'. On the other hand,

#BETTER
SELECT * FROM TABLE WHERE  COLUMN = 'value%'.

Wildcard query will run faster if the above query is searching for all rows that contain 'value' as the first character. Example,

#SEARCH FOR ALL ROWS WITH THE FIRST CHARACTER AS 'E'
SELECT * FROM TABLE WHERE  COLUMN = 'E%'.

Index Unique Column

Some database such as MySQL search better with column that are unique and indexed. Hence, it is best to remember to index those columns that are unique. And if the column is truly unique, declare them as one. However, if that particular column was never used for searching purposes, it gives no reason to index that particular column although it is given unique.

Max and Min Operators

Max and Min operators look for the maximum or minimum value in a column. We can further optimize this by placing a indexing on that particular columnMisleading We can use Max or Min on columns that already established such Indexes. But if that particular column is frequently use, having an index should help speed up such searching and at the same time speed max and min operators. This makes searching for maximum or minimum value faster. Deliberate having an index just to speed up Max and Min is always not advisable. Its like sacrifice the whole forest for a merely a tree.

Data Types

Use the most efficient (smallest) data types possible. It is unnecessary and sometimes dangerous to provide a huge data type when a smaller one will be more than sufficient to optimize your structure. Example, using the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space. On the other hand, VARCHAR will be better than longtext to store an email or small details.

Primary Index

The primary column that is used for indexing should be made as short as possible. This makes identification of each row easy and efficient by the DBMS.

String indexing

It is unnecessary to index the whole string when a prefix or postfix of the string can be indexed instead. Especially if the prefix or postfix of the string provides a unique identifier for the string, it is advisable to perform such indexing. Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks.

Limit The Result

Another common way of optimizing your query is to minimize the number of row return. If a table have a few billion records and a search query without limitation will just break the database with a simple SQL query such as this.

SELECT * FROM TABLE

Hence, don't be lazy and try to limit the result turn which is both efficient and can help minimize the damage of an SQL injection attack.

SELECT * FROM TABLE WHERE 1 LIMIT 10

Use Default Value

If you are using MySQL, take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

In Subquery

Some of us will use a subquery within the IN operator such as this.

SELECT * FROM TABLE WHERE COLUMN IN (SELECT COLUMN FROM TABLE)

Doing this is very expensive because SQL query will evaluate the outer query first before proceed with the inner query. Instead we can use this instead.

SELECT * FROM TABLE, (SELECT COLUMN FROM TABLE) as dummytable WHERE dummytable.COLUMN = TABLE.COLUMN;

Using dummy table is better than using an IN operator to do a subquery. Alternative, an exist operator is also better.

Utilize Union instead of OR

Indexes lose their speed advantage when using them in OR-situations in MySQL at least. Hence, this will not be useful although indexes is being applied

SELECT * FROM TABLE WHERE COLUMN_A = 'value' OR COLUMN_B = 'value'

On the other hand, using Union such as this will utilize Indexes.

SELECT * FROM TABLE WHERE COLUMN_A = 'value'
UNION
SELECT * FROM TABLE WHERE COLUMN_B = 'value'

Hence, run faster.

Summary

Definitely, these optimization tips doesn't guarantee that your queries won't become your system bottleneck. It will require much more benchmarking and profiling to further optimize your SQL queries. However, the above simple optimization can be utilize by anyone that might just help save some colleague rich bowl while you learn to write good queries. (its either you or your team leader/manager)

10 Ways To Destroy A SQL Database

Database is the asset of most online or internet based company. Everyone looks at how to improve and secure their databases to protect or improve their company. While everyone is searching for remedies or enhancement pills for their company, there are often simple mistakes made by some companies (especially the small to middle ones) that might just destroy their businesses. Rather than looking at how we can protect our database, this article will look at ways to destroy it instead! (through mistake, of course)

Don't Monitor Error Log

The first line of defense that any database would have. Error log may indicates first time problem occurs or warnings that your database might be facing problem. These troubles can be easily avoided or missed depending on what you do. Be my guest and ignore error log will definitely help to destroy your database.

Many company databases are designed in a way to enforce availability. Hence, there will surely be primary and slaves databases in such company. These databases also contain error files. However, if you would like your secondary databases or slave databases go out of sync with the primary database, be sure to ignore the error file and give it some time. Depending on the size of your company, the amount of data lost caused by the lost of synchronization might just cost you dearly when some hero DBA shutdown the primary database without issuing "stop slave" first on the slave database or wait till some errant SQL come down the line. Although this might take some times to destroy your database but its worth to think about using it.

Don't Fine Tune Queries

You have a big server, lots of memory and fast disk, so don't have to worry. Continue with this attitude and you are on your way to success (destroy it!). Developers writing bad code that caused full table scan and trying their best to trash your query cache, overloading your Innodb buffer cache with useless blocks. Plus hitting disk instead of main memory as much as possible? Well, there won't be ANY problem since every piece of hardware is the latest, fastest and most powerful ones! Let's just wait and see till your database fall on its knees! Especially when data is getting larger! That's the best time we see this happen!

Don't Document Procedures and Configurations

Ah! No documentation won't cause a single problem! No problem at all my friend! Why the need to enforce such tedious job when 'we' can maintain the job? Just you wait when the 'we' becomes 'i' and 'i' becomes 'who'. Employees come and go nowadays. People always look for a better future in life and no matter what happens to you, it's really none of their concern. Man! What so difficult? Let's just hire an expert to take the job. Oh boy! That's a great solution! Let's see when he presses the wrong button 🙂

Don't Backup

Another great way to destroy your database is to avoid making backups! Hardware failure is a common thing in data center. Hard disk fail, power supply down, plugs get pulled, basically anything you can imagine. Don't backup regularly can just do the thing!

Its not only hardware that might assist you in your task. Developers or DBA who accidentally deletes data can also be of help. Deleting columns, table, rows, data or even database! These sort of things do happen and it happens quite frequently. Other than deleting data, mistakes made on program might just transfer the wrong data to the wrong place. All of these are just things that might happen to any IT firms. Well, there is always never happen before situation for some of you. Just follow your instinct (backup suck!) on this and you will do just fine.

Don't Use Memory Wisely

Server nowadays have huge memory installed with it. Technology advancement has made everything more powerful than before. Furthermore, it is that affordable that many companies can afford big and fast memory! With such powerful memory backup, we can assure that whatever developers throw in, the server and database will surely be able to take it! We can safely assume MySQL knows our database memory requirement! We just have to run the wizard installation and viola! Everything is automatic nowadays! There won't be such thing as misallocation of memory. The system is perfect.

Good to know. You have everything that required (even attitude) in preparing to bomb up your database.

Don't Worry About Indexes

Indexes is the most effective way to destroy your database. However, you must know the trick to do this. There are two ways to succeed. The first one required you to do absolutely nothing. No indexing is required purely full scan table. However, this required certain criteria to be meet but it should do the trick. If this doesn't suit your taste, you can try a faster way by creating useless or unwanted indexes and ensure that your table have tons of records. This can surely improve the process of destroying your database.

Don't Normalize Your Database Design

If you are just starting to build a system, you can consider skipping normalization in your database design. Skipping normalization can help contribute to bad database design which is part of the plan to destroy a database. Furthermore, without normalization, there is a good chance your system can be inaccurate, slow, and inefficient, and it might not even produce the data you expect.

Don't Make Policies For Database Patch

Its true that majority company doesn't update their databases immediately after a new security vulnerability patch has been released. It would be irresponsible for a company to deploy a patch in production without first running it through quality assurance. Furthermore, some companies didn't event bother to have policies to update their database. If you think that databases are a little more isolated than the desktop, there's less of a security concern and thinks that your databases are more secure because they're behind firewalls and and have a good perimeter security, you are on the right track of destroying a database.

Don't Bother Caching

My database can take tons of crap anything throw towards it. Its the fastest computer (why bother to have technology advancement when we already have the fastest? Dot) on the planet. Cache or no cache won't destroy my database. Its the fastest (ya ya, i get it.). The dramatics performance gain using cache table might not interest you. Scalability, flexibility, availability and performance are just some benefits that caching can gives. Multitier architecture, what bullshit. Your server will NEVER go down and you will NEVER required cache table to be available. Millions of hits on the server database might just do the trick on helping you achieve what you want in this article. It will definitely work better when your table has few millions record and a few lousy queries. (it might not even required millions of hit to kill it)

Don't Use Fast or Reliable Disk

Using something like a single disk or mirror will definitely makes your I/O the main bottleneck on your system. With the help of a single disk, you can expect OS and your database fighting for resources, serving one user at a time while others waiting for their turn. To make things worst, you can try utilize RAID-5 instead of RAID-10. May be you already are!  Well, to  compare between these two, RAID-5 only performed reasonably well on read while RAID-10 exceed almost two times better than RAID-5 on writes. RAID-5 only can handle 1 fails and any drive die will approximately caused 64% degration in read performance until the faulty drive is discovered. Furthermore during recovery, read performance for a RAID5 array is degraded by as much as 80% compared to RAID-5 which only degrade performance on the faulty disk itself. There are more 'advantages' on RAID-5 but i will just stop here. RAID-5 seems to be good with destroying than RAID-10 don't you think so?

Conclusion

The points discuss here might just happen to large data or traffic internet site (well, your site will eventually grow to have big data, hopefully). However, the conclusion to all these jokes are more valuable; Learn to save your ass. No one will.