Minimal Introduction on Regular Expression

Regular expression IllustrationMinimal basics on Regular Expression (Non-technical )

This article is for those who have graduated from Computer Science background or have basic knowledge of natural language processing concepts. I will not share any coding stuffs in this post and all basic rules with examples will be posted in next article.
 In our graduation time we had one paper called Theory of Automata which dealt with NFA, DFA etc. There we had used some magic/wildcard symbols to complete the expression. Like “^(a|b*)+ababa?cd$”  I know it looks nothing more than a junk to a common man but it does lots of such things which saves our time and effort in real life of programming world.   Whatever you saw, was nothing but a small example of regular expression.

What is regular Expression?
It is a method to manipulate set of strings which is in some order or which makes some sense in real use. It is also called as pattern.  It provides a method to match, substitute, and transliterate set of patterns, text, strings. In programmers world it is also being called as regex or regexp.
Ex: All corrupt politicians substituted by honest politicians. We just need to use the pattern: s/corrupt/honest/ig

That’s it! Now our nation has all honest politicians in the world of regular expression. This was just a small example. It can do a lot. (Think about it -> sochiye jara :P.  It’s very famous phrase now in Indian television advertisement đŸ˜‰ ) Of course to use this small but damn powerful tool, we need to follow some sets of rules which I will try to cover in my next article as its continued part.

Why Regular Expression?
Whenever we need to do something with the text, regex’s magic can be useful. Almost all text editors use regular expression to manipulate text contents. Like find something in a file or replace a pattern with new pattern etc.  For example:

  • replace all carbonate to polycarbonate
  • Find all coders word in a file. It should not match aliencoders, decoders, and encoders. It should match only coders
  • Replace nasty characters from Unix file which came from windows files like ^M
  • Find all Cisco Pix Firewall from different firewall products
Where we can use Regular Expression?
  • It can be used anywhere, where we need to manipulate a text file, set of strings using a pattern. It is being used in almost all the programming languages and Perl is a well-known language whose power lies behind regex.
  • It is being used in text editors, even in MS Word
  • If you want to replace a word, say by mistake you wrote gray instead of grey and it is having more than 50 such occurrences in a file. Just write a pattern and use it. Ex: s/gray/grey/ig.  It will change all gray to grey in a fraction of seconds.
  • You can save many lines by using regular expression rather than using nested if else.
  • If it’s Perl then you can use code also, instead of just a string pattern. You need to use x modifier for that.
  • One can use regular expression in grep and map function in Perl which can save many lines of codes ( an alternative to loop structure with if else conditions ). There are lot more instances.
  • Regular expression is used by all level of programmers from different programming language domain like C#.NET, JavaScript, Perl, Python, PHP
  • For Syntax highlighting, data validation using regex in your codes
  • Even search engines like Google also uses Regular Expressions.
Where we can avoid regular expression?
  • If work can be done using simple if else or loop. Ex: do ip address validation (1-3 digit in four octet) and then use if else to check octets lies between 0-255 range or not.
  • If you are using backtrack with greedy search, beware of using it. It may hang your system
  • If you need to use regex anyhow , then don’t try to put all things in one complex regex
  • It’s very powerful tool, if used wisely. Always try to use minimal greedy pattern like .*? instead of .*
What are the basic elements to learn in Regular Expression?
There are lots of things to learn to start writing regex from the scratch but If you get familiar with some of its technical terms and its meaning, then writing and understanding complex regex will be easy.
  • Character class: whatever you use inside [] will be treated as range with or options  i.e [a-z0-9A-Z] which means anything from a to z or A-Z or 0 to 9
  • Meta characters (special characters) : character that has special meaning in regular expression other than simple meaning. Like + in regular expression world means one or more occurrence not a simple addition. other examples are \w,\d,\S,.,\t, ^, $, |
Note: ^, $, \A,\Z, \<,\> are also called as anchors because it decides the start and end position in the pattern
  • Quantifier or range operators: It is used for defining range of the pattern like one or more, at least 2 or more, between 3 and 8 Ex:
    • \d{3,} means at least 3 digits or more  (ha){3} matches 3 ha i.e. hahaha (shortcut to laugh (he){3} -> hehehe đŸ˜›
    • \w +  means one or more occurrence of alphanumeric character including ‘_’
    •  Other quantifiers are ?, *, {} (Hint: greedy, non-greedy used for these)
  • Modifier or (pattern modifier): used after that pattern to modify the working of regular expression
    • Ex: s/old/new/ig means substitute old to new wherever it finds the word old with case insensitive property
    • Other modifiers are i.e. m,i,c,x,g ,d,s,o  etc which depends upon regular expression operators. Like e modifier can be used only for substitute regex operator
  • Saving the matched pattern using grouping pattern i.e. (\d+), if matched will be saved in $1
  • Regular Expression operators: it defines that the pattern is for matching or searching or transliteration i.e. m, s, tr
  • Other terms like greedy pattern, non-greedy pattern, backtrack etc. Like how .* and .*? works
That’s all for now.  I hope you can understand now about the basic use and technical terms used in regular expression. I will explain all such terms in details using Perl programming language in my next post

Share your comment