Learn Regular Expressions in 20 Minutes

Learn Regular Expressions in 20 Minutes

You run into a problem and decide to use a regular expression. Now you have two problems. Or at least this is how the saying goes. Regular expressions are a powerful tool that skillful coders leave as a last resort, but when they do use it, they strike terror in the hearts of their enemies (and colleagues).

Regular expressions (or regex-es, as is the correct term for what we use in programming languages today) are specialized languages for defining pattern matching rules for text. They have their own grammar and syntax rules, which every beginner gets wrong. But you don’t have to! Here is what you need to know:

1. Matching a single character

Every programming language has a way of defining and using regular expressions. They have some differences, but the basics which are covered in this article should work anywhere. The examples here are written in JavaScript, so that you can try them out in your browser.

The most basic regexes are those that match a single character. Here are the rules:

  • The dot (.) matches any character. If you want to match the dot as a character, escape it like this: \.
  • A question mark (?) means that the preceding character is optional. If you want to match an actual question mark, escape it: \?

You can play with our editor below. Clicking the Run button will execute your code.

var text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit lest. Donec convallis dignissim ligula, et rutrum est elat vistibulum eu.';

// Will match both "elit" and "elat". The dot can match any character.
// Tip: try removing the "g" modifier of the regex to see what happens.

var regex = /el.t/g;

console.log( text.match(regex) );


// Will match both "est" and "lest". The question mark makes "l" optional.

var regex2 = /l?est/g;

console.log( text.match(regex2) );

2. Matching a character of a set

Building up from the previous example, we can write regexes that match only certain character by using sets:

  • A set is one or more characters enclosed in brackets [abc]. It matches only one of those characters – in this example only a, b or c. You can negate a set with ^. [^abc] will match any character that is not a, b or c. You can also specify a range [0-9], [a-z], which will match everything in the range.
  • There are built-in sets that make writing regexes easier (they are called shorthand). Instead of [0-9] you can write \d and for [^0-9] you can write \D. There are also sets for word characters (a through z with digits and underscore) – \w and \W, and spaces (including tabs and new lines) – \s and \S.

This example will makes things clearer:

// Match only "cat" and "can", but not "car".

var text = 'cat car can';

console.log( text.match(/ca[tn]/g) );

// Match everything BUT cat and can (notice the ^ symbol)

console.log( text.match(/ca[^tn]/g) );


// Here is another example, which matches only the number

text = 'I would like 8 cups of coffee, please.';


console.log('How many cups: ' + text.match( /[0-9]/g ));

// A better, shorter way to write it, using the \d character class

console.log('How many cups: ' + text.match( /\d/g ));


// Matching everything BUT the number (will return an array of chars)

console.log( text.match(/\D/g) );

3. Matching words

Most of the time, you will want to match entire words, instead of single characters. This is done by using modifiers which repeat a character or a character set. These are:

  • +, which repeats the preceding character or set one or more times
  • *, which repeats the preceding character or set zero or more times
  • {x} for an exact number of repetitions, {x,y} for varying number of repetitions (where x and y are numbers)

Also, there is the special \b pattern which matches the boundaries at the ends of words (not a real symbol).

var text = 'Hello people of 1974. I come from the future. In 2014 we have laser guns, hover boards and live on the moon!';

// Find the years. \d+ will match one or more chars

var yearRegex = /\d+/g;

console.log('Years: ', text.match( yearRegex ) );


// Find all sentences. Our sentences begin with a capital letter and end in either a dot or an exclamation mark.
// The question mark makes the regex non-greedy. Try removing it.

var sentenceRegex = /[A-Z].+?(\.|!)/g;

console.log('Sentences: ', text.match(sentenceRegex) );


// Find all words that begin with h. We want to match both lower and upper case, so we include the i modifier.
// Notice the \b for word boundary. Try removing it.

var hWords = /\bh\w+/ig;

console.log('H Words: ', text.match(hWords) );


// Find all words between four and six characters

var findWords = /\b\w{4,6}\b/g;

console.log( 'Words between 4 and 6 chars: ', text.match(findWords) );


// Find words longer than 5 chars

console.log( 'Words 5 chars or longer: ', text.match(/\b\w{5,}\b/g) );


// Find words exactly 6 chars long

console.log( 'Words exactly 6 chars long: ', text.match(/\b\w{6}\b/g) );

4. Matching/validating entire lines

In JavaScript, this is the type of patterns you would use to validate user input from text fields. It is just a ordinary regex, but anchored to the start and end of the text using ^ (start of line), $ (end of line) expressions. This will make sure that the pattern that you write spans the entire length of the text, and doesn’t only match a part of it.

Also, in this case we use the test() method of the regex object, which returns either true or false if the regex matches the string.

// We have an array with strings. Let's extract only the URLs!

var strings = [
	'http://tutorialzine.com/posts/',
	'this is not a URL',
	'https://google.com/',
	'123461',
	'http://tutorialzine.com/?search=jquery',
	'http://not a valid url',
	'abc http://invalid.url/'
];

// Here is a simple regex that will do the job. Note the ^ and $ symbols for beggining and end of line. 
// Try removing them and see which URLs are detected.

var regex = /^https?:\/\/[\w\/?.&-=]+$/;

var urls = [];

for( var i = 0; i < strings.length; i++ ){

	if( regex.test(strings[i]) ){
		
		// This is a valid URL
		urls.push(strings[i]);

	}

}

console.log('Valid URLs: ', urls);

5. Search and replace

Another common task that often calls for the use of regular expressions is searching and replacing text. There are two basic ideas here:

  • A group is a set of patterns enclosed in braces (). Each group collects the text that was matched by the patterns inside it. The text matched by each group can be addressed later with indexes prefixed with dollar signs (starting from $1 for the first group).
  • Each group is available in the pattern itself as a back reference – backward slash followed by the group index, starting from \1 (see the example below). This is only rarely used, so you can blissfully forget about this feature.

// Using backreferences
// Find the words which consist only of the same letters

var text = 'Abc ddefg, hijk lllll mnopqr ssss. Tuv wxyyy z.';

var sameLetterRegex = /\b(\w)\1*\b/g;

console.log( text.match(sameLetterRegex) );


// Let's turn "John Smith" into "Smith, John"
// Each group (\w+) matches a single word. Each group is assigned 
// an index, starting from $1

var name = 'John Smith';
var nameRegex = /(\w+) (\w+)/;

console.log( name.replace(nameRegex, '$2, $1') );


// For more advanced manipulations, we need to provide a JS callback. 
// For example, lets make the surname uppercase

var upcasename = name.replace(nameRegex, function(string, group1, group2){
	return group2.toUpperCase() + ', ' + group1;
});

console.log( upcasename );

Resources and further reading

And this concludes our quick overview! If you learn what was presented in this article, you will be well prepared to solve 80% of the problems that involve regexes. For the other 20%, try these tools and resources:

Presenting Bootstrap Studio

a revolutionary tool that developers and designers use to create
beautiful interfaces using the Bootstrap Framework.

Learn more
Web Browser Frame DevKit Box Mouse Cursor
by Martin Angelov

Martin is a web developer with an eye for design from Bulgaria. He founded Tutorialzine in 2009 and it still is his favorite side project.

1Share this post
2Read one more article
3Get your free book
Book Cover
jQuery Trickshots

Tutorialzine's advanced jQuery techniques book.

Download

10 Comments

  1. Danny Markov says:

    Another cool tool for testing regular expresssions is regex101. It supports php, js and python regexes.

    1. Martin Angelov says:

      Thank you Danny! This is a useful tool.

    2. NetHawk says:

      Great tip!

  2. Ed says:

    I have used regular expressions before, but I have always had a hard time getting everything just right. This is a great Article. Thanks!

  3. Ma says:

    This is really helpful. Keep it up :)

  4. Federica says:

    Very useful tank you! Bookmarked for future reference.

  5. Bhavesh Gohel says:

    JavaScript Regular Expression Visualizer : http://jex.im/regulex/

  6. Dmitriy says:

    Martin, interesting information!
    Thanks for fast explanation

  7. syed shaik shavali says:

    Thanks, Good information about regEx.

  8. Majid says:

    Martin , Thanks a lot for this simplified examples .... it's my first time with Regex

Add Comment

Add a Reply

HTML is escaped automatically. Surround code blocks with <pre></pre> for readability.
Perks:   **bold**   __italics__   [some text](http://example.com) for links