Ruby Regular Expressions


A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

A regular expression literal is a pattern between slashes or between arbitrary delimiters preceded by %r as follows:

Syntax:

      /pattern/
      /pattern/im    # option can be specified
      %r!/usr/local! # general delimited regular expression
      

Example:

      line1 = "Cats like meat"
      line2 = "Dogs also like meat"
      
      if ( line1 =~ /Cats(.*)/ )
      	puts "Line1 starts with Cats"
      end
      if ( line2 =~ /Dogs(.*)/ )
      	puts "Line2 starts with Dogs"
      end
      
      if ( line1 =~ /like(.*)/ )
      	puts 'Line1 contains "like"'
      end
      if ( line2 =~ /like(.*)/ )
      	puts 'Line2 contains "like"'
      end
      

This will produce following result:

      Line1 starts with Cats
      Line2 starts with Dogs
      Line1 contains "like"
      Line2 contains "like"
      

 

Like string literals delimited with %Q, Ruby allows you to begin your regular expressions with %r followed by a delimiter of your choice. This is useful when the pattern you are describing contains a lot of forward slash characters that you don't want to escape:

      # Following matches a single slash character, no escape required
      %r|/|               
      
      # Flag characters are allowed with this syntax, too
      %r[</(.*)>]i
      

 

Search and Replace:

Some of the most important String methods that use regular expressions are sub and gsub , and their in-place variants sub! and gsub!.

All of these methods perform a search-and-replace operation using a Regexp pattern. The sub & sub! replace the first occurrence of the pattern and gsub & gsub! replace all occurrences.

The sub and gsub return a new string, leaving the original unmodified where as sub! and gsub! modify the string on which they are called.

Following is the example:

      phone = "2004-959-559 #This is Phone Number"
      
      # Delete Ruby-style comments
      phone = phone.sub!(/#.*$/, "")   
      puts "Phone Num : #{phone}"
      
      # Remove anything other than digits
      phone = phone.gsub!(/\D/, "")    
      puts "Phone Num : #{phone}"
      

This will produce following result:

      Phone Num : 2004-959-559
      Phone Num : 2004959559
      

Following is another example:

      text = "rails are rails, really good Ruby on Rails"
      
      # Change "rails" to "Rails" throughout
      text.gsub!("rails", "Rails")
      
      # Capitalize the word "Rails" throughout
      text.gsub!(/\brails\b/, "Rails")
      
      puts "#{text}"
      

This will produce following result:

      Rails are Rails, really good Ruby on Rails
      

Notice: The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called "an alphanumeric sequence boundary". This match is zero-length.


You may want to go deeply into Regular Expressions and see what the Regexp class can return:

      /[a-z0-9]+\s/mi
      %r{/path/to/gif\.gif}mi
      puts Regexp.new("[a-z0-9]+\s", Regexp::IGNORECASE | Regexp::MULTILINE)
      
      string = '[my_file.gif]'
      puts Regexp.escape(string)
      

This will produce following result:

      (?mi-x:[a-z0-9]+ )
      \[my_file\.gif\]