Friday, January 8, 2010

Regular Expression 2 : The rest Regex class

This is the second installment of Regular expressions in Scala. In the first installment the basics were shown and a few of the methods in the Regex class were inspected. This topic will look at the rest of the methods in the Regex class.

Regex.findPrefixMatchOf
  1. /*
  2. returns the match if the regex is the prefix of the string
  3. */
  4. scala> "(h)(e)|(l)".r findPrefixMatchOf "hello xyz"  
  5. res2: Option[scala.util.matching.Regex.Match] = Some(he)
  6. scala> "lo".r findPrefixMatchOf "hello xyz"  
  7. res3: Option[scala.util.matching.Regex.Match] = None
  8. /*
  9. The method is essentially the same as adding the boundary regex character
  10. */
  11. scala> "^ab".r findFirstMatchIn "ababab"
  12. res8: Option[scala.util.matching.Regex.Match] = Some(ab)
  13. scala> "^ab".r findFirstMatchIn "hababab"
  14. res9: Option[scala.util.matching.Regex.Match] = None
  15. /*
  16. findPrefixOf is the same but returns the matched string instead
  17. */
  18. scala> "ab".r findPrefixOf "haababab"       
  19. res11: Option[String] = None
  20. scala> "ab".r findPrefixOf "ababab"    
  21. res12: Option[String] = Some(ab)

Regex.replaceAllIn -- Essentially the same as using String.replaceAll
Regex.replaceFirstIn -- Essentially the same as using String.replaceFirst
  1. scala> "(h)(e)|(l)".r replaceAllIn ("hello xyz","__")
  2. res13: String = ______o xyz
  3. scala> "hello xyz" replaceAll ("(h)(e)|(l)","__")    
  4. res14: java.lang.String = ______o xyz
  5. scala> "hello xyz" replaceFirst ("(h)(e)|(l)","__")  
  6. res16: java.lang.String = __llo xyz
  7. scala> "(h)(e)|(l)".r replaceFirstIn ("hello xyz","__")
  8. res17: String = __llo xyz

This next section is not Scala specific but because Regex does not provide a way to set the flags CASE_INSENSITIVE, DOTALL, etc... The section is useful to demonstrate how to do it as part of the standard regex syntax.
  1. // examples based on java blog at: <a href="http://www.javaranch.com/journal/2003/04/RegexTutorial.htm#flags">http://www.javaranch.com/journal/2003/04/RegexTutorial.htm#flags</a>
  2. scala> val input = """Hey, diddle, diddle,      
  3.      | |The cat and the fiddle,                 
  4.      | |The cow jumped over the moon.           
  5.      | |The little dog laughed                  
  6.      | |To see such sport,                      
  7.      | |And the dish ran away with the spoon.""".stripMargin
  8. input: String = 
  9. Hey, diddle, diddle,
  10. The cat and the fiddle,
  11. The cow jumped over the moon.
  12. The little dog laughed
  13. To see such sport,
  14. And the dish ran away with the spoon.
  15. // by default regex is case sensitive
  16. scala> """the \w+?(?=\W)""".r findAllIn input foreach (println _)
  17. the fiddle
  18. the moon
  19. the dish
  20. the spoon
  21. /* the (?i)  makes the match case insensitive the complete set of options are:
  22. (?idmsux)
    • i - case insensitive
    • d - only unix lines are recognized as end of line
    • m - enable multiline mode
    • s - . matches any characters including line end
    • u - Enables Unicode-aware case folding
    • x - Permits whitespace and comments in pattern
  23. */
  24. scala> """(?i)the \w+?(?=\W)""".r findAllIn input foreach (println _)
  25. The cat
  26. the fiddle
  27. The cow
  28. the moon
  29. The little
  30. the dish
  31. the spoon

No comments:

Post a Comment