Friday, May 27, 2016

Word Boundaries Regex

\b

This is the second time this week where I have had to ask myself "how did I not know about this?"

There is a regex character to identify word boundaries: \b This is a zero length match, similar to the caret and dollar sign. It finds the boundaries between words, allowing you to search for a whole word match.

Below is a sample extension method that uses this to replace words in a string.

Implementation

public static class StringExtensions
{
    private static readonly Regex WordRegex = new Regex(@"\b\w+\b", RegexOptions.Compiled);
 
    public static string ReplaceWords(
        this string input,
        string find,
        string replace,
        StringComparison comparison = StringComparison.InvariantCulture)
    {
        return WordRegex.Replace(input, m => m.Value.Equals(find, comparison)
            ? replace
            : m.Value);
    }
}

Unit Test

public class StringExtensionsTests
{
    [Fact]
    public void ReplaceWords()
    {
        Assert.Equal(
            "This island can has beautiful",
            "This island is beautiful".ReplaceWords("is", "can has"));
 
        Assert.Equal(
            "This island are beautiful",
            "This island is beautiful".ReplaceWords(
                "IS", 
                "are", 
                StringComparison.InvariantCultureIgnoreCase));
    }
}

Enjoy,
Tom

1 comment:

  1. Great Information. Very useful. Thanks for sharing.

    ReplyDelete

Real Time Web Analytics