Main Content

pattern

Patterns to search and match text

Description

A pattern defines rules for matching text with text-searching functions like contains, matches, and extract. You can build a pattern expression using pattern functions, operators, and literal text. For example, MATLAB® release names, start with "R", followed by the four-digit year, and then either "a" or "b". Define a pattern to match the format of the release names:

pat = "R" + digitsPattern(4) + ("a"|"b");

Match that pattern in a string:

str = ["String was introduced in R2016b." 
       "Pattern was added in R2020b."];
extract(str,pat)
ans =
  2x1 string array
    "R2016b"
    "R2020b"

Creation

Patterns are composed of literal text and other patterns using the +, |, and ~ operators. You also can create common patterns using Object Functions, which use rules often associated with regular expressions:

  • Character-Matching Patterns – Ranges of letters or digits, wildcards, or whitespaces, such as lettersPattern.

  • Search Rules – How many times the pattern must occur, case sensitivity, optional patterns, and named expressions, such as optionalPattern.

  • Boundaries – Boundaries at the start or end of a run of specific characters, such as alphanumericBoundary. Boundary patterns can be negated using the ~ operator so that matches to the boundary prevents matching of their pattern expression.

  • Pattern Organization – Define pattern structure and specify how pattern expressions are displayed, such as maskedPattern and namedPattern.

The function pattern also creates pattern functions with the syntax, pat = pattern(txt), where txt is literal text that pat matches. Pattern functions are useful for specifying pattern type for function argument validation. However, the pattern function is rarely needed for other cases because MATLAB text-matching functions accept text inputs.

Object Functions

expand all

containsDetermine if pattern is in strings
matchesDetermine if pattern matches strings
countCount occurrences of pattern in strings
endsWithDetermine if strings end with pattern
startsWithDetermine if strings start with pattern
extractExtract substrings from strings
replaceFind and replace one or more substrings
replaceBetweenReplace substrings between start and end points
splitSplit calendar duration into numeric and duration units
eraseDelete substrings within strings
eraseBetweenDelete substrings between start and end points
extractAfterExtract substrings after specified positions
extractBeforeExtract substrings before specified positions
extractBetweenExtract substrings between start and end points
insertAfterInsert strings after specified substrings
insertBeforeInsert strings before specified substrings
digitsPattern Match digit characters
lettersPatternMatch letter characters
alphanumericsPatternMatch letter and digit characters
characterListPatternMatch characters from list
whitespacePatternMatch whitespace characters
wildcardPatternMatches as few characters of any type
optionalPatternMake pattern optional to match
possessivePatternMatch pattern without backtracking
caseSensitivePatternMatch pattern with case sensitivity
caseInsensitivePatternMatch pattern regardless of case
asFewOfPatternMatch pattern as few times as possible
asManyOfPatternMatch pattern as many times as possible
alphanumericBoundaryMatch boundary between alphanumeric and non-alphanumeric characters
digitBoundaryMatch boundary between digit characters and nondigit characters
letterBoundaryMatch boundary between letter characters and nonletter characters
whitespaceBoundaryMatch boundary between whitespace characters and non-whitespace characters
lineBoundaryMatch start or end of line
textBoundaryMatch start or end of text
lookAheadBoundaryMatch boundary before specified pattern
lookBehindBoundaryMatch boundary before specified pattern
regexpPatternPattern that matches specified regular expression
maskedPatternPattern with specified display name
namedPatternDesignate named pattern

Examples

collapse all

lettersPattern is a typical character-matching pattern that matches letter characters. Create a pattern that matches one or more letter characters.

txt = ["This" "is a" "1x6" "string" "array" "."];
pat = lettersPattern;

Use contains to determine if characters matched by pat are present in each string. The output logical array shows that the first five of the strings in txt contain letters, but the sixth string does not.

contains(txt,pat)
ans = 1x6 logical array

   1   1   1   1   1   0

Determine if text starts with the specified pattern. The output logical array shows that four of the strings in txt start with letters, but two strings do not.

startsWith(txt,pat)
ans = 1x6 logical array

   1   1   0   1   1   0

Determine if the string fully matches the specified pattern. The output logical array shows which of the strings in txt contain nothing but letters.

matches(txt,pat)
ans = 1x6 logical array

   1   0   0   1   1   0

Count the number of times a pattern matched. The output numerical array shows how many times lettersPattern matched in each element of txt. Note that lettersPattern matches one or more letters so a group of concurrent letters is a single match.

count(txt,pat)
ans = 1×6

     1     2     1     1     1     0

digitsPattern is a typical character-matching pattern that matches digit characters. Create a pattern that matches digit characters.

txt = ["1 fish" "2 fish" "[1,0,0] fish" "[0,0,1] fish"];
pat = digitsPattern;

Use replace to edit pieces of text that match the pattern.

replace(txt,pat,"#")
ans = 1x4 string
    "# fish"    "# fish"    "[#,#,#] fish"    "[#,#,#] fish"

Create a new piece of text by inserting an "!" character after matched letters.

insertAfter(txt,pat,"!")
ans = 1x4 string
    "1! fish"    "2! fish"    "[1!,0!,0!] fish"    "[0!,0!,1!] fish"

Patterns can be created using the OR operator, |, with text. Erase text matched by the specified pattern.

txt = erase(txt,"," | "]" | "[")
txt = 1x4 string
    "1 fish"    "2 fish"    "100 fish"    "001 fish"

Extract pat from the new text.

extract(txt,pat)
ans = 1x4 string
    "1"    "2"    "100"    "001"

Use patterns to count the occurrences of individual characters in a piece of text.

txt = "She sells sea shells by the sea shore.";

Create pat as a pattern object that matches individual letters using alphanumericsPattern. Extract the pattern.

pat = alphanumericsPattern(1);
letters = extract(txt,pat);

Display a histogram of the number of occurrences of each letter.

letters = lower(letters);
letters = categorical(letters);
histogram(letters)

Use maskedPattern to display a variable in place of a complicated pattern expression.

Build a pattern that matches simple arithmetic expressions composed of numbers and arithmetic operators.

mathSymbols = asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
mathSymbols = pattern
  Matching:

    asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)

Build a pattern that matches arithmetic expressions with whitespaces between characters using arithmeticPat.

longExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
longExpressionPat = pattern
  Matching:

    asManyOfPattern(asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1) + whitespacePattern) + asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)

The displayed pattern expression is long and difficult to read. Use maskedPattern to display the variable name, mathSymbols, in place of the pattern expression.

mathSymbols = maskedPattern(mathSymbols);
shortExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
shortExpressionPat = pattern
  Matching:

    asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols

  Show all details

Create a string containing some arithmetic expressions, and then extract the pattern from the text.

txt = "What is the answer to 1 + 1? Oh, I know! 1 + 1 = 2!";
arithmetic = extract(txt,shortExpressionPat)
arithmetic = 2x1 string
    "1 + 1"
    "1 + 1 = 2"

Create a pattern from two named patterns. Naming patterns adds context to the display of the pattern.

Build two patterns: one that matches words that begin and end with the letter D, and one that matches words that begin and end with the letter R.

dWordsPat = letterBoundary + caseInsensitivePattern("d" + lettersPattern + "d") + letterBoundary;
rWordsPat = letterBoundary + caseInsensitivePattern("r" + lettersPattern + "r") + letterBoundary;

Build a pattern using the named patterns that finds a word that starts and ends with D followed by a word that starts and ends with R.

dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat = pattern
  Matching:

    letterBoundary + caseInsensitivePattern("d" + lettersPattern + "d") + letterBoundary + whitespacePattern + letterBoundary + caseInsensitivePattern("r" + lettersPattern + "r") + letterBoundary

This pattern is hard to read and does not convey much information about its purpose. Use namedPattern to designate the patterns as named patterns that display specified names and descriptions in place of the pattern expressions.

dWordsPat = namedPattern(dWordsPat,"dWords", "Words that start and end with D");
rWordsPat = namedPattern(rWordsPat,"rWords", "Words that start and end with R");
dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat = pattern
  Matching:

    dWords + whitespacePattern + rWords

  Using named patterns:

    dWords: Words that start and end with D
    rWords: Words that start and end with R

  Show more details

Create a string and extract the text that matches the pattern.

txt = "Dad, look at the divided river!";
words = extract(txt,dAndRWordsPat)
words = 
"divided river"

Build an easy to read pattern to match email addresses.

Email addresses follow the structure userename@domain.TLD, where userename and domain are made up of identifiers separated by periods. Build a pattern that matches identifiers composed of any combination of alphanumeric characters and "_" characters. Use maskedPattern to name this pattern identifier.

identifier = asManyOfPattern(alphanumericsPattern(1) | "_", 1);
identifier = maskedPattern(identifier);

Build patterns to match domains and subdomains comprised of identifiers. Create a pattern that matches TLDs from a specified list.

subdomain = asManyOfPattern(identifier + ".") + identifier;
domainName = namedPattern(identifier,"domainName");
tld = "com" | "org" | "gov" | "net" | "edu";

Build a pattern for matching the local part of an email, which matches one or more identifiers separated by periods. Build a pattern for matching the domain, TLD, and any potential subdomains by combining the previously defined patterns. Use namedPattern to assign each of these patterns to a named pattern.

username = asManyOfPattern(identifier + ".") + identifier;
domain = optionalPattern(namedPattern(subdomain) + ".") + ...
            domainName + "." + ...
            namedPattern(tld);

Combine all of the patterns into a single pattern expression. Use namedPattern to assign username, domain, and emailPattern to named patterns.

emailAddress = namedPattern(username) + "@" + namedPattern(domain);
emailPattern = namedPattern(emailAddress)
emailPattern = pattern
  Matching emailAddress:

    username + "@" + domain

  Using named patterns:

    emailAddress  : username + "@" + domain
      username    : asManyOfPattern(identifier + ".") + identifier
      domain      : optionalPattern(subdomain + ".") + domainName + "." + tld
        subdomain : asManyOfPattern(identifier + ".") + identifier
        domainName: identifier
        tld       : "com" | "org" | "gov" | "net" | "edu"

  Show all details

Create a string that contains an email address, and then extract the pattern from the text.

txt = "You can reach me by email at John.Smith@department.organization.org";
extract(txt,emailPattern)
ans = 
"John.Smith@department.organization.org"

Named patterns allow dot-indexing in order to access named subpatterns. Use dot-indexing to assign a specific value to the named pattern domain.

emailPattern.emailAddress.domain = "mathworks.com"
emailPattern = pattern
  Matching emailAddress:

    username + "@" + domain

  Using named patterns:

    emailAddress: username + "@" + domain
      username  : asManyOfPattern(identifier + ".") + identifier
      domain    : "mathworks.com"

  Show all details

Introduced in R2020b