extractBetween
Extract substrings between start and end points
Syntax
Description
extracts the substring from newStr
= extractBetween(str
,startPat
,endPat
)str
that occurs between the
substrings startPat
and endPat
. The extracted
substring does not include startPat
and
endPat
.
newStr
is a string array if str
is a string
array. Otherwise, newStr
is a cell array of character
vectors.
If str
is a string array or a cell array of character vectors, then
extractBetween
extracts substrings from each element of
str
.
forces the starts and ends specified in any of the previous syntaxes to be either
inclusive or exclusive. They are inclusive when newStr
= extractBetween(___,'Boundaries',bounds
)bounds
is
'inclusive'
, and exclusive when bounds
is
'exclusive'
. For example,
extractBetween(str,startPat,endPat,'Boundaries','inclusive')
returns startPat
, endPat
, and all the text
between them as newStr
.
Examples
Select Text Between Substrings
Create string arrays and select text that occurs between substrings.
str = "The quick brown fox"
str = "The quick brown fox"
Select the text that occurs between the substrings "quick "
and " fox"
. The extractBetween
function selects the text but does not include "quick "
or " fox"
in the output.
newStr = extractBetween(str,"quick "," fox")
newStr = "brown"
Select substrings from each element of a string array. When you specify different substrings as start and end indicators, they must be contained in a string array or a cell array that is the same size as str
.
str = ["The quick brown fox jumps";"over the lazy dog"]
str = 2x1 string
"The quick brown fox jumps"
"over the lazy dog"
newStr = extractBetween(str,["quick ";"the "],[" fox";" dog"])
newStr = 2x1 string
"brown"
"lazy"
Extract Text Between Tags Using Patterns
Since R2020b
Create a string array of text enclosed by tags.
str = ["<courseName>Calculus I</courseName>"; "<semester>Fall 2020</semester>"; "<schedule>MWF 8:00-8:50</schedule>"]
str = 3x1 string
"<courseName>Calculus I</courseName>"
"<semester>Fall 2020</semester>"
"<schedule>MWF 8:00-8:50</schedule>"
Extract the text enclosed by tags. First create patterns that match any start tag and end tag by using the wildcardPattern
function.
startPat = "<" + wildcardPattern + ">"
startPat = pattern
Matching:
"<" + wildcardPattern + ">"
endPat = "</" + wildcardPattern + ">"
endPat = pattern
Matching:
"</" + wildcardPattern + ">"
Then call the extractBetween
function.
newStr = extractBetween(str,startPat,endPat)
newStr = 3x1 string
"Calculus I"
"Fall 2020"
"MWF 8:00-8:50"
For a list of functions that create pattern objects, see pattern
.
Select Substrings Between Start and End Positions
Create string arrays and select substrings between start and end positions that are specified as numbers.
str = "Edgar Allen Poe"
str = "Edgar Allen Poe"
Select the middle name. Specify the seventh and 11th positions in the string.
newStr = extractBetween(str,7,11)
newStr = "Allen"
Select substrings from each element of a string array. When you specify different start and end positions with numeric arrays, they must be the same size as the input string array.
str = ["Edgar Allen Poe";"Louisa May Alcott"]
str = 2x1 string
"Edgar Allen Poe"
"Louisa May Alcott"
newStr = extractBetween(str,[7;8],[11;10])
newStr = 2x1 string
"Allen"
"May"
Select Text with Inclusive and Exclusive Boundaries
Select text from string arrays with boundaries that are forced to be inclusive or exclusive. extractBetween
includes the boundaries with the selected text when the boundaries are inclusive. extractBetween
does not include the boundaries with the selected text when the boundaries are exclusive.
str1 = "small|medium|large"
str1 = "small|medium|large"
Select the text between sixth and 13th positions, but do not include the characters at those positions.
newStr = extractBetween(str1,6,13,'Boundaries','exclusive')
newStr = "medium"
Select the text between two substrings, and also the substrings themselves.
str2 = "The quick brown fox jumps over the lazy dog"
str2 = "The quick brown fox jumps over the lazy dog"
newStr = extractBetween(str2," brown","jumps",'Boundaries','inclusive')
newStr = " brown fox jumps"
Select Text Between Positions in Character Vector
Create a character vector and select text between start and end positions.
chr = 'mushrooms, peppers, and onions'
chr = 'mushrooms, peppers, and onions'
newChr = extractBetween(chr,12,18)
newChr = 1x1 cell array
{'peppers'}
Select text between substrings.
newChr = extractBetween(chr,'mushrooms, ',', and')
newChr = 1x1 cell array
{'peppers'}
Input Arguments
str
— Input text
string array | character vector | cell array of character vectors
Input text, specified as a string array, character vector, or cell array of character vectors.
startPat
— Text or pattern that marks start position
string array | character vector | cell array of character vectors | pattern
array (since R2020b)
Text or pattern that marks the start position of the text to extract, specified as one of the following:
String array
Character vector
Cell array of character vectors
pattern
array (since R2020b)
If str
is a string array or cell array of character
vectors, then you can extract substrings from every element of
str
. You can specify that the substrings either all
have the same start or have different starts in each element of
str
.
To specify the same start, specify
startPat
as a character vector, string scalar, orpattern
object.To specify different starts, specify
startPat
as a string array, cell array of character vectors, orpattern
array.
Example: extractBetween(str,"AB","YZ")
extracts the
substrings between AB
and YZ
in each
element of str
.
Example: If str
is a
2
-by-1
string array, then
extractBetween(str,["AB";"FG"],["YZ";"ST"])
extracts
the substrings between AB
and YZ
in
str(1)
, and between FG
and
ST
in str(2)
.
endPat
— Text or pattern that marks end position
string array | character vector | cell array of character vectors | pattern
array (since R2020b)
Text or pattern that marks the end position of the text to extract, specified as one of the following:
String array
Character vector
Cell array of character vectors
pattern
array (since R2020b)
If str
is a string array or cell array of character
vectors, then you can extract substrings from every element of
str
. You can specify that the substrings either all
have the same end or have different ends in each element of
str
.
To specify the same end, specify
endPat
as a character vector, string scalar, orpattern
object.To specify different ends, specify
endPat
as a string array, cell array of character vectors, orpattern
array.
Example: extractBetween(str,"AB","YZ")
extracts the
substrings between AB
and YZ
in each
element of str
.
Example: If str
is a
2
-by-1
string array, then
extractBetween(str,["AB";"FG"],["YZ";"ST"])
extracts
the substrings between AB
and YZ
in
str(1)
, and between FG
and
ST
in str(2)
.
startPos
— Start position
numeric array
Start position, specified as a numeric array.
If str
is an array with multiple pieces of text, then
startPos
can be a numeric scalar or numeric array of
the same size as str
.
Example: extractBetween(str,5,9)
extracts the substrings
from the fifth through the ninth positions in each element of
str
.
Example: If str
is a
2
-by-1
string array, then
extractBetween(str,[5;10],[9;21])
extracts the
substring from the fifth through the ninth positions in
str(1)
, and from the 10th through the 21st positions
in str(2)
.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
endPos
— End position
numeric array
End position, specified as a numeric array.
If str
is an array with multiple pieces of text, then
endPos
can be a numeric scalar or numeric array of
the same size as str
.
Example: extractBetween(str,5,9)
extract the substrings
from the fifth through the ninth positions in each element of
str
.
Example: If str
is a
2
-by-1
string array, then
extractBetween(str,[5;10],[9;21])
extracts the
substrings from the fifth through the ninth positions in
str(1)
, and from the 10th through the 21st positions
in str(2)
.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
bounds
— Boundary behavior
'inclusive'
| 'exclusive'
Boundary behavior, specified as 'inclusive'
or
'exclusive'
. When boundary behavior is inclusive the
start and end specified by previous arguments are included in the extracted
text. If boundary behavior is exclusive, then the start and end are not
included.
Output Arguments
newStr
— Output text
string array | cell array of character vectors
Output text, returned as a string array or cell array of character vectors.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations:
Expansion in the first dimension is not supported with tall arrays.
Pattern objects are not supported.
For more information, see Tall Arrays.
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
Distributed Arrays
Partition large arrays across the combined memory of your cluster using Parallel Computing Toolbox™.
Usage notes and limitations:
startPat
andendPat
must be string arrays, character vectors, or cell arrays of character vectors.
For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).
Version History
Introduced in R2016b
See Also
split
| join
| erase
| eraseBetween
| extract
| extractBefore
| extractAfter
| insertAfter
| insertBefore
| replace
| replaceBetween
| strlength
| count
| pattern
| wildcardPattern
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)