Main Content

splitSentences

Split text into sentences

Description

newStr = splitSentences(str) splits str into an array of sentences.

example

newDocuments = splitSentences(document) splits a single tokenizedDocument object into a tokenizedDocument array of sentences.

Examples

collapse all

Read the text from the example file sonnets.txt and split it into sentences.

filename = "sonnets.txt";
str = extractFileText(filename);
sentences = splitSentences(str);

View the first few sentences.

sentences(1:10)
ans = 10×1 string
    "THE SONNETS"
    "by William Shakespeare"
    "I"
    "From fairest creatures we desire increase,↵That thereby beauty's rose might never die,↵But as the riper should by time decease,↵His tender heir might bear his memory:↵But thou, contracted to thine own bright eyes,↵Feed'st thy light's flame with self-substantial fuel,↵Making a famine where abundance lies,↵Thy self thy foe, to thy sweet self too cruel:↵Thou that art now the world's fresh ornament,↵And only herald to the gaudy spring,↵Within thine own bud buriest thy content,↵And tender churl mak'st waste in niggarding:↵Pity the world, or else this glutton be,↵To eat the world's due, by the grave and thee."
    "II"
    "When forty winters shall besiege thy brow,↵And dig deep trenches in thy beauty's field,↵Thy youth's proud livery so gazed on now,↵Will be a tatter'd weed of small worth held:↵Then being asked, where all thy beauty lies,↵Where all the treasure of thy lusty days;↵To say, within thine own deep sunken eyes,↵Were an all-eating shame, and thriftless praise."
    "How much more praise deserv'd thy beauty's use,↵If thou couldst answer 'This fair child of mine↵Shall sum my count, and make my old excuse,'↵Proving his beauty by succession thine!"
    "This were to be new made when thou art old,↵And see thy blood warm when thou feel'st it cold."
    "III"
    "Look in thy glass and tell the face thou viewest↵Now is the time that face should form another;↵Whose fresh repair if now thou not renewest,↵Thou dost beguile the world, unbless some mother."

Input Arguments

collapse all

Input text, specified as a string scalar, a character vector, or a scalar cell array containing a character vector.

Data Types: string | char | cell

Input document, specified as a scalar tokenizedDocument object.

Output Arguments

collapse all

Output text, returned as a string array or cell array of character vectors.

If str is a string, then newStr is a string. Otherwise, newStr is a cell array of character vectors.

Data Types: string | cell

Output documents, returned as a tokenizedDocument array.

Algorithms

If emoticons or emoji characters appear after a terminating punctuation character, then the function splits the sentence after the emoticons and emoji.

Version History

Introduced in R2018a