Split a cell array of character vectors at multiple-number delimiter

7 views (last 30 days)
Hello Everyone,
I hope that you're doing well, and your loved ones as well.
I wanted to ask you for help, after many trials. The issue is that I'va data on construction types for bridges uploaded from an Excel spreadsheet, and their data type is CELL ARRAY OF CHARACTER VECTORS.
Below, I jotted down a few example lines of these data:
BEAMS-PREFLEX WITH SLAB-RC 4 NO SPANS
ARCH-MASONRY 1 NO STONE (LIMESTONE SADDLED 1924)
ARCH-MASONRY 2 NO SPAN WIDENED WITH SLAB-R.C. 2 NO SPAN
BOX-R.C. 1 NO SPAN
We can see that they've a typical structure, namely the construction system, followed by the superstructure material, and then by the number of spans. However, these different parameters should stay into separated cells.
I'm a beginner of analysing text with computing techiniqeus, so to use the built-in function 'split', I converted the data type into STRING, and then defined a pattern of the number of spans, namely string(1 2 3 4 ...). However, I couldn't save the split text into the variable spl because the for loop doesn't work, and the results obtained from the split function changes for each data line.
For example, for the first dataline I retrieve:
spl, 2x1 string, that is:
BEAMS-PREFLEX WITH SLAB-R.C.
NO SPANS
Then, for the second dataline I retrieve:
spl, 6x1 string, that is:
ARCH-MASONRY
NO SPAN (LIMESTONE SADDLED)
3 WHITE SPACES
)
I also attached my simple code for any advice.
Thanks for your time and consideration, I look forward to hearing from you.
Best.
% T2 is the data table
% T2.ConstructionType = string(T2.ConstructionType)
% str = T2{:, 6}; %T2.ConstructionType is the sixth column of T2
% noOfSpans = [1:1:20]
% pat = string(noOfSpans)
% spl = []; %Initialisation of variable for the split results
% for i = 1:1:length(str)
% spl = split(str(i), pat)
% end

Accepted Answer

Stephen23
Stephen23 on 24 Sep 2021
Edited: Stephen23 on 24 Sep 2021
Rather than telling us what you currently get, it is probably more useful if you tell us what you want.
I made some guesses about how that text is formatted:
str = ["BEAMS-PREFLEX WITH SLAB-RC 4 NO SPANS";...
"ARCH-MASONRY 1 NO STONE (LIMESTONE SADDLED 1924)";...
"ARCH-MASONRY 2 NO SPAN WIDENED WITH SLAB-R.C. 2 NO SPAN";...
"BOX-R.C. 1 NO SPAN"];
tkn = regexp(str,'^(\w+)-(\D+?)\s*(\d+)\s*(.*)','tokens','once');
tkn = vertcat(tkn{:})
tkn = 4×4 string array
"BEAMS" "PREFLEX WITH SLAB-RC" "4" "NO SPANS" "ARCH" "MASONRY" "1" "NO STONE (LIMESTONE SADDLED 1924)" "ARCH" "MASONRY" "2" "NO SPAN WIDENED WITH SLAB-R.C. 2 NO SPAN" "BOX" "R.C." "1" "NO SPAN"
  1 Comment
Giuseppe Degan Di Dieco
Giuseppe Degan Di Dieco on 27 Sep 2021
Dear Stephen,
thanks for your time and help. Much appreciated, and hope it will be useful to the whole Community.
Sorry for my ingenuity, I am new in asking for help in threads, and a beginner in string formatting.
You guessed quite right, the desiderable output would be:
"BEAMS" "PREFLEX" "SLAB" "RC" "4 NO SPANS"
"ARCH" "MASONRY" "1 NO SPAN" "(LIMESTONE SADDLED 1924)"
"ARCH" "MASONRY" "2 NO SPAN" "WIDENED WITH SLAB-R.C." "2 NO SPAN"
"BOX" "R.C." "1 NO SPAN"
This is needed because the above info are related to different parameters, for example:
-BEAMS refers to the superstructure construction system;
-PREFLEX refers to the superstructure construction material, in this case a composite one;
-ARCH refers to the superstructure construction system;
-MASONRY refers to the superstructure construction material;
-4 NO SPANS refers to the number of spans;
and they must be separated in order to derive descriptive statistics tasks.
I'm afraid that the data are a mess, but this happen quite often with real-file collected data.
I'll work on your suggested solution, and if you don't mind, let you know the developments.
Best.

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!