How to extract a specific series of strings from a character array

Hi guys,
I have a massive character array (1x187253) that I imported in matlab. Here is a small sample:
NEW SCOMPONENT /JBRC200XT
DESC 'CARBON STEEL ORDINARY REDUCER JIS3452 BWD CON. 350Ax200A'
GTYP REDU
PARA 350 200 355.6 216.3 BWD $
330.2 0
END
NEW SCOMPONENT /JBRC200XV
DESC 'CARBON STEEL ORDINARY REDUCER JIS3452 BWD CON. 350Ax250A'
GTYP REDU
PARA 350 250 355.6 267.4 BWD $
330.2 0
END
What I want to do with this character array is obtain all the 9 letter codes that are located right next to the NEW SCOMPONENT row (e.g. JBRC200XT and JBRC200XV in this case) as well as the characters between the quotes that are located on the DESC line (e.g. 'CARBON STEEL ORDINARY REDUCER... ') and place those side by side on a table in matlab table to be exported in excel.
I know that this should be possible however I have been trying for the last few days being stuck in the first step of even obtaining the codes JBRC200...
Thanks for your help in advance,
KR,
KMT.

 Accepted Answer

Regular expressions are your friends
regexp(text, 'NEW SCOMPONENT\s*\/?(?<component>[\w]{9})\s+DESC\s*''(?<desc>[^'']+)', 'names');
the result is a struct array with two fields - component and desc containing your strings
ans =
struct with fields:
component: 'JBRC200XT'
desc: 'CARBON STEEL ORDINARY REDUCER JIS3452 BWD CON. 350Ax200A'
I tried it on your sample which i duplicated into a giant text file and it works very fast

6 Comments

@Konstantinos, you may want to spend some time understanding the regular expression that TADA has written so that you can be sure it fits the grammar of your files, as for now it's based on your limited example. If the grammar of your files is more complex (for example sometimes there may be something between the component and desc line) the regular expression may need adjusting.
Amazing... I was working with strfind and indexing.. I had a 40 line code and this does the same thing in 2. Thanks, good to know the above.
One more question... How would I extract the numbers from 350 to 330.2 for each SCOMPONENT under specific headings? some times these are 3 digit numbers, some times 2 digits, etc..
I'll leave the details to you, but you can use the \d to match digits and use the curly bracers with comma separated numbers to specify range in the number of repeats.
You can find some decent regex testers online to help with planning the right regex. Some of them also have good modules for learning regex patterns.
You should know that there are slight differences in the features of regex engines between different environments (Matlab, python, js, etc.), but the patterns are identical to the most part.

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!