a philosophical question regarding function (how many tasks should be wrapped inside a function?)

4 views (last 30 days)
My data have two types of tables. They share common columns, and the second type has extra columns. I couldn't decide whether if I should write two separate functions to process the tables? Or should I have the extra columns processed outside one common function? I personally prefer the one-function approach. Below are toy examples to show what my question is about. What would be the best practice? What are the criteria to choose it? Thanks for any thoughts you like to share.
Two types of tables
T = cell2table({'matlab 101'; 'C++ 202'}, "VariableNames", "u");
T2 = cell2table({'algebra 101', 'jack 001'; 'calculus 202', 'jill 002'}, "VariableNames", ["u", "v"]);
Coding Phylosophy I. One function processes common tasks while dealing with extra columns outside the function. What I don't like about this approach is that it doesn't look as clean.
sp = func(T.u);
desired_output1 = cell2table(sp, "VariableNames", ["course", "level"])
desired_output1 = 2×2 table
course level __________ _______ {'matlab'} {'101'} {'C++' } {'202'}
spu = func(T2.u);
spv = func(T2.v);
desired_output2 = cell2table([spu, spv], "VariableNames", ["course", "level", "teacher", "id"])
desired_output2 = 2×4 table
course level teacher id ____________ _______ ________ _______ {'algebra' } {'101'} {'jack'} {'001'} {'calculus'} {'202'} {'jill'} {'002'}
Coding Phylosophy II. Two separate functions handle two slightly diffrent tables separately. What I don't like about this approach is that the functions must know the tables' variable names. If the variable names are to be changed, the functions must be rewritten.
desired_output1 = gunc(T)
desired_output1 = 2×2 table
course level __________ _______ {'matlab'} {'101'} {'C++' } {'202'}
desired_output2 = hunc(T2)
desired_output2 = 2×4 table
course level teacher id ____________ _______ ________ _______ {'algebra' } {'101'} {'jack'} {'001'} {'calculus'} {'202'} {'jill'} {'002'}
functions:
function sp = func(C)
sp = split(C);
% There are other processings here. For the sake of demo simplicity, they are
% ommited.
end
function out = gunc(T)
sp = split(T.u);
out = cell2table(sp, "VariableNames", ["course", "level"]);
end
function out = hunc(T)
spu = split(T.u);
spv = split(T.v);
out = cell2table([spu, spv], "VariableNames", ["course", "level", "teacher", "id"]);
end

Accepted Answer

Ive J
Ive J on 21 Jun 2023
Assuming you're gonna stick with those variable names, you can do somethig like this:
T1 = array2table(["matlab 101"; "C++ 202"], "VariableNames", "u");
T2 = array2table(["algebra 101", "jack 001"; "calculus 202", "jill 002"], "VariableNames", ["u", "v"]);
function out = tableSplitter(tab)
cols = string(tab.Properties.VariableNames);
if width(tab) > 2 || ~any(ismember(cols, ["u", "v"]))
error("input table must have max two columns with var names of u/[v]!")
end
out = splitvars(varfun(@split, tab));
if width(tab) == 1
out.Properties.VariableNames = ["course", "level"];
else
out.Properties.VariableNames = ["course", "level", "teacher", "id"];
end
end
  4 Comments
Simon
Simon on 23 Jun 2023
I didn't exactly know what I mean with that statement. I guess I mean in the if-else condition, the 'if' part and the 'else' part have different input formats and different output formats. So it seems to put two functions together with if-else statement. I don't completely avoid using conditional statements; just that I think it would be better for the condition to handle homogeneous data, such as if (x is positive), if ( year is before 2020), if (diagnosis is negative), ... etc. I do appreciate your codes and thoughts!
Simon
Simon on 4 Jul 2023
@Ive J I see now some benefitis your idea has. Using conditional statement to wrap related processes in a function does have its advantage. Your answer is accepted.

Sign in to comment.

More Answers (0)

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!