Get all used variable names from a script
Show older comments
As in the check "Check usage of restricted variable names" I want to check the names of variables used in a script, only against our more explicit naming conventions. But using symvar also returns keywords like "function", "if" or "end" and also, what is much worse, any word found in comments and even "-delimited strings. Is there any function that can return me all variable names used in a script file or string, but nothing else?
Or to be a bit more precise, as Stephen Cobeldick correctly hinted to the dynamic execution nature of scripting languages: variable names, that are explicitly used in a function header as input or output variables (not varargin, varargout), and variable names explicitly used as left hand arguments in assignments like a = <some expression> or [a, b] = <expression>. That certainly would be sufficient, as the execution context here is eml, so apart from local variables data flow is pretty much under control with signal i/o and data store memory requiring registration as Stateflow.Data objects.
1 Comment
"Is there any function that can return me all variable names used in a script file or string, but nothing else?"
No.
Variables can be created dynamically, even by functions called from your script/function (or functions that they call...). Function scope can also change dynamically, so which functions get called can also change (or even deciding if something is a function or a variable). Only actually running the code can resolve this stack: static code analysis is not sufficient.
It might be possible to provide an "estimate" based on static code analysis, but on the understanding that it can diverge from what variables are "used" when the code is actually run.
Answers (1)
It is hard to parse the code exhaustively for names of variables:
- Mask strings and char's. This is not trivial:
'"asd"', '"asd', "'asd'", "'asd", "asd"', 'asd''', ...
- Recognize and remove comments. This inlcudes block comments between %{ and %} as well as "..." .
- Distinguish the creation of indexed variables from function calls:
f(1);
f(1) = 0;
v = f(1);
v = f ...
(1);
- Cope with eval, evalin, assignin
- If you are talking of scripts instead of functions, it is hard to identify if sum(1:5) means the built-in function or if another script has redefined sum as avariable before.
Maybe the best is to run the code and update a list of variables after each line of code:
function Out = TrackVariables(mFile, Data)
% USAGE:
% If you really want a hardcore debugging:
% 1. TrackVariables('D:\MatlabCodes\yourFcn.m')
% This injects a DBSTOP in each line of the code, which calls the
% function TrackVariables with the output of WHOS as 2nd input.
% You can do this for multiple functions at the same time.
% 2. Call yourFcn() or the main routine.
% After each line the output of WHOS is forwarded to TrackVariables and
% the names are stored persistently. If you want, you can expand this
% to store the sizes or types of the variables also.
% 3. Request the collected data by:
% List = TrackVariables();
% 4. Clean up brutally:
% dbclear all
%
% This is NOT a recommendation for using this function to control the
% quality of code, but a brute hack only. If you can identify a
% miss-spelled variable, it was useful.
% Advantage: It tracks even the evil dynamic creation of variables.
% Limitations: The code execution is slowed down. It tracks only branches
% of the code, which actually run, so this might remain invisible:
% if rand < 0.001; KILLER = 17; end
%
% Use MLINT for a smart code analysis.
%
% (C) 2021, Jan, Heidelberg, License: CC BY-SA 3.0
persistent List
if isempty(List)
List = struct();
end
switch nargin
case 1 % Inject a dbstop in each line:
[~, mName] = fileparts(mFile);
Cmd = sprintf('TrackVariables(''%s'', whos)', mName);
Str = strsplit(fileread(mFile), '\n');
for k = 1:numel(Str)
if ~isempty(Str{k})
dbstop('in', mName, 'at', sprintf('%d', k), 'if', Cmd)
end
end
List.(mName) = {};
case 2 % Called for collecting variables:
List.(mFile) = unique(cat(2, List.(mFile), {data.name}));
Out = false; % Do not stop the debugger
case 0 % Flush the list:
Out = List;
List = [];
end
end
Call this as:
TrackVariables('YourFunc.m');
YourFunc % Or the main program
List = TrackVariables;
This does not consider, if the variable is created in subfunctions or nested functions.
I do not trust such meta-programming techniques. Exhaustive unit-testing is more powerful. Most of all, avoid scripts, if you need reliable code.
6 Comments
Jan
on 7 May 2021
A simple example:
% data-mat file contains variable a == 1 only
% Script file:
load data
function1();
run('script2');
b = a(1) * 3;
disp(b) % What do you get?
function function1
assignin('caller', 'a', @(x) 2)
end
% 2nd script file:
b = a(1) * 3;
Scripts and the dynamic creation of variables are a shot in your knee. There are no magic tools to parse this without running the code. So controlling the naming conventions for scripts is not reliable, while using functions is safe, secure and helps to write efficient code. The possibility of unit testing and re-using established functions is too valuable to work with scripts.
Robert
on 7 May 2021
Jan
on 7 May 2021
I have a function, which is more powerful than SYMVARs, but it depends on multiple further functions for masking Strings, CHARs and comments at first. It is not bullet-proof, e.g. if the indexing is separated by a ... from the variable. Distinguishing function calls from variables with a static text analysis is not reliable also. It fails to handle this correctly: a = cos(1); cos = 2 . Therefore I hesitate to publish the code. Fragile methods are not sufficient to control code stability.
Jan
on 8 May 2021
I meant a parser, which I have written as M-function.
Categories
Find more on MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!