Is there any concept like dictionary or hash tables in matlab like in Python?

 Accepted Answer

More Answers (4)

Yes.
c = containers.Map
c('foo') = 1
c(' not a var name ') = 2
keys(c)
values(c)
struct
Works exactly the way I needed

6 Comments

Maybe struct don't scalates well. I think struct params are stored in "creation" order and not in "name" order. Maps use order in order to obtain the values faster with search algoristhm. I don't think struct does this at all.
Anyway ty for your proposal. In my case I don't really need a map and struct is enough good (and I didn't realized).
We do not know how struct is processed internally at JIT time. Constant field names for a struct level not assigned to could be optimized to constant offsets until the fieldnames were altered, and then recached. Dynamic field names we just don't know. Hypothetically matlab could create a perfect hash.
This could probably be timing tested somehow.
Timing tests, see enclosed code.
The timings for the smaller structure, s2, are notably faster than for the bigger structure. This establishes that a perfect hash is not being used: a perfect hash would have constant access time regardless of the size of the structure.
The accesses for dynamic names.
Zeroth mean in output is measuring overhead of the algorithms: the structures are passed in and values are calculated, but without any reference to the structures.
First mean in output is measuring cost of numerous mostly-different dynamic field names. WIth the s2 (smaller structure) group being measurably faster compared to, we can see that dynamic field names cost more for larger structures.
Second, third, fourth means deal with the cost of accessing constant field names that are at various offsets into the structures. Fourth (furthest in the structure) is a little slower than third (40% into the structure) or second (beginning of the structure). The differences are not as high as one might expect, and on some of my tests, third (40% through) was faster than second (first), so there is some questions about statistical significance here. Fifth (all the way through) was always slower than first, though, in my tests.
Fifth is testing constant dynamic field name. With the larger structure (first group) it is consistently faster than random field names, but still much slower than constant field names. This tells us that the implementation is doing some kind of internal caching of structure name use.
The dynamic fieldname (test 1, 5) accesses are on the order of 10 times slower than the static fieldname accesses (test 2, 3, 4). This tells us that some kind of compilation is going on. At the same time, the fact that access to the last field (test 4) is slower than to the first field (test 2) shows that it is not compiled to constant offsets.
At the moment, I do not have a model implementation in mind.
I revised my earlier code of https://www.mathworks.com/matlabcentral/answers/113258-is-there-any-concept-like-dictionary-or-hash-tables-in-matlab-like-in-python#comment_915892 to compare timing for struct and the new dictionary. I also improved the code make sure that we were comparing the same thing.
In the previous version of the code, the reason that some operations were faster turned out to be because I was not looking of the field name, but looking up the field name was taking a notable amount of time. So now I have all of the tests do the same field-name lookup -- even if they then ignore the looked-up name and use a constant name: the point is to do fair tests on how fast the datastructure is, not tests on how fast the testing function is.
In the below, "no operation" involves passing in the data structure (struct or dictionary), and pulling out the field name (as per above, to be fair timing), but then returning a constant value that does not use the data structure -- so "no operation" is testing overhead.
"1000 random" involves 1000 different random field names, a complete random access to the data structure. This would generally be expected to be the slowest operation -- however if the implementation is poor then potentially accessing the last field each time could be slower.
"first field" involves always accessing the first field or key in the data structure, using hard-coded name, which would be the best case scenario. (Again a field name is pulled out anyhow even if ignored, so that timing is fair.)
"middle field" involves pulling out a field/key in the middle -- 2000'th field out of 5000 -- using a hard-coded field name.
"last field" involves pulling out the last field/key, using a hard-coded field name. Potentially this would be the slowest operation, if the implementation does a linear search through the field names each time.
"fixed dynamic" selects the second last field of the data structure, but does so using dynamic field names rather than using hard-coded field names. Potentially MATLAB might mostly optimize away a hardcoded field access inside a loop -- but there is also the possibility that the implementation just has more overhead for dynamic names than for fixed names (as the case for table objects.)
The s2 series repeats the test but for a very small data structure. If you compare the timing between the full structure and the small structure, that gives you informaton about whether the operation is "constant time".
The series of tests is then repeated for a dictionary() with the same contents, and a small dictionary as well.
If you examine the times for dictionary first / middle / last then those are practically constant regardless of dictionary size, suggesting that a hash really is being used. On the other hand, timing for dynamic field names is measurably slower for dictionaries.
If you examine the timings for structure vs dictionary, you will see that even for larger structure, that the fixed-field access is several times faster than dictionary access. But dynamic field access for struct is slower than random field access for dictionaries.
Since timing for access to small structure is faster than access to large structures, we can predict that if the structure had enough fields (maybe 15000-ish??) then even with hard-coded field access, eventually dictionary lookup would start to be faster -- but that certainly in the 5000-ish field range and smaller, that if you have hard-coded field names then use struct rather than dictionary, since the constant-time overhead for dictionary is comparatively high.
And if you are doing dynamic field name access, then even with small data structures, dictionary can be faster, at least if you are going to repeat the operation enough times.
time_struct_and_dict();
no operation: mean = 0.00004839, med = 0.00004435, std = 0.00001725 1000 random: mean = 0.00081360, med = 0.00080943, std = 0.00001581 first field: mean = 0.00013630, med = 0.00013404, std = 0.00000560 middle field: mean = 0.00014403, med = 0.00014154, std = 0.00000650 last field: mean = 0.00014731, med = 0.00014519, std = 0.00000626 fixed dynamic: mean = 0.00071298, med = 0.00071093, std = 0.00000808 s2 no operation: mean = 0.00004321, med = 0.00004313, std = 0.00000035 s2 1000 random: mean = 0.00064086, med = 0.00064068, std = 0.00000578 s2 first field: mean = 0.00007192, med = 0.00007205, std = 0.00000050 s2 middle field: mean = 0.00007010, med = 0.00007005, std = 0.00000044 s2 last field: mean = 0.00006793, med = 0.00006785, std = 0.00000062 s2 fixed dynamic: mean = 0.00064101, med = 0.00064093, std = 0.00000443 dictionary no operation: mean = 0.00005415, med = 0.00004619, std = 0.00001498 dictionary 1000 random: mean = 0.00049794, med = 0.00050577, std = 0.00015563 dictionary first field: mean = 0.00028254, med = 0.00027996, std = 0.00000575 dictionary middle field: mean = 0.00028808, med = 0.00028746, std = 0.00000601 dictionary last field: mean = 0.00028021, med = 0.00027921, std = 0.00000599 dictionary fixed dynamic: mean = 0.00036190, med = 0.00031971, std = 0.00013159 dictionary2 no operation: mean = 0.00004485, med = 0.00004485, std = 0.00000028 dictionary2 1000 random: mean = 0.00029856, med = 0.00029809, std = 0.00000323 dictionary2 first field: mean = 0.00030205, med = 0.00028334, std = 0.00008003 dictionary2 middle field: mean = 0.00027994, med = 0.00027946, std = 0.00000361 dictionary2 last field: mean = 0.00028110, med = 0.00027959, std = 0.00000491 dictionary2 fixed dynamic: mean = 0.00028691, med = 0.00028546, std = 0.00000755
time_struct_dict_table();
I added support for timing table() accesses. This is for the case where each table variable contains a single scalar -- not the best use of table() but it is interesting to compare performance.
table() is enough slower that I cannot execute here on Answers as it exceeds the time limit. The results on my desktop are:
no operation: mean = 0.00007099, med = 0.00006009, std = 0.00003329
1000 random: mean = 0.00122110, med = 0.00123992, std = 0.00010826
first field: mean = 0.00024605, med = 0.00022060, std = 0.00006604
middle field: mean = 0.00035495, med = 0.00033114, std = 0.00015423
last field: mean = 0.00027000, med = 0.00024307, std = 0.00008898
fixed dynamic: mean = 0.00143618, med = 0.00148110, std = 0.00027858
s2 no operation: mean = 0.00011520, med = 0.00011546, std = 0.00000465
s2 1000 random: mean = 0.00146808, med = 0.00140250, std = 0.00024633
s2 first field: mean = 0.00023687, med = 0.00021517, std = 0.00005892
s2 middle field: mean = 0.00012845, med = 0.00009055, std = 0.00005018
s2 last field: mean = 0.00012263, med = 0.00010719, std = 0.00004112
s2 fixed dynamic: mean = 0.00091966, med = 0.00091064, std = 0.00010529
dictionary no operation: mean = 0.00006795, med = 0.00005281, std = 0.00002384
dictionary 1000 random: mean = 0.00053217, med = 0.00052016, std = 0.00004246
dictionary first field: mean = 0.00046640, med = 0.00046126, std = 0.00002666
dictionary middle field: mean = 0.00044165, med = 0.00043643, std = 0.00001592
dictionary last field: mean = 0.00043974, med = 0.00043391, std = 0.00001467
dictionary fixed dynamic: mean = 0.00050596, med = 0.00048529, std = 0.00004669
dictionary2 no operation: mean = 0.00005400, med = 0.00005221, std = 0.00000597
dictionary2 1000 random: mean = 0.00045773, med = 0.00044854, std = 0.00002464
dictionary2 first field: mean = 0.00050153, med = 0.00049391, std = 0.00005518
dictionary2 middle field: mean = 0.00044765, med = 0.00043619, std = 0.00002839
dictionary2 last field: mean = 0.00046707, med = 0.00043328, std = 0.00007996
dictionary2 fixed dynamic: mean = 0.00045607, med = 0.00045069, std = 0.00002115
table no operation: mean = 0.00005341, med = 0.00005138, std = 0.00000502
table 1000 random: mean = 0.09879826, med = 0.09191914, std = 0.02014155
table first field: mean = 0.07515565, med = 0.06741315, std = 0.02106712
table middle field: mean = 0.08363042, med = 0.06921861, std = 0.03160464
table last field: mean = 0.06500776, med = 0.05959190, std = 0.01808197
table fixed dynamic: mean = 0.08840865, med = 0.09286350, std = 0.00894378
table2 no operation: mean = 0.00005393, med = 0.00005415, std = 0.00000115
table2 1000 random: mean = 0.03217866, med = 0.03195556, std = 0.00051499
table2 first field: mean = 0.01194878, med = 0.01183239, std = 0.00052255
table2 middle field: mean = 0.01216132, med = 0.01192055, std = 0.00058499
table2 last field: mean = 0.01306926, med = 0.01159898, std = 0.00409964
table2 fixed dynamic: mean = 0.03837043, med = 0.03305997, std = 0.01260309
Those table() results are pretty poor compared to struct or dictionary! 75 to 175 times slower !
It appears that in the meantime, speeds have pretty much doubled.
Unless, that is, I happened to execute the code before on my older iMac ?
R2024b Pre-release timing on my intel iMac:
no operation: mean = 0.00003648, med = 0.00003586, std = 0.00000213
1000 random: mean = 0.00053238, med = 0.00053929, std = 0.00001971
first field: mean = 0.00012308, med = 0.00012347, std = 0.00000589
middle field: mean = 0.00012308, med = 0.00012172, std = 0.00000508
last field: mean = 0.00012482, med = 0.00012419, std = 0.00000332
fixed dynamic: mean = 0.00043523, med = 0.00043637, std = 0.00000451
s2 no operation: mean = 0.00003281, med = 0.00003290, std = 0.00000043
s2 1000 random: mean = 0.00038221, med = 0.00038425, std = 0.00001155
s2 first field: mean = 0.00006217, med = 0.00006207, std = 0.00000052
s2 middle field: mean = 0.00005924, med = 0.00005918, std = 0.00000085
s2 last field: mean = 0.00005553, med = 0.00005549, std = 0.00000101
s2 fixed dynamic: mean = 0.00037049, med = 0.00036893, std = 0.00000855
dictionary no operation: mean = 0.00003666, med = 0.00003632, std = 0.00000189
dictionary 1000 random: mean = 0.00056722, med = 0.00055974, std = 0.00001518
dictionary first field: mean = 0.00052580, med = 0.00051858, std = 0.00001562
dictionary middle field: mean = 0.00052420, med = 0.00051482, std = 0.00001838
dictionary last field: mean = 0.00050794, med = 0.00050233, std = 0.00001180
dictionary fixed dynamic: mean = 0.00057878, med = 0.00057393, std = 0.00001463
dictionary2 no operation: mean = 0.00003649, med = 0.00003652, std = 0.00000030
dictionary2 1000 random: mean = 0.00051898, med = 0.00051190, std = 0.00001326
dictionary2 first field: mean = 0.00055092, med = 0.00053418, std = 0.00008305
dictionary2 middle field: mean = 0.00051139, med = 0.00051072, std = 0.00000163
dictionary2 last field: mean = 0.00052065, med = 0.00051705, std = 0.00001230
dictionary2 fixed dynamic: mean = 0.00052063, med = 0.00051347, std = 0.00001601
table no operation: mean = 0.00003769, med = 0.00003737, std = 0.00000212
table 1000 random: mean = 0.04622849, med = 0.04621893, std = 0.00149274
table first field: mean = 0.04062485, med = 0.04034006, std = 0.00122831
table middle field: mean = 0.04080905, med = 0.04076088, std = 0.00183228
table last field: mean = 0.03929389, med = 0.03886761, std = 0.00162468
table fixed dynamic: mean = 0.04706765, med = 0.04702064, std = 0.00166777
table2 no operation: mean = 0.00003161, med = 0.00003153, std = 0.00000066
table2 1000 random: mean = 0.00463795, med = 0.00463153, std = 0.00006766
table2 first field: mean = 0.00166517, med = 0.00165833, std = 0.00002508
table2 middle field: mean = 0.00161921, med = 0.00162277, std = 0.00003672
table2 last field: mean = 0.00167462, med = 0.00167515, std = 0.00002166
table2 fixed dynamic: mean = 0.00462260, med = 0.00460212, std = 0.00009719

Sign in to comment.

struct seems to produce a much nicer text output than container.Map:
% Example with struct
settings = struct();
settings.open_loop = false;
settings.adaptive = true;
settings.estimator = 'RLSFF';
if settings.open_loop
do something...
end
>> disp(settings)
open_loop: 0
adaptive: 1
estimator: 'RLSFF'
>> settings
settings =
struct with fields:
open_loop: 0
adaptive: 1
estimator: 'RLSFF'
% Example with Map
settings2 = containers.Map;
settings2('open_loop') = false;
settings2('adaptive') = true;
settings2('estimator') = 'RLSFF';
if settings2('open_loop')
do something...
end
>> settings2
settings2 =
Map with properties:
Count: 3
KeyType: char
ValueType: any
>> disp(settings)
open_loop: 0
adaptive: 1
estimator: 'RLSFF'
Although they look identical when returning at the command line.
But is there a literal representation for a struct? So you can define it in code more concisely, something like this:
settings = struct(
'open_loop': false,
'adaptive': true,
'estimator': 'RLSFF'
);
(The above is not valid of course). Or is there some other way of doing this in a readable convenient way?

1 Comment

Your example doesn't work. Unfortunately, you need to add the line continuation syntax:
settings = struct(...
'open_loop', false, ...
'adaptive', true, ...
'estimator', 'RLSFF' ...
);

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!