Function ecdf break down for large datasets
4 views (last 30 days)
Show older comments
Hi,
I have a very large vector x (around 130 million elements). When I try to find the empirical cumulative distribution function of the values from that vector using MATLAB's command "ecdf(x)" the function breaks down. Its plot shows the ECDF for only the smaller values of x and doesn't even exist for bigger values of x. When I try to run the ecdf command on only a part of the vector (say 10 million elements), the results seem OK. Does anyone know what could be wrong with the ecdf function so that it breaks down in this manner for very large datasets?
Thank you very much for you help.
Martin
Answers (1)
Mathieu Boutin
on 8 Sep 2011
Hi Martin. You could try my new homemade function and see if it works fine:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [v_f,v_x] = homemade_ecdf(v_data)
nb_data = numel(v_data);
v_sorted_data = sort(v_data);
v_unique_data = unique(v_data);
nb_unique_data = numel(v_unique_data);
v_data_ecdf = zeros(1,nb_unique_data);
for index = 1:nb_unique_data
current_data = v_unique_data(index);
v_data_ecdf(index) = sum(v_sorted_data <= current_data)/nb_data;
end
v_x = [v_unique_data(1) v_unique_data];
v_f = [0 v_data_ecdf];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!