File Exchange

image thumbnail

Uniform Manifold Approximation and Projection (UMAP)

version 1.3.1 (299 KB) by Stephen Meehan
An algorithm for manifold learning and dimension reduction.

82 Downloads

Updated 09 Oct 2019

View License

The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).

Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration.

This MATLAB implementation follows a very similar structure to the Python implementation, and many of the function descriptions are nearly identical.

Here are some major differences in this MATLAB implementation:
1) All nearest-neighbour searches are performed through the built-in MATLAB function knnsearch.m. The original Python implementation uses random projection trees and nearest-neighbour descent to approximate nearest neighbours of data points. The function knnsearch.m either uses an exhaustive approach or k-d trees, both of which are slow for high-dimensional data. As such, this implementation is slower in the case of large, high-dimensional data sets.

2) The MATLAB function eigs.m does not appear to be as fast as the function "eigsh" in the Python package Scipy. For large data sets, we initialize a low-dimensional transform by binning the data using an algorithm known as probability binning. If the user downloads and installs the function lobpcg.m, made available here (https://www.mathworks.com/matlabcentral/fileexchange/48-locally-optimal-block-preconditioned-conjugate-gradient) by Andrew Knyazev, this can be used to find exact eigenvectors for medium-sized data sets.

3) We have built in the optional ability to detect clusters in the low-dimensional output of UMAP. For users with MATLAB R2019a or later, DBSCAN can be used to produce cluster IDs as explained in the code examples.

Overall, this MATLAB UMAP implementation is slower than the the original Python implementation, which uses Numba to accelerate the calculations. In our example with 136,000 data points and 10 dimensions, our implementation is about 40% slower than the Python implementation.

This implementation is considered a first draft. It has been looked over by Leland McInnes, who considers it "a fairly faithful direct translation of the original Python code (except for the nearest neighbour search)". We hope to continue improving it in the future.

Provided by the Herzenberg Lab at Stanford University.

Cite As

Connor Meehan, Stephen Meehan, and Wayne Moore (2019). Uniform Manifold Approximation and Projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange.

Comments and Ratings (14)

This code is fantastic. Thanks for putting it together. I use it daily.

One error that I've encountered though is in function "smooth_knn_dist" around line 81, reproduced below.

rho = aug_dists(idx) + interpolation*(aug_dists(idx) - aug_dists(idx+height));

Sometimes "idx+height" is out of bounds of "aug_dists". Since "idx" itself is defined to go up to numel(aug_dists), this makes sense that it could go over when added to. I just put in a corrective factor shown below and it seems to work. At the edge case, it interpolates one column inward, rather than outward.

correction = zeros(size(idx));
correction(idx+height>numel(aug_dists)) = -height;
rho = aug_dists(idx) + interpolation*(aug_dists(idx+correction) - aug_dists(idx+height+correction));

jiaxin

错误使用 -
矩阵维度必须一致。

出错 smooth_knn_dist (line 84)
d = distances(:,2:end) - rho;

出错 fuzzy_simplicial_set (line 108)
[sigmas, rhos] = smooth_knn_dist(knn_dists, n_neighbors, local_connectivity);

出错 UMAP/fit (line 420)
U.graph = fuzzy_simplicial_set(X, U.n_neighbors, randomState, U.metric,
'metric_kwds', U.metric_kwds,...

出错 UMAP/fit_transform (line 486)
U = fit(U, X, y);

出错 run_umap (line 495)
reduction = umap.fit_transform(inData);

jiaxin

Thanks for the code, it's been very useful! However, I have tried to reduce the model to a 3-dimensional system, but I come up with this error:

Error using UMAP/validate_parameters (line 303)
The Java and C methods currently only support reducing to 2 dimensions

Error in UMAP/fit (line 358)
validate_parameters(U);

Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);

Error in run_umap (line 441)
reduction = umap.fit_transform(inData);

When is this option going to be available?

John

Hi ageorge and Rasmus,

We've looked into the error that you are both receiving. We realized that one of the MATLAB functions that we call, fit.m, actually requires the MATLAB Curve Fitting Toolbox (https://www.mathworks.com/products/curvefitting.html) and we mistakenly did not list this requirement on the download page. If you do not have the Curve Fitting Toolbox installed, this would explain the errors that you are receiving. We have now listed this requirement on the download page.

As a workaround for MATLAB users who do not have the Curve Fitting Toolbox, we have now hard-coded in values for the outputs of find_ab_params.m when the inputs have particular default inputs. In particular, all the examples in the documentation of run_umap.m should now run in the current version 1.2.1 without any problems for users without the Curve Fitting Toolbox.

Rasmus Bro

Hi there

I am really interested in trying this, but I am also running into problems. I tried your updated version here and the one at your homepage. I get the following error (on matlab 2019a)

[reduction,umap] = run_umap(rand(10,100));

ans =

20

ans =

20

java.awt.Point[x=793,y=53] java.awt.Dimension[width=1146,height=1006]
DUDE [UMAP for 10x100
n\_neighbors=\color{blue}30\color{black}, min\_dist=\color{blue}0.3\color{black}, metric=\color{blue}euclidean\color{black},randomize=\color{blue}0\color{black}, labels=\color{blue}0
Undefined function 'fit' for input arguments of type 'function_handle'.

Error in find_ab_params (line 43)
f = fit(xv', yv', curve);

Error in UMAP/fit (line 352)
[U.a, U.b] = find_ab_params(U.spread, U.min_dist);

Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);

Error in run_umap (line 441)
reduction = umap.fit_transform(inData);

Thank you so much for the code. I wonder whether your set of codes includes re-embedding of new data to old embedding without modifying the old embeddings. Is init_transform relevant to that purpose?

One question: unless you change input parsing, it seems changing the 'n_epochs' are quite inflexible (I changed by myself). Like n_neighbor, for example, it would be great to have it as a free parameter.

ageorge

I'm getting the following error when run:

Undefined function 'fit' for input arguments of type 'function_handle'.

Error in find_ab_params (line 43)
f = fit(xv', yv', curve);

Error in UMAP/fit (line 352)
[U.a, U.b] = find_ab_params(U.spread, U.min_dist);

Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);

Error in run_umap (line 441)
reduction = umap.fit_transform(inData);

Lucy Davis

Hi Joanna,
Sorry for the delayed response to your issue. We have just uploaded a major update (version 1.2.0) that may resolve the issue, so try downloading the latest version and seeing if it is fixed! What was previously line 273 in version 1.1.0 should now be line 391 in 1.2.0.
If you are still receiving an error, would you mind sending us the full text of the exception so that we can investigate it further? We are having trouble reproducing the error. You can e-mail it to us at swmeehan@stanford.edu or connor.meehan@shaw.ca.
If you require a temporary workaround, we recommend downloading our UMAP distribution directly from our Web site at http://cgworkspace.cytogenie.org/GetDown2/demo/umapDistribution.zip. We are able to include some additional code in this distribution that does not meet File Exchange criteria. If an exception occurs with this version, it will switch to running the algorithm in C instead.

I got the same error as Damon. Line 273: nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;
How to deal with it?
Thanks, Joanna

How much slower is it than the python implementation?

Damon Clark

Thanks for putting this together! Line 273 in run_umap throws an error for me -- it looks like it may be calling some local variable. I believe I had everything in the path correctly.

nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;

Updates

1.3.1

-Fixed a GUI bug that would occur for users with MATLAB R2018b or earlier

1.3.0

-Data can now be reduced to any number of dimensions by changing the 'n_components' parameter; if reducing to more than 2 dimensions, a 3D plot is shown
-DBSCAN can be used to cluster UMAP output
-The 'n_epochs' parameter can now be manually changed

1.2.1

-Added precomputed parameter values for users without the Curve Fitting Toolbox
-Fixed an issue when using transform() on new data sets of same size of previous embedding and improved adjacency matrix for transform()
-Improved progress bars

1.2.0

-Added 2 examples (run_umap.m) showing how to perform supervised dimension reduction with UMAP
-Improved labelling of plots; for supervised UMAP, the plot includes a legend with labels from the categorical data
-Explained proper MATLAB path settings

MATLAB Release Compatibility
Created with R2019b
Compatible with R2017a to R2019b
Platform Compatibility
Windows macOS Linux