Start with HeatGeo

The Heat-Geodesic embedding preserves the heat-geodesic dissimilarity defined as \[ d_t(x_i,x_j) = \bigg[ -4t \log (\mathbf{H}_t)_{ij} - \sigma 4 t \log(\mathbf{V})_{ij} \bigg] ^{1/2}, \] where \(\mathbf{H}_t\) is a heat kernel on a graph, and \(\mathbf{V}\) is a volume regularization term. This dissimilarity is inspired by Varadhan’s formula which relates the heat kernel to the geodesic distance on a manifold. For more details on the heat-geodesic dissimilarity read our preprint A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction.

Note

We are currently updating this repository to provide examples and improve the documentation.

Install

The package is available on PyPI, you can install it by running

pip install heatgeo

To reproduce the results in experiments/ or try the embeddings with different graph constructions, you need additional packages that can be installed via the development version. In this case run

pip install heatgeo['dev']

We provide an example below.

How to use

To create the embedding of a dataset data, run

from heatgeo.embedding import HeatGeo
emb_op = HeatGeo(knn=5)
emb = emb_op.fit_transform(data)

We provide a Google colab example on the swiss roll Run in Google Colab

The directory experiments contains code to reproduce our main results. We used hydra, the parameters can be changed in config or directly in the CLI. In notebooks, we provide examples on toy datasets.

Contributing

We are using nbdev for this package and the documentation. See this introduction to start using nbdev. The code and documentation should be modified in the notebooks nbs/, then run nbdev_prepare before a commit. This command will export the notebooks to .py files in heatgeo, it will also clean the metadata, and run some test. The page will then automatically be deployed through GitHub actions.

Acknowledgements

This repository is a simplified version of a larger codebase used for development. It loses the original commit history which contains contributions from other authors of the paper. This repository uses or modify code from the PHATE implementation, and the Chebychev polynomials implementation of the paper Fast Multiscale Diffusion on Graphs.