Elyas Heidari
I build machine-learning systems for spatial biology, mostly graph models that read a whole tissue at once, at the scale of tens of millions of transcripts.
My main project, Segger, takes cell segmentation that used to be too slow to attempt on large spatial datasets and runs it in about ten minutes. I was also co-first author on a tumour-genomics study in Nature Biomedical Engineering.
PhD researcher, Stegle and Gerstung labs · DKFZ and EMBL, Heidelberg · finishing 2026
About
I'm from Iran. I studied computer engineering and applied mathematics at Sharif, in Tehran, and spent the years since moving across European labs — EMBL, ETH Zurich, the University of Zurich, and Cambridge — before ending up where I am now, a PhD between DKFZ and EMBL in Heidelberg.
Most of my work lives on graphs. Tissue has structure everywhere you look, and a graph is an honest way to write that structure down, so I reach for graph neural networks for almost everything. The last few years have gone into taking analyses that were slow or thought impossible and getting them to run in minutes, cell segmentation being the latest.
I have a confession that won't surprise anyone who does this work: the most scientific parts of my papers were usually the fastest to write, and the rest of the time I was keeping broken code alive. I've stopped treating that as the boring part, because the engineering is what turns a nice idea into something a biologist can actually use on a Monday morning, and I care about a clean repository about as much as I care about the loss curve coming down. The work I like best happens when a wet-lab biologist and a couple of people who think in code get properly stuck on the same problem. I'm finishing the PhD in 2026 and looking for where to do this next, somewhere that takes both the biology and the systems seriously. I work in English, German, and Persian.
How I work
Most of biology happens in place.
Where a cell sits, and which cells it sits next to, can matter as much as what type it is, and much of that spatial context is thrown away the moment you grind a tissue up to sequence it. A lot of what I do is try to read it back: to recover the structure of a whole tissue and connect it to what the tissue is doing, in a tumour or a demyelinating lesion.
That pushes me toward models that keep the structure rather than flatten it into a table of numbers. In practice they are graph neural networks for the bulk of it, with transformers and generative pieces where they fit, written in PyTorch and increasingly JAX, plus a lot of unglamorous work to make them run across many GPUs once a dataset stops fitting in memory. I don't consider a method finished until someone I've never met can run it on their own data without emailing me.
I also think single-cell benchmarking is quietly broken. Most benchmarks are built once and never extended, so the moment a new method or dataset arrives they stop measuring anything useful, and the field goes on citing them anyway. I co-authored a paper making that case.
Selected work
A few things I've built. Segger and SageNet have taken most of my recent attention; the last two are older but seeded much of it.
Segger
Cell segmentation as a graph problem. Thirty million transcripts in about ten minutes.
Segger started as a complaint. Cell segmentation in imaging-based spatial transcriptomics was slow, inaccurate, or both, and every tool I tried buckled once a dataset got genuinely large. So I rebuilt it as one big heterogeneous graph: transcripts and cells are the nodes, deciding which cell a transcript belongs to becomes a link-prediction problem, and the cell outlines from nuclear staining supply the labels; it can also fold in a matched single-cell reference to sharpen the calls. It runs across many GPUs and segments thirty million transcripts in about ten minutes, roughly a thousand times faster than the tools before it, without giving up accuracy. It's under revision at Nature Methods, and it's the work I'm proudest of, because people now run it on datasets they'd written off.
SageNet
Putting dissociated cells back where they came from.
SageNet learns to put dissociated cells back where they came from. When you dissociate a tissue for single-cell sequencing, the first thing you lose is where each cell used to sit. SageNet trains on a spatial reference and predicts that lost position, and its trick is to build the graph over a gene-interaction network estimated from the reference, so it is genes, not just cells, that carry the spatial signal. We showed it by reconstructing the spatial layout of the mouse embryo during gastrulation from seqFISH data, where it beat the standard tools of the time, Tangram and NovoSpaRc. It was my master's thesis, done in John Marioni's lab at EMBL-EBI in Cambridge while I was enrolled at ETH Zurich, and it later won the ETH Medal, ETH Zurich's award for an outstanding thesis.
Before those two: scPotter (formerly scGCN), an early attempt at graph-convolutional learning over gene regulatory networks for cell annotation, from back when GNNs for single cells were barely a thing; and MUVis, an R package from my Sharif years for modelling dependencies in population-scale, mixed-type data. Both are older than my current taste, but the instinct behind them is the one I still work by.
Tehran to Heidelberg
DKFZ & EMBL Heidelberg
PhD researcher, Stegle and Gerstung labs
Shared between Oliver Stegle's and Moritz Gerstung's groups, on structured representation learning for large-scale spatial omics. Segger came out of this, along with most of what I now know about keeping a distributed, multi-GPU system honest once a single dataset stops fitting in memory. I contribute to the scverse ecosystem, mainly through SpatialData, and I led projects at the scverse × Owkin and SpaceHack hackathons.
ETH Zurich · UZH · EMBL-EBI Cambridge
MSc Computational Biology · RA, Robinson lab · thesis in the Marioni lab
Three overlapping things that make one story. A master's in computational biology at ETH Zurich (5.76/6.0, top three in the cohort); alongside it, a research assistant post in Mark Robinson's lab in Zurich building single-cell pipelines for quality control, integration, and cell typing, which is where I learned to write software other people depend on; and a final year on a competitive fellowship in John Marioni's lab at EMBL-EBI in Cambridge, where SageNet came out.
EMBL Heidelberg
Research trainee, Huber group (BSc thesis)
A summer fellowship in Wolfgang Huber's group, and my first real taste of graph-based representation learning for single-cell data. I spent it on random-walk and probabilistic graphical models, and on finding where they stop scaling. It was enough to decide the rest of this for me.
Sharif University of Technology, Tehran
BSc Computer Engineering & Applied Mathematics
My undergrad years, and where a lot of this started. I was head TA for advanced programming and for probability, founded Sharif DataDays, and wrote MUVis on the side. Iranian olympiad culture taught me to treat hard problems as the normal state of things, for better and for worse.
Selected publications
Six I usually point to first. My name is in bold and an asterisk marks equal contribution; the full list is on Google Scholar.
Segger: Fast and accurate cell segmentation of imaging-based spatial transcriptomics data
Heidari, E.*, Moorman, A.*, Unyi, D., et al.
bioRxiv 2025 · Under revision at Nature Methods. · [docs] · [code]
Breinig, M.*, Lomakin, A.*, Heidari, E.*, et al.
Nature Biomedical Engineering 2025
SpatialData: an open and universal data framework for spatial omics
Marconato, L.*, Palla, G.*, Yamauchi, K. A.*, Virshup, I.*, Heidari, E., et al.
Nature Methods 22(1):58–62 2025
snRNA-seq stratifies multiple sclerosis patients into distinct white matter glial responses
Macnair, W., Calini, D., Agirre, E., Heidari, E., et al.
Neuron 113(3):396–410.e9 2025
Supervised spatial inference of dissociated single-cell data with SageNet
Heidari, E., Lohoff, T., Tyser, R. C. V., Marioni, J. C., Robinson, M. D., Ghazanfar, S.
bioRxiv 2022 · Master’s thesis; outperformed Tangram and NovoSpaRc.
Sonrel, A., Luetge, A., Soneson, C., Mallona, I., Germain, P. L., Heidari, E., et al.
Genome Biology 24(1):119 2023
Writing
Occasional notes on how the work really goes, the science and the plumbing both.
2026-02-05
Why Picasso Made 147,000 Things (And Why You Should Too)
The exploration-exploitation trade-off in creative work: why volume is the only variable you control in the pursuit of masterpieces.
2026-02-03
PhD Researcher & Research Engineer: Navigating the AI x Biology Frontier
Reflections on the hybrid identity of a PhD researcher and research engineer, and the pursuit of balance in the high-stakes intersection of AI and Biology.
2026-02-02
Bioinformaticians' Tale: From Pipeline Plumber to Architects of Agentic Bot-Labs
A 2026 hot-take on the future of bioinformatics: as AI agents orchestrate entire experimental loops, the role of the scientist shifts from pipeline builder to architect of discovery.
Contact
If you work on machine learning for biology, or you have a spatial dataset that keeps breaking the tools you throw at it, I'd like to hear about it. Email is the surest way to reach me, and most of my code is on GitHub.