Accelerating cures for 10,000 rare genetic diseases
DNA to the power of AI
Advancing the cure
Our AI platform pinpoints the single faulty DNA token behind each rare genetic disease—radically accelerating the development of cures for millions of people.
Today, developing treatments for ~10,000 rare genetic diseases costs billions of dollars, takes ~7 years, and requires tens of thousands of genetic samples—putting cures beyond reach.
Now, with the 3 billion-token human genome mapped (2022), and with FDA approval for the first CRISPR-based gene editing treatment (2023), we are able to target the single token causing disease—if we know its exact location.
Targeting and editing the single DNA token among the 3 billion is the key to the cure. Ecotone’s AI platform helps locate the precise disease-causing token, slashing the time and cost to bring cures to these patients.
Built on original research
We build AI foundational models from the ground up using genetics-first principles.
Acceleration
Ecotone’s foundational model, dnaSORA slashes the cost, time, and genetic material needed for developing rare disease treatments—eliminating costly false positives.

Efficiency
We use transfer learning to create synthetic data, enabling breakthroughs with just dozens of patient samples instead of tens of thousands—making cures for the rarest diseases achievable.

Precision
Our dual-model system combines a generator and a discriminator to deliver ultra-precise targeting (0.3 megabase resolution vs. pharmaceutical industry’s hundreds), reducing complexity and enhancing precision.

Whitepapers
Abstract
dnaSORA - A Unified Diffusion Transformer for DNA point clouds
The relatively obscure Hawaiian experiment collapses diverse phenotypes, including nearly all human genetic diseases to a singular Gaussian-like point cloud feature, structuring unstructured information. The uniformity of the feature space provides a straightforward way for AI models to learn all three billion tokens for reading the human genome as a first language. We propose a diffusion transformer, dnaSORA, for learning these features. dnaSORA has generative capacity similar to Stable Diffusion but for DNA point clouds. The model’s architecture is novel because it is unified; thus, it also functions as a discriminator that uses a frozen latent representation for classification. dnaSORA transfer learns from synthetic data emulating real genome point clouds to classify misrepresented tokens in C. elegans Hawaiian data at state-of-the-art 0.3 Mb resolution. Pre-training large genome models typically requires expensive and difficult-to-obtain genomes. However, our solution provides nearly unlimited synthetic training data at negligible compute costs. Inference for new token assignments (e.g., new diseases) requires genomes from several dozen rather than thousands of individuals. These efficiencies, combined with state-of-the-art resolution, provide a pathway for rapid, massive scaling of token annotation of the entire human genome at orders of magnitude below expected costs.
bioRxiv (January 29, 2025)

Oleksandr Koreniuk, Dr. eMalick G. Njie
Abstract
Seq2KING - An Unsupervised Internal Transformer Representation of Global Human Heritages
Deciphering the intricate tapestry of human genetic relationships is a central challenge in population genetics and precision medicine. We propose that the principles of lexical connectivity—where words derive meaning from their contextual interactions—can be adapted to genetic data, enabling transformer models to reveal that individuals with higher genetic similarity form stronger latent connections. We explore this by transposing KING kinship related matrices into the QKV (query, key, value) latent space within transformer models and determined that attention mechanisms can capture genetic relatedness in an unsupervised fashion. We found that individuals had attention weight connectivity of 85.34% (p<0.05) if they were from within the same continent compared to if they were from other continents. Surprisingly, we found that some encoder layers required inversion of their latent representations for this connectivity to be obvious. Lastly, we made use of BERTVis to make human-readable hyperdense connectivity patterns amongst individuals. Our approach is purely based on attention that yields a non-discrete spectrum of relatedness and thus uncovers patterns on first principles. Seq2KING addresses the significant challenge of discovering population structure to construct a global human relatedness map without relying on predefined labels. Our excavation into the latent space is a paradigm shift from legacy supervised genetic methodologies that presents a new way to understand the human pangenome as well as discern population substructure for creating precision genetic medicines.
Draft (June 2025)

Bhavana Jonnalagadda, MsDS, Dr. eMalick G. Njie
News
    Get updates on our journey to help
    cure rare genetic diseases
    You will receive occasional emails from us on our latest company and technical developments. You always have the choice to unsubscribe within every email you receive.
    Contact
    For more information, please contact: info@ecotone.ai