Software Di Genoma Lab

DiGenoma Lab software

At DiGenoma Lab, we have developed a suite of open-source software tools to facilitate high-throughput genomics research. Our software is designed with reproducibility in mind, and we use containerization technologies like Docker and Singularity to ensure that our pipelines run consistently across different environments and platforms.

Our software is optimized to run on HPC clusters, and we take full advantage of high-performance computing resources to enable rapid analysis of large-scale genomic datasets.

For example, our genome assembly software includes Wengan, an ultra-fast and accurate hybrid assembler that can leverage long- and short-read data for optimal genome reconstruction. We also offer Fast-SG, an alignment-free algorithm for constructing scaffolding graphs from reads, and FastKM, a lightning-fast tool for matching k-mers using rolling and perfect hashing.

In the field of cancer genomics, we have developed a range of tools for detecting somatic mutations and structural variants, including Purple-nf, a nextflow pipeline for somatic copy number variant calling using PURPLE, and Sv-somatic-cns, a consensus calling pipeline for somatic structural variants integrating popular SV callers. We also offer tools for detecting mRNA fusions and discovering extrachromosomal DNA in cancer genomes.

Our software is also designed to handle large genomic datasets, such as that generated by high-throughput sequencing machines like MGI-G400 or Nanopore P2-solo. For example, our K-Count pipeline uses k-mer counting to estimate genome size from whole-genome sequencing data, while our Alnsl pipeline provides a fast and accurate alignment tool for short WGS reads.

In summary, our software tools are designed to be scalable, reproducible, and optimized for high-performance computing environments. Whether working with complex genomic datasets or trying to solve complex biological problems, we have the software tools you need to get the job done.

Here are some of the software tools we’ve developed:

Genome Assembly

  1. Wengan: An accurate and ultra-fast hybrid genome assembler that can handle short-read and long-read data from various sequencing platforms. Wengan is designed to be highly scalable and can assemble genomes of various sizes and complexity. Notably, it can assemble a human genome in a single day on a modest machine (20 cores, 40Gb RAM)
  2. Fast-SG: An alignment-free algorithm for ultrafast scaffolding graph construction from short or long reads. Fast-SG can be used to generate high-quality genome assemblies in a fraction of the time of traditional assembly methods.
  3. FastKM: A tool for ultra-fast matching of k-mers using rolling and perfect hashing. FastKM is designed to be highly efficient, allowing researchers to generate haplotype-resolved assemblies quickly and accurately.
  4. hic-scaffolding-nf: A Nextflow pipeline for scaffolding genome assemblies with Hi-C reads. This tool can help researchers improve the contiguity and accuracy of their genome assemblies by leveraging long-range chromatin interaction data.
  5. k-count: A Nextflow pipeline to count k-mers and estimate genome size from WGS data. K-count is a fast and efficient tool for genome size estimation, which is essential for many downstream genomic analyses.

Cancer Genomics

  1. purple-nf: A Nextflow pipeline for somatic Copy Number Variant (CNV) calling with PURPLE. This tool can help researchers identify somatic CNVs in tumor samples, providing insights into cancer development and progression.
  2. sv_somatic_cns: A tool for consensus calling of somatic structural variants from paired WGS data. This tool can help researchers identify somatic structural variants in cancer genomes, which can provide insights into the mechanisms underlying cancer development or identify complex mutational processes like chromothripsis.
  3. ampliconarchitect-nf: A Nextflow pipeline to discover extrachromosomal DNA (ecDNA) in cancer genomes. This tool can help researchers identify ecDNA in tumor samples, which has been associated with aggressive tumor growth and poor prognosis.
  4. nf-gene-fusions: A Nextflow pipeline to call somatic mRNA fusions. This tool can help researchers identify gene fusions in cancer genomes, which can play a role in tumorigenesis.

General Utilities

  1. alnsl: A Nextflow pipeline for alignment of short WGS reads. Alnsl is a fast and efficient tool for read alignment, which is an essential step in many genomic analyses.
  2. longreadstats: is a bioinformatics best-practice analysis pipeline for computing long-read statistics with Nanoplot.

We’re constantly updating and improving our software tools to stay at the forefront of genomic analysis. If you’re interested in using any of our software, or if you have ideas for new tools we could develop, don’t hesitate to get in touch!