Bayesian mixture models for estimating and clustering cancer cell fractions

Build Status AppVeyor Build Status


ccube is an R package for clustering and estimating cancer cell fractions (CCF) of somatic variants (SNVs/SVs) from bulk whole genome/exome data. The method takes the reference and alternative allele read counts of called variants, corrects for copy number alterations and purity, then produces CCF estimates for all variants within the tumour sample. It identifies clusters of mutations, which can be used to determine the clonal architecture of the sample.


The package contains four Bayesian mixture models, all fitted with variational inference.

  • Ccube: Normal-Binomial mixture model, used for clustering and estimating CCFs of SNVs. Details can be found in this manuscript.
  • CcubeSV: Normal-Binomial mixture model, similar with Ccube but modified for clustering and estimating CCFs of SVs. Details can be found in this manuscript and repo.
  • Student-t mixture model: Main model for purity estimation. Implements model described in this paper.
  • Normal mixture model: Alternative model for purity estimation, also used for code calibration and testing.



Getting Started