Jellyfisher is an R package (a htmlwidget) for visualizing tumor evolution and subclonal compositions using Jellyfish plots. The package is based on the Jellyfish visualization tool, bringing its functionality to R users. Jellyfisher supports both ClonEvol results and plain data frames, making it compatible with various tools and workflows.
Input data
The input data should follow specific structures for samples, phylogeny, and subclonal compositions.
The Jellyfisher package includes an example data set based on the following publication: Lahtinen, A., Lavikka, K., Virtanen, A., et al. “Evolutionary states and trajectories characterized by distinct pathways stratify patients with ovarian high-grade serous carcinoma.” Cancer Cell 41, 1103–1117.e12 (2023). DOI: 10.1016/j.ccell.2023.04.017.
data(jellyfisher_example_tables)
Samples
-
sample
(string): specifies the unique identifier for each sample. -
displayName
(string, optional): allows for specifying a custom name for each sample. If the column is omitted, thesample
column is used as the display name. -
rank
(integer): specifies the position of each sample in the Jellyfish plot. For example, different stages of a disease can be ranked in chronological order: diagnosis (1), interval (2), and relapse (3). The zeroth rank is reserved for the root of the sample tree. Ranks can be any integer, and unused ranks are automatically excluded from the plot. If therank
column is absent, ranks are assigned based on each sample’s depth in the sample tree. -
parent
(string): identifies the parent sample for each entry. Samples without a specified parent are treated as children of an imaginary root sample.
head(jellyfisher_example_tables$samples, 25)
#> sample displayName rank parent patient
#> 35 EOC69_pOme1_DNA1 pOme1 1 EOC69
#> 36 EOC69_pOva1_DNA2 pOva1 1 EOC69
#> 37 EOC69_r1Vag1_DNA1 r1Vag1 10 EOC69
#> 262 EOC495_pLNL1_DNA1 pLNL1 1 EOC495
#> 263 EOC495_pLNL2_DNA1 pLNL2 1 EOC495
#> 264 EOC495_pLNR_DNA1 pLNR 1 EOC495
#> 265 EOC495_pOvaL6_DNA1 pOvaL6 1 EOC495
#> 266 EOC495_pOvaL7_DNA1 pOvaL7 1 EOC495
#> 267 EOC495_pPerL_DNA1 pPerL 1 EOC495
#> 322 EOC677_pAsc_DNA1 pAsc 1 EOC677
#> 323 EOC677_pPer1_DNA pPer1 1 EOC677
#> 324 EOC677_r2Asc_DNA r2Asc 11 EOC677_rAsc_DNA4 EOC677
#> 325 EOC677_rAsc_DNA4 rAsc 9 EOC677_pAsc_DNA1 EOC677
#> 368 EOC809_p2Bow1_c_DNA2 p2Bow1_c 3 EOC809
#> 369 EOC809_p2Ome1_c_DNA1 p2Ome1_c 3 EOC809
#> 370 EOC809_p2OvaL1_c_DNA6 p2OvaL1_c 3 EOC809
#> 371 EOC809_p2Per1_cO_DNA2 p2Per1_cO 3 EOC809
#> 372 EOC809_r1Bow1_DNA1 r1Bow1 10 EOC809_p2Bow1_c_DNA2 EOC809
Phylogeny
-
subclone
(string): specifies subclone IDs, which can be any string. -
parent
(string): designates the parent subclone. The subclone without a parent is considered the root of the phylogeny. -
color
(string, optional): specifies the color for the subclone. If the column is omitted, colors will be generated automatically. -
branchLength
(number): specifies the length of the branch leading to the subclone. The length may be based on, for example, the number of unique mutations in the subclone. The branch length is shown in the Jellyfish plot’s legend as a bar chart. It is also used when generating a phylogeny-aware color scheme.
head(jellyfisher_example_tables$phylogeny, 25)
#> subclone parent color branchLength patient
#> 44 1 -1 #cccccc 2742 EOC69
#> 45 2 12 #a6cee3 478 EOC69
#> 46 5 9 #ff99ff 68 EOC69
#> 47 6 5 #fdbf6f 244 EOC69
#> 48 8 1 #bbbb77 2433 EOC69
#> 49 9 8 #cf8d30 313 EOC69
#> 50 11 8 #ff7f00 868 EOC69
#> 51 12 1 #3de4c5 4762 EOC69
#> 52 13 9 #ff1aff 1017 EOC69
#> 420 1 -1 #cccccc 1426 EOC495
#> 421 2 5 #a6cee3 184 EOC495
#> 422 3 6 #b2df8a 246 EOC495
#> 423 4 1 #cab2d6 2874 EOC495
#> 424 5 7 #ff99ff 864 EOC495
#> 425 6 5 #fdbf6f 154 EOC495
#> 426 7 1 #fb9a99 179 EOC495
#> 427 8 7 #bbbb77 631 EOC495
#> 428 9 4 #cf8d30 415 EOC495
#> 510 1 -1 #cccccc 4961 EOC677
#> 511 2 1 #a6cee3 239 EOC677
#> 512 4 5 #cab2d6 437 EOC677
#> 513 5 10 #ff99ff 1802 EOC677
#> 514 6 9 #fdbf6f 979 EOC677
#> 515 9 10 #cf8d30 223 EOC677
#> 516 10 1 #41ae76 314 EOC677
Subclonal compositions
Subclonal compositions are specified in a tidy format, where each row represents a subclone in a sample.
-
sample
(string): specifies the sample ID. -
subclone
(string): specifies the subclone ID. -
clonalPrevalence
(number): specifies the clonal prevalence of the subclone in the sample. The clonal prevalence is the proportion of the subclone in the sample. The clonal prevalences in a sample must sum to 1.
head(jellyfisher_example_tables$compositions, 25)
#> sample subclone clonalPrevalence patient
#> 98 EOC69_pOme1_DNA1 5 0.2250 EOC69
#> 99 EOC69_pOme1_DNA1 6 0.0965 EOC69
#> 100 EOC69_pOme1_DNA1 13 0.6660 EOC69
#> 101 EOC69_pOva1_DNA2 6 0.4175 EOC69
#> 102 EOC69_pOva1_DNA2 11 0.5225 EOC69
#> 103 EOC69_pOva1_DNA2 13 0.0360 EOC69
#> 104 EOC69_r1Vag1_DNA1 2 0.3845 EOC69
#> 105 EOC69_r1Vag1_DNA1 12 0.5970 EOC69
#> 902 EOC495_pLNL1_DNA1 4 0.5575 EOC495
#> 903 EOC495_pLNL1_DNA1 9 0.4405 EOC495
#> 904 EOC495_pLNL2_DNA1 4 0.4635 EOC495
#> 905 EOC495_pLNL2_DNA1 9 0.5345 EOC495
#> 906 EOC495_pLNR_DNA1 1 0.1595 EOC495
#> 907 EOC495_pLNR_DNA1 4 0.5060 EOC495
#> 908 EOC495_pLNR_DNA1 5 0.0350 EOC495
#> 909 EOC495_pLNR_DNA1 9 0.2950 EOC495
#> 910 EOC495_pOvaL6_DNA1 3 0.5320 EOC495
#> 911 EOC495_pOvaL6_DNA1 5 0.0665 EOC495
#> 912 EOC495_pOvaL6_DNA1 6 0.3995 EOC495
#> 913 EOC495_pOvaL7_DNA1 2 0.5440 EOC495
#> 914 EOC495_pOvaL7_DNA1 5 0.4390 EOC495
#> 915 EOC495_pPerL_DNA1 1 0.1155 EOC495
#> 916 EOC495_pPerL_DNA1 8 0.8850 EOC495
#> 1085 EOC677_pAsc_DNA1 2 0.3180 EOC677
#> 1086 EOC677_pAsc_DNA1 9 0.2440 EOC677
Plotting
Basic plotting
The three tables are passed to the jellyfisher
function
as a named list. The function generates an interactive Jellyfish plot
based on the input data. If the data set contains multiple patients, the
Jellyfisher htmlwidget shows navigation buttons to switch between
patients.
jellyfisher(jellyfisher_example_tables,
width = "100%", height = 500)
Plotting with custom options
jellyfisher(jellyfisher_example_tables,
options = list(
sampleHeight = 70,
sampleTakenGuide = "none",
tentacleWidth = 3,
showLegend = FALSE
),
width = "100%", height = 500)
Plotting a single patient
When plotting multiple patients, Jellyfisher shows buttons (Previous
and Next) to navigate between patients. When the data contains only one
patient, these buttons are hidden. The package also provides a
select_patients
function to filter the data set with
ease.
jellyfisher_example_tables |>
select_patients("EOC677") |>
jellyfisher(width = "100%", height = 500)
Adjusting the sample tree structures
The sample trees in the example data set were constructed as follows:
“For each sample, we checked whether an earlier time point included exactly one sample from the same anatomical location. If such a sample existed, it was assigned as the parent; otherwise, the inferred root was used as the parent.”
However, this mechanistic approach may not always produce credible sample trees.
Changing parent
The r1Bow1 (bowel) sample in the following jellyfish plot is derived from an earlier bowel sample p2Bow1_c, which has no traces of the subclone 12.
jellyfisher_example_tables |>
select_patients("EOC809") |>
jellyfisher(width = "100%", height = 600)
Using the set_parents
function, we can adjust the parent
of the r1Bow1 sample to be p2Per1_cO (peritoneum),
which is a possible source of the metastasis due to its proximity. The
high prevalence of subclone 12 in this sample suggests that it is the
likely source of the metastasis in the r1Bow1 sample.
jellyfisher_example_tables |>
select_patients("EOC809") |>
set_parents(list("EOC809_r1Bow1_DNA1" = "EOC809_p2Per1_cO_DNA2")) |>
jellyfisher(width = "100%", height = 600)
Changing topology
While ranks (the columns) can indicate the time points when the samples were acquired, they can also be used to simply show the sample’s depth in the sample tree. For instance, the following plot shows all the samples on the same rank, indicating that they were diagnostic samples acquired at the same time.
jellyfisher_example_tables |>
select_patients("EOC495") |>
jellyfisher(width = "100%", height = 600)
However, one can argue that the LN (lymph node) samples represent a later development in the disease, and thus, they should be placed on a later rank. We can remove the existing ranks, define new parent-child relationships, and let Jellyfisher assign the ranks based on the sample tree depth.
tables <- jellyfisher_example_tables |>
select_patients("EOC495")
# Remove existing ranks. The ranks will be assigned automatically based
# on samples' depths in the sample tree.
tables$samples$rank <- NA
tables |>
set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1",
"EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |>
jellyfisher(width = "100%", height = 500)
If we think that the lymph node samples represent even later
development, we can manually assign ranks to the samples. The
set_ranks
function provides an easy way to do this.
tables |>
set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1",
"EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |>
set_ranks(list("EOC495_pLNR_DNA1" = 2,
"EOC495_pLNL1_DNA1" = 3,
"EOC495_pLNL2_DNA1" = 4),
default = 1) |>
jellyfisher(width = "100%", height = 400)
Session info
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] jellyfisher_0.0.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.3 knitr_1.49 rlang_1.1.4
#> [5] xfun_0.50 stringi_1.8.4 generics_0.1.3 textshaping_0.4.1
#> [9] jsonlite_1.8.9 glue_1.8.0 htmltools_0.5.8.1 ragg_1.3.3
#> [13] sass_0.4.9 rmarkdown_2.29 tibble_3.2.1 evaluate_1.0.3
#> [17] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
#> [21] stringr_1.5.1 compiler_4.4.2 dplyr_1.1.4 fs_1.6.5
#> [25] pkgconfig_2.0.3 htmlwidgets_1.6.4 systemfonts_1.1.0 digest_0.6.37
#> [29] R6_2.5.1 tidyselect_1.2.1 pillar_1.10.1 magrittr_2.0.3
#> [33] bslib_0.8.0 tools_4.4.2 pkgdown_2.1.1 cachem_1.1.0
#> [37] desc_1.4.3