Processing Multiplexed Samples in PhyloSift

We are steadily working to increase support for running multiplexed samples in PhyloSift. At the moment, the client workflow (all mode as executed using the phylosift wrapper script) assumes that all input sequences are derived from a single sample.

Luckily, there are easy workaround. To analyze multiplexed samples in PhyloSift, follow these steps:

Step 1: Demultiplex raw sequence data

If you have raw data files containing sequences from number of samples multiplexed with different barcodes (e.g. paired-end data from a lane of Illumina), you will first need to demultiplex (remove barcodes) and generate separate input files for each sample site. This pre-processing can be accomplished using external software such as QIIME (we recommend using this comprehensive tutorial for demultiplexing Illumina data within QIIME).

Step 2: Run samples separately through PhyloSift

Run the full PhyloSift client workflow separately for each sample site. Accepted file formats and full documentation can be found here: Running PhyloSift – an overview

Step 3: Execute multisample comparisons in guppy

Once you have successfully run all samples through this workflow, you can explore the sample-specific PhyloSift outputs (individual Krona outputs,  taxonomy summaries, pplacer placefiles, etc.), and additionally conduct multisample comparisons using the guppy software that comes prepackaged in the PhyloSift download. The guppy package accepts .jplace files (pplacer place files) which can be found in the /PS_temp/user_filename/treeDir/ output directory. Note that PhyloSift generates one .jplace file PER MARKER GENE. When making multisample comparisons in guppy you must use .jplace files representing the same gene alignment across samples, e.g. using concat.jplace files to analyze the concatenated DNGNGWU marker genes.

Comprehensive guppy documentation can be found here. The most common multisample comparisons can be executed using the following commands:

Edge Principal Components Analysis (akin to PCA using UniFrac)

./guppy epca [options] placefiles

Squash Clustering (akin to UPGMA using UniFrac)

./guppy squash [options] placefiles

Kantorovich-Rubinstein distance (akin to weighted UniFrac)

./guppy kr [options] placefiles

For example, to compare community similarity across four samples using Edge Principal Components to analyze phylogenetic patterns observed in tree placements for concatenated DNGNGWU markers, you would run the command:

./guppy epca sample1_concat.jplace sample2_concat.jplace sample3_concat.jplace sample4_concat.jplace