PhyloSift Reference Marker Genes

New marker download link here:
https://figshare.com/articles/PhyloSift_markers_database/5755404/1

In addition to rRNA genes (16S/18S), Phylosift relies on protein coding marker genes for assigning taxonomy to bacteria, archaea, and eukaryotes. Although we’ve computationally identified marker genes via name-independent approaches, gene annotations are listed here for reference and biological ponderings.

We’re working to update PhyloSift with an expanded marker set which will incorporate additional marker genes for eukaryotes (e.g. plastid genes), taxon-specific sets of gene families, and support for viral marker genes.

The following markers (DNGNGWU*) were originally mined from bacterial/archaeal genomes, but our in-house tests show that at least 33 of these markers have full-length eukaryotic homologs (based on searches against the yeast genome). Percentage values given for Bacteria/Archaea indicate the proportion of taxa in each group whose genomes contain each marker gene (obtained from Wu et al. 2013). Asterisk denotes potentially multi-copy markers, as determined by in-house assessments of microbial genome assemblies at UC Davis (NOTE: As of 1/10/14, these three markers with an asterisk – DNGNGWU00004, DNGNGWU00008, and DNGNGWU00038 – are currently disabled during PhyloSift runs, even though they are still included in automatic marker package downloads):

PhyloSift Marker

Gene Name

DNGNGWU00001 ribosomal protein S2 rpsB (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00002 ribosomal protein S10 rpsJ (Archaea: 100%, Bacteria: 98.51%)
DNGNGWU00003 ribosomal protein L1 rplA (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00004* translation elongation factor EF-2 (Archaea: 100%, Bacteria: 99.67%)
DNGNGWU00005 translation initiation factor IF-2 (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00006 metalloendopeptidase (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00007 ribosomal protein L22 (Archaea: 100%, Bacteria: 99.67%)
DNGNGWU00008* ffh signal recognition particle protein (Archaea: 100%, Bacteria: 98.18%)
DNGNGWU00009 ribosomal protein L4/L1e rplD (Archaea: 100%, Bacteria: 99.67%)
DNGNGWU00010 ribosomal protein L2 rplB (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00011 ribosomal protein S9 rpsI (Archaea: 100%, Bacteria: 100%)
DNGNGWU00012 ribosomal protein L3 rplC (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00013 phenylalanyl-tRNA synthetase beta subunit (Archaea: 100%, Bacteria: 99.67%)
DNGNGWU00014 ribosomal protein L14b/L23e rplN (Archaea: 100%, Bacteria: 99.34%)
DNGNGWU00015 ribosomal protein S5 (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00016 ribosomal protein S19 rpsS (Archaea: 100%, Bacteria: 99.17%)
DNGNGWU00017 ribosomal protein S7 (Archaea: 100%, Bacteria: 99.67%)
DNGNGWU00018 ribosomal protein L16/L10E rplP (Archaea: 100%, Bacteria: 99.67%)
DNGNGWU00019 ribosomal protein S13 rpsM (Archaea: 100%, Bacteria: 99.17%)
DNGNGWU00020 phenylalanyl-tRNA synthetase alpha subunit (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00021 ribosomal protein L15 (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00022 ribosomal protein L25/L23 (Archaea: 100%, Bacteria: 99.17%)
DNGNGWU00023 ribosomal protein L6 rplF (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00024 ribosomal protein L11 rplK (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00025 ribosomal protein L5 rplE (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00026 ribosomal protein S12/S23 (Archaea: 100%, Bacteria: 99.17%)
DNGNGWU00027 ribosomal protein L29 (Archaea: 98.39%, Bacteria: 98.68%)
DNGNGWU00028 ribosomal protein S3 rpsC (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00029 ribosomal protein S11 rpsK (Archaea: 100%, Bacteria: 99.17%)
DNGNGWU00030 ribosomal protein L10 (Archaea: 98.39%, Bacteria: 99.67%)
DNGNGWU00031 ribosomal protein S8 (Archaea: 100%, Bacteria: 99.5%)
DNGNGWU00032 tRNA pseudouridine synthase B (Archaea: 95.16%, Bacteria: 97.35%)
DNGNGWU00033 ribosomal protein L18P/L5E (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00034 ribosomal protein S15P/S13e (Archaea: 100%, Bacteria: 99.84%)
DNGNGWU00035 Porphobilinogen deaminase (Archaea: 85.48%, Bacteria: 86.59%)
DNGNGWU00036 ribosomal protein S17 (Archaea: 100%, Bacteria: 99.17%)
DNGNGWU00037 ribosomal protein L13 rplM (Archaea: 100%, Bacteria: 99.83%)
DNGNGWU00038* phosphoribosylformylglycinamidine cyclo-ligase rpsE (Archaea: 90.32%, Bacteria: 92.38%)
DNGNGWU00039 ribonuclease HII (Archaea: 100%, Bacteria: 98.51%)
DNGNGWU00040 ribosomal protein L24 (Archaea: 100%, Bacteria: 99.5%)

PhyloSift also includes a suite of markers that are more narrowly focused on eukaryotes, including both nuclear and mitochondrial markers:

PhyloSift Marker (Eukaryotic)

Gene Name

14-3-3 5-monooxygenase activation protein (HomoloGene ID: 100743)
40S 40S ribosomal protein S4 (HomoloGene ID: 90857)
Actin actin, beta (HomoloGene ID: 110648)
Atub tubulin, alpha 4a (HomoloGene ID: 68496)
Btub tubulin, beta 4 (HomoloGene ID: 55952)
ef1aLike eukaryotic translation elongation factor 1, alpha 1 (HomoloGene ID: 105313)
ef2 eukaryotic translation elongation factor 2 (HomoloGene ID: 100816)
enolase enolase 1 (HomoloGene ID: 68183)
gamma tubulin, gamma
grc5 60S ribosomal protein L10 (HomoloGene ID: 68830)
hsp70 Hsp70 protein
hsp70cyt heat shock 70kDa protein 8 (HomoloGene ID: 68524)
hsp70er predicted Hsp70 protein
Hsp90 heat shock protein 90kDa alpha (cytosolic) (HomoloGene ID: 74306)
metk methionine adenosyltransferase II alpha, S-adenosylmethionine synthetase (HomoloGene ID: 38112)
Rad51 RAD-associated protein
rps22 Rps15a (ribosomal protein S15A) (HomoloGene ID: 128371)
Rps23a 40S ribosomal protein S23 (HomoloGene ID: 799)
TFIIH (hypothetical protein)
Tsec61 Sec61 alpha 1 subunit (HomoloGene ID: 55537)
U5 splicing factor Prp8
mtDNA_ATP6 Mitochondrial ATP synthase subunit 6
mtDNA_ATP8 Mitochondrial ATP synthase subunit 8
mtDNA_Cox1 Mitochondrial cytochrome c oxidase subunit 1
mtDNA_Cox2 Mitochondrial cytochrome c oxidase subunit 2
mtDNA_Cox3 Mitochondrial cytochrome c oxidase subunit 3
mtDNA_CytB Mitochondrial Cytochrome b
mtDNA_ND1 Mitochondrial NADH Deyhydrogenase subunit 1
mtDNA_ND2 Mitochondrial NADH Deyhydrogenase subunit 2
mtDNA_ND4 Mitochondrial NADH Deyhydrogenase subunit 4
mtDNA_ND4L Mitochondrial NADH Deyhydrogenase subunit 4L
mtDNA_ND5 Mitochondrial NADH Deyhydrogenase subunit 5
mtDNA_ND6 Mitochondrial NADH Deyhydrogenase subunit 6