------------------------------------------------------ EnTAP Run Information - Execution ------------------------------------------------------ Current EnTAP Version: 0.10.4 Start time: Thu Sep 24 12:50:43 2020 Working directory has been set to: /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results User Inputs: out-dir: /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results config: false runP: true runN: false overwrite: false ini: /home/FCAM/egrau/gmap/Qulo/v1.0/entap_config.ini input: Qulo.1_0.filtered.pep.fa database: /isg/shared/databases/Diamond/RefSeq/complete.protein.faa.200.dmnd, graph: false no-trim: false threads: 12 state: + no-check: false output-format: 1,3,4, entap-db-bin: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/entap_database.bin entap-db-sql: entap-graph: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/src/entap_graphing.py data-generate: false data-type: 0, fpkm: 0.500000 align: single-end: false rsem-calculate-expression: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-calculate-expression rsem-sam-validator: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-sam-validator rsem-prepare-reference: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-prepare-reference convert-sam-for-rsem: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/convert-sam-for-rsem complete: false frame-selection: 2 genemarkst-exe: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/gmst_linux_64/gmst.pl transdecoder-long-exe: TransDecoder.LongOrfs transdecoder-predict-exe: TransDecoder.Predict transdecoder-m: 100 diamond-exe: diamond taxon: Quercus qcoverage: 50.000000 tcoverage: 50.000000 contam: e-value: 0.000010 uninformative: ontology: 0, level: 1, eggnog-sql: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/eggnog.db eggnog-dmnd: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/eggnog_proteins.dmnd interproscan-exe: interproscan.sh protein: ------------------------------------------------------ Transcriptome Statistics ------------------------------------------------------ Protein sequences found Total sequences: 94394 Total length of transcriptome(bp): 411784667 Average sequence length(bp): 4362.00 n50: 35948 n90: 3873 Longest sequence(bp): 65533 (scaffold4232_1) Shortest sequence(bp): 2 (scaffold8573_1) ------------------------------------------------------ Similarity Search - DIAMOND - complete ------------------------------------------------------ Search results: /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/blastp_Qulo_final_complete.out Total alignments: 1090 Total unselected results: 725 Written to: /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/unselected_lvl0 Total unique transcripts with an alignment: 365 Reference transcriptome sequences with an alignment (FASTA): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Search results (TSV): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Total unique transcripts without an alignment: 94029 Reference transcriptome sequences without an alignment (FASTA): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/no_hits.faa Total unique informative alignments: 60 Total unique uninformative alignments: 305 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Top 10 alignments by species: 1)quercus lobata: 134(36.71%) 2)quercus suber: 68(18.63%) 3)camellia sinensis: 15(4.11%) 4)nicotiana tomentosiformis: 13(3.56%) 5)juglans regia: 12(3.29%) 6)papaver somniferum: 9(2.47%) 7)vitis vinifera: 8(2.19%) 8)erythranthe guttata: 7(1.92%) 9)brassica oleracea var. oleracea: 5(1.37%) 10)pistacia vera: 4(1.10%) ------------------------------------------------------ Compiled Similarity Search - DIAMOND - Best Overall ------------------------------------------------------ Total unique transcripts with an alignment: 365 Reference transcriptome sequences with an alignment (FASTA): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_lvl0 Search results (TSV): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_lvl0 Total unique transcripts without an alignment: 94029 Reference transcriptome sequences without an alignment (FASTA): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/overall_results/no_hits.faa Total unique informative alignments: 60 Total unique uninformative alignments: 305 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): /home/FCAM/egrau/gmap/Qulo/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Top 10 alignments by species: 1)quercus lobata: 129(35.34%) 2)quercus suber: 73(20.00%) 3)camellia sinensis: 15(4.11%) 4)nicotiana tomentosiformis: 14(3.84%) 5)juglans regia: 12(3.29%) 6)papaver somniferum: 9(2.47%) 7)vitis vinifera: 8(2.19%) 8)erythranthe guttata: 6(1.64%) 9)pistacia vera: 4(1.10%) 10)nicotiana attenuata: 4(1.10%) ------------------------------------------------------ Gene Family - Gene Ontology and Pathway - EggNOG ------------------------------------------------------ Statistics for overall Eggnog results: Total unique sequences with family assignment: 11827 Total unique sequences without family assignment: 82567 Top 10 Taxonomic Scopes Assigned: 1)Viridiplantae: 9343(79.00%) 2)Eukaryotes: 2360(19.95%) 3)Ancestor: 82(0.69%) 4)Animals: 16(0.14%) 5)Fungi: 12(0.10%) 6)Arthropoda: 6(0.05%) 7)Bacteria: 3(0.03%) 8)Mammals: 2(0.02%) 9)Opisthokonts: 2(0.02%) 10)Nematodes: 1(0.01%) Total unique sequences with at least one GO term: 11819 Total unique sequences without GO terms: 8 Total GO terms assigned: 542611 Total cellular_component terms (lvl=1): 11050 Total unique cellular_component terms (lvl=1): 12 Top 10 cellular_component terms assigned (lvl=1): 1)GO:0005623-cell(L=1): 3580(32.40%) 2)GO:0043226-organelle(L=1): 2710(24.52%) 3)GO:0016020-membrane(L=1): 2057(18.62%) 4)GO:0032991-macromolecular complex(L=1): 942(8.52%) 5)GO:0030054-cell junction(L=1): 489(4.43%) 6)GO:0055044-symplast(L=1): 473(4.28%) 7)GO:0005576-extracellular region(L=1): 387(3.50%) 8)GO:0031974-membrane-enclosed lumen(L=1): 316(2.86%) 9)GO:0009295-nucleoid(L=1): 63(0.57%) 10)GO:0045202-synapse(L=1): 20(0.18%) Total molecular_function terms (lvl=1): 15827 Total unique molecular_function terms (lvl=1): 21 Top 10 molecular_function terms assigned (lvl=1): 1)GO:0005488-binding(L=1): 7142(45.13%) 2)GO:0003824-catalytic activity(L=1): 6683(42.23%) 3)GO:0005215-transporter activity(L=1): 600(3.79%) 4)GO:0001071-nucleic acid binding transcription factor activity(L=1): 376(2.38%) 5)GO:0005198-structural molecule activity(L=1): 285(1.80%) 6)GO:0009055-electron carrier activity(L=1): 280(1.77%) 7)GO:0060089-molecular transducer activity(L=1): 149(0.94%) 8)GO:0004871-signal transducer activity(L=1): 149(0.94%) 9)GO:0016209-antioxidant activity(L=1): 86(0.54%) 10)GO:0045735-nutrient reservoir activity(L=1): 18(0.11%) Total overall terms (lvl=1): 54241 Total unique overall terms (lvl=1): 114 Top 10 overall terms assigned (lvl=1): 1)GO:0008152-metabolic process(L=1): 7379(13.60%) 2)GO:0005488-binding(L=1): 7142(13.17%) 3)GO:0009987-cellular process(L=1): 6906(12.73%) 4)GO:0003824-catalytic activity(L=1): 6683(12.32%) 5)GO:0005623-cell(L=1): 3580(6.60%) 6)GO:0043226-organelle(L=1): 2710(5.00%) 7)GO:0044699-single-organism process(L=1): 2519(4.64%) 8)GO:0050896-response to stimulus(L=1): 2058(3.79%) 9)GO:0016020-membrane(L=1): 2057(3.79%) 10)GO:0065007-biological regulation(L=1): 1656(3.05%) Total biological_process terms (lvl=1): 27364 Total unique biological_process terms (lvl=1): 81 Top 10 biological_process terms assigned (lvl=1): 1)GO:0008152-metabolic process(L=1): 7379(26.97%) 2)GO:0009987-cellular process(L=1): 6906(25.24%) 3)GO:0044699-single-organism process(L=1): 2519(9.21%) 4)GO:0050896-response to stimulus(L=1): 2058(7.52%) 5)GO:0065007-biological regulation(L=1): 1656(6.05%) 6)GO:0051179-localization(L=1): 1079(3.94%) 7)GO:0032501-multicellular organismal process(L=1): 1026(3.75%) 8)GO:0032502-developmental process(L=1): 1006(3.68%) 9)GO:0071840-cellular component organization or biogenesis(L=1): 879(3.21%) 10)GO:0000003-reproduction(L=1): 712(2.60%) Total unique sequences with at least one pathway (KEGG) assignment: 1600 Total unique sequences without pathways (KEGG): 10227 Total pathways (KEGG) assigned: 5427 ------------------------------------------------------ Final Annotation Statistics ------------------------------------------------------ Total Sequences: 94394 Similarity Search Total unique sequences with an alignment: 365 Total unique sequences without an alignment: 94029 Gene Families Total unique sequences with family assignment: 11827 Total unique sequences without family assignment: 82567 Total unique sequences with at least one GO term: 8915 Total unique sequences with at least one pathway (KEGG) assignment: 1589 Totals Total unique sequences annotated (similarity search alignments only): 49 Total unique sequences annotated (gene family assignment only): 11511 Total unique sequences annotated (gene family and/or similarity search): 11876 Total unique sequences unannotated (gene family and/or similarity search): 82518 EnTAP has completed! Total runtime (minutes): 863