------------------------------------------------------ EnTAP Run Information - Execution ------------------------------------------------------ Current EnTAP Version: 0.10.4 Start time: Thu Sep 24 12:48:21 2020 Working directory has been set to: /home/FCAM/egrau/gmap/Potr/v2.0/entap_results User Inputs: out-dir: /home/FCAM/egrau/gmap/Potr/v2.0/entap_results config: false runP: true runN: false overwrite: false ini: /home/FCAM/egrau/gmap/Potr/v2.0/entap_config.ini input: Potr.2_0.filtered.pep.fa database: /isg/shared/databases/Diamond/RefSeq/complete.protein.faa.200.dmnd, graph: false no-trim: false threads: 12 state: + no-check: false output-format: 1,3,4, entap-db-bin: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/entap_database.bin entap-db-sql: entap-graph: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/src/entap_graphing.py data-generate: false data-type: 0, fpkm: 0.500000 align: single-end: false rsem-calculate-expression: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-calculate-expression rsem-sam-validator: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-sam-validator rsem-prepare-reference: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-prepare-reference convert-sam-for-rsem: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/convert-sam-for-rsem complete: false frame-selection: 2 genemarkst-exe: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/gmst_linux_64/gmst.pl transdecoder-long-exe: TransDecoder.LongOrfs transdecoder-predict-exe: TransDecoder.Predict transdecoder-m: 100 diamond-exe: diamond taxon: Populus qcoverage: 50.000000 tcoverage: 50.000000 contam: e-value: 0.000010 uninformative: ontology: 0, level: 1, eggnog-sql: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/eggnog.db eggnog-dmnd: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/eggnog_proteins.dmnd interproscan-exe: interproscan.sh protein: ------------------------------------------------------ Transcriptome Statistics ------------------------------------------------------ Protein sequences found Total sequences: 45033 Total length of transcriptome(bp): 56429232 Average sequence length(bp): 1253.00 n50: 1575 n90: 654 Longest sequence(bp): 16056 (POPTR_0017s06640.1|PACid:18210658) Shortest sequence(bp): 165 (POPTR_0015s10110.1|PACid:18232727) ------------------------------------------------------ Similarity Search - DIAMOND - complete ------------------------------------------------------ Search results: /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/blastp_Potr_final_complete.out Total alignments: 296836 Total unselected results: 258265 Written to: /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/processed//complete/unselected_lvl0 Total unique transcripts with an alignment: 38571 Reference transcriptome sequences with an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Search results (TSV): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Total unique transcripts without an alignment: 6462 Reference transcriptome sequences without an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/processed//complete/no_hits.faa Total unique informative alignments: 28439 Total unique uninformative alignments: 10132 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Top 10 alignments by species: 1)populus trichocarpa: 33957(88.04%) 2)populus euphratica: 1941(5.03%) 3)hevea brasiliensis: 299(0.78%) 4)manihot esculenta: 146(0.38%) 5)quercus lobata: 116(0.30%) 6)camellia sinensis: 115(0.30%) 7)jatropha curcas: 115(0.30%) 8)quercus suber: 112(0.29%) 9)pistacia vera: 86(0.22%) 10)ricinus communis: 84(0.22%) ------------------------------------------------------ Compiled Similarity Search - DIAMOND - Best Overall ------------------------------------------------------ Total unique transcripts with an alignment: 38571 Reference transcriptome sequences with an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_lvl0 Search results (TSV): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_lvl0 Total unique transcripts without an alignment: 6462 Reference transcriptome sequences without an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/overall_results/no_hits.faa Total unique informative alignments: 28512 Total unique uninformative alignments: 10059 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): /home/FCAM/egrau/gmap/Potr/v2.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Top 10 alignments by species: 1)populus trichocarpa: 34012(88.18%) 2)populus euphratica: 1886(4.89%) 3)hevea brasiliensis: 300(0.78%) 4)manihot esculenta: 144(0.37%) 5)camellia sinensis: 116(0.30%) 6)quercus lobata: 115(0.30%) 7)quercus suber: 113(0.29%) 8)jatropha curcas: 112(0.29%) 9)ricinus communis: 89(0.23%) 10)pistacia vera: 86(0.22%) ------------------------------------------------------ Gene Family - Gene Ontology and Pathway - EggNOG ------------------------------------------------------ Statistics for overall Eggnog results: Total unique sequences with family assignment: 41611 Total unique sequences without family assignment: 3422 Top 10 Taxonomic Scopes Assigned: 1)Viridiplantae: 34964(84.03%) 2)Eukaryotes: 5826(14.00%) 3)Ancestor: 815(1.96%) 4)Bacteria: 3(0.01%) 5)Animals: 2(0.00%) 6)Archaea: 1(0.00%) Total unique sequences with at least one GO term: 41607 Total unique sequences without GO terms: 4 Total GO terms assigned: 2316009 Total molecular_function terms (lvl=1): 50360 Total unique molecular_function terms (lvl=1): 36 Top 10 molecular_function terms assigned (lvl=1): 1)GO:0005488-binding(L=1): 20464(40.64%) 2)GO:0003824-catalytic activity(L=1): 18970(37.67%) 3)GO:0005215-transporter activity(L=1): 2845(5.65%) 4)GO:0001071-nucleic acid binding transcription factor activity(L=1): 2655(5.27%) 5)GO:0005198-structural molecule activity(L=1): 1397(2.77%) 6)GO:0009055-electron carrier activity(L=1): 1123(2.23%) 7)GO:0060089-molecular transducer activity(L=1): 932(1.85%) 8)GO:0004871-signal transducer activity(L=1): 932(1.85%) 9)GO:0016209-antioxidant activity(L=1): 487(0.97%) 10)GO:0000988-transcription factor activity, protein binding(L=1): 244(0.48%) Total cellular_component terms (lvl=1): 69266 Total unique cellular_component terms (lvl=1): 15 Top 10 cellular_component terms assigned (lvl=1): 1)GO:0005623-cell(L=1): 22703(32.78%) 2)GO:0043226-organelle(L=1): 17958(25.93%) 3)GO:0016020-membrane(L=1): 12911(18.64%) 4)GO:0032991-macromolecular complex(L=1): 5496(7.93%) 5)GO:0030054-cell junction(L=1): 2517(3.63%) 6)GO:0005576-extracellular region(L=1): 2514(3.63%) 7)GO:0031974-membrane-enclosed lumen(L=1): 2456(3.55%) 8)GO:0055044-symplast(L=1): 2372(3.42%) 9)GO:0045202-synapse(L=1): 155(0.22%) 10)GO:0009295-nucleoid(L=1): 109(0.16%) Total overall terms (lvl=1): 241756 Total unique overall terms (lvl=1): 279 Top 10 overall terms assigned (lvl=1): 1)GO:0008152-metabolic process(L=1): 23668(9.79%) 2)GO:0005623-cell(L=1): 22703(9.39%) 3)GO:0009987-cellular process(L=1): 21925(9.07%) 4)GO:0005488-binding(L=1): 20464(8.46%) 5)GO:0003824-catalytic activity(L=1): 18970(7.85%) 6)GO:0043226-organelle(L=1): 17958(7.43%) 7)GO:0044699-single-organism process(L=1): 13995(5.79%) 8)GO:0016020-membrane(L=1): 12911(5.34%) 9)GO:0050896-response to stimulus(L=1): 11579(4.79%) 10)GO:0065007-biological regulation(L=1): 10453(4.32%) Total biological_process terms (lvl=1): 122130 Total unique biological_process terms (lvl=1): 228 Top 10 biological_process terms assigned (lvl=1): 1)GO:0008152-metabolic process(L=1): 23668(19.38%) 2)GO:0009987-cellular process(L=1): 21925(17.95%) 3)GO:0044699-single-organism process(L=1): 13995(11.46%) 4)GO:0050896-response to stimulus(L=1): 11579(9.48%) 5)GO:0065007-biological regulation(L=1): 10453(8.56%) 6)GO:0032501-multicellular organismal process(L=1): 6244(5.11%) 7)GO:0032502-developmental process(L=1): 6229(5.10%) 8)GO:0051179-localization(L=1): 6119(5.01%) 9)GO:0071840-cellular component organization or biogenesis(L=1): 5340(4.37%) 10)GO:0000003-reproduction(L=1): 3851(3.15%) Total unique sequences with at least one pathway (KEGG) assignment: 10049 Total unique sequences without pathways (KEGG): 31562 Total pathways (KEGG) assigned: 36236 ------------------------------------------------------ Final Annotation Statistics ------------------------------------------------------ Total Sequences: 45033 Similarity Search Total unique sequences with an alignment: 38571 Total unique sequences without an alignment: 6462 Gene Families Total unique sequences with family assignment: 41611 Total unique sequences without family assignment: 3422 Total unique sequences with at least one GO term: 33531 Total unique sequences with at least one pathway (KEGG) assignment: 9977 Totals Total unique sequences annotated (similarity search alignments only): 366 Total unique sequences annotated (gene family assignment only): 3406 Total unique sequences annotated (gene family and/or similarity search): 41977 Total unique sequences unannotated (gene family and/or similarity search): 3056 EnTAP has completed! Total runtime (minutes): 1083