------------------------------------------------------ EnTAP Run Information - Execution ------------------------------------------------------ Current EnTAP Version: 0.10.4 Start time: Thu Sep 24 12:44:26 2020 Working directory has been set to: /home/FCAM/egrau/gmap/Potr/v1.0/entap_results User Inputs: out-dir: /home/FCAM/egrau/gmap/Potr/v1.0/entap_results config: false runP: true runN: false overwrite: false ini: /home/FCAM/egrau/gmap/Potr/v1.0/entap_config.ini input: Potr.1_0.filtered.pep.fa database: /isg/shared/databases/Diamond/RefSeq/complete.protein.faa.200.dmnd, graph: false no-trim: false threads: 12 state: + no-check: false output-format: 1,3,4, entap-db-bin: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/entap_database.bin entap-db-sql: entap-graph: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/src/entap_graphing.py data-generate: false data-type: 0, fpkm: 0.500000 align: single-end: false rsem-calculate-expression: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-calculate-expression rsem-sam-validator: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-sam-validator rsem-prepare-reference: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/rsem-prepare-reference convert-sam-for-rsem: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/RSEM-1.3.0/convert-sam-for-rsem complete: false frame-selection: 2 genemarkst-exe: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/libs/gmst_linux_64/gmst.pl transdecoder-long-exe: TransDecoder.LongOrfs transdecoder-predict-exe: TransDecoder.Predict transdecoder-m: 100 diamond-exe: diamond taxon: Populus qcoverage: 50.000000 tcoverage: 50.000000 contam: e-value: 0.000010 uninformative: ontology: 0, level: 1, eggnog-sql: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/eggnog.db eggnog-dmnd: /labs/Wegrzyn/EnTAP/EnTAP_v0.10.4/EnTAP/databases/eggnog_proteins.dmnd interproscan-exe: interproscan.sh protein: ------------------------------------------------------ Transcriptome Statistics ------------------------------------------------------ Protein sequences found Total sequences: 58036 Total length of transcriptome(bp): 58875123 Average sequence length(bp): 1014.00 n50: 1344 n90: 528 Longest sequence(bp): 15816 (jgi|Poptr1|95010|fgenesh1_pg.C_scaffold_64000031) Shortest sequence(bp): 39 (jgi|Poptr1|566091|eugene3.00100649) ------------------------------------------------------ Similarity Search - DIAMOND - complete ------------------------------------------------------ Search results: /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/blastp_Potr_final_complete.out Total alignments: 367620 Total unselected results: 325975 Written to: /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/unselected_lvl0 Total unique transcripts with an alignment: 41645 Reference transcriptome sequences with an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Search results (TSV): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Total unique transcripts without an alignment: 16391 Reference transcriptome sequences without an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/no_hits.faa Total unique informative alignments: 26301 Total unique uninformative alignments: 15344 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Top 10 alignments by species: 1)populus trichocarpa: 30038(72.13%) 2)populus euphratica: 2967(7.12%) 3)hevea brasiliensis: 425(1.02%) 4)camellia sinensis: 404(0.97%) 5)pyrus x bretschneideri: 304(0.73%) 6)nicotiana tomentosiformis: 288(0.69%) 7)pistacia vera: 268(0.64%) 8)quercus suber: 258(0.62%) 9)olea europaea var. sylvestris: 255(0.61%) 10)quercus lobata: 217(0.52%) ------------------------------------------------------ Compiled Similarity Search - DIAMOND - Best Overall ------------------------------------------------------ Total unique transcripts with an alignment: 41645 Reference transcriptome sequences with an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_lvl0 Search results (TSV): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_lvl0 Total unique transcripts without an alignment: 16391 Reference transcriptome sequences without an alignment (FASTA): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/overall_results/no_hits.faa Total unique informative alignments: 26364 Total unique uninformative alignments: 15281 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): /home/FCAM/egrau/gmap/Potr/v1.0/entap_results/similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Top 10 alignments by species: 1)populus trichocarpa: 30084(72.24%) 2)populus euphratica: 2922(7.02%) 3)hevea brasiliensis: 423(1.02%) 4)camellia sinensis: 395(0.95%) 5)pyrus x bretschneideri: 301(0.72%) 6)nicotiana tomentosiformis: 285(0.68%) 7)pistacia vera: 269(0.65%) 8)olea europaea var. sylvestris: 262(0.63%) 9)quercus suber: 249(0.60%) 10)quercus lobata: 218(0.52%) ------------------------------------------------------ Gene Family - Gene Ontology and Pathway - EggNOG ------------------------------------------------------ Statistics for overall Eggnog results: Total unique sequences with family assignment: 49308 Total unique sequences without family assignment: 8728 Top 10 Taxonomic Scopes Assigned: 1)Viridiplantae: 38983(79.06%) 2)Eukaryotes: 7342(14.89%) 3)Bacteria: 1980(4.02%) 4)Ancestor: 925(1.88%) 5)Mammals: 28(0.06%) 6)Animals: 27(0.05%) 7)Archaea: 14(0.03%) 8)Opisthokonts: 5(0.01%) 9)Arthropoda: 3(0.01%) 10)Fungi: 1(0.00%) Total unique sequences with at least one GO term: 49305 Total unique sequences without GO terms: 3 Total GO terms assigned: 2605054 Total molecular_function terms (lvl=1): 60249 Total unique molecular_function terms (lvl=1): 34 Top 10 molecular_function terms assigned (lvl=1): 1)GO:0005488-binding(L=1): 24704(41.00%) 2)GO:0003824-catalytic activity(L=1): 23047(38.25%) 3)GO:0005215-transporter activity(L=1): 3353(5.57%) 4)GO:0001071-nucleic acid binding transcription factor activity(L=1): 2729(4.53%) 5)GO:0005198-structural molecule activity(L=1): 1499(2.49%) 6)GO:0009055-electron carrier activity(L=1): 1433(2.38%) 7)GO:0060089-molecular transducer activity(L=1): 1206(2.00%) 8)GO:0004871-signal transducer activity(L=1): 1206(2.00%) 9)GO:0016209-antioxidant activity(L=1): 525(0.87%) 10)GO:0000988-transcription factor activity, protein binding(L=1): 247(0.41%) Total cellular_component terms (lvl=1): 74695 Total unique cellular_component terms (lvl=1): 15 Top 10 cellular_component terms assigned (lvl=1): 1)GO:0005623-cell(L=1): 24975(33.44%) 2)GO:0043226-organelle(L=1): 18828(25.21%) 3)GO:0016020-membrane(L=1): 14199(19.01%) 4)GO:0032991-macromolecular complex(L=1): 6273(8.40%) 5)GO:0005576-extracellular region(L=1): 2640(3.53%) 6)GO:0030054-cell junction(L=1): 2600(3.48%) 7)GO:0031974-membrane-enclosed lumen(L=1): 2346(3.14%) 8)GO:0055044-symplast(L=1): 2327(3.12%) 9)GO:0045202-synapse(L=1): 273(0.37%) 10)GO:0009295-nucleoid(L=1): 121(0.16%) Total overall terms (lvl=1): 268854 Total unique overall terms (lvl=1): 270 Top 10 overall terms assigned (lvl=1): 1)GO:0008152-metabolic process(L=1): 28063(10.44%) 2)GO:0009987-cellular process(L=1): 25590(9.52%) 3)GO:0005623-cell(L=1): 24975(9.29%) 4)GO:0005488-binding(L=1): 24704(9.19%) 5)GO:0003824-catalytic activity(L=1): 23047(8.57%) 6)GO:0043226-organelle(L=1): 18828(7.00%) 7)GO:0044699-single-organism process(L=1): 15133(5.63%) 8)GO:0016020-membrane(L=1): 14199(5.28%) 9)GO:0050896-response to stimulus(L=1): 12482(4.64%) 10)GO:0065007-biological regulation(L=1): 11064(4.12%) Total biological_process terms (lvl=1): 133910 Total unique biological_process terms (lvl=1): 221 Top 10 biological_process terms assigned (lvl=1): 1)GO:0008152-metabolic process(L=1): 28063(20.96%) 2)GO:0009987-cellular process(L=1): 25590(19.11%) 3)GO:0044699-single-organism process(L=1): 15133(11.30%) 4)GO:0050896-response to stimulus(L=1): 12482(9.32%) 5)GO:0065007-biological regulation(L=1): 11064(8.26%) 6)GO:0051179-localization(L=1): 6834(5.10%) 7)GO:0032501-multicellular organismal process(L=1): 6216(4.64%) 8)GO:0032502-developmental process(L=1): 6152(4.59%) 9)GO:0071840-cellular component organization or biogenesis(L=1): 5623(4.20%) 10)GO:0023052-signaling(L=1): 3821(2.85%) Total unique sequences with at least one pathway (KEGG) assignment: 11132 Total unique sequences without pathways (KEGG): 38176 Total pathways (KEGG) assigned: 41026 ------------------------------------------------------ Final Annotation Statistics ------------------------------------------------------ Total Sequences: 58036 Similarity Search Total unique sequences with an alignment: 41645 Total unique sequences without an alignment: 16391 Gene Families Total unique sequences with family assignment: 49308 Total unique sequences without family assignment: 8728 Total unique sequences with at least one GO term: 38702 Total unique sequences with at least one pathway (KEGG) assignment: 11058 Totals Total unique sequences annotated (similarity search alignments only): 891 Total unique sequences annotated (gene family assignment only): 8554 Total unique sequences annotated (gene family and/or similarity search): 50199 Total unique sequences unannotated (gene family and/or similarity search): 7837 EnTAP has completed! Total runtime (minutes): 1423