------------------------------------------------------ EnTAP Run Information - Execution ------------------------------------------------------ Current EnTAP Version: 0.9.1 Start time: Fri Apr 24 20:00:13 2020 Working directory has been set to: ./en_results/ Execution Paths/Commands: RSEM Directory: /labs/Wegrzyn/EnTAP/libs/RSEM-1.3.0 GeneMarkS-T: perl /labs/Wegrzyn/EnTAP/libs/gmst_linux_64/gmst.pl DIAMOND: diamond InterPro: interproscan.sh EggNOG SQL Database: /labs/Wegrzyn/EnTAP/EnTAP_v0.9.0/EnTAP/databases/databases/eggnog.db EggNOG DIAMOND Database: /labs/Wegrzyn/EnTAP/EnTAP_v0.9.0/EnTAP/databases/bin/eggnog_proteins.dmnd EnTAP Database (binary): /labs/Wegrzyn/EnTAP/EnTAP_v0.9.0/EnTAP/databases/bin/entap_database.bin EnTAP Database (SQL): /labs/Wegrzyn/EnTAP/EnTAP_v0.9.1/EnTAP//databases/entap_database.db EnTAP Graphing Script: /labs/Wegrzyn/EnTAP/EnTAP_v0.9.1/EnTAP//src/entap_graphing.py User Inputs: contam: null data-type: 0 database: /isg/shared/databases/Diamond/Uniprot/uniprot_sprot.dmnd /isg/shared/databases/Diamond/RefSeq/complete.protein.faa.92.dmnd /isg/shared/databases/Diamond/Uniprot/uniprot_sprot.dmnd e: 1.00e-05 fpkm: 0.50 input: ./annotation/Frex.0_4.pep.fa level: 0 3 4 ontology: 0 out-dir: ./en_results/ output-format: 1 4 3 paths: /labs/Wegrzyn/EnTAP/EnTAP_v0.9.1/EnTAP/entap_config.txt protein: pfam qcoverage: 50.00 runP: null state: + tcoverage: 50.00 threads: 25 ------------------------------------------------------ Transcriptome Statistics ------------------------------------------------------ Protein sequences found Total sequences: 59154 Total length of transcriptome(bp): 71852970 Average sequence length(bp): 1214.00 n50: 1527 n90: 645 Longest sequence(bp): 16566 (FRAEX38873_1.0_000227250.3) Shortest sequence(bp): 132 (FRAEX38873_1.0_000271540.1) ------------------------------------------------------ Similarity Search - DIAMOND - uniprot_sprot ------------------------------------------------------ Search results: ./en_results//similarity_search/DIAMOND/blastp_Frex_final_uniprot_sprot.out Total alignments: 62868 Total unselected results: 28655 Written to: ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/unselected.tsv Total unique transcripts with an alignment: 34213 Reference transcriptome sequences with an alignment (FASTA): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_lvl0 Search results (TSV): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_lvl0 Total unique transcripts without an alignment: 24941 Reference transcriptome sequences without an alignment (FASTA): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/no_hits.faa Total unique informative alignments: 32801 Total unique uninformative alignments: 1412 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_contam_lvl0 Top 10 alignments by species: 1)arabidopsis thaliana: 24647(72.04%) 2)oryza sativa subsp. japonica: 1353(3.95%) 3)nicotiana tabacum: 504(1.47%) 4)homo sapiens: 427(1.25%) 5)solanum lycopersicum: 414(1.21%) 6)mus musculus: 385(1.13%) 7)solanum tuberosum: 302(0.88%) 8)dictyostelium discoideum: 299(0.87%) 9)oryza sativa subsp. indica: 255(0.75%) 10)bos taurus: 218(0.64%) ------------------------------------------------------ Similarity Search - DIAMOND - complete ------------------------------------------------------ Search results: ./en_results//similarity_search/DIAMOND/blastp_Frex_final_complete.out Total alignments: 443549 Total unselected results: 393573 Written to: ./en_results//similarity_search/DIAMOND/processed//complete/unselected.tsv Total unique transcripts with an alignment: 49976 Reference transcriptome sequences with an alignment (FASTA): ./en_results//similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Search results (TSV): ./en_results//similarity_search/DIAMOND/processed//complete/best_hits_lvl0 Total unique transcripts without an alignment: 9178 Reference transcriptome sequences without an alignment (FASTA): ./en_results//similarity_search/DIAMOND/processed//complete/no_hits.faa Total unique informative alignments: 37813 Total unique uninformative alignments: 12163 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): ./en_results//similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): ./en_results//similarity_search/DIAMOND/processed//complete/best_hits_contam_lvl0 Top 10 alignments by species: 1)olea europaea var. sylvestris: 40524(81.09%) 2)sesamum indicum: 3912(7.83%) 3)coffea arabica: 524(1.05%) 4)nicotiana tabacum: 397(0.79%) 5)erythranthe guttata: 340(0.68%) 6)coffea eugenioides: 255(0.51%) 7)solanum lycopersicum: 202(0.40%) 8)ziziphus jujuba: 188(0.38%) 9)juglans regia: 137(0.27%) 10)nicotiana sylvestris: 136(0.27%) ------------------------------------------------------ Similarity Search - DIAMOND - uniprot_sprot ------------------------------------------------------ Search results: ./en_results//similarity_search/DIAMOND/blastp_Frex_final_uniprot_sprot.out Total alignments: 125736 Total unselected results: 91523 Written to: ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/unselected.tsv Total unique transcripts with an alignment: 34213 Reference transcriptome sequences with an alignment (FASTA): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_lvl0 Search results (TSV): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_lvl0 Total unique transcripts without an alignment: 24941 Reference transcriptome sequences without an alignment (FASTA): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/no_hits.faa Total unique informative alignments: 32801 Total unique uninformative alignments: 1412 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): ./en_results//similarity_search/DIAMOND/processed//uniprot_sprot/best_hits_contam_lvl0 Top 10 alignments by species: 1)arabidopsis thaliana: 24633(72.00%) 2)oryza sativa subsp. japonica: 1354(3.96%) 3)nicotiana tabacum: 506(1.48%) 4)homo sapiens: 426(1.25%) 5)solanum lycopersicum: 414(1.21%) 6)mus musculus: 386(1.13%) 7)solanum tuberosum: 303(0.89%) 8)dictyostelium discoideum: 299(0.87%) 9)oryza sativa subsp. indica: 256(0.75%) 10)bos taurus: 214(0.63%) ------------------------------------------------------ Compiled Similarity Search - DIAMOND - Best Overall ------------------------------------------------------ Total unique transcripts with an alignment: 49998 Reference transcriptome sequences with an alignment (FASTA): ./en_results//similarity_search/DIAMOND/overall_results/best_hits_lvl0 Search results (TSV): ./en_results//similarity_search/DIAMOND/overall_results/best_hits_lvl0 Total unique transcripts without an alignment: 9156 Reference transcriptome sequences without an alignment (FASTA): ./en_results//similarity_search/DIAMOND/overall_results/no_hits.faa Total unique informative alignments: 38268 Total unique uninformative alignments: 11730 Total unique contaminants: 0(0.00%): Transcriptome reference sequences labeled as a contaminant (FASTA): ./en_results//similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Transcriptome reference sequences labeled as a contaminant (TSV): ./en_results//similarity_search/DIAMOND/overall_results/best_hits_contam_lvl0 Top 10 alignments by species: 1)olea europaea var. sylvestris: 32496(64.99%) 2)arabidopsis thaliana: 6793(13.59%) 3)sesamum indicum: 2919(5.84%) 4)nicotiana tabacum: 655(1.31%) 5)coffea arabica: 435(0.87%) 6)oryza sativa subsp. japonica: 402(0.80%) 7)solanum lycopersicum: 386(0.77%) 8)erythranthe guttata: 294(0.59%) 9)solanum tuberosum: 239(0.48%) 10)coffea eugenioides: 212(0.42%) ------------------------------------------------------ Gene Family - Gene Ontology and Pathway - EggNOG ------------------------------------------------------ Statistics for overall Eggnog results: Total unique sequences with family assignment: 56308 Total unique sequences without family assignment: 2846 Top 10 Taxonomic Scopes Assigned: 1)Viridiplantae: 54117(96.11%) 2)Eukaryotes: 1757(3.12%) 3)Bacteria: 216(0.38%) 4)Ancestor: 187(0.33%) 5)Fungi: 21(0.04%) 6)Animals: 3(0.01%) 7)Arthropoda: 3(0.01%) 8)Apicomplexa: 2(0.00%) 9)Fishes: 1(0.00%) 10)Opisthokonts: 1(0.00%) Total unique sequences with at least one GO term: 56304 Total unique sequences without GO terms: 4 Total GO terms assigned: 2820502 Total molecular_function terms (lvl=0): 578846 Total unique molecular_function terms (lvl=0): 2987 Top 10 molecular_function terms assigned (lvl=0): 1)GO:0003674-molecular_function(L=0): 39472(6.82%) 2)GO:0005488-binding(L=1): 27852(4.81%) 3)GO:0003824-catalytic activity(L=1): 24917(4.30%) 4)GO:0097159-organic cyclic compound binding(L=2): 20313(3.51%) 5)GO:1901363-heterocyclic compound binding(L=2): 20293(3.51%) 6)GO:0043167-ion binding(L=2): 18642(3.22%) 7)GO:0003676-nucleic acid binding(L=3): 11834(2.04%) 8)GO:0043169-cation binding(L=3): 11177(1.93%) 9)GO:0046872-metal ion binding(L=4): 10933(1.89%) 10)GO:0036094-small molecule binding(L=2): 10828(1.87%) Total cellular_component terms (lvl=0): 625259 Total unique cellular_component terms (lvl=0): 1082 Top 10 cellular_component terms assigned (lvl=0): 1)GO:0005575-cellular_component(L=0): 37458(5.99%) 2)GO:0005623-cell(L=1): 34068(5.45%) 3)GO:0044464-cell part(L=2): 34068(5.45%) 4)GO:0005622-intracellular(L=3): 30092(4.81%) 5)GO:0044424-intracellular part(L=3): 29514(4.72%) 6)GO:0043226-organelle(L=1): 26335(4.21%) 7)GO:0043229-intracellular organelle(L=3): 26329(4.21%) 8)GO:0043227-membrane-bounded organelle(L=2): 25316(4.05%) 9)GO:0043231-intracellular membrane-bounded organelle(L=4): 25310(4.05%) 10)GO:0005737-cytoplasm(L=4): 22130(3.54%) Total overall terms (lvl=0): 2820502 Total unique overall terms (lvl=0): 11540 Top 10 overall terms assigned (lvl=0): 1)GO:0008150-biological_process(L=0): 41335(1.47%) 2)GO:0003674-molecular_function(L=0): 39472(1.40%) 3)GO:0005575-cellular_component(L=0): 37458(1.33%) 4)GO:0005623-cell(L=1): 34068(1.21%) 5)GO:0044464-cell part(L=2): 34068(1.21%) 6)GO:0008152-metabolic process(L=1): 32621(1.16%) 7)GO:0009987-cellular process(L=1): 31064(1.10%) 8)GO:0005622-intracellular(L=3): 30092(1.07%) 9)GO:0044424-intracellular part(L=3): 29514(1.05%) 10)GO:0005488-binding(L=1): 27852(0.99%) Total biological_process terms (lvl=0): 1616397 Total unique biological_process terms (lvl=0): 7471 Top 10 biological_process terms assigned (lvl=0): 1)GO:0008150-biological_process(L=0): 41335(2.56%) 2)GO:0008152-metabolic process(L=1): 32621(2.02%) 3)GO:0009987-cellular process(L=1): 31064(1.92%) 4)GO:0071704-organic substance metabolic process(L=2): 26001(1.61%) 5)GO:0044237-cellular metabolic process(L=2): 25149(1.56%) 6)GO:0044238-primary metabolic process(L=2): 24973(1.54%) 7)GO:0043170-macromolecule metabolic process(L=3): 19261(1.19%) 8)GO:0044699-single-organism process(L=1): 18838(1.17%) 9)GO:0044260-cellular macromolecule metabolic process(L=3): 17354(1.07%) 10)GO:0044763-single-organism cellular process(L=2): 15818(0.98%) Total molecular_function terms (lvl=3): 117718 Total unique molecular_function terms (lvl=3): 270 Top 10 molecular_function terms assigned (lvl=3): 1)GO:0003676-nucleic acid binding(L=3): 11834(10.05%) 2)GO:0043169-cation binding(L=3): 11177(9.49%) 3)GO:0000166-nucleotide binding(L=3): 10631(9.03%) 4)GO:1901265-nucleoside phosphate binding(L=3): 10631(9.03%) 5)GO:0043168-anion binding(L=3): 9847(8.36%) 6)GO:0032553-ribonucleotide binding(L=3): 8466(7.19%) 7)GO:0001882-nucleoside binding(L=3): 8351(7.09%) 8)GO:0016772-transferase activity, transferring phosphorus-containing groups(L=3): 6166(5.24%) 9)GO:0016817-hydrolase activity, acting on acid anhydrides(L=3): 3089(2.62%) 10)GO:0016788-hydrolase activity, acting on ester bonds(L=3): 3024(2.57%) Total cellular_component terms (lvl=3): 165458 Total unique cellular_component terms (lvl=3): 120 Top 10 cellular_component terms assigned (lvl=3): 1)GO:0005622-intracellular(L=3): 30092(18.19%) 2)GO:0044424-intracellular part(L=3): 29514(17.84%) 3)GO:0043229-intracellular organelle(L=3): 26329(15.91%) 4)GO:0044446-intracellular organelle part(L=3): 12291(7.43%) 5)GO:0071944-cell periphery(L=3): 11040(6.67%) 6)GO:0005886-plasma membrane(L=3): 9515(5.75%) 7)GO:0031224-intrinsic component of membrane(L=3): 8701(5.26%) 8)GO:0031090-organelle membrane(L=3): 5556(3.36%) 9)GO:0043232-intracellular non-membrane-bounded organelle(L=3): 4597(2.78%) 10)GO:0031975-envelope(L=3): 3444(2.08%) Total overall terms (lvl=3): 644354 Total unique overall terms (lvl=3): 806 Top 10 overall terms assigned (lvl=3): 1)GO:0005622-intracellular(L=3): 30092(4.67%) 2)GO:0044424-intracellular part(L=3): 29514(4.58%) 3)GO:0043229-intracellular organelle(L=3): 26329(4.09%) 4)GO:0043170-macromolecule metabolic process(L=3): 19261(2.99%) 5)GO:0044260-cellular macromolecule metabolic process(L=3): 17354(2.69%) 6)GO:1901360-organic cyclic compound metabolic process(L=3): 12474(1.94%) 7)GO:0044446-intracellular organelle part(L=3): 12291(1.91%) 8)GO:0006725-cellular aromatic compound metabolic process(L=3): 12198(1.89%) 9)GO:0034641-cellular nitrogen compound metabolic process(L=3): 12039(1.87%) 10)GO:0046483-heterocycle metabolic process(L=3): 11947(1.85%) Total biological_process terms (lvl=3): 361178 Total unique biological_process terms (lvl=3): 416 Top 10 biological_process terms assigned (lvl=3): 1)GO:0043170-macromolecule metabolic process(L=3): 19261(5.33%) 2)GO:0044260-cellular macromolecule metabolic process(L=3): 17354(4.80%) 3)GO:1901360-organic cyclic compound metabolic process(L=3): 12474(3.45%) 4)GO:0006725-cellular aromatic compound metabolic process(L=3): 12198(3.38%) 5)GO:0034641-cellular nitrogen compound metabolic process(L=3): 12039(3.33%) 6)GO:0046483-heterocycle metabolic process(L=3): 11947(3.31%) 7)GO:1901576-organic substance biosynthetic process(L=3): 11925(3.30%) 8)GO:0050794-regulation of cellular process(L=3): 11909(3.30%) 9)GO:0044249-cellular biosynthetic process(L=3): 11793(3.27%) 10)GO:0006139-nucleobase-containing compound metabolic process(L=3): 11107(3.08%) Total molecular_function terms (lvl=4): 110420 Total unique molecular_function terms (lvl=4): 553 Top 10 molecular_function terms assigned (lvl=4): 1)GO:0046872-metal ion binding(L=4): 10933(9.90%) 2)GO:0017076-purine nucleotide binding(L=4): 8367(7.58%) 3)GO:0032555-purine ribonucleotide binding(L=4): 8361(7.57%) 4)GO:0032549-ribonucleoside binding(L=4): 8336(7.55%) 5)GO:0001883-purine nucleoside binding(L=4): 8316(7.53%) 6)GO:0035639-purine ribonucleoside triphosphate binding(L=4): 8100(7.34%) 7)GO:0003677-DNA binding(L=4): 7235(6.55%) 8)GO:0016301-kinase activity(L=4): 4792(4.34%) 9)GO:0016773-phosphotransferase activity, alcohol group as acceptor(L=4): 4035(3.65%) 10)GO:0003723-RNA binding(L=4): 3565(3.23%) Total cellular_component terms (lvl=4): 106440 Total unique cellular_component terms (lvl=4): 198 Top 10 cellular_component terms assigned (lvl=4): 1)GO:0043231-intracellular membrane-bounded organelle(L=4): 25310(23.78%) 2)GO:0005737-cytoplasm(L=4): 22130(20.79%) 3)GO:0044444-cytoplasmic part(L=4): 20135(18.92%) 4)GO:0016021-integral component of membrane(L=4): 8268(7.77%) 5)GO:0005829-cytosol(L=4): 6232(5.85%) 6)GO:0005794-Golgi apparatus(L=4): 3168(2.98%) 7)GO:0005618-cell wall(L=4): 2467(2.32%) 8)GO:0030529-intracellular ribonucleoprotein complex(L=4): 2336(2.19%) 9)GO:0005783-endoplasmic reticulum(L=4): 2327(2.19%) 10)GO:0009579-thylakoid(L=4): 1486(1.40%) Total overall terms (lvl=4): 584229 Total unique overall terms (lvl=4): 2013 Top 10 overall terms assigned (lvl=4): 1)GO:0043231-intracellular membrane-bounded organelle(L=4): 25310(4.33%) 2)GO:0005737-cytoplasm(L=4): 22130(3.79%) 3)GO:0044444-cytoplasmic part(L=4): 20135(3.45%) 4)GO:0046872-metal ion binding(L=4): 10933(1.87%) 5)GO:0044267-cellular protein metabolic process(L=4): 9578(1.64%) 6)GO:0090304-nucleic acid metabolic process(L=4): 9027(1.55%) 7)GO:0006796-phosphate-containing compound metabolic process(L=4): 8498(1.45%) 8)GO:0017076-purine nucleotide binding(L=4): 8367(1.43%) 9)GO:0032555-purine ribonucleotide binding(L=4): 8361(1.43%) 10)GO:0032549-ribonucleoside binding(L=4): 8336(1.43%) Total biological_process terms (lvl=4): 367369 Total unique biological_process terms (lvl=4): 1262 Top 10 biological_process terms assigned (lvl=4): 1)GO:0044267-cellular protein metabolic process(L=4): 9578(2.61%) 2)GO:0090304-nucleic acid metabolic process(L=4): 9027(2.46%) 3)GO:0006796-phosphate-containing compound metabolic process(L=4): 8498(2.31%) 4)GO:0009059-macromolecule biosynthetic process(L=4): 8331(2.27%) 5)GO:0034645-cellular macromolecule biosynthetic process(L=4): 8211(2.24%) 6)GO:0010467-gene expression(L=4): 8081(2.20%) 7)GO:0031323-regulation of cellular metabolic process(L=4): 7951(2.16%) 8)GO:0080090-regulation of primary metabolic process(L=4): 7775(2.12%) 9)GO:0060255-regulation of macromolecule metabolic process(L=4): 7610(2.07%) 10)GO:0043412-macromolecule modification(L=4): 7194(1.96%) Total unique sequences with at least one pathway (KEGG) assignment: 14588 Total unique sequences without pathways (KEGG): 41720 Total pathways (KEGG) assigned: 49928 ------------------------------------------------------ Final Annotation Statistics ------------------------------------------------------ Total Sequences: 59154 Similarity Search Total unique sequences with an alignment: 49998 Total unique sequences without an alignment: 9156 Gene Families Total unique sequences with family assignment: 56308 Total unique sequences without family assignment: 2846 Total unique sequences with at least one GO term: 47662 Total unique sequences with at least one pathway (KEGG) assignment: 14435 Totals Total unique sequences annotated (similarity search alignments only): 311 Total unique sequences annotated (gene family assignment only): 6621 Total unique sequences annotated (gene family and/or similarity search): 56619 Total unique sequences unannotated (gene family and/or similarity search): 2535 EnTAP has completed! Total runtime (minutes): 981