File Formats¶
ORFanage is designed to use simple and common file formats.
Re-Annotated GTF Output¶
SeqID |
chr3 |
Chromosome ID |
|---|---|---|
Source |
BestRefSeq |
Source of the annotation |
Type |
transcript |
Type of the record. Possible values are: transcript/exon/CDS |
Start |
139577792 |
Start coordinate of the record |
End |
139583319 |
End coordinate of the record |
Score |
. |
Unused by ORFanage |
Strand |
Strand of the record |
|
Phase |
. |
Value sets the number of bases that need to be trimmed off the start of the CDS record to reach th estart of the next codon. Value is only available in CDS records. Possible values are: 0/1/2 |
Attributes |
transcript_id “rna-NR_121609.1 |
Field containing extended descriptors of each record |
transcript_id |
rna-NM_001363968.1 |
Unique Transcript Identifier |
|---|---|---|
gene_id |
gene-NMNAT3 |
Unique Gene Identifier |
orfanage_status |
1 |
Flag set if ORFanage was able to find an ORF. Possible values are 0 and 1 |
orfanage_template |
ENST00000643695.2 |
transcript_id of the reference transcript used to find the best ORF |
ORFanage Stats Output (TSV)¶
The stats output file (--stats) contains detailed metrics for each query/template comparison.
Column |
Description |
Example 1 |
Example 2 |
Example 3 |
Example 4 |
Example 5 |
|---|---|---|---|---|---|---|
query_id |
Transcript ID of the query transcript. In ALL mode may include CDS coordinates. |
rna-NR_121609.1 |
rna-NM_001363968.1 |
rna-NM_001363968.1 |
rna-NM_001363968.1 |
rna-XM_011512790.1 |
template_id |
Transcript ID of the template (reference) transcript. ‘-’ if no template found. |
ENST00000643695.2 |
ENST00000643695.2 |
ENST00000643695.2 |
ENST00000643695.2 |
|
segment |
Genomic coordinates of the CDS segment in format start-end. |
139561010-139561392 |
139573598-139579055 |
139627616-139627724 |
139561010-139579055 |
|
notes |
Status annotation: best_gtf, gtf, dup, keep_cds, no-overlap, segment_len<3, cds_len<3. |
gtf |
dup |
gtf |
||
num_templates |
Number of template transcripts contributing to this ORF annotation. |
1 |
1 |
1 |
1 |
|
query_len |
Length of the query CDS in nucleotides (bp). |
393 |
513 |
513 |
648 |
|
template_len |
Length of the template CDS in nucleotides (bp). |
1041 |
1041 |
1041 |
1041 |
|
union_len |
Length of the union of query and template CDS chains in nucleotides. |
1051 |
1168 |
1168 |
1041 |
|
pass |
Boolean (0/1) indicating if candidate CDS passed all filtering criteria. |
1 |
1 |
0 |
1 |
|
len_match |
Number of nucleotides overlapping between query and template CDS. |
383 |
386 |
386 |
648 |
|
len_inframe |
Number of overlapping nucleotides in the same reading frame as template. |
383 |
376 |
376 |
648 |
|
len_outframe |
Number of overlapping nucleotides out of frame relative to template. |
0 |
10 |
10 |
0 |
|
len_extra |
Nucleotides in query CDS extending beyond template CDS boundaries. |
10 |
127 |
127 |
0 |
|
len_missing |
Nucleotides from template CDS not covered by query CDS. |
658 |
655 |
655 |
393 |
|
length_pi |
Length Percent Identity: 100 * query_len / union_len. |
37.7522 |
49.2795 |
49.2795 |
62.2478 |
|
match_length_pi |
Match Length Percent Identity: 100 * len_match / union_len. Measures actual overlap between query and template CDS accepting both in and out of frame matching positions. |
36.7915 |
37.0797 |
37.0797 |
62.2478 |
|
inframe_length_pi |
In-frame Length Percent Identity: 100 * len_inframe / union_len. Measures overlap maintaining correct reading frame. |
36.7915 |
36.1191 |
36.1191 |
62.2478 |
|
alignment_match |
Nucleotides with identical sequence (requires –reference for alignment). |
0 |
0 |
0 |
0 |
|
start_match |
Boolean (0/1) if query start codon matches template start codon position. |
0 |
0 |
0 |
0 |
|
stop_match |
Boolean (0/1) if query stop codon matches template stop codon position. |
0 |
0 |
0 |
0 |
|
pi |
Sequence-level percent identity from alignment (requires –reference). |
0 |
0 |
0 |
0 |
Key Metrics:
length_pi (LPI): Measures how much of the combined query+template CDS region is covered by the query. Formula:
100 * query_len / union_len.match_length_pi (MLPI): Measures actual overlap between query and template, excluding query extensions. Formula:
100 * len_match / union_len.inframe_length_pi (ILPI): The most biologically meaningful metric—measures what fraction maintains the correct reading frame. Formula:
100 * len_inframe / union_len.pi: Requires genome sequence (
--reference); measures actual nucleotide sequence identity in the aligned regions at the codon level.
ORFcompare Stats Output (TSV)¶
The orfcompare output file compares CDS annotations between query and template transcripts.
Column |
Description |
Example 1 |
Example 2 |
Example 3 |
|---|---|---|---|---|
query_id |
Transcript ID of the query transcript. |
CHS.8812.2 |
CHS.8812.5 |
CHS.8812.6 |
template_id |
Transcript ID of the template transcript. ‘-’ if no overlapping template. |
ENST00000300146.10 |
ENST00000300146.10 |
|
query_len |
Length of the query CDS in nucleotides (bp). |
1566 |
2310 |
2223 |
template_len |
Length of the template CDS in nucleotides (bp). |
2313 |
2313 |
|
len_match |
Number of nucleotides overlapping between query and template CDS. |
1566 |
2310 |
|
len_inframe |
Number of overlapping nucleotides in the same reading frame as template. |
1566 |
2310 |
|
len_outframe |
Number of overlapping nucleotides out of frame relative to template. |
0 |
0 |
|
len_extra |
Nucleotides in query CDS extending beyond template CDS boundaries. |
0 |
0 |
|
len_missing |
Nucleotides from template CDS not covered by query CDS. |
747 |
3 |
|
lpi |
Length Percent Identity: 100 * query_len / union_len. |
92.5532 |
99.8703 |
|
mlpi |
Match Length Percent Identity: 100 * len_match / union_len. |
92.5532 |
99.8703 |
|
ilpi |
In-frame Length Percent Identity: 100 * len_inframe / union_len. |
92.5532 |
99.8703 |
|
query_start_codon |
Amino acid at query CDS start position (requires –reference). ‘M’ for methionine. |
M |
M |
|
template_start_codon |
Amino acid at template CDS start position (requires –reference). |
M |
M |
|
query_stop_codon |
Amino acid at query CDS end position (requires –reference). ‘.’ for stop codon. |
. |
R |
|
template_stop_codon |
Amino acid at template CDS end position (requires –reference). |
. |
. |
Codon Columns (require –reference):
query_start_codon / template_start_codon: Amino acid at the CDS start.
M(methionine) indicates a canonical start codon.query_stop_codon / template_stop_codon: Amino acid at the CDS end.
.indicates a proper stop codon; any other letter indicates the ORF is incomplete.