File Formats

ORFanage is designed to use simple and common file formats.

Re-Annotated GTF Output

Re-annotated GTF Output (TAB-separated)

SeqID

chr3

Chromosome ID

Source

BestRefSeq

Source of the annotation

Type

transcript

Type of the record. Possible values are: transcript/exon/CDS

Start

139577792

Start coordinate of the record

End

139583319

End coordinate of the record

Score

.

Unused by ORFanage

Strand

Strand of the record

Phase

.

Value sets the number of bases that need to be trimmed off the start of the CDS record to reach th estart of the next codon. Value is only available in CDS records. Possible values are: 0/1/2

Attributes

transcript_id “rna-NR_121609.1

Field containing extended descriptors of each record

Attributes (column #9) (TAB-separated)

transcript_id

rna-NM_001363968.1

Unique Transcript Identifier

gene_id

gene-NMNAT3

Unique Gene Identifier

orfanage_status

1

Flag set if ORFanage was able to find an ORF. Possible values are 0 and 1

orfanage_template

ENST00000643695.2

transcript_id of the reference transcript used to find the best ORF

ORFanage Stats Output (TSV)

The stats output file (--stats) contains detailed metrics for each query/template comparison.

ORFanage Stats Output (TSV)

Column

Description

Example 1

Example 2

Example 3

Example 4

Example 5

query_id

Transcript ID of the query transcript. In ALL mode may include CDS coordinates.

rna-NR_121609.1

rna-NM_001363968.1

rna-NM_001363968.1

rna-NM_001363968.1

rna-XM_011512790.1

template_id

Transcript ID of the template (reference) transcript. ‘-’ if no template found.

ENST00000643695.2

ENST00000643695.2

ENST00000643695.2

ENST00000643695.2

segment

Genomic coordinates of the CDS segment in format start-end.

139561010-139561392

139573598-139579055

139627616-139627724

139561010-139579055

notes

Status annotation: best_gtf, gtf, dup, keep_cds, no-overlap, segment_len<3, cds_len<3.

gtf

dup

gtf

num_templates

Number of template transcripts contributing to this ORF annotation.

1

1

1

1

query_len

Length of the query CDS in nucleotides (bp).

393

513

513

648

template_len

Length of the template CDS in nucleotides (bp).

1041

1041

1041

1041

union_len

Length of the union of query and template CDS chains in nucleotides.

1051

1168

1168

1041

pass

Boolean (0/1) indicating if candidate CDS passed all filtering criteria.

1

1

0

1

len_match

Number of nucleotides overlapping between query and template CDS.

383

386

386

648

len_inframe

Number of overlapping nucleotides in the same reading frame as template.

383

376

376

648

len_outframe

Number of overlapping nucleotides out of frame relative to template.

0

10

10

0

len_extra

Nucleotides in query CDS extending beyond template CDS boundaries.

10

127

127

0

len_missing

Nucleotides from template CDS not covered by query CDS.

658

655

655

393

length_pi

Length Percent Identity: 100 * query_len / union_len.

37.7522

49.2795

49.2795

62.2478

match_length_pi

Match Length Percent Identity: 100 * len_match / union_len. Measures actual overlap between query and template CDS accepting both in and out of frame matching positions.

36.7915

37.0797

37.0797

62.2478

inframe_length_pi

In-frame Length Percent Identity: 100 * len_inframe / union_len. Measures overlap maintaining correct reading frame.

36.7915

36.1191

36.1191

62.2478

alignment_match

Nucleotides with identical sequence (requires –reference for alignment).

0

0

0

0

start_match

Boolean (0/1) if query start codon matches template start codon position.

0

0

0

0

stop_match

Boolean (0/1) if query stop codon matches template stop codon position.

0

0

0

0

pi

Sequence-level percent identity from alignment (requires –reference).

0

0

0

0

Key Metrics:

  • length_pi (LPI): Measures how much of the combined query+template CDS region is covered by the query. Formula: 100 * query_len / union_len.

  • match_length_pi (MLPI): Measures actual overlap between query and template, excluding query extensions. Formula: 100 * len_match / union_len.

  • inframe_length_pi (ILPI): The most biologically meaningful metric—measures what fraction maintains the correct reading frame. Formula: 100 * len_inframe / union_len.

  • pi: Requires genome sequence (--reference); measures actual nucleotide sequence identity in the aligned regions at the codon level.

ORFcompare Stats Output (TSV)

The orfcompare output file compares CDS annotations between query and template transcripts.

ORFcompare Stats Output (TSV)

Column

Description

Example 1

Example 2

Example 3

query_id

Transcript ID of the query transcript.

CHS.8812.2

CHS.8812.5

CHS.8812.6

template_id

Transcript ID of the template transcript. ‘-’ if no overlapping template.

ENST00000300146.10

ENST00000300146.10

query_len

Length of the query CDS in nucleotides (bp).

1566

2310

2223

template_len

Length of the template CDS in nucleotides (bp).

2313

2313

len_match

Number of nucleotides overlapping between query and template CDS.

1566

2310

len_inframe

Number of overlapping nucleotides in the same reading frame as template.

1566

2310

len_outframe

Number of overlapping nucleotides out of frame relative to template.

0

0

len_extra

Nucleotides in query CDS extending beyond template CDS boundaries.

0

0

len_missing

Nucleotides from template CDS not covered by query CDS.

747

3

lpi

Length Percent Identity: 100 * query_len / union_len.

92.5532

99.8703

mlpi

Match Length Percent Identity: 100 * len_match / union_len.

92.5532

99.8703

ilpi

In-frame Length Percent Identity: 100 * len_inframe / union_len.

92.5532

99.8703

query_start_codon

Amino acid at query CDS start position (requires –reference). ‘M’ for methionine.

M

M

template_start_codon

Amino acid at template CDS start position (requires –reference).

M

M

query_stop_codon

Amino acid at query CDS end position (requires –reference). ‘.’ for stop codon.

.

R

template_stop_codon

Amino acid at template CDS end position (requires –reference).

.

.

Codon Columns (require –reference):

  • query_start_codon / template_start_codon: Amino acid at the CDS start. M (methionine) indicates a canonical start codon.

  • query_stop_codon / template_stop_codon: Amino acid at the CDS end. . indicates a proper stop codon; any other letter indicates the ORF is incomplete.