[ensembl-dev] Relevant UTR's in GTF file

Transposon Research transposon.research at gmail.com
Thu Jan 23 14:28:00 GMT 2020


Dear Ensembl Team,

I am having a bit of confusion while dealing with UTR's in the reference
GTF file for the human genome - Homo_sapiens.GRCh38.99.chr.gtf.
Specifically, while querying the 3 or 5'-UTR of any gene of interest, I get
often a long list of different versions of the coordinates. I understand
that it depends on the source - Ensembl, Havana-Ensembl, Havana - however
how can I tell which UTR is the most relevant? To add up to confusion,
there are some cases where for a given gene, there are several version of
the coordinates for a same source - actually all other fields are equal,
only the coordinates change.

The case scenario I am in, is that I have to collect all the UTR's of
target genes of all existing miRNAs.

For now, as I don't have sufficient knowledge to judge whether any of the
various versions of a given UTR is the most relevant, I ended up making a
somewhat arbitrary selection following this algorithm:

1) If there are different versions for the same source, only keep the first
instance that appear in the GTF -> unique combinations source-coordinates
2) if Ensembl version present, select this one
3) if Ensembl version not available, pick Ensembl-Havana
4) if Ensembl-Havana not available, pick Havana

I get pretty much matching UTRs with other reference sites for about all
queries, but there are some differences, and for e.g. finding the seed
sequence of a target site in an UTR, oftentimes it is expressed as the
position of the start nucleotide from the beginning of the 3'-UTR, and
therefore I need an absolute assurance that the UTR I am picking is the
correct one (here comparing with the ones published on miRDB like this one:
http://mirdb.org/cgi-bin/target_detail.cgi?targetID=1605627). And in
general it'd be great to really understand the nitty-gritty of the
differences between the different UTR versions, and be able to select the
one that is the most relevant for a specific situation.

Could you eventually enlighten me on this topic?

Best regards,
Lexa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200123/a9c8ba52/attachment.html>


More information about the Dev mailing list