[ensembl-dev] Relevant UTR's in GTF file
    Transposon Research 
    transposon.research at gmail.com
       
    Thu Jan 23 14:28:00 GMT 2020
    
    
  
Dear Ensembl Team,
I am having a bit of confusion while dealing with UTR's in the reference
GTF file for the human genome - Homo_sapiens.GRCh38.99.chr.gtf.
Specifically, while querying the 3 or 5'-UTR of any gene of interest, I get
often a long list of different versions of the coordinates. I understand
that it depends on the source - Ensembl, Havana-Ensembl, Havana - however
how can I tell which UTR is the most relevant? To add up to confusion,
there are some cases where for a given gene, there are several version of
the coordinates for a same source - actually all other fields are equal,
only the coordinates change.
The case scenario I am in, is that I have to collect all the UTR's of
target genes of all existing miRNAs.
For now, as I don't have sufficient knowledge to judge whether any of the
various versions of a given UTR is the most relevant, I ended up making a
somewhat arbitrary selection following this algorithm:
1) If there are different versions for the same source, only keep the first
instance that appear in the GTF -> unique combinations source-coordinates
2) if Ensembl version present, select this one
3) if Ensembl version not available, pick Ensembl-Havana
4) if Ensembl-Havana not available, pick Havana
I get pretty much matching UTRs with other reference sites for about all
queries, but there are some differences, and for e.g. finding the seed
sequence of a target site in an UTR, oftentimes it is expressed as the
position of the start nucleotide from the beginning of the 3'-UTR, and
therefore I need an absolute assurance that the UTR I am picking is the
correct one (here comparing with the ones published on miRDB like this one:
http://mirdb.org/cgi-bin/target_detail.cgi?targetID=1605627). And in
general it'd be great to really understand the nitty-gritty of the
differences between the different UTR versions, and be able to select the
one that is the most relevant for a specific situation.
Could you eventually enlighten me on this topic?
Best regards,
Lexa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200123/a9c8ba52/attachment.html>
    
    
More information about the Dev
mailing list