[ensembl-dev] HG375_PATCH features out of bounds

Bronwen Aken ba1 at sanger.ac.uk
Thu Aug 29 18:34:57 BST 2013


Hi Kamil,

Yes, the Ensembl patch pipeline does allow us to annotate genes where part a gene lies outside of the patch.

The features that you see lying outside of the patch coordinates are the two long transcripts ENST00000594988 and ENST00000593441 from gene IL1RAPL2. You can see them spanning across both sides of the patch here:
http://www.ensembl.org/Homo_sapiens/Share/a902f2d99653b079b5c39f494fec090c102145539

In Ensembl, we are annotating and displaying the assembly patches within a genomic context; in the picture link above you will see that HG375_PATCH (green) is embedded within chromosome X. 

This means that we have the DNA from chromosome X, both up- and downstream of HG375_PATCH, available at the time of annotating the patch. Annotating the patch within its genomic context means that we are able to annotate genes that span across the boundary of a patch. 

Hope that helps,
Bronwen


On 14 Aug 2013, at 17:26, Kamil Slowikowski <kslowikowski at gmail.com> wrote:

> There exist features outside the coordinates listed for HG375_PATCH. I'm wondering if this is expected or if this is an error.
> 
> 
> ftp://ftp.ensembl.org/pub/release-72/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.72.dna.chromosome.HG375_PATCH.fa.gz
> 
> zcat Homo_sapiens.GRCh37.72.dna.chromosome.HG375_PATCH.fa.gz | head -n1
> >HG375_PATCH dna:chromosome chromosome:GRCh37:HG375_PATCH:104423968:104489001:1 PATCH_FIX
> 
> Notice that the last position is 104489001.
> 
> 
> ftp://ftp.ensembl.org/pub/release-72/gtf/homo_sapiens/Homo_sapiens.GRCh37.72.gtf.gz
> 
> zcat Homo_sapiens.GRCh37.72.gtf.gz | grep HG375_PATCH | cut -f1-5 | head
> HG375_PATCH	protein_coding	exon	103810996	103811732
> HG375_PATCH	protein_coding	exon	103903576	103903676
> HG375_PATCH	protein_coding	CDS	103903595	103903676
> HG375_PATCH	protein_coding	start_codon	103903595	103903597
> HG375_PATCH	protein_coding	exon	104440157	104440430
> HG375_PATCH	protein_coding	CDS	104440157	104440430
> HG375_PATCH	protein_coding	exon	104478500	104478686
> HG375_PATCH	protein_coding	CDS	104478500	104478686
> HG375_PATCH	protein_coding	exon	104512069	104512222
> HG375_PATCH	protein_coding	CDS	104512069	104512222
> 
> Notice the positions such as 104512069 are greater than 104489001.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130829/13334df7/attachment.html>


More information about the Dev mailing list