[ensembl-dev] HG375_PATCH features out of bounds
Bronwen Aken
ba1 at sanger.ac.uk
Thu Aug 29 18:34:57 BST 2013
Hi Kamil,
Yes, the Ensembl patch pipeline does allow us to annotate genes where part a gene lies outside of the patch.
The features that you see lying outside of the patch coordinates are the two long transcripts ENST00000594988 and ENST00000593441 from gene IL1RAPL2. You can see them spanning across both sides of the patch here:
http://www.ensembl.org/Homo_sapiens/Share/a902f2d99653b079b5c39f494fec090c102145539
In Ensembl, we are annotating and displaying the assembly patches within a genomic context; in the picture link above you will see that HG375_PATCH (green) is embedded within chromosome X.
This means that we have the DNA from chromosome X, both up- and downstream of HG375_PATCH, available at the time of annotating the patch. Annotating the patch within its genomic context means that we are able to annotate genes that span across the boundary of a patch.
Hope that helps,
Bronwen
On 14 Aug 2013, at 17:26, Kamil Slowikowski <kslowikowski at gmail.com> wrote:
> There exist features outside the coordinates listed for HG375_PATCH. I'm wondering if this is expected or if this is an error.
>
>
> ftp://ftp.ensembl.org/pub/release-72/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.72.dna.chromosome.HG375_PATCH.fa.gz
>
> zcat Homo_sapiens.GRCh37.72.dna.chromosome.HG375_PATCH.fa.gz | head -n1
> >HG375_PATCH dna:chromosome chromosome:GRCh37:HG375_PATCH:104423968:104489001:1 PATCH_FIX
>
> Notice that the last position is 104489001.
>
>
> ftp://ftp.ensembl.org/pub/release-72/gtf/homo_sapiens/Homo_sapiens.GRCh37.72.gtf.gz
>
> zcat Homo_sapiens.GRCh37.72.gtf.gz | grep HG375_PATCH | cut -f1-5 | head
> HG375_PATCH protein_coding exon 103810996 103811732
> HG375_PATCH protein_coding exon 103903576 103903676
> HG375_PATCH protein_coding CDS 103903595 103903676
> HG375_PATCH protein_coding start_codon 103903595 103903597
> HG375_PATCH protein_coding exon 104440157 104440430
> HG375_PATCH protein_coding CDS 104440157 104440430
> HG375_PATCH protein_coding exon 104478500 104478686
> HG375_PATCH protein_coding CDS 104478500 104478686
> HG375_PATCH protein_coding exon 104512069 104512222
> HG375_PATCH protein_coding CDS 104512069 104512222
>
> Notice the positions such as 104512069 are greater than 104489001.
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130829/13334df7/attachment.html>
More information about the Dev
mailing list