[ensembl-dev] Exporting regions

Andy Yates ayates at ebi.ac.uk
Wed Sep 12 15:54:33 BST 2012


Hi Stefan,

If you're exporting the region from the website you can edit the region you wish to pull out and then export that. Converting from the sub-slice to a larger contig is something which I'd rather not do as it's more towards the limits of what the format is attempting to convey. I'd much prefer to point you at our feature formats like GTF, GFF or BED files

Should you want full length genbank files then please use the pre-dumped ones from our FTP site which suffer from no such problems.

Cheers,

Andy

On 10 Sep 2012, at 17:52, Stefan Kirov wrote:

> Thanks a lot Andy! That makes sense...
> If this is not too much to ask for- in some of the next releases could you include data on how to convert the sub-slice to the larger contig/slice? Perhaps as a note in the source feature.... Just in case you have too much free time on your hands ;-) .
> Cheers!
> Stefan
> 
> On 09/10/2012 10:52 AM, Andy Yates wrote:
>> Hi Stefan,
>> 
>> This is because the coordinates given in the Genbank files are relative to the slice you requested and the numbering is relative to that sub-slice. Those attributes which refer to other contigs are the elements which extend beyond the bounds of the requested slice e.g.
>> 
>> http://www.ensembl.org/Homo_sapiens/Export/Output/Location?db=core;flank3_display=0;flank5_display=0;output=genbank;r=6:133017695-133161157;strand=feature;param=similarity;param=repeat;param=genscan;param=contig;param=variation;param=marker;param=gene;param=vegagene;param=estgene;_format=Text
>> 
>> Will return a set of features from VNN1 which starts upstream of the region 6:133017695-133161157. Compare this to a very gene specific subslice e.g. VNN3 at 6:133043926-133055904:
>> 
>> http://www.ensembl.org/Homo_sapiens/Export/Output/Location?db=core;flank3_display=0;flank5_display=0;g=ENSG00000093134;output=genbank;r=6:133043926-133055904;strand=feature;param=similarity;param=repeat;param=genscan;param=contig;param=variation;param=marker;param=gene;param=vegagene;param=estgene;_format=Text
>> 
>> This will contain no coordinates referring to other slices but feature locations will be relative to the requested slice i.e. the Gene location for VNN3 is "complement(1..11979)" and not "complement(133043926..133055904)".
>> 
>> Hope this helps,
>> 
>> Andy
>> 
>> Andrew Yates                   Ensembl Core Software Project Leader
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>> 
>> On 10 Sep 2012, at 03:45, Stefan Kirov wrote:
>> 
>>> Guys,
>>> When exporting genbank for a region I see things like that:
>>> 
>>>     misc_RNA        join(complement(AC009121.8:51066..51103),
>>>                     complement(AC009121.8:46967..47862))
>>> 
>>> I was under the impression that the coordinates for any feature would be based on the coordinate system used to export, not a contig based coordinate system.
>>> To make things worse sometimes it seems that the coordinate system is the exported slice and sometimes it is mixed.
>>> Is this the expected behavior or a bug?
>>> Thanks!
>>> Stefan
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list