[ensembl-dev] affy_hg_u133_plus_2 to ensg mappings
Oliver Burren
oliver.burren at cimr.cam.ac.uk
Tue Jun 18 07:58:24 BST 2013
Hi Nathan,
Thanks a lot for the reply - very helpful. I was trying to add ENSG id's
(E71) to the GEO annotation file available for this platform. I expected
some drop out (due to annotation differences etc) but wasn't expecting
~9000 protein coding genes (14000 probesets) to go missing between GEO
and ensembl. I guess a more stringent QC strategy would probably explain
do that, I was worried that there was a problem with my way of doing
this, but this doesn't seem to be the case.
Thanks for your help.
Olly
On 17/06/13 21:39, Nathan Johnson wrote:
> Hi Oliver
>
> The reason why this isn't being considered as a transcript xref is
> because it is on the wrong strand. This is an easy mistake to make as
> many of the array technologies differ in how they process the RNA
> sample and hence what strand is actually hybridised when it eventually
> meets the array.
>
> There is a digram of the IVT processing on this page:
>
> http://www.affymetrix.com/estore/browse/products.jsp?categoryIdClicked=&productId=131415#1_1
>
> In saying that, that particular set of alignments does look like it
> was designed for the exons of that gene, albeit with some exon
> boundary overlap. However, IVT arrays normally target 3' ends and UTRs
> specifically, which makes this particular probeset even more odd.
>
> Sorry I can't be of more help.
>
> Nathan
>
>
>
> On 17 Jun 2013, at 15:58, Oliver Burren <oliver.burren at cimr.cam.ac.uk
> <mailto:oliver.burren at cimr.cam.ac.uk>> wrote:
>
>> Hi,
>>
>> I'm trying to retrieve all probset.id mappings to ensembl genes for
>> [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
>> (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570) using
>> ensmart 71. However I noticed a large drop out wrt to the GEO
>> annotation file so I did some digging...
>>
>>
>> If I look in Biomart for something like this
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE Query>
>> <Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
>>
>> <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
>> <Filter name = "affy_hg_u133_plus_2" value = "205332_at"/>
>> <Attribute name = "ensembl_gene_id" />
>> <Attribute name = "ensembl_transcript_id" />
>> </Dataset>
>> </Query>
>>
>> I get no results. However if I search the website for 205332_at and
>> turn on the track for AFFY:HG-U133_Plus_2 it shows that the probeset
>> (6 features) maps to the gene. The help on this page
>> http://www.ensembl.org/info/docs/microarray_probe_set_mapping.html
>> says ' it is normally required that more than 50% of the probes in a
>> probe set hit a given transcript sequence'. Is this the reason why
>> this probeset isn't being tagged to this gene (although this appears
>> to be 60%) ?
>>
>> Any light that you could shed would be appreciated. Thanks,
>>
>> Olly Burren
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130618/1dbda4b9/attachment.html>
More information about the Dev
mailing list