[ensembl-dev] affy_hg_u133_plus_2 to ensg mappings

Tue Jun 18 07:58:24 BST 2013

Hi Nathan,

Thanks a lot for the reply - very helpful. I was trying to add ENSG id's 
(E71) to the GEO annotation file available for this platform. I expected 
some drop out (due to annotation differences etc) but wasn't expecting 
~9000  protein coding genes (14000 probesets) to go missing between GEO 
and ensembl. I guess a more stringent QC strategy would probably explain 
do that, I was worried that there was a problem with my way of doing 
this, but this doesn't seem to be the case.

Thanks for your help.

Olly

On 17/06/13 21:39, Nathan Johnson wrote:
> Hi Oliver
>
> The reason why this isn't being considered as a transcript xref is 
> because it is on the wrong strand.  This is an easy mistake to make as 
> many of the array technologies differ in how they process the RNA 
> sample and hence what strand is actually hybridised when it eventually 
> meets the array.
>
> There is a digram of the IVT processing on this page:
>
> http://www.affymetrix.com/estore/browse/products.jsp?categoryIdClicked=&productId=131415#1_1
>
> In saying that, that particular set of alignments does look like it 
> was designed for the exons of that gene, albeit with some exon 
> boundary overlap. However, IVT arrays normally target 3' ends and UTRs 
> specifically, which makes this particular probeset even more odd.
>
>  Sorry I can't be of more help.
>
> Nathan
>
>
>
> On 17 Jun 2013, at 15:58, Oliver Burren <oliver.burren at cimr.cam.ac.uk 
> <mailto:oliver.burren at cimr.cam.ac.uk>> wrote:
>
>> Hi,
>>
>> I'm trying to retrieve all probset.id mappings to ensembl genes for 
>> [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 
>> (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570) using 
>> ensmart 71. However I noticed a large drop out wrt to the GEO 
>> annotation file so I did some digging...
>>
>>
>> If I look in Biomart for something like this
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE Query>
>> <Query  virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
>> 			
>> 	<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
>> 		<Filter name = "affy_hg_u133_plus_2" value = "205332_at"/>
>> 		<Attribute name = "ensembl_gene_id" />
>> 		<Attribute name = "ensembl_transcript_id" />
>> 	</Dataset>
>> </Query>
>>
>> I get no results. However if I search the website for 205332_at and 
>> turn on the track for AFFY:HG-U133_Plus_2 it shows that the probeset 
>> (6 features) maps to the gene. The help on this page 
>> http://www.ensembl.org/info/docs/microarray_probe_set_mapping.html 
>> says ' it is normally required that more than 50% of the probes in a 
>> probe set hit a given transcript sequence'. Is this the reason why 
>> this probeset isn't being tagged to this gene (although this appears 
>> to be 60%) ?
>>
>> Any light that you could shed would be appreciated. Thanks,
>>
>> Olly Burren
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: 
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130618/1dbda4b9/attachment.html>