[ensembl-dev] Annotation problem in GRCh37 - LCE2B?

Thibaut Hourlier thibaut at ebi.ac.uk
Thu Aug 6 13:39:44 BST 2015


Hi Chris,
The annotation is not good in this region because there is a gene family. Our alignment tools get confused because the region have high level of similarity where the genes are so the aligner may not pick the biologically best alignment. Only manual curation can give you a good annotation in these cases.
The evidence used for the transcript ENST00000417924 is NM_014357.4 (http://grch37.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000159455;r=1:152625712-152699381;t=ENST00000417924) and the alignment was seen as an alternate isoform of ENST00000368780 which is wrong in this case. You can see that transcript ENST00000368780 is a merged transcript (coloured in yellow;source Havana/Ensembl) which means that the Ensembl annotation (automatic) and the Havana annotation (manual) were the same. This implies that the transcript ENST00000368780 should be more reliable. As you've already seen we have corrected the annotation in GRCh38.

If you are using the latest version of GENCODE, the annotation has been made on GRCh38 and is the same as the Ensembl annotation on GRCh38. Unless I am wrong RefSeq also use GRCh38 as default assembly.
I would recommend you to use the GRCh38 assembly/annotations for multiple reasons:
- some of the regions with errors in GRCh37 have been corrected in GRCh38
- new annotations using up-to-date evidences and actively curated
- unless you need to realign terabytes of data the move from GRCh37 to GRCh38 should be quite easy and we provide tools to do it:
	http://www.ensembl.org/Homo_sapiens/Tools/AssemblyConverter?db=core
	REST API (http://rest.ensembl.org/map/human/GRCh37/X:1000000..1000100:1/GRCh38?content-type=application/json, from the thread: transcript coordinates via REST?)
	Perl API, transform method

At the moment we don't have plans to update the GRCh37 annotation.

Cheers
Thibaut

> On 6 Aug 2015, at 11:08, Christian Cole (Staff) <C.Cole at dundee.ac.uk> wrote:
> 
> Hi,
> 
> I've noticed that adjacent genes LCE28 and LCE2C have exactly the same starting coordinate 1:152,647,771 despite the main transcripts being 10kb apart. This is inconsistent with Refseq, Gencode and GRCh38. What's the evidence behind transcript ENST00000417924 which spans both gene loci and has a 12kb intron? 
> 
> GRCh38 doesn't have this. Any chance for a 'fix'?
> Cheers,
> 
> Chris
> 
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list