[ensembl-dev] Annotation problem in GRCh37 - LCE2B?
Christian Cole (Staff)
C.Cole at dundee.ac.uk
Thu Aug 6 14:20:58 BST 2015
Hi Thibaut,
Thanks for the detailed response.
Not always possible to use GRCh38 - although your AssemblyConverter is a nice tip - and refseq GRCh37 has the 'correct' annotation.
One to bear in mind and push as much as possible to GRCh38.
Cheers,
Chris
On 06/08/2015 13:39, "dev-bounces at ensembl.org on behalf of Thibaut Hourlier" <dev-bounces at ensembl.org on behalf of thibaut at ebi.ac.uk> wrote:
>Hi Chris,
>The annotation is not good in this region because there is a gene family. Our alignment tools get confused because the region have high level of similarity where the genes are so the aligner may not pick the biologically best alignment. Only manual curation can give you a good annotation in these cases.
>The evidence used for the transcript ENST00000417924 is NM_014357.4 (http://grch37.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000159455;r=1:152625712-152699381;t=ENST00000417924) and the alignment was seen as an alternate isoform of ENST00000368780 which is wrong in this case. You can see that transcript ENST00000368780 is a merged transcript (coloured in yellow;source Havana/Ensembl) which means that the Ensembl annotation (automatic) and the Havana annotation (manual) were the same. This implies that the transcript ENST00000368780 should be more reliable. As you've already seen we have corrected the annotation in GRCh38.
>
>If you are using the latest version of GENCODE, the annotation has been made on GRCh38 and is the same as the Ensembl annotation on GRCh38. Unless I am wrong RefSeq also use GRCh38 as default assembly.
>I would recommend you to use the GRCh38 assembly/annotations for multiple reasons:
>- some of the regions with errors in GRCh37 have been corrected in GRCh38
>- new annotations using up-to-date evidences and actively curated
>- unless you need to realign terabytes of data the move from GRCh37 to GRCh38 should be quite easy and we provide tools to do it:
>http://www.ensembl.org/Homo_sapiens/Tools/AssemblyConverter?db=core
>REST API (http://rest.ensembl.org/map/human/GRCh37/X:1000000..1000100:1/GRCh38?content-type=application/json, from the thread: transcript coordinates via REST?)
>Perl API, transform method
>
>At the moment we don't have plans to update the GRCh37 annotation.
>
>Cheers
>Thibaut
>
>> On 6 Aug 2015, at 11:08, Christian Cole (Staff) <C.Cole at dundee.ac.uk> wrote:
>>
>> Hi,
>>
>> I've noticed that adjacent genes LCE28 and LCE2C have exactly the same starting coordinate 1:152,647,771 despite the main transcripts being 10kb apart. This is inconsistent with Refseq, Gencode and GRCh38. What's the evidence behind transcript ENST00000417924 which spans both gene loci and has a 12kb intron?
>>
>> GRCh38 doesn't have this. Any chance for a 'fix'?
>> Cheers,
>>
>> Chris
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>_______________________________________________
>Dev mailing list Dev at ensembl.org
>Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>Ensembl Blog: http://www.ensembl.info/
The University of Dundee is a registered Scottish Charity, No: SC015096
More information about the Dev
mailing list