[ensembl-dev] How to handle N-content ?

João Eiras joao.eiras at gmail.com
Tue Sep 27 15:56:55 BST 2016


>> If I use --check_ref, then VEP complains
>> WARNING: Could not fetch sub-slice from 1:99772780-99772780(1) on line 15
>> WARNING: Specified reference allele N does not match Ensembl reference
>> allele on line 15
>

I have the whole genome cached locally, including the fasta files. But
this i unimportant for me. I can easily "fix" the data before passing
it to VEP.

I raised this issue to know if supporting N would be something
interesting for VEP.

> Of course, but this is only marginally worse than all the combinations that
> would have to be computed if you did give N as the ALT. You cite the codons
> that have N in the third position in the genetic code, but this doesn't
> account for variants that fall in any other position in the codon. And nor
> does it offer any better solution for variants of longer than 1 nucleotide,
> or variants that fall in splicing or other non-coding regions.
>

True, many combinations, hence I suggested just reporting X as
amino-acid when N makes the codon ambiguous. VEP already does that for
amino-acids in the start site of a frameshift, or when annotating an
incomplete codon at the end of an incomplete transcript.

> If you can describe (or even better write code!) to do as you are suggesting then feel free to contribute

I'd be happy to, but a) my perl skills are useless and b) to do such
changes one needs to ask first if they have any relevance to the
project.

Thank you for your time.




More information about the Dev mailing list