暗能星系

    • 登录
    • 搜索

    Is there any difference between HG 19 (from the UCSC) and GRCh37(NCBI)?

    生物信息分析
    1
    2
    6
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      GRCh37/hg19 and GRCh38 are genome builds rather than annotations, which describe where features are in a given genome build. The actual sequences you'll get from NCBI/UCSC/Ensembl will be identical, but their annotations will be different and (importantly) updated at different frequencies. NCBI's annotation is the "refseq" dataset (the "refGene" track in UCSC), which is essentially a subset of the UCSC and Ensembl annotations. UCSC's annotations are kind of a mess. You'll find genes with the same ID on multiple strand and multiple chromosomes, which makes them a bit useless. Ensembl's annotations typically contain more features than UCSC (so a bit more noise), but they're otherwise much better put together (e.g., you'll never find a gene ID on different strand or different chromosomes) and their IDs are typically easier to map to other things (e.g., gene names, GO and pathway memberships). Ensembl also updates its annotation fairly often and versions everything nicely, so it's quite convenient to report what version you used in a paper (reproducibility is always a good thing). Given the choice, use the Ensembl annotation.

      BTW, don't forget that the various sources can use different names for chromosomes (e.g., chr1 in UCSC is just 1 in Ensembl), so don't mix and match them.

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        There are a few minor differences between GRCh37 and hg19. The contig sequences are the same but the names are different, i.e. "1" may need to be converted to "chr1". In addition UCSC hg19 is currenly using the old mitochondrial sequence but NCBI and Ensembl have transitioned to NC_012920.
        Citing UCSC:
        "Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as "chrM" in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome. We plan to use the Revised Cambridge Reference Sequence (rCRS) in the next human assembly release."

        1 条回复 最后回复 回复 引用 0
        • First post
          Last post
        Powered by 暗能星系