暗能星系

    • 登录
    • 搜索

    新冠病毒数据分析

    生物信息分析
    1
    19
    99
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      https://sci-hub.st/10.3390/cimb43020061
      Next-Generation Sequencing (NGS) in COVID-19: A Tool for
      SARS-CoV-2 Diagnosis, Monitoring New Strains and
      Phylodynamic Modeling in Molecular Epidemiology

      这个文章里面对武汉新冠当时的情况做了一个介绍 5个样本中国疾控处理、4个样本华大处理的
      华大用bwa和hg19人的基因组对比 去除了宿主 然后和NCBI的冠状病毒(具体是哪个序列还不知道)做了对齐 然后使用SPAdes做了一个一致性序列
      中国疾控用的是 CLCBio (就是我研究的竞品 CLC workbench)software version 11.0.1 was used for de novo assembly, variant calling, and alignment
      有了这些组装的结果之后,就可以做进化分析( phylogenetic analysis)

      该文章对covid-19 NGS实验和生信做了综述 里面引用了一些文章还是有价值的:
      1.https://pubmed.ncbi.nlm.nih.gov/29154853/
      Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists
      这个文章里面总结了一个专家共识,提出了17条临床NGS生物信息学管道验证的最佳实践共识建议
      470f8ff7-4d5e-4bc7-b759-344216c77ed1-image.png
      属于解释
      Terminology
      Description
      BAM
      Compressed (binary) format of a SAM file intended for faster random access (search) of aligned and unaligned sequences and its related metadata. Compression enables smaller file size and storage efficiency which makes it popular over the SAM format. This is a default output of many alignment and post-alignment softwares used in bioinformatics pipeline and commonly used as an input by many variant callers*.
      FASTQ
      It is a de facto, human readable, file format that stores nucleotide sequences and corresponding quality (PHRED) scores for each nucleotide as an ASCII encoded character.4 This is commonly used for storing unaligned short sequence reads after the steps of base calling and is a typical starting point for NGS bioinformatics pipeline.
      PHRED Score
      It is a per base (nucleotide) quality score that is defined as an estimated probability for a called base to be incorrect (erroneous call). Mathematically, it is expressed as4

      where Q is Phred quality score, Pe is the probability for an erroneous base call. The Pe is typically generated by the base calling software which is sequence instrument specific. Therefore, Q values in isolation cannot be used to compare sequence quality across different sequencing platforms.
      SAM
      Stands for Sequence Alignment/Map format. It is a human readable (text file) file format specification for storing information on aligned sequence. This is a default output of many alignment softwares used in bioinformatics pipeline. Given the large file size and slower random access, BAM format is preferred for routine bioinformatics data processing. This format is helpful for technical troubleshooting when manual review of the stored information is necessary*.
      Variant - horizontally complex
      When two or more sequence alterations are present on the same read in close proximity such that they may represent a single complex variant. These variants are frequently represented as deletion-insertions and may result in ambiguous sequence description or HGVS nomenclature.
      Variant - Left-aligned
      If there are multiple potential VCF entries of the same allele length that represent the same variant, then left-alignment refers to the VCF entry with the smallest base position. The base position is typically represented in genomic coordinate for a given primary assembly (eg, GRCh38) and represents the most 5’ position.5
      Variant - Normalized
      A normalized variant must be parsimonious as well as left-aligned.5
      Variant - Parsimony
      If there are more than one way to represent the same variant in a VCF file, parsimony refers to the representation with the shortest possible allele length5 (positive and non-zero length).
      Variant - vertically complex
      A vertically complex variant occurs when three or more alleles are represented by different sequence reads, typically with or uncommonly without a reference (normal) allele, at the same genomic coordinate or set of coordinates.
      VCF
      Variant Call Format is a versioned, text-file (human readable) specification for storing sequence variant calls. The file contains meta-information containing various details of the variant calling process and definition of headers and format tags, a header line and data lines. Each data line represents a sequence variant defined using a combination of chromosome, position, reference allele, and alternate allele†.

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        https://pubmed.ncbi.nlm.nih.gov/31978945/
        A Novel Coronavirus from Patients with Pneumonia in China, 2019
        这个文章提到了最初武汉的几个样本采用的二代和三代混合测序的方式

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7929396/
          Bioinformatics resources for SARS-CoV-2 discovery and surveillance
          几种实验方法
          6ac373b5-27c1-4674-b73a-8e083137b005-image.png
          The workflow of different NGS sequencing approaches currently available for virus discovery and genomic surveillance. The library construction scheme employed in (A) metatranscriptomic sequencing, (B) a hybrid capture-based approach based on a metatranscriptomic library, (C) multiplex PCR amplification for NGS platforms and (D) the Oxford Nanopore sequencing platform.
          新病毒发现的基本过程和工具
          a9c9d66c-b37a-4cd5-b221-d1929cc3c715-image.png
          在去宿主步骤 提到了要去掉rRNA 而且也提到病毒载量比较低的情况下 可以不去宿主

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            covid19.sfb.uit.no
            新冠数据库

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/
              多序列比对软件的对比

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 anneng 编辑

                https://www.frontiersin.org/articles/10.3389/fmicb.2021.665041/full
                A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity

                这个文章对新冠的UTR部分的突变做了分析
                c553f6b4-1c84-47c6-87a3-bcc14e818150-image.png

                这个文章的附件有一张表 里面有SRR号 序列数很多 对我们来说需要支持这种批量下载数据的情况

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  https://www.cdc.gov/amd/pdf/slidesets/toolkitmodule_3.5-508c.pdf
                  新冠病毒的仓库 gisaid 和ncbi

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    https://www.mdpi.com/2075-1729/12/1/69/htm
                    Direct RNA Nanopore Sequencing of SARS-CoV-2 Extracted from Critical Material from Swabs
                    直接用Nanopore RNA测序来检测新冠

                    • basecalling
                      Nanopore Guppy base caller (v3.4.4) tool
                      “flow cell = FLO-MIN106” and “kit = SQK-RNA002”.

                    • 质控
                      PycoQC (v2.5.0.21) software

                    • 过滤
                      NanoFilt (v2.7.0)
                      minimum read length ≥500 nt and read quality ≥8.

                    • 去宿主和其他微生物(去污染)
                      去除人GRCh38 (hg38) fungal and bacterial genome
                      minimap2 (v2.17–r941)

                    • 提取新冠病毒
                      samtools (v1.7) view unmapped reads and reads with mapping quality lower than 10

                    • 对齐 call 突变
                      minimap2
                      BCFtools mpileup\call
                      使用Integrative Genomic Viewer (IGV) (v2.8.2) 查看突变

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 anneng 编辑

                      新冠的命名
                      https://covariants.org/
                      https://cov-lineages.org/ Pango的官网

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        https://www.nature.com/articles/s41467-020-20075-6

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 anneng 编辑

                          https://terra.bio/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/
                          terra 对sars 流程的更新说明
                          公开流程仓库
                          https://app.terra.bio/#workspaces/pathogen-genomic-surveillance/COVID-19

                          https://support.terra.bio/hc/en-us/articles/360041068771
                          7192e3a3-c3c9-4efb-bbc5-075ba9f62616-image.png

                          d326298f-067b-44f4-9c3d-26d5f6af6d7f-image.png
                          2020.08.23.20178236v1.full.sars.kraken2.terra.pdf

                          这个文章里面提到 使用kraken2检测其他病毒 主要用于排除新冠和其他病毒的交叉感染

                          We used Kraken2 (46) to identify other viral taxa present in NP swab samples from COVID
                          positive patients, excluding those removed by filters i and ii described above. To do so, we ran
                          the classify_single workflow on all reads from all samples (with
                          kraken2_db_tgz=”gs://pathogen-public-dbs/v1/kraken2-broad-20200505.tar.zst”,
                          krona_taxonomy_db_kraken2_tgz=”gs://pathogen-public-dbs/v1/krona.taxonomy-20200505.tab.
                          zst”, ncbi_taxdump_tgz=”gs://pathogen-public-dbs/v1/taxdump-20200505.tar.gz”,
                          trim_clip_db=”gs://pathogen-public-dbs/v0/contaminants.clip_db.fasta”,
                          spikein_db=”gs://pathogen-public-dbs/v0/ERCC_96_nopolyA.fasta”). Our kraken2 database was
                          

                          Terra的这个流程是针对illumina二代的情况
                          https://app.terra.bio/#workspaces/pathogen-genomic-surveillance/COVID-19_Broad_Viral_NGS

                          1 条回复 最后回复 回复 引用 0
                          • A
                            anneng 最后由 编辑

                            https://dockstore.org/organizations/BroadInstitute/collections/pgs

                            1 条回复 最后回复 回复 引用 0
                            • A
                              anneng 最后由 anneng 编辑

                              https://artic.readthedocs.io/en/latest/primer-schemes/
                              artic的引物设计
                              8a721750-2510-4f0c-8cf0-dd8ff18b6f56-image.png
                              该图来自该nature的文献 对多重PCR做了详细说明
                              https://www.nature.com/articles/nprot.2017.066
                              Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples
                              文档中提到了这个方法主要是用于在临床中 宏基因测序的病毒载量很低
                              genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads.
                              这个文章还提到了一个在线引物设计工具(引物设计是实验的一个关键环节 这类工具我们可以做到系统里面 甚至做成一个app)
                              5d442beb-a955-439a-b12d-4def69ab3642-image.png

                              1 条回复 最后回复 回复 引用 0
                              • A
                                anneng 最后由 编辑

                                https://bugseq.com/demo/metagenomic
                                58413ace-956d-41e9-a1e6-a55667bea09a-image.png

                                1 条回复 最后回复 回复 引用 0
                                • A
                                  anneng 最后由 编辑

                                  https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/sars-cov-2-variant-discovery/tutorial.html
                                  3d45b431-f0f2-4fb8-8097-95ef568404a1-image.png

                                  b3923b0d-86a8-4cf0-a330-7faf0d318bfb-image.png

                                  8e618714-840f-4f27-985a-50420fdd0571-image.png

                                  1 条回复 最后回复 回复 引用 0
                                  • First post
                                    Last post
                                  Powered by 暗能星系