<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[简化基因组流程开发笔记]]></title><description><![CDATA[<h2>process_radtags</h2>
<pre><code>process_radtags -p in_dir [--paired [--interleaved&rsqb;&rsqb; [-b barcode_file] -o out_dir -e enz [-c] [-q] [-r] [-t len]
process_radtags -f in_file [-b barcode_file] -o out_dir -e enz [-c] [-q] [-r] [-t len]
process_radtags -1 pair_1 -2 pair_2 [-b barcode_file] -o out_dir -e enz [-c] [-q] [-r] [-t len]

  p: path to a directory of files.
  P,--paired: files contained within the directory are paired.
  I,--interleaved: specify that the paired-end reads are interleaved in single files.
  i: input file type, either 'fastq', 'gzfastq' (gzipped fastq), 'bam', or 'bustard' (default: guess, or gzfastq if unable to).
  b: path to a file containing barcodes for this run, omit to ignore any barcoding.
  o: path to output the processed files.
  f: path to the input file if processing single-end sequences.
  1: first input file in a set of paired-end sequences.
  2: second input file in a set of paired-end sequences.
  c,--clean: clean data, remove any read with an uncalled base.
  q,--quality: discard reads with low quality scores.
  r,--rescue: rescue barcodes and RAD-Tags.
  t: truncate final read length to this value.
  D: capture discarded reads to a file.
  E: specify how quality scores are encoded, 'phred33' (Illumina 1.8+/Sanger, default) or 'phred64' (Illumina 1.3-1.5).
  w: set the size of the sliding window as a fraction of the read length, between 0 and 1 (default 0.15).
  s: set the score limit. If the average score within the sliding window drops below this value, the read is discarded (default 10).
  y: output type, either 'fastq', 'gzfastq', 'fasta', or 'gzfasta' (default: match input type).

  Barcode options:
    --inline-null:   barcode is inline with sequence, occurs only on single-end read (default).
    --index-null:    barcode is provded in FASTQ header (Illumina i5 or i7 read).
    --null-index:    barcode is provded in FASTQ header (Illumina i7 read if both i5 and i7 read are provided).
    --inline-inline: barcode is inline with sequence, occurs on single and paired-end read.
    --index-index:   barcode is provded in FASTQ header (Illumina i5 and i7 reads).
    --inline-index:  barcode is inline with sequence on single-end read and occurs in FASTQ header (from either i5 or i7 read).
    --index-inline:  barcode occurs in FASTQ header (Illumina i5 or i7 read) and is inline with single-end sequence (for single-end data) on paired-end read (for paired-end data).

  Restriction enzyme options:
    -e &lt;enz&gt;, --renz-1 &lt;enz&gt;: provide the restriction enzyme used (cut site occurs on single-end read)
    --renz-2 &lt;enz&gt;: if a double digest was used, provide the second restriction enzyme used (cut site occurs on the paired-end read).
    Currently supported enzymes include:
      'aciI', 'ageI', 'aluI', 'apaLI', 'apeKI', 'apoI', 'aseI', 'bamHI', 
      'bbvCI', 'bfaI', 'bfuCI', 'bgIII', 'bsaHI', 'bspDI', 'bstYI', 'btgI', 
      'cac8I', 'claI', 'csp6I', 'ddeI', 'dpnII', 'eaeI', 'ecoRI', 'ecoRV', 
      'ecoT22I', 'haeIII', 'hinP1I', 'hindIII', 'hpaII', 'hpyCH4IV', 'kpnI', 'mluCI', 
      'mseI', 'mslI', 'mspI', 'ncoI', 'ndeI', 'nheI', 'nlaIII', 'notI', 
      'nsiI', 'nspI', 'pacI', 'pspXI', 'pstI', 'rsaI', 'sacI', 'sau3AI', 
      'sbfI', 'sexAI', 'sgrAI', 'speI', 'sphI', 'taqI', 'xbaI', or 'xhoI'
      

  Protocol-specific options:
    --bestrad: library was generated using BestRAD, check for restriction enzyme on either read and potentially tranpose reads.

  Adapter options:
    --adapter-1 &lt;sequence&gt;: provide adaptor sequence that may occur on the single-end read for filtering.
    --adapter-2 &lt;sequence&gt;: provide adaptor sequence that may occur on the paired-read for filtering.
      --adapter-mm &lt;mismatches&gt;: number of mismatches allowed in the adapter sequence.

  Output options:
    --retain-header: retain unmodified FASTQ headers in the output.
    --merge: if no barcodes are specified, merge all input files into a single output file.

  Advanced options:
    --filter-illumina: discard reads that have been marked by Illumina's chastity/purity filter as failing.
    --disable-rad-check: disable checking if the RAD site is intact.
    --len-limit &lt;limit&gt;: specify a minimum sequence length (useful if your data has already been trimmed).
    --barcode-dist-1: the number of allowed mismatches when rescuing single-end barcodes (default 1).
    --barcode-dist-2: the number of allowed mismatches when rescuing paired-end barcodes (defaults to --barcode-dist-1).

</code></pre>
<p dir="auto">分析：</p>
<ul>
<li>输入支持目录和文件两种方式
<ul>
<li>目录：通过<code>-P/--paired </code>指定是否双端数据</li>
<li>文件：单端<code>-f</code>指定文件；双端<code>-1\-2</code>指定R1和R2文件</li>
</ul>
</li>
<li>可选参数 <code>[-c] [-q] [-r] [-t len]</code>
<ul>
<li><code>-c</code> 表示数据清洗时去除表示为N的碱基，</li>
<li><code>-q</code> 表示数据清理时要去除低质量碱基 ,</li>
<li><code>-r</code> 表示要抢救下barcode和RAD-tag。</li>
</ul>
</li>
<li>单选参数：
<ul>
<li>barcode 类型 <code>-b</code></li>
<li>建库所用的限制性内切酶 <code>-e, --renz-1\--renz-2</code></li>
<li>质量分数编码 <code>-E</code>,有默认值 phred33</li>
</ul>
</li>
</ul>
]]></description><link>http://an.forum.genostack.com/topic/174/简化基因组流程开发笔记</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 14:27:41 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/174.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 20 Jan 2021 07:31:29 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 简化基因组流程开发笔记 on Thu, 28 Jan 2021 08:43:17 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="http://an.forum.genostack.com/uid/3">@ice-melt</a></p>
<p dir="auto">process_radtags 双端处理格式</p>
<pre><code>
    // Parse a file name that looks like: lane6_NoIndex_L006_R1_003.fastq
    // but exclude the paired-end files:  lane6_NoIndex_L006_R2_003.fastq
    //
    // Another example could be:          GfddRAD1_001_ATCACG_L008_R1_001.fastq.gz
    // and excluding the paired-end file: GfddRAD1_001_ATCACG_L008_R2_001.fastq.gz
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/354</link><guid isPermaLink="true">http://an.forum.genostack.com/post/354</guid><dc:creator><![CDATA[ice-melt]]></dc:creator><pubDate>Thu, 28 Jan 2021 08:43:17 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组流程开发笔记 on Fri, 22 Jan 2021 02:06:08 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="http://an.forum.genostack.com/uid/3">@ice-melt</a> 在 <a href="/post/331">简化基因组流程开发笔记</a> 中说：</p>
<blockquote>
<h2>参考基因组创建索引</h2>
<p dir="auto">烟草的参考基因组下载链接：<br />
<a href="https://www.ncbi.nlm.nih.gov/assembly/GCA_000715135.1" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/assembly/GCA_000715135.1</a><br />
三刺鱼参考基因组：<br />
<a href="http://asia.ensembl.org/Gasterosteus_aculeatus/Info/Index" rel="nofollow ugc">http://asia.ensembl.org/Gasterosteus_aculeatus/Info/Index</a></p>
<pre><code>wget -q ftp://ftp.ensembl.org/pub/release-91/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz

gzip -d Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz
</code></pre>
<h4>创建索引</h4>
<p dir="auto">stacks不直接参与比对，而是处理不同比对软件得到的BAM文件<br />
这里建立bwa的索引</p>
<pre><code>genome_fa=genome/Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa

bwa index -p index/bwa/gac $genome_fa &amp;&gt; index/bwa/bwa_index.oe

# 结果如下
|-- genome
|   |-- Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa
|-- index
    |-- bwa
        |-- bwa_index.oe
        |-- gac.amb
        |-- gac.ann
        |-- gac.bwt
        |-- gac.pac
        |-- gac.sa
</code></pre>
</blockquote>
<h2>生成 bam/sam 文件</h2>
<pre><code># sam 文件
bwa mem -M ../genome/index/bwa/gcf T5_R1.fq.gz T5_R2.fq.gz &gt; T5_R2.sam &amp;

# bam 文件
bwa mem -M ../genome/index/bwa/gcf T5_R1.fq.gz T5_R2.fq.gz |samtools view -b &gt; T5_R2.bam &amp;
</code></pre>
<ul>
<li>如果创建参考基因组索引没有指定索引文件夹， <code>bwa mem</code> 不需要指定<code>-M</code>参数</li>
<li>samtools 可以把bam转成sam格式</li>
<li>可以指定 <code>-t</code>  增加线程数</li>
<li>比对的结果可以查看一下质量<pre><code>samtools flagstat T5_R2.bam 
</code></pre>
</li>
</ul>
]]></description><link>http://an.forum.genostack.com/post/340</link><guid isPermaLink="true">http://an.forum.genostack.com/post/340</guid><dc:creator><![CDATA[ice-melt]]></dc:creator><pubDate>Fri, 22 Jan 2021 02:06:08 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组流程开发笔记 on Wed, 20 Jan 2021 10:51:15 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="http://an.forum.genostack.com/uid/3">@ice-melt</a></p>
<p dir="auto">这一步需要留意自己的单端测序，还是双端测序，barcode是在read中，还是在FASTQ的header中，是否还需要去接头序列，是否是双酶切建库等。</p>
<p dir="auto">另外这一步比较耗时，尽量脱机运行或者提交到计算节点中，不然突然断网导致运行终止就更浪费时间了。<br />
将运行结果记录到日志文件中，方便后期检查报错。</p>
<pre><code>process_radtags -p $raw_dir -b $barcode_file \
    -o $processed_data/ -e sbfI --inline_null \
    -c -q -r &amp;&gt; $processed_data/process_radtags.lane1.oe&amp;
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/332</link><guid isPermaLink="true">http://an.forum.genostack.com/post/332</guid><dc:creator><![CDATA[ice-melt]]></dc:creator><pubDate>Wed, 20 Jan 2021 10:51:15 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组流程开发笔记 on Wed, 20 Jan 2021 10:46:18 GMT]]></title><description><![CDATA[<h2>参考基因组创建索引</h2>
<p dir="auto">烟草的参考基因组下载链接：<br />
<a href="https://www.ncbi.nlm.nih.gov/assembly/GCA_000715135.1" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/assembly/GCA_000715135.1</a><br />
三刺鱼参考基因组：<br />
<a href="http://asia.ensembl.org/Gasterosteus_aculeatus/Info/Index" rel="nofollow ugc">http://asia.ensembl.org/Gasterosteus_aculeatus/Info/Index</a></p>
<pre><code>wget -q ftp://ftp.ensembl.org/pub/release-91/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz

gzip -d Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz
</code></pre>
<h4>创建索引</h4>
<p dir="auto">stacks不直接参与比对，而是处理不同比对软件得到的BAM文件<br />
这里建立bwa的索引</p>
<pre><code>genome_fa=genome/Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa

bwa index -p index/bwa/gac $genome_fa &amp;&gt; index/bwa/bwa_index.oe

# 结果如下
|-- genome
|   |-- Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa
|-- index
    |-- bwa
        |-- bwa_index.oe
        |-- gac.amb
        |-- gac.ann
        |-- gac.bwt
        |-- gac.pac
        |-- gac.sa
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/331</link><guid isPermaLink="true">http://an.forum.genostack.com/post/331</guid><dc:creator><![CDATA[ice-melt]]></dc:creator><pubDate>Wed, 20 Jan 2021 10:46:18 GMT</pubDate></item></channel></rss>