<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[view-toolkit详细设计]]></title><description><![CDATA[<p dir="auto">支持大文件（具体多大？）<br />
统计分析</p>
<p dir="auto">1.fasta/fastq文件的处理<br />
<a href="https://bioinf.shenwei.me/seqkit/usage/#grep" rel="nofollow ugc">https://bioinf.shenwei.me/seqkit/usage/#grep</a><br />
seqkit是一个fastq fasta的文件处理工具包，我们看看能不能使用我们的工具来实现它的功能。<br />
1.1 打开并显示 fasta/fastq的内容，支持懒加载（鼠标滚动后按需加载）。<br />
1.2 使用id 查询 允许同时搜索多个id  查询到的内容进行着色<br />
1.3 使用子序列进行查询 查询到的内容进行着色<br />
1.4 统计序列的长度及其分布（小提琴图或者箱体图） 按照序列长度进行过滤<br />
1.5 统计重复序列及其个数<br />
1.6 统计序列总数<br />
1.7 统计N50  <a href="https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics#N50" rel="nofollow ugc">https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics#N50</a><br />
1.8 Q20(%)、Q30(%)、Q40(%) 的分布  提供类似fastqc的序列质量图（每个位置的质量盒形图）<br />
1.9 GC(%) 含量<br />
1.10 排序<br />
1.11数据导出  支持随机采用 类似seqtk sample</p>
]]></description><link>http://an.forum.genostack.com/topic/1031/view-toolkit详细设计</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 09:38:28 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/1031.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 18 Jan 2024 08:23:40 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to view-toolkit详细设计 on Thu, 01 Feb 2024 03:17:19 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://sql.quacking.cloud/" rel="nofollow ugc">https://sql.quacking.cloud/</a><br />
<a href="https://tobilg.com/using-duckdb-wasm-for-in-browser-data-engineering" rel="nofollow ugc">https://tobilg.com/using-duckdb-wasm-for-in-browser-data-engineering</a></p>
<p dir="auto">一个duckdb-wasm sql在线工具</p>
]]></description><link>http://an.forum.genostack.com/post/2465</link><guid isPermaLink="true">http://an.forum.genostack.com/post/2465</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 01 Feb 2024 03:17:19 GMT</pubDate></item><item><title><![CDATA[Reply to view-toolkit详细设计 on Thu, 01 Feb 2024 03:47:07 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://milicendev.netlify.app/article/install-the-latest-cmake-version-in-ubuntu-18-04-bionic/" rel="nofollow ugc">https://milicendev.netlify.app/article/install-the-latest-cmake-version-in-ubuntu-18-04-bionic/</a></p>
<p dir="auto">在大服务器上升级cmake</p>
<p dir="auto">升级runst<br />
sudo apt update<br />
sudo apt install rustc</p>
]]></description><link>http://an.forum.genostack.com/post/2463</link><guid isPermaLink="true">http://an.forum.genostack.com/post/2463</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 01 Feb 2024 03:47:07 GMT</pubDate></item><item><title><![CDATA[Reply to view-toolkit详细设计 on Wed, 24 Jan 2024 07:31:22 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.linkedin.com/pulse/how-query-azure-blob-using-duckdb-vivek-anandaraman-f5bfc/" rel="nofollow ugc">https://www.linkedin.com/pulse/how-query-azure-blob-using-duckdb-vivek-anandaraman-f5bfc/</a><br />
如何使用DuckDB查询Azure Blob? 是一个通用的方法  通过网络来读取数据</p>
<pre><code>import duckdb
from adlfs.spec import AzureBlobFileSystem

active_directory_application_id = "Your Application ID"
active_directory_application_secret = "Your Client Secret"
active_directory_tenant_id = "Your Tenant ID"
accountname = "Your Storage account name"

connection = duckdb.connect()

connection.register_filesystem(AzureBlobFileSystem(account_name=accountname, tenant_id=active_directory_tenant_id, client_id = active_directory_application_id, client_secret = active_directory_application_secret ))

query = connection.sql('''
  SELECT count(*) FROM read_csv_auto('abfs://container/path/blob.csv')
''')
print(query.fetchall())
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/2449</link><guid isPermaLink="true">http://an.forum.genostack.com/post/2449</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 24 Jan 2024 07:31:22 GMT</pubDate></item><item><title><![CDATA[Reply to view-toolkit详细设计 on Wed, 24 Jan 2024 06:20:43 GMT]]></title><description><![CDATA[<pre><code>install parquet;
load parquet;
load exon;
create table testfastq as from read_fastq('test.fastq');
copy testfastq to 'test.parquet' (format parquet);
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/2448</link><guid isPermaLink="true">http://an.forum.genostack.com/post/2448</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 24 Jan 2024 06:20:43 GMT</pubDate></item><item><title><![CDATA[Reply to view-toolkit详细设计 on Wed, 31 Jan 2024 14:53:02 GMT]]></title><description><![CDATA[<p dir="auto">大服务器的cmake版本比较低 当前未敢升级<br />
这个软件至少需要cmake 3.11以上 服务器上是3.10<br />
exon-duckdb 验证</p>
<pre><code>git clone --recursive https://mirror.ghproxy.com//https://github.com/wheretrue/exon-duckdb.git
curl --proto '=https' --tlsv1.3 https://sh.rustup.rs -sSf | sh
cd exon-duckdb
替换 目录下 CMakeLists.txt 
主要是修改了一些github下载地址
make
</code></pre>
<p dir="auto"><a href="/assets/uploads/files/1705659699810-cmakelists.txt">CMakeLists.txt</a></p>
]]></description><link>http://an.forum.genostack.com/post/2445</link><guid isPermaLink="true">http://an.forum.genostack.com/post/2445</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 31 Jan 2024 14:53:02 GMT</pubDate></item></channel></rss>