<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[pandas vs sql]]></title><description><![CDATA[<p dir="auto"><a href="https://datascience.stackexchange.com/questions/34357/why-do-people-prefer-pandas-to-sql" rel="nofollow ugc">https://datascience.stackexchange.com/questions/34357/why-do-people-prefer-pandas-to-sql</a></p>
]]></description><link>http://an.forum.genostack.com/topic/389/pandas-vs-sql</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 10:43:51 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/389.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 16 Aug 2021 12:54:11 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to pandas vs sql on Tue, 30 Nov 2021 06:27:53 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://drops.dagstuhl.de/opus/volltexte/2020/11960/pdf/OASIcs-PLATEAU-2019-6.pdf" rel="nofollow ugc">https://drops.dagstuhl.de/opus/volltexte/2020/11960/pdf/OASIcs-PLATEAU-2019-6.pdf</a></p>
]]></description><link>http://an.forum.genostack.com/post/933</link><guid isPermaLink="true">http://an.forum.genostack.com/post/933</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 30 Nov 2021 06:27:53 GMT</pubDate></item><item><title><![CDATA[Reply to pandas vs sql on Thu, 26 Aug 2021 03:38:39 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://pandas.pydata.org/docs/user_guide/scale.html?highlight=postgresql" rel="nofollow ugc">https://pandas.pydata.org/docs/user_guide/scale.html?highlight=postgresql</a></p>
<p dir="auto">使用pandas处理大规模数据<br />
pandas provides data structures for in-memory analytics, which makes using pandas to analyze datasets that are larger than memory datasets somewhat tricky. Even datasets that are a sizable fraction of memory become unwieldy, as some pandas operations need to make intermediate copies.</p>
<p dir="auto">This document provides a few recommendations for scaling your analysis to larger datasets. It’s a complement to Enhancing performance, which focuses on speeding up analysis for datasets that fit in memory.</p>
<p dir="auto">But first, it’s worth considering not using pandas. pandas isn’t the right tool for all situations. <strong>If you’re working with very large datasets and a tool like PostgreSQL fits your needs, then you should probably be using that.</strong> Assuming you want or need the expressiveness and power of pandas, let’s carry on.<br />
由于pandas是在内存中进行计算的　因此当数据量太大时　pandas建议了几种方式 比较好的方式就是大数据存储　然后parquet进行数据块切分　使用pandas处理数据　或者用Dask（pandas接口风格一直）进行多线程或者跨集群的大规模并行处理</p>
]]></description><link>http://an.forum.genostack.com/post/797</link><guid isPermaLink="true">http://an.forum.genostack.com/post/797</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 26 Aug 2021 03:38:39 GMT</pubDate></item></channel></rss>