暗能星系

    • 登录
    • 搜索

    pandas vs sql

    大数据
    1
    3
    9
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      https://datascience.stackexchange.com/questions/34357/why-do-people-prefer-pandas-to-sql

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 anneng 编辑

        https://pandas.pydata.org/docs/user_guide/scale.html?highlight=postgresql

        使用pandas处理大规模数据
        pandas provides data structures for in-memory analytics, which makes using pandas to analyze datasets that are larger than memory datasets somewhat tricky. Even datasets that are a sizable fraction of memory become unwieldy, as some pandas operations need to make intermediate copies.

        This document provides a few recommendations for scaling your analysis to larger datasets. It’s a complement to Enhancing performance, which focuses on speeding up analysis for datasets that fit in memory.

        But first, it’s worth considering not using pandas. pandas isn’t the right tool for all situations. If you’re working with very large datasets and a tool like PostgreSQL fits your needs, then you should probably be using that. Assuming you want or need the expressiveness and power of pandas, let’s carry on.
        由于pandas是在内存中进行计算的 因此当数据量太大时 pandas建议了几种方式 比较好的方式就是大数据存储 然后parquet进行数据块切分 使用pandas处理数据 或者用Dask(pandas接口风格一直)进行多线程或者跨集群的大规模并行处理

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          https://drops.dagstuhl.de/opus/volltexte/2020/11960/pdf/OASIcs-PLATEAU-2019-6.pdf

          1 条回复 最后回复 回复 引用 0
          • First post
            Last post
          Powered by 暗能星系