2017 Outlook: pandas, Arrow, Feather, Parquet, Spark, Ibis
4.3 McKinney argues that 2017 will be pivotal for Python data tooling as pandas, Arrow, Parquet, Feather, and PySpark converge on a shared, highâperformance columnar foundation. He frames his new role at Two Sigma as aligned with longâterm open source development and stresses that companies must engage with open source to stay competitive and attract top engineers. The post lays out pandas 2.0 goals focused on fixing technical debt, improving memory efficiency, and enabling true multithreading to keep pandas relevant at larger data scales. Apache Arrow is positioned as the interoperability layer that will make crossâlanguage, highâperformance IO practical, including for Spark and pandas. He also highlights ongoing work on Parquet, consolidation of Feather into Arrow, and plans to accelerate PySpark and deepen Ibis. The conclusion is an outlook of coordinated ecosystem work that improves performance, composability, and sustainability across the Python data stack.
Data InfrastructureApache Arrow