2017年3月18日 星期六

【Spark】Apache Spark 2.0 筆記

Spark 2.0 的特性

更簡單(Easier: ANSI SQL and Streamlined APIs)

  • Unifying DataFrames and Datasets in Scala/Java
  • SparkSession
  • Simpler, more performant Accumulator API
  • DataFrame-based Machine Learning API emerges as the primary ML API
  • Machine learning pipeline persistence
  • Distributed algorithms in R
  • User-defined functions (UDFs) in R

更快速(Faster: Apache Spark as a Compiler)

  • 搭載了第二代 Tungsten 引擎,此技術官方稱為「whole-stage code generation」

更聰明(Smarter: Structured Streaming)

  • Integrated API with batch jobs
  • Transactional interaction with storage systems
  • Rich integration with the rest of Spark

Spark 2.0 官方介紹影片:SPARK EAST SUMMIT in New York(2016/02/16 )


Spark 2.2.0(2017/07/11)
Spark 2.1.1(2017/05/02)
Spark 2.1.0(2016/12/28)
Spark 2.0.2(2016/11/14)
Spark 2.0.1(2016/10/03)
Spark 2.0.0(2016/07/26)

