转换
字段选择
<b>Python: <br></b>df.select("DEST_COUNTRY_NAME").show(2)<br>
<b>Spark SQL:<br></b>SELECT columnName * 10, otherColumn, someOtherCol as c FROM dataFrameTable<b><br></b>
增加列
<b>Python:<br></b>df.withColumn("numberOne", lit(1)).show(2) <br>
<b>Spark SQL:<br></b>SELECT *, 1 as numberOne FROM dfTable LIMIT 2<b><br></b>
修改列
<b>Python:</b><br>df.withColumnRenamed("DEST_COUNTRY_NAME", "dest").columns<br>
删除列
<b>Python: </b><br>df.drop("ORIGIN_COUNTRY_NAME").columns<br>
修改类型
<b>Python : </b><br>df.withColumn("count2", col("count").cast("long"))<br>
条件过滤
<b>Python :</b><br>df.filter(col("count") < 2).show(2) <br>df.where("count < 2").show(2)<br>
<b>Spark SQL:<br></b>SELECT * FROM dfTable WHERE count < 2 LIMIT 2<b><br></b>
去重
<b>Python:</b><br>df.select("id","sno").distinct().count()<br>
<b>Spark SQL:<br></b>SELECT COUNT(DISTINCT(id, sno)) FROM dfTable<b><br></b>
排序
<b>Python:</b><br>df.orderBy(col("count").desc(), col("DEST_COUNTRY_NAME").asc()).show(2)<br>
<b>Spark SQL:<br></b>SELECT * FROM dfTable ORDER BY count DESC, DEST_COUNTRY_NAME ASC LIMIT 2<b><br></b>
有限选择
<b>Python:</b><br>df.orderBy(expr("count desc")).limit(6).show()<br>
<b>Spark SQL:<br></b>SELECT * FROM dfTable ORDER BY count desc LIMIT 6<b><br></b>