Spark - RDD Actions in scala
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U`s, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
Aggregates the elements of this RDD in a multi-level tree pattern.
Aggregate the elements of each partition, and then the results for all the partitions, using a given associative and commutative function and a neutral "zero value". The function op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object allocation; however, it should not modify t2.
This behaves somewhat differently from fold operations implemented for non-distributed collections in functional languages like Scala. This fold operation may be applied to partitions individually, and then fold those results into the final result, rather than apply the fold to each element sequentially in some defined ordering. For functions that are not commutative, the result may differ from that of a fold applied to a non-distributed collection.
Reduces the elements of this RDD using the specified commutative and associative binary operator.
Reduces the elements of this RDD in a multi-level tree pattern.
Return the count of each unique value in this RDD as a local map of (value, count) pairs.
Note that this method should only be used if the resulting map is expected to be small, as the whole thing is loaded into the driver`s memory. To handle very large results, consider using rdd.map(x => (x, 1L)).reduceByKey(_ + _), which returns an RDD[T, Long] instead of a map.
Return the first element in this RDD.
Returns the top k (largest) elements from this RDD as defined by the specified implicit Ordering[T] and maintains the ordering. This does the opposite of takeOrdered. For example:
sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1)
// returns Array(12)
sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2)
// returns Array(6, 5)
Displays all elements of this mutable indexed sequence in a string using a separator string.
List(1, 2, 3).mkString("|") = "1|2|3"
Returns the min of this RDD as defined by the implicit Ordering[T].
Returns the max of this RDD as defined by the implicit Ordering[T].
Returns the first k (smallest) elements from this RDD as defined by the specified implicit Ordering[T] and maintains the ordering. This does the opposite of top. For example:
sc.parallelize(Seq(10, 4, 2, 12, 3)).takeOrdered(1)
// returns Array(2)
sc.parallelize(Seq(2, 3, 4, 5, 6)).takeOrdered(2)
// returns Array(2, 3)
Return approximate number of distinct elements in the RDD.
The algorithm used is based on streamlib`s implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm",
Return the number of elements in the RDD.
· JDK 1.7 or higher
· Scala 2.10.3