Top K Quickselect in Clojure Published Thu Apr 17 02:00:00 SAST 2014
This is my implementation of a Top K quickselect in Clojure. The beauty of this algorithm is that on average it runs in O(n)
time, unlike quicksort's O(n log n)
. Note the :keys
destructuring.
(defn top-k [coll k]
(let [pivot (rand-nth coll)
{:keys [left middle right]} (group-by
#(cond (= % pivot) :middle (< % pivot) :left :else :right) coll)
left-cnt (count left)]
(cond
(= left-cnt k) left
(= (inc left-cnt) k) (concat left middle)
(> left-cnt k) (top-k left k)
:else ; left-cnt < k
(concat left middle (top-k right (- k left-cnt 1))))))
Example
(top-k [4 3 2 1 0] 3) ; => (0 1 2)
(top-k [4 5 3 2 8 3 8 9 0 9] 3) ; => (0 2 3)
How does it work?
The top-k
function chooses a random pivot element in a collection coll
and groups smaller elements into left
and larger elements into right
. Based on the length of left
, it recursively applies top-k
to the right side until it has k values and returns left
.
This function could be rewritten with recur
tail-call optimization to prevent blowing the stack for massive collections.