Cookies

A small problem with complex divisions.

Today we go on with some more problem from leetcode : assigning cookies.

We will do a small variant. We have children and cookies and want to give cookies to a maximum number of children. Cookies have different integer sizes, stored in a slice s: &[u32]. Now children have expectations (called greed here, wow, so nice) about the minimum size they can accept for a cookie. All children’s greed are stored in a slice g: &[u32].

Both slices are initially sorted in decreasing order.

Can you write a sequential and a parallel code for this problem ? As usual for the parallel version you need to think in a divide and conquer way.

Let’s take some examples:

g = [5,4,3,3,2], s = [7,6,6,2,1] : we can serve 4 children
g = [7,6,5,4,4,4], s = [5,5,4,2] : we can serve 3 children
g = [7,6,6,3,2,2], s = [6,5,5,1] : we can serve 3 children

Sequential Solution

So, this is a bit reminiscent of a merge algorithm where you advance on two slices but in a irregular manner. Like for the merge it is easier to solve if you use peekable iterators.

#![feature(peekable_next_if)]
pub fn find_content_children(g: &[u32], s: &[u32]) -> usize {
    let mut sizes = s.iter().peekable();
    g.iter()
        .filter_map(|greed| sizes.next_if(|&s| s >= greed))
        .take(s.len())
        .count()
}

In this algorithm we start by converting s to a peekable sizes iterator. We then loop on all greeds and we filter out those which cannot be satisfied. Since sizes are processed in decreasing order we only need to test against the first available size. If the size is large enough we need to consume it by removing it from the sizes iterator. We could use peek and next but there is an actual shortcut using next_if.

We take at most s.len() cookies thus aborting early if no cookies remain. Finally we consume the iterator by counting the number of cookies served.

Parallel Solution

Parallel algorithm

Ok, let’s go for the parallel code. Let’s try divide and conquer.

First step, divide the input into two sub-inputs. Our input is composed of two slices g and s so it makes sense to divide them into two sub-slices each.

Let’s give it a try:

Let’s take g = [7,6,6,3,2,2], s = [6,5,5,1]. We divide g into two equally sized halves : [7,6,6] on the left and [3,2,2] on the right. Now we could also divide s equally into [6,5] on the left and [5,1] on the right. Sadly, this division of s is not super nice. Why ?

Well, we need to go on with step 2 to figure it out : conquer !

We recurse on both sub-inputs.

[7,6,6] and [6,5] will give 1 cookie assigned (sized 6, assigned to child of greed 6)
[3,2,2] and [5,1] will give 1 cookie assigned (sized 5, assigned to child of greed 3)

So we have our two sub-solutions and now need to merge them in order to obtain the final solution which is here 3 cookies assigned (6 to 6, 5 to 3, 5 to 2). Now, how can 1 cookie + 1 cookie become 3 cookies ?

Well, we can see that we must re-use the cookie of size 5 which has been assigned to the left but this time to the right. This means that some (1 in this example) cookies were assigned to the wrong side and we now need to recompute if they can be used or not but on the other side.

It would have been better to directly divide with the cookies on the good side :

[7,6,6] and [6] will give 1 cookie assigned
[3,2,2] and [5,5,1] will give 2 cookies assigned

This means s needs to be divided un-equally. What we must do is take the last (smallest) element of the left sub-slice of g and use this value to do a binary search in s.

By doing that we are not sure all ’left’ cookies are going to be consumed by ’left’ greeds but any un-consumed ’left’ cookie can be consumed by any ‘right’ greed.

Parallel code

Let’s go :

pub fn par_find_content_children(g: &[u32], s: &[u32]) -> usize {
    rayon::iter::split((g, s), |(g, s)| {
        let n = g.len();
        let (left_g, right_g) = g.split_at(n / 2);
        if let Some(target) = left_g.last() {
            let i = match s.binary_search_by(|a| target.cmp(a)) {
                Ok(i) => i + 1,
                Err(i) => i,
            };
            let (left_s, right_s) = s.split_at(i);
            ((left_g, left_s), Some((right_g, right_s)))
        } else {
            ((g, s), None)
        }
    })

We start by writing our parallel iterator on our sub-slices. g is split in half and s after the binary search.

    .map(|(g, s)| {
        let served = find_content_children(g, s);
        let unserved = g.len() - served;
        let remaining_sizes = s.len() - served;
        (served, unserved, remaining_sizes)
    })

Now we solve the sequential task with the sequential algorithm. Instead of returning only a single value we return :

how many cookies have been assigned
how many children are left without cookies
how many cookies are remaining

    .reduce(
        || (0, 0, 0),
        |left, right| {
            let newly_served = (right.1).min(left.2);
            (
                left.0 + right.0 + newly_served,
                left.1 + right.1 - newly_served,
                left.2 + right.2 - newly_served,
            )
        },
    )
    .0
}

Now for the reduction:

we already keep the left and right assigned cookies
the left children with no cookies, well we cannot do anything for them
the right children however can be fed by remaining left cookies. Because of the binary search, any right child can be fed any unused left cookie.

So we compute newly_served the number of left cookies fed to right children and use its value to compute the new values for the result triplet.

Performances

This is my code for performances evaluation:

const SIZE: usize = 10_000_000;
fn main() {
    let mut greeds: Vec<_> = std::iter::repeat_with(rand::random::<u32>)
        .map(|e| e % (2 * SIZE as u32))
        .take(SIZE)
        .collect();
    let mut sizes: Vec<_> = std::iter::repeat_with(rand::random::<u32>)
        .map(|e| e % (2 * SIZE as u32))
        .take(SIZE)
        .collect();
    // added pre-condition: greeds and sizes are already sorted
    greeds.sort_by(|a, b| b.cmp(a));
    sizes.sort_by(|a, b| b.cmp(a));

    let start = std::time::Instant::now();
    let seq = find_content_children(&greeds, &sizes);
    println!("seq took {:?}", start.elapsed());

    let start = std::time::Instant::now();
    let par = par_find_content_children(&greeds, &sizes);
    println!("par took {:?}", start.elapsed());

    assert_eq!(seq, par);
}

On my machine i get the following output with two threads :

seq took 15.18747ms
par took 10.416638ms

(larger input sizes have better speedups)

Not too bad again even for such small running times.

Note that a fun side effect of the parallel algorithm is that it could theoretically have a speedup larger than one on a single thread :-). Can you figure out why ?

Solution

If we take s=[5,5,5....,5,4,3,2] with 10 million elements (that is almost all elements are fives). If the last value of the left slice of greeds is 4 then the sizes will be divided into left_s=[5,5,5,.....,5,4] and right_s=[3,2] so extremely un-even sizes. The 10 million greeds will be however cut into two equal sized sub-slices of 5 million elements. Now there is a possibility that right_s gets consumed very fast and the take of the sequential algorithms stops the loop. In such a case we would end up with a work W=n/2 + log(n) because we only need to process half of the greeds while the sequential algorithm has a work of n.