Number of Lines to Write a String

A nifty parallel algorithm which seems difficult at first glance.

Continuing with some leetcode problems, I present today my take on number of lines to write a string.

We have as input a String containing only letters (we will turn it into a bytes slice with .as_bytes()). What is more we also know the width of any letter, stored in a width slice such that width[0] contains the width of ‘a’, width[1] the width of ‘b’ and … width[25] the width of ‘z’.

We want to print a text but our page only has a total width of 100 so we need to print it on several lines. The algorithm will greedily fill each line sequentially until we try to fill the current line but there is not enough space left. If a character does not fit the remaining space we just start a new line and put it there. All chars have a width <= 100.

Your goal is to find the number of lines needed and the space used on the last line.

Here is a small project to complete.

pub fn number_of_lines(widths: &[i32], s: &str) -> (i32, i32) {
    unimplemented!()
}

Sequential Solution

Ok, here is the sequential solution.

pub fn number_of_lines(widths: &[i32], s: &str) -> (i32, i32) {
    s.as_bytes()

We start by looking at the string like a slice of bytes. This will speed up iterations and allow us some more parallel operations later on.

        .iter()

Let’s loop on all chars.

        .map(|c| *c as usize - 'a' as usize)

Turn each char into the corresponding index between 0 and 25.

        .map(|i| widths[i])

Now each index into the corresponding width so we end up looping on all widths.

        .fold((1, 0), |(lines, used_space), s| {
            if used_space + s > 100 {
                (lines + 1, s)
            } else {
                (lines, used_space + s)
            }
        })
}

Last step, we fold, producing the lines count and the space count. For each width s if it fits we update the space used. If it does not fit the current line we increase the line count and reset space use.

Parallel Solution

So, the parallel solution does not seem trivial because of the fold which propagates a sequential state. At first glance it seems even very bad. Let’s take for example the widths [45, 60, 13, 22, 17, 80]. If we divide and try to recurse we end up with

45, 60, 13 which uses 2 lines with a use of 13 on the last line
22, 17, 80 uses 2 lines with a use of 80 on the last line

The final answer should be [45] [60,13,22] [17, 80] so three lines and 97 space used on the last line.

Now how should we turn (2,13) and (2,80) into (3, 97) ??

It seems bad, right ?

Well, turns out there is a solution:

First I’ll start by adding an intermediate sequential function:

fn scan_string<'a>(
    init: (i32, i32),
    widths: &'a [i32],
    s: &'a [u8],
) -> impl Iterator<Item = (i32, i32)> + 'a {
    s.iter()
        .map(|c| *c as usize - 'a' as usize)
        .map(move |i| widths[i])
        .scan(init, |(ref mut lines, ref mut used_space), s| {
            if *used_space + s > 100 {
                *lines += 1;
                *used_space = s;
            } else {
                *used_space += s;
            }
            Some((*lines, *used_space))
        })
}

This is very similar to our previous code except that the fold is replaced by a scan. This will allow us to loop on all partial results.

Ok, let’s get started.

pub fn par_number_of_lines(widths: &[i32], s: &str) -> (i32, i32) {
    let letters = s.as_bytes();
    let n = letters.len();
    let mut res = letters
        .par_chunks(n / 8)

Let’s take a parallel iterator on 8 subslices. We could have gone with dynamic scheduling but it will be easier to control what’s going on this way.

        .map(|s| (s, scan_string((1, 0), &widths, s).last().unwrap()))

Each slice gets solved sequentially (this time using the scan). Note that we also return the slice itself since we will need it for fusing partial results later on.

        .map(|t| std::iter::once(t).collect::<LinkedList<_>>())
        .reduce(LinkedList::new, |mut l1, mut l2| {
            l1.append(&mut l2);
            l1
        })
        .into_iter();

We turn each result into a linked list and reduce to obtain a list of partial results which is then turned into a sequential iterator.

We now need to loop on all partial results and fuse them.

    let init = res.next().map(|(_, r)| r).unwrap_or((0, 0));

Let’s initialize the final result with the result from the first slice and continue by fusing all results sequentially.

    res.fold(
        init,
        |(previous_lines, previous_space): (i32, i32), (slice, (lines, space))| {
            let (l, s) = match scan_string((0, previous_space), &widths, slice)
                .zip(scan_string((1, 0), &widths, slice))
                .try_fold(
                    (0, 0),
                    |_, ((new_line, new_space), (old_line, old_space))| {
                        if old_space == new_space {
                            Err((lines + new_line - old_line, space))
                        } else {
                            Ok((new_line, new_space))
                        }
                    },
                ) {
                Ok(r) => r,
                Err(r) => r,
            };
            (previous_lines + l, s)
        },
    )
}

Wow wow wow, what is that ? Let’s take again an example, this time with sizes [30, 20, 10, 80, 17, 90, 40, 24]. We have:

on the left : [30, 20, 10] [80] which gives 2 lines and space 80,
on the right : [17] [90] [40, 24] which gives 3 lines and space 64,
the ‘real’ solution : [30, 20, 10] [80, 17] [90] [40, 24] which gives 4 lines and space 64.

Now the nice observation is that [90] [40, 24] is actually the same in the partial solution and in the final solution. This means that if we have the partial solutions then when the real solution reaches ‘90’ there is no need to continue the computations.

So what we can do is compare the solution and the partial solution until there is a match. We don’t store solutions but scan_string is able to loop on them lazily. We can play the real solution by calling:

scan_string((0, previous_space), &widths, slice)

(initial line does not count but we must continue from previous space) and the partial one by calling (again):

scan_string((1, 0), &widths, slice)

(we start a new line with no used space).

So we zip both iterators and replay them until the amount of space used matches. From then, we can abort the loop because both iterators will advance in sync. So we use a try_fold to be able to abort the loop when spaces become equal. The ’else’ branch correspond to a last on the real solution iterator but the ‘if’ branch allows us an early break.

The amazing stuff is that while the worst case is catastrophic O(n), it actually converges very fast in practice.

Here is a log of a run (the given code) if you don’t believe me:

You can see all reduction tasks are instantaneous. The whole run takes 60ms (without logs) on my laptop with two threads while the sequential run takes 120ms. Additional threads don’t improve performances since we become memory bound.