Log in

Alice's DVD night...ruined! - unixish [entries|archive|friends|userinfo]
Unixish - Solving problems the UNIX way

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Alice's DVD night...ruined! [Sep. 8th, 2006|03:02 pm]
Unixish - Solving problems the UNIX way
Well done to everybody that had a go at the first problem. skx provided a neat and tidy script, tuomov showed that he's apparently comfortable with pipelining and tjernobyl came up with a solution with a bit of help from everyone's best friend/enemy Perl.

Anyway, second problem again a day late!

Problem 2:

Alice was feeling tired and a bit fed up after a long and laborious week of work. So she settled down and decided she'd watch her favourite DVD to cheer her up. Unfortunately though, she found that due to deep scratches in the DVD it wouldn't play. Frustrated and angry at the situation she'd got herself into she promised to herself she'd never let it happen again. So that night, one by one, she backed up and compressed all of her legally purchased DVDs on to a directory on her PC.

Over the next few days she's decided she'd like to copy a subset of files to DVD. The problem she has however is that not all the files will fit on a single DVD and at the same time she'd like to make the most of the space on each DVD as much as possible. So what she really needs help doing is determining the optimal subset of files (which range from 500MB to 1GB each) to burn to DVD1 (wasting the least amount of space). Oh and ideally she'd like to keep things from the same series together and organised. Can you help you Alice out?

[1] 4.7GB DVD-Rs.

(thanks to pauldoo for the problem suggestion)

[User Picture]From: mdlbear
2006-09-08 02:29 pm (UTC)
You've just described the knapsack problem -- it's NP complete.

A greedy algorithm (copy as many as will fit, in order, then move on to the next) works fairly well, but it's not optimal. Computing an optimal solution takes an amount of time that increases very rapidly with the number of files, and is impractical in most cases.
(Reply) (Thread)
From: yaarg
2006-09-08 02:34 pm (UTC)
You're correct of course. I think it's a bad choice of words on my part. I guess what I meant is 'best as possible'?
(Reply) (Parent) (Thread)
[User Picture]From: mdlbear
2006-09-08 02:59 pm (UTC)
Anything with "best" in it is going to imply "optimal". You probably mean something like "doing a reasonably good job of packing".

I'd guess that an optimal solution would require something like "tar -c --multi-volume" (which splits a monster tar archive over multiple physical volumes), though it would require extra effort to extract the files.
(Reply) (Parent) (Thread)
From: yaarg
2006-09-08 03:38 pm (UTC)
Yeah you're right again! I'm being awfully sloppy - sorry!
(Reply) (Parent) (Thread)
[User Picture]From: lightning_rose
2006-09-08 06:31 pm (UTC)

You're right, the best (and probably easiest) solution would be to use an archival program that can span volumes. That way you use the full capacity of the media.

(Reply) (Parent) (Thread)
From: (Anonymous)
2006-09-08 03:22 pm (UTC)
Actually, this particular problem is called 'minimum bin packing'. In the knapsack problem there's only one DVD, and you want to pack as much stuff there as possible, by possibly some other measure than the sizes.

While the exact solution of this problem is intractable, the simple "First fit" method that first sorts the files by decreasing size, and assigns a file on the first disk on which it fits, does provide an approximate solution, that uses at most 1.5 m^* + 1 times as many disks as the optimal solution that would use m^* disks. If the files don't have widely different sizes, with most episodes in a series having the same size and so on, it is slightly possible to accomodate for the preferred ordering by varying the sorting.

There's also an asymptotically polynomial time approximation scheme for the problem, consisting of the following:

1. Remove all "small" files from the list, small depending on a parameter.
2. Quantise the sizes of the remaining items into a constant (parameter) number of of sizes.
3. Solve the resulting restricted bin packing problem, whose complexity does not depend on the input size, only on the parameters.
4. Insert the remaining small items using "first fit".

This algorithm is a bit complex, but it can achieve approximation with performance r m^* + 1 for any rational 1 < r < 2; see e.g. [Ausiello et. al: Complexity and Approximation].


Now, with regard to actually implementing even the first algorithm, it is too much work for my lazy ass, especially in sh (and anything else would feel like cheating), which would likely involve some very tedious sed'ing of lists as strings, so I'll leave it to someone else.

-- tuomov
(Reply) (Parent) (Thread)
[User Picture]From: mdlbear
2006-09-08 03:28 pm (UTC)
Thanks for the correction. Yes, first fit seems pretty straightforward. Like you, I'll let somebody else implement it.
(Reply) (Parent) (Thread)
(Deleted comment)
From: yaarg
2006-09-08 03:38 pm (UTC)
Yeah fair enough. I think the problem was perhaps a bit ambitious!
(Reply) (Parent) (Thread)
[User Picture]From: pauldoo
2006-09-08 05:41 pm (UTC)
I hadn't realised the problem I suggested was so hairy (I suspected it might be but I hadn't investigated).

It makes me feel better about the totally naive solution I implemented in Ruby (happened to be the scripting language I was sort-of learning at the time):
#!/usr/bin/env ruby

def Toggle(files, total, desc)
    files = files.clone();
    if files.size == 0
        printf("%05i", total / 1024);
        print ":" + desc + "\n";
        file = files.pop;
        #size = File.stat(file).size;
        size = `du -k "#{file}"`;
        size = Integer(size[/\d+/]);
        Toggle(files, total, desc);
        Toggle(files, total + size, " \"" + file + "\"" + desc);
Toggle(ARGV, 0, "");

I run this as dvdfitter.rb * | sort | less. I then skim down the list to find the highest value that is under the size of a DVD. This last part isn't very elegant, but I couldn't be bothered adding it to the Ruby.
(Reply) (Thread)