﻿ Exploring an algorithm to find minimum chain containing certain items

# Exploring an algorithm to find minimum chain containing certain items

Sorry, I cannot come up with the name of an algorithm or problem for the following algorithm. I will state the problem and then what I have tried and perhaps someone can point me in the right direction.

Imagine you have a bag of items (unordered, duplicates permitted). In practice the bag may contain 2-20 items in case this relaxation helps.

The goal is to find the minimum length chain (ordered link list in case we have different notions of a chain) which contains all of the items in the bag in any order.

A chain consists of a start token (not present in bag) followed by any number of items followed by an end token (also not in bag).

The chain is formed by piecing together n-tuples (order is important) and as a further relaxation let us say the n value is the same for all tuples. In practice, I am working with n = 3. Chains may be "blended" as opposed to concatenated if they have overlapping elements. For example, consider (a,b,c) and (c,d,e). The may be joined as (a,b,c,d,e). Likewise, (a,b,c) and (b,c,d) may be joined as (a,b,c,d). Some tuples may have a start token in the first position and some tokens make have an end token in the last position which of course allows there to be a solution to the problem.

So it would seem to me that an exact solution to the problem is not tractable in general. Some sort of optimization algorithm would be necessary in order to get a "good" solution to the problem. "Good" solutions I can live with.

What I have started with is a greedy approach where on the first pass you find the tuple which contains the most number of elements in the bag, arbitrarily breaking ties. Create a data structure which holds the chain we have built so far, and stick the chosen tuple into this data structure. Split the problem into 2 subproblems, the start token side and the end token side. Until the first token of the data structure of subproblem 1 is a start token and the last token of subproblem 2 is an end token, grow the chain such that we are trying to find the stop condition as soon as possible (start or end token depending on the subproblem) while also trying to exhaust the contents of the bag as soon as possible. This may not be good because each subproblem has to communicate with each other as to how many items are left in the bag which need to be included.

Anyone seen this problem anywhere? Any thoughts as to how to improve (or get to work correctly) this algorithm? This is a real problem I am tackling which is a smart part of a much larger system and is not a toy problem or a homework problem.

EDIT

Sorry all I been away from the computer today. I will try to post an example solution which is not too trivial, yet not too intricate to see.

Given:

1. `Bag = {A, B, C, D}` (I make it a set for the sake of example, but each item can appear more than once)
2. `/ = Start Token`
3. `\ = End Token`
4. 3-Tuples (Triples): I label them a-g for simplicity in naming. The lowercase letters have no actual function in the problem.

``````(/,A, E) a
(/,C, D) b
(/,G, H) c
(D,B, A) d
(C,G, H) e
(B,A, \) f
(G,H, \) g
``````

Solution: If we chain together b, d and f we get `(/,C,D,B,A,\)`.
This is the shortest possible chain containing all elements in the bag which is length 6 if you count both the start and end token. In general, the shortest possible path is of length |BAG| + 2, if it in fact exists. I hope my problem statement makes more sense now.

As you only have up to 20 items I think you can compute an exact solution in a reasonable time (e.g. under a minute).

One approach would be to use dynamic programming where the state is given by:

``````A) a 20 bit number m (which will represent which items have been visited so far)
B) a number b in the range 1..20
C) a number c in the range 1..20
``````

This state would correspond to a chain looking like Start,?,?,?,...,?,b,c. i.e. b and c are the 2 most recent elements.

The number m is a bitfield that represents which other elements have been visited in the chain. In other words, bit i of m is 1 if and only if the chain includes the i th element in the bag.

The algorithm to find the shortest chain would then be:

1. Let S = set of states consisting of all tuples that have the start token. (All these states will have the same chain length of 2)
2. For each chain length y from 3 upwards, go through all states in S and try extending the state to have length y by using an appropriate tuple. If this is possible, add the extended state to set S.
3. Only allow tuples with the end token to be added if the bitfield m would end up with all bits set.

If you ever manage to add a tuple containing the end state then you have found the shortest chain to contain all the elements.

For N items in the bag, there are roughly 2^N.N.N states which should be just about manageable.