Finding The Longest Common Subsequence In A Variable Amount Of Sets With No Repeating Characters?
Solution 1:
Not sure how efficient an adaption of the normal LCS algorithm would be, so this may be inefficient, but since there aren't many eligible letters, it's not too horrible.
Make a successor matrix. For each string s, for each pair of letters (s[i],s[j]) with i < j, increment matrix[s[i]][s[j]]. A pair of letters can only be part of a longest common subsequence if matrix[a][b] = n = number of strings. Build a directed graph from the letters participating in such a pair, with an edge from a to b iff matrix[a][b] = n. Find a longest path in that graph.
Solution 2:
What you are trying to do is called multiple sequence alignment. The algorithm 'could' be rewritten to be n-dimensional, but that would increase the space complexity to Length ^ Dimensions.
Since none of the letters repeat, is there a more efficient algorithm I should consider rather than the one described in the Wikipedia article?
Perhaps. First create a mapping (a,b) such that a < b. For example the mappings for the first three characters of each sets is as follows
--- Mapping for character[0]
Z - BA - C A - N Z - IA N IA
N I C N
IB O C
C Z T O
O I Z T
T O BB
--- Mapping for character[1]B - A C - N N - II - A
N I C N
IB O C
C Z T O
O I Z T
T O BB
--- Mapping for character[2]A - N N - I N - C I - N
IB O C
C Z T O
O I Z T
T O BBThen retain only mappings that are common on all sets. Here's an example of following this process, traversing Set[A] (ZBANICOT), which will give us a set of relations that are common to all sets.
So with the letter Z you will observe that in Set[C], the only character that follows Z is B, so you can eliminate just about every mapping from Z that is not to B. But to cut a long story short, every mapping for Z is removed. Note, only mappings from Z are removed, not mappings to Z.
Also note that since Set[D] ends with B, we can safely eliminate all mappings from B. By following the process one can observe that A maps to N, C, O and T. Next is N, which maps to T and O.
I maps to O (honestly, "I/O") while C maps to O and T (how cute). And funnily enough O maps to nothing!!! How befitting.
Finally there is one more character, and T maps to nothing too (since ZBANICOT ends with T).
In the end you are left with the following mappings:
A - C C - O I - O N - O
N TT
O
TNow all you have to do is start from each of the keys (A, C, I and N) and search for the longest sequence without any repeating nodes (characters in this case).
Here are all of the common subsequences (all using the simple algorithm):
ACO
ACT
ANO
ANT
AO
AT
CO
CT
IO
NT
NOSolution 3:
What if I stored character mapping for each set of characters and then incremented this count each time...
For the 1st set (ZBANICOT), this would create the following character pairs (28 of them):
ZB, ZA, ZN, ZI, ZC, ZO, ZT
BA, BN, BI, BC, BO, BT
AN, AI, AC, AO, AT
NI, NC, NO, NT
IC, IO, IT
CO, CT
OT
For the 2nd set (ACNTBZIO), I would have 28 pairs again:
AC, AN, AT, AB, AZ, AI, AO
CN, CT, CB, CZ, CI, CO
NT, NB, NZ, NI, NO
TB, TZ, TI, TO
BZ, BI, BO
ZI, ZO
IO
For the last 2 sets, I would have:
AN, AI, AC, AO, AT, AZ, AB
NI, NC, NO, NT, NZ, NB
IC, IO, IT, IZ, IB
CO, CT, CZ, CB
OT, OZ, OB
TZ, TB
ZB
ZI, ZA, ZN, ZC, ZO, ZT, ZB
IA, IN, IC, IO, IT, IB
AN, AC, AO, AT, AB
NC, NO, NT, NB
CO, CT, CB
OT, OB
TB
(So to figure out the iteration count so far, let N equal the number of sets, and M be the number of characters in each set, we have N*(M)((M-1)/2) or 4*28=112 in this example)
Next, I would count the number of times each pair exists. For example,
PairCounts["ZB"] == 3
PairCounts["ZA"] == 2// ...
PairCounts["AO"] == 4
PairCounts["AT"] == 4// ...
PairCounts["AN"] == 4
PairCounts["AC"] == 4
PairCounts["CT"] == 4
PairCounts["NO"] == 4
PairCounts["NT"] == 4Then, since we have 4 pairs, I would discard all pairs which don't have a count of 4. And then I would try concatting each pair in order to form the longest string possible.
Post a Comment for "Finding The Longest Common Subsequence In A Variable Amount Of Sets With No Repeating Characters?"