algorithm-notes
  • Preface
  • Outline
    • Asymptotic-Analysis
    • Divide-and-Conquer (DnC)
      • Overview
      • Multiplication
      • DnC Strategy
      • Master Method
    • Randomization
      • Overview
      • Discrete Probability
      • Randomized Algorithms
      • Reservoir Sampling
    • Sorting
      • Overview
      • Selection Sort
      • Bubble Sort
      • Insertion Sort
      • Shell Sort
      • Merge Sort
      • Heap Sort
      • Quick Sort
      • Non-Comparison Sort
    • Graph Algorithms
      • Overview
      • Minimum Spanning Tree
      • Shortest Path
      • Topological Sort
    • Tree
      • Overview
      • Balanced Search Tree
        • AVL Tree
        • Red-Black Tree
        • B Tree
      • Heap
      • Disjoint Set
      • Threaded Binary Tree
      • Segment Tree
    • Searching
      • Overview
      • Binary Search
      • Graph Search
      • Hash Table
      • String Search
    • Dynamic Programming
      • Overview
      • Fibonacci Numbers
      • LCS problem
    • Greedy Algorithms
      • Overview
Powered by GitBook
On this page
  • Analysis of this problem
  • Pseudocode Solution

Was this helpful?

  1. Outline
  2. Dynamic Programming

LCS problem

PreviousFibonacci NumbersNextGreedy Algorithms

Last updated 6 years ago

Was this helpful?

differs from problem of finding the ; and it has wide applications such as and .

The problem is formally defined as follows: define two strings for example abcgf and achfe, find the same longest subsequence from both of them. A subsequence is composed of characters in same relative order within the string but not necessarily being contiguous. Strings like ac, bc, bf, abf are all subsequences of string abcgf.

From all the subsequences of strings abcgf and achfe, the common ones are a, ac, cf, c, f, acf, the longest common subsequence is acf.

Analysis of this problem

Define two sequences X[0..m-1] and Y[0..n-1]. And L(X[0..m-1], Y[0..n-1]) be the LCS of the two sequences X and Y. LCS problem can be solved in dynamic programming for its satisfaction of two important factors:

  • Optimal Substructure

    If a problem has an optimal solution and its sub-problems also have optimal solutions, then we say this problem has optimal substructure.

    In the above example, sequence X is abcgf and sequence Y is achfe; Then, if the last characters of X and Y match we only need to find out if the preceding characters match: L(X[0..m-1], Y[0..n-1]) = 1 + L(X[0..m-2], Y[0..n-2]).

    If the last characters of X and Y do not match, we need to find out the maximal LCS between L(X[0..m-2], Y[0..n-1]) and L(X[0..m-1], Y[0..n-2]).

  • Overlapping Subproblems

    In the recursion of finding common sequences, there are overlapping function calls; they are called overlapping subproblems:

    In the above example, finding the LCS of X and Y is broken down into finding L(abcgf, achf) and L(abcg, achfe), the next level recursions of both have overlapping subproblems: L(abcg, achf). By memoization, the computing cost can be saved significantly (from exponential time to polynomial time).

Then, the following formula is given for this problem to be solved in dynamic programming:

Figure 1. LCS Formula

Pseudocode Solution

X of size m, Y of size n

LCS(X, Y)
  initialize a 2-D array L of size m+1 by n+1

  for i in m
    for j in n
      if i == 0 and j == 0
        L[i][j] = 0
      else if X[i - 1] == Y[j - 1]
        L[i][j] = 1 + L[i - 1][j - 1]
      else
        L[i][j] = max(L[i][j - 1], L[i - 1][j])

  return L[m][n]

The time and the space complexity are both Ο(n × m).

longest-common-subsequence
longest common substring
diff utility
bioinformatics