Occurences of one string in another.

The string “ABC” occurs in “ABBC” twice, if you remove any characters you wish from “ABBC”.

The following algorithm finds such occurences in O(n) time given any two strings.
One assumption worth noting for the O(n) time bound is that the size of the alphabet is limited to some constant size.

Nested loops in this algorithm scream O(n^2)!! However, those loops are bounded by a constant that is proportional to the size of the alphabet, squared. Since the alphabet is bounded by a constant (by my assumption above), these nested loops will be bounded by a constant — so they don’t take us out of O(n) time.

__author__ = "Kris Olhovsky"
__date__ = "$Dec 29, 2009 6:54:11 PM$"

class Char:
    def __init__(self, char):
        self.char = char
        self.paths = {} # Key is length of paths, value is the # of paths.

    ''' Return the number of paths to this char of length len. '''
    def retrieve_matches(self, len):
        count = 0
        for c in self.paths:
            if len in self.paths[c]:
                count += self.paths[c][len]
                self.paths[c][len] = 0 # Delete recorded matches.
        return count

    ''' Copies paths from char c to this char, adding 1 to the
        length of every path. Paths of length greater than len are ignored. '''
    def add_paths(self, c, len):
        temp = {} # Avoid altering map during iteration.
        for char in c.paths:
            for length in c.paths[char]:
                if length + 1 <= len:
                    if c not in temp:
                        temp[c] = {}
                    if length + 1 not in temp[c]:
                        temp[c][length+1] = 0
                    temp[c][length+1] += c.paths[char][length]

        for k in temp: # Apply changes to map.
            for j in temp[k]:
                if k not in self.paths:
                    self.paths[c] = {}
                if j not in self.paths[k]:
                    self.paths[k][j] = 0
                self.paths[k][j] += temp[k][j]

''' Returns the number of occurences of string s1 in string s2. '''
def subseq(s1, s2):
    if s1 == "": # Special case defined by the problem.
        return 1

    prev = {}   # Maps a char to a dict of chars that appear one index to
                # the left of that char anywhere in s1.

    chars = {}  # All of the chars that appear in s1.

    for c in s1: # Initialize maps.
        prev[c] = {}
        chars[c] = Char(c)

    i = 1
    while i < len(s1): # Populate maps with data from s1.
        prev[s1[i]][s1[i-1]] = chars[s1[i-1]]
        i += 1

    start = Char('start')
    chars[s1[0]].paths[start] = {}
    chars[s1[0]].paths[start][0] = 0
    occurences = 0 # Record number of occurences of s1 in s2.
    for char in s2:

        if char in chars: # A char in s2 matches a char in s1.
            if char in prev[char]: # If same exists, prioritize this.
                chars[char].add_paths(prev[char][char], len(s1) - 1)
            for c in prev[char]:
                if c != char:
                    chars[char].add_paths(prev[char][c], len(s1) - 1)
            if char == s1[0]: # First char in s1 always starts a new path.
                chars[s1[0]].paths[start][0] += 1

            if char == s1[-1]: # Update occurences when last char of s1 seen.
                occurences += chars[char].retrieve_matches(len(s1) - 1)

    return occurences

Here are some example usages:

if __name__ == "__main__":
    s1 = "ABC"
    s2 = "ABBC"
    print subseq(s1, s2)

    s1 = "ABCB"
    s2 = "ABBCBBCB"
    print subseq(s1, s2)

    s1 = "ABCB"
    s2 = "ABCB"
    print subseq(s1, s2)

    s1 = "ABBB"
    s2 = "ABBBBB"
    print subseq(s1, s2)

    s1 = "ABBB"
    s2 = "ABABB"
    print subseq(s1, s2)
This entry was posted in Coding, Misc and tagged , . Bookmark the permalink.

One Response to Occurences of one string in another.

  1. nails says:

    You should be a part of a contest for one
    of the highest quality websites on the net. I’m going to recommend
    this blog!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>