<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Olhovsky &#187; Misc</title>
	<atom:link href="http://www.olhovsky.com/category/misc/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.olhovsky.com</link>
	<description>Miscellaneous Coding</description>
	<lastBuildDate>Thu, 03 Jun 2010 16:30:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Occurences of one string in another.</title>
		<link>http://www.olhovsky.com/2009/12/occurences-of-one-string-in-another/</link>
		<comments>http://www.olhovsky.com/2009/12/occurences-of-one-string-in-another/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 20:24:40 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Misc]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=128</guid>
		<description><![CDATA[The string &#8220;ABC&#8221; occurs in &#8220;ABBC&#8221; twice, if you remove any characters you wish from &#8220;ABBC&#8221;.
The following algorithm finds such occurences in O(n) time given any two strings.
One assumption worth noting for the O(n) time bound is that the size of the alphabet is limited to some constant size.
Nested loops in this algorithm scream O(n^2)!! [...]]]></description>
			<content:encoded><![CDATA[<p>The string &#8220;ABC&#8221; occurs in &#8220;ABBC&#8221; twice, if you remove any characters you wish from &#8220;ABBC&#8221;.</p>
<p>The following algorithm finds such occurences in O(n) time given any two strings.<br />
One assumption worth noting for the O(n) time bound is that the size of the alphabet is limited to some constant size.</p>
<p>Nested loops in this algorithm scream O(n^2)!! However, those loops are bounded by a constant that is proportional to the size of the alphabet, squared. Since the alphabet is bounded by a constant (by my assumption above), these nested loops will be bounded by a constant &#8212; so they don&#8217;t take us out of O(n) time.</p>
<pre class="brush:python">__author__ = "Kris Olhovsky"
__date__ = "$Dec 29, 2009 6:54:11 PM$"

class Char:
    def __init__(self, char):
        self.char = char
        self.paths = {} # Key is length of paths, value is the # of paths.

    ''' Return the number of paths to this char of length len. '''
    def retrieve_matches(self, len):
        count = 0
        for c in self.paths:
            if len in self.paths[c]:
                count += self.paths[c][len]
                self.paths[c][len] = 0 # Delete recorded matches.
        return count

    ''' Copies paths from char c to this char, adding 1 to the
        length of every path. Paths of length greater than len are ignored. '''
    def add_paths(self, c, len):
        temp = {} # Avoid altering map during iteration.
        for char in c.paths:
            for length in c.paths[char]:
                if length + 1 <= len:
                    if c not in temp:
                        temp[c] = {}
                    if length + 1 not in temp[c]:
                        temp[c][length+1] = 0
                    temp[c][length+1] += c.paths[char][length]

        for k in temp: # Apply changes to map.
            for j in temp[k]:
                if k not in self.paths:
                    self.paths[c] = {}
                if j not in self.paths[k]:
                    self.paths[k][j] = 0
                self.paths[k][j] += temp[k][j]

''' Returns the number of occurences of string s1 in string s2. '''
def subseq(s1, s2):
    if s1 == "": # Special case defined by the problem.
        return 1

    prev = {}   # Maps a char to a dict of chars that appear one index to
                # the left of that char anywhere in s1.

    chars = {}  # All of the chars that appear in s1.

    for c in s1: # Initialize maps.
        prev[c] = {}
        chars[c] = Char(c)

    i = 1
    while i < len(s1): # Populate maps with data from s1.
        prev[s1[i]][s1[i-1]] = chars[s1[i-1]]
        i += 1

    start = Char('start')
    chars[s1[0]].paths[start] = {}
    chars[s1[0]].paths[start][0] = 0
    occurences = 0 # Record number of occurences of s1 in s2.
    for char in s2:

        if char in chars: # A char in s2 matches a char in s1.
            if char in prev[char]: # If same exists, prioritize this.
                chars[char].add_paths(prev[char][char], len(s1) - 1)
            for c in prev[char]:
                if c != char:
                    chars[char].add_paths(prev[char][c], len(s1) - 1)
            if char == s1[0]: # First char in s1 always starts a new path.
                chars[s1[0]].paths[start][0] += 1

            if char == s1[-1]: # Update occurences when last char of s1 seen.
                occurences += chars[char].retrieve_matches(len(s1) - 1)

    return occurences</pre>
<p>Here are some example usages:</p>
<pre class="brush:python">if __name__ == "__main__":
    s1 = "ABC"
    s2 = "ABBC"
    print subseq(s1, s2)

    s1 = "ABCB"
    s2 = "ABBCBBCB"
    print subseq(s1, s2)

    s1 = "ABCB"
    s2 = "ABCB"
    print subseq(s1, s2)

    s1 = "ABBB"
    s2 = "ABBBBB"
    print subseq(s1, s2)

    s1 = "ABBB"
    s2 = "ABABB"
    print subseq(s1, s2)</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2009/12/occurences-of-one-string-in-another/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extract Longest Non-Decreasing Sequence From Any Sequence</title>
		<link>http://www.olhovsky.com/2009/11/extract-longest-increasing-sequence-from-any-sequence/</link>
		<comments>http://www.olhovsky.com/2009/11/extract-longest-increasing-sequence-from-any-sequence/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 01:07:19 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Misc]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=106</guid>
		<description><![CDATA[I wrote some python code that extracts the longest non-decreasing subsequence from any given sequence.
This runs in O(n log n) time, and uses O(n) memory.
''' An item in the final sequence, used to form a linked list. '''
class SeqItem():
    val = 0      # This item's value.
  [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote some python code that extracts the longest non-decreasing subsequence from any given sequence.</p>
<p>This runs in O(n log n) time, and uses O(n) memory.</p>
<pre class="brush:python">''' An item in the final sequence, used to form a linked list. '''
class SeqItem():
    val = 0      # This item's value.
    prev = None  # The value before this one.
    def __init__(self, val, prev):
        self.val = val
        self.prev = prev

''' Extract longest non-decreasing subsequence from sequence seq.'''
def extract_sorted(seq):
    subseqs = [SeqItem(seq[0], None)] # Track decreasing subsequences in seq.
    result_list = [subseqs[0]]
    for i in range(1, len(seq)):
        result = search_insert(subseqs, seq[i], 0, len(subseqs))

    # Build Python list from custom linked list:
    final_list = []
    result = subseqs[-1] # Longest nondecreasing subsequence is found by
                         # traversing the linked list backwards starting from
                         # the final smallest value in the last nonincreasing
                         # subsequence found.
    while(result != None and result.val != None):
        final_list.append(result.val)
        result = result.prev # Walk backwards through longest sequence.

    final_list.reverse()
    return final_list

''' Seq tracks the smallest value of each nonincreasing subsequence constructed.
Find smallest item in seq that is greater than search_val.
If such a value does not exist, append search_val to seq, creating the beginning
of a new nonincreasing subsequence.
If such a value does exist, replace the value in seq at that position, and
search_val will be considered the new candidate for the longest subseq if
a value in the following nonincreasing subsequence is added.
Seq is guaranteed to be in increasing sorted order.
Returns the index of the element in seq that should be added to results. '''
def search_insert(seq, search_val, start, end):
    median = (start + end)/2

    if end - start < 2: # End of the search.
        if seq[start].val > search_val:
            if start > 0:
                new_item = SeqItem(search_val, seq[start - 1])
            else:
                new_item = SeqItem(search_val, None)

            seq[start] = new_item
            return new_item
        else: # seq[start].val <= search_val
            if start + 1 < len(seq):
                new_item = SeqItem(search_val, seq[start])
                seq[start + 1] = new_item
                return new_item
            else:
                new_item = SeqItem(search_val, seq[start])
                seq.append(new_item)
                return new_item

    if search_val < seq[median].val: # Search left side
        return search_insert(seq, search_val, start, median)
    else: #search_val >= seq[median].val: # Search right side
        return search_insert(seq, search_val, median, end)</pre>
<p>And here is an example usage:</p>
<pre class="brush:python">
import random
if __name__ == '__main__':
    seq = []
    for i in range(100000):
        seq.append(int(random.random() * 1000))

    print extract_sorted(seq)</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2009/11/extract-longest-increasing-sequence-from-any-sequence/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
