Add Martin's patch for rewrite of command-line option parsing. Add CGI and mod_python versions of a web interface.

2009-05-22 09:04:40 -05:00 · 2009-05-22 09:04:40 -05:00 · b9a9020feb
commit b9a9020feb
parent e163fb348c d8ffea56e5
11 changed files with 2802 additions and 135 deletions
--- a/1
+++ b/1
@ -8,3 +8,4 @@ Thanks to the following contributors to scour:
 * Martin:
 	- better methods of handling string-to-float conversions in Python
 	- document functions in the traditional Python way
+	- rewrite option parsing code
--- a/crunch.sh
+++ b/crunch.sh
@ -3,6 +3,5 @@ mkdir $1
 for FILE in `ls fulltests`
 do
 	echo Doing $FILE:
-	./scour.py -i fulltests/$FILE -o $1/$FILE >> $1/report.txt
+	./scour.py -i fulltests/$FILE -o $1/$FILE 2>> $1/report.txt
 done
-	
--- a/python-modules-pre24/fixedpoint.py
+++ b/python-modules-pre24/fixedpoint.py
@ -0,0 +1,619 @@
+#!/usr/bin/env python
+"""
+FixedPoint objects support decimal arithmetic with a fixed number of
+digits (called the object's precision) after the decimal point.  The
+number of digits before the decimal point is variable & unbounded.
+
+The precision is user-settable on a per-object basis when a FixedPoint
+is constructed, and may vary across FixedPoint objects.  The precision
+may also be changed after construction via FixedPoint.set_precision(p).
+Note that if the precision of a FixedPoint is reduced via set_precision,
+information may be lost to rounding.
+
+>>> x = FixedPoint("5.55")  # precision defaults to 2
+>>> print x
+5.55
+>>> x.set_precision(1)      # round to one fraction digit
+>>> print x
+5.6
+>>> print FixedPoint("5.55", 1)  # same thing setting to 1 in constructor
+5.6
+>>> repr(x) #  returns constructor string that reproduces object exactly
+"FixedPoint('5.6', 1)"
+>>>
+
+When FixedPoint objects of different precision are combined via + - * /,
+the result is computed to the larger of the inputs' precisions, which also
+becomes the precision of the resulting FixedPoint object.
+
+>>> print FixedPoint("3.42") + FixedPoint("100.005", 3)
+103.425
+>>>
+
+When a FixedPoint is combined with other numeric types (ints, floats,
+strings representing a number) via + - * /, then similarly the computation
+is carried out using-- and the result inherits --the FixedPoint's
+precision.
+
+>>> print FixedPoint(1) / 7
+0.14
+>>> print FixedPoint(1, 30) / 7
+0.142857142857142857142857142857
+>>>
+
+The string produced by str(x) (implictly invoked by "print") always
+contains at least one digit before the decimal point, followed by a
+decimal point, followed by exactly x.get_precision() digits.  If x is
+negative, str(x)[0] == "-".
+
+The FixedPoint constructor can be passed an int, long, string, float,
+FixedPoint, or any object convertible to a float via float() or to a
+long via long().  Passing a precision is optional; if specified, the
+precision must be a non-negative int.  There is no inherent limit on
+the size of the precision, but if very very large you'll probably run
+out of memory.
+
+Note that conversion of floats to FixedPoint can be surprising, and
+should be avoided whenever possible.  Conversion from string is exact
+(up to final rounding to the requested precision), so is greatly
+preferred.
+
+>>> print FixedPoint(1.1e30)
+1099999999999999993725589651456.00
+>>> print FixedPoint("1.1e30")
+1100000000000000000000000000000.00
+>>>
+
+The following Python operators and functions accept FixedPoints in the
+expected ways:
+
+    binary + - * / % divmod
+        with auto-coercion of other types to FixedPoint.
+        + - % divmod  of FixedPoints are always exact.
+        * / of FixedPoints may lose information to rounding, in
+            which case the result is the infinitely precise answer
+            rounded to the result's precision.
+        divmod(x, y) returns (q, r) where q is a long equal to
+            floor(x/y) as if x/y were computed to infinite precision,
+            and r is a FixedPoint equal to x - q * y; no information
+            is lost.  Note that q has the sign of y, and abs(r) < abs(y).
+    unary -
+    == != < > <= >=  cmp
+    min  max
+    float  int  long    (int and long truncate)
+    abs
+    str  repr
+    hash
+    use as dict keys
+    use as boolean (e.g. "if some_FixedPoint:" -- true iff not zero)
+
+Methods unique to FixedPoints:
+   .copy()              return new FixedPoint with same value
+   .frac()              long(x) + x.frac() == x
+   .get_precision()     return the precision(p) of this FixedPoint object
+   .set_precision(p)    set the precision of this FixedPoint object
+   
+Provided as-is; use at your own risk; no warranty; no promises; enjoy!
+"""
+
+# Released to the public domain 28-Mar-2001,
+# by Tim Peters (tim.one@home.com).
+
+
+# 28-Mar-01 ver 0.0,4
+#     Use repr() instead of str() inside __str__, because str(long) changed
+#     since this was first written (used to produce trailing "L", doesn't
+#     now).
+#
+# 09-May-99 ver 0,0,3
+#     Repaired __sub__(FixedPoint, string); was blowing up.
+#     Much more careful conversion of float (now best possible).
+#     Implemented exact % and divmod.
+#
+# 14-Oct-98 ver 0,0,2
+#     Added int, long, frac.  Beefed up docs.  Removed DECIMAL_POINT
+#     and MINUS_SIGN globals to discourage bloating this class instead
+#     of writing formatting wrapper classes (or subclasses)
+#
+# 11-Oct-98 ver 0,0,1
+#     posted to c.l.py
+
+__copyright__ = "Copyright (C) Python Software Foundation"
+__author__ = "Tim Peters"
+__version__ = 0, 1, 0
+
+def bankersRounding(self, dividend, divisor, quotient, remainder):
+    """
+    rounding via nearest-even
+    increment the quotient if
+         the remainder is more than half of the divisor
+      or the remainder is exactly half the divisor and the quotient is odd
+    """
+    c = cmp(remainder << 1, divisor)
+    # c < 0 <-> remainder < divisor/2, etc
+    if c > 0 or (c == 0 and (quotient & 1) == 1):
+        quotient += 1
+    return quotient
+
+def addHalfAndChop(self, dividend, divisor, quotient, remainder):
+    """
+    the equivalent of 'add half and chop'
+    increment the quotient if
+         the remainder is greater than half of the divisor
+      or the remainder is exactly half the divisor and the quotient is >= 0
+    """
+    c = cmp(remainder << 1, divisor)
+    # c < 0 <-> remainder < divisor/2, etc
+    if c > 0 or (c == 0 and quotient >= 0):
+        quotient += 1
+    return quotient
+
+# 2002-10-20 dougfort - fake classes for pre 2.2 compatibility
+try:
+    object
+except NameError:
+    class object:
+        pass
+    def property(x, y):
+        return None
+
+# The default value for the number of decimal digits carried after the
+# decimal point.  This only has effect at compile-time.
+DEFAULT_PRECISION = 2
+
+class FixedPoint(object):
+    """Basic FixedPoint object class,
+        The exact value is self.n / 10**self.p;
+        self.n is a long; self.p is an int
+    """
+    __slots__ = ['n', 'p']
+    def __init__(self, value=0, precision=DEFAULT_PRECISION):
+        self.n = self.p = 0
+        self.set_precision(precision)
+        p = self.p
+
+        if isinstance(value, type("42.3e5")):
+            n, exp = _string2exact(value)
+            # exact value is n*10**exp = n*10**(exp+p)/10**p
+            effective_exp = exp + p
+            if effective_exp > 0:
+                n = n * _tento(effective_exp)
+            elif effective_exp < 0:
+                n = self._roundquotient(n, _tento(-effective_exp))
+            self.n = n
+            return
+
+        if isinstance(value, type(42)) or isinstance(value, type(42L)):
+            self.n = long(value) * _tento(p)
+            return
+
+        if isinstance(value, type(self)):
+            temp = value.copy()
+            temp.set_precision(p)
+            self.n, self.p = temp.n, temp.p
+            return
+
+        if isinstance(value, type(42.0)):
+            # XXX ignoring infinities and NaNs and overflows for now
+            import math
+            f, e = math.frexp(abs(value))
+            assert f == 0 or 0.5 <= f < 1.0
+            # |value| = f * 2**e exactly
+
+            # Suck up CHUNK bits at a time; 28 is enough so that we suck
+            # up all bits in 2 iterations for all known binary double-
+            # precision formats, and small enough to fit in an int.
+            CHUNK = 28
+            top = 0L
+            # invariant: |value| = (top + f) * 2**e exactly
+            while f:
+                f = math.ldexp(f, CHUNK)
+                digit = int(f)
+                assert digit >> CHUNK == 0
+                top = (top << CHUNK) | digit
+                f = f - digit
+                assert 0.0 <= f < 1.0
+                e = e - CHUNK
+
+            # now |value| = top * 2**e exactly
+            # want n such that n / 10**p = top * 2**e, or
+            # n = top * 10**p * 2**e
+            top = top * _tento(p)
+            if e >= 0:
+                n = top << e
+            else:
+                n = self._roundquotient(top, 1L << -e)
+            if value < 0:
+                n = -n
+            self.n = n
+            return
+
+        if isinstance(value, type(42-42j)):
+            raise TypeError("can't convert complex to FixedPoint: " +
+                            `value`)
+
+        # can we coerce to a float?
+        yes = 1
+        try:
+            asfloat = float(value)
+        except:
+            yes = 0
+        if yes:
+            self.__init__(asfloat, p)
+            return
+
+        # similarly for long
+        yes = 1
+        try:
+            aslong = long(value)
+        except:
+            yes = 0
+        if yes:
+            self.__init__(aslong, p)
+            return
+
+        raise TypeError("can't convert to FixedPoint: " + `value`)
+
+    def get_precision(self):
+        """Return the precision of this FixedPoint.
+
+           The precision is the number of decimal digits carried after
+           the decimal point, and is an int >= 0.
+        """
+
+        return self.p
+
+    def set_precision(self, precision=DEFAULT_PRECISION):
+        """Change the precision carried by this FixedPoint to p.
+
+           precision must be an int >= 0, and defaults to
+           DEFAULT_PRECISION.
+
+           If precision is less than this FixedPoint's current precision,
+           information may be lost to rounding.
+        """
+
+        try:
+            p = int(precision)
+        except:
+            raise TypeError("precision not convertable to int: " +
+                            `precision`)
+        if p < 0:
+            raise ValueError("precision must be >= 0: " + `precision`)
+
+        if p > self.p:
+            self.n = self.n * _tento(p - self.p)
+        elif p < self.p:
+            self.n = self._roundquotient(self.n, _tento(self.p - p))
+        self.p = p
+
+    precision = property(get_precision, set_precision)
+
+    def __str__(self):
+        n, p = self.n, self.p
+        i, f = divmod(abs(n), _tento(p))
+        if p:
+            frac = repr(f)[:-1]
+            frac = "0" * (p - len(frac)) + frac
+        else:
+            frac = ""
+        return "-"[:n<0] + \
+               repr(i)[:-1] + \
+               "." + frac
+
+    def __repr__(self):
+        return "FixedPoint" + `(str(self), self.p)`
+
+    def copy(self):
+        return _mkFP(self.n, self.p, type(self))
+
+    __copy__ = copy
+
+    def __deepcopy__(self, memo):
+        return self.copy()
+
+    def __cmp__(self, other):
+        xn, yn, p = _norm(self, other, FixedPoint=type(self))
+        return cmp(xn, yn)
+
+    def __hash__(self):
+        """ Caution!  == values must have equal hashes, and a FixedPoint
+            is essentially a rational in unnormalized form.  There's
+            really no choice here but to normalize it, so hash is
+            potentially expensive.
+            n, p = self.__reduce()
+
+            Obscurity: if the value is an exact integer, p will be 0 now,
+            so the hash expression reduces to hash(n).  So FixedPoints
+            that happen to be exact integers hash to the same things as
+            their int or long equivalents.  This is Good.  But if a
+            FixedPoint happens to have a value exactly representable as
+            a float, their hashes may differ.  This is a teensy bit Bad.
+        """
+        n, p = self.__reduce()
+        return hash(n) ^ hash(p)
+
+    def __nonzero__(self):
+        """ Returns true if this FixedPoint is not equal to zero"""
+        return self.n != 0
+
+    def __neg__(self):
+        return _mkFP(-self.n, self.p, type(self))
+
+    def __abs__(self):
+        """ Returns new FixedPoint containing the absolute value of this FixedPoint"""
+        if self.n >= 0:
+            return self.copy()
+        else:
+            return -self
+
+    def __add__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        # n1/10**p + n2/10**p = (n1+n2)/10**p
+        return _mkFP(n1 + n2, p, type(self))
+
+    __radd__ = __add__
+
+    def __sub__(self, other):
+        if not isinstance(other, type(self)):
+            other = type(self)(other, self.p)
+        return self.__add__(-other)
+
+    def __rsub__(self, other):
+        return (-self) + other
+
+    def __mul__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        # n1/10**p * n2/10**p = (n1*n2/10**p)/10**p
+        return _mkFP(self._roundquotient(n1 * n2, _tento(p)), p, type(self))
+
+    __rmul__ = __mul__
+
+    def __div__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        if n2 == 0:
+            raise ZeroDivisionError("FixedPoint division")
+        if n2 < 0:
+            n1, n2 = -n1, -n2
+        # n1/10**p / (n2/10**p) = n1/n2 = (n1*10**p/n2)/10**p
+        return _mkFP(self._roundquotient(n1 * _tento(p), n2), p, type(self))
+
+    def __rdiv__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        return _mkFP(n2, p, FixedPoint=type(self)) / self
+
+    def __divmod__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        if n2 == 0:
+            raise ZeroDivisionError("FixedPoint modulo")
+        # floor((n1/10**p)/(n2*10**p)) = floor(n1/n2)
+        q = n1 / n2
+        # n1/10**p - q * n2/10**p = (n1 - q * n2)/10**p
+        return q, _mkFP(n1 - q * n2, p, type(self))
+
+    def __rdivmod__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        return divmod(_mkFP(n2, p), self)
+
+    def __mod__(self, other):
+        return self.__divmod__(other)[1]
+
+    def __rmod__(self, other):
+        n1, n2, p = _norm(self, other, FixedPoint=type(self))
+        return _mkFP(n2, p, type(self)).__mod__(self)
+
+    def __float__(self):
+        """Return the floating point representation of this FixedPoint. 
+            Caution! float can lose precision.
+        """
+        n, p = self.__reduce()
+        return float(n) / float(_tento(p))
+
+    def __long__(self):
+        """EJG/DF - Should this round instead?
+            Note e.g. long(-1.9) == -1L and long(1.9) == 1L in Python
+            Note that __int__ inherits whatever __long__ does,
+                 and .frac() is affected too
+        """
+        answer = abs(self.n) / _tento(self.p)
+        if self.n < 0:
+            answer = -answer
+        return answer
+
+    def __int__(self):
+        """Return integer value of FixedPoint object."""
+        return int(self.__long__())
+    
+    def frac(self):
+        """Return fractional portion as a FixedPoint.
+
+           x.frac() + long(x) == x
+        """
+        return self - long(self)
+
+    def _roundquotient(self, x, y):
+        """
+        Divide x by y,
+        return the result of rounding
+        Developers may substitute their own 'round' for custom rounding
+        y must be > 0
+        """
+        assert y > 0
+        n, leftover = divmod(x, y)
+        return self.round(x, y, n, leftover)
+
+    def __reduce(self):
+        """ Return n, p s.t. self == n/10**p and n % 10 != 0"""
+        n, p = self.n, self.p
+        if n == 0:
+            p = 0
+        while p and n % 10 == 0:
+            p = p - 1
+            n = n / 10
+        return n, p
+
+# 2002-10-04 dougfort - Default to Banker's Rounding for backward compatibility
+FixedPoint.round = bankersRounding
+
+# return 10L**n
+
+def _tento(n, cache={}):
+    """Cached computation of 10**n"""
+    try:
+        return cache[n]
+    except KeyError:
+        answer = cache[n] = 10L ** n
+        return answer
+
+def _norm(x, y, isinstance=isinstance, FixedPoint=FixedPoint,
+                _tento=_tento):
+    """Return xn, yn, p s.t.
+           p = max(x.p, y.p)
+           x = xn / 10**p
+           y = yn / 10**p
+
+        x must be FixedPoint to begin with; if y is not FixedPoint,
+        it inherits its precision from x.
+
+        Note that this method is called a lot, so default-arg tricks are helpful.
+    """
+    assert isinstance(x, FixedPoint)
+    if not isinstance(y, FixedPoint):
+        y = FixedPoint(y, x.p)
+    xn, yn = x.n, y.n
+    xp, yp = x.p, y.p
+    if xp > yp:
+        yn = yn * _tento(xp - yp)
+        p = xp
+    elif xp < yp:
+        xn = xn * _tento(yp - xp)
+        p = yp
+    else:
+        p = xp  # same as yp
+    return xn, yn, p
+
+def _mkFP(n, p, FixedPoint=FixedPoint):
+    """Make FixedPoint objext - Return a new FixedPoint object with the selected precision."""
+    f = FixedPoint()
+    #print '_mkFP Debug: %s, value=%s' % (type(f),n)
+    f.n = n
+    f.p = p
+    return f
+
+# crud for parsing strings
+import re
+
+# There's an optional sign at the start, and an optional exponent
+# at the end.  The exponent has an optional sign and at least one
+# digit.  In between, must have either at least one digit followed
+# by an optional fraction, or a decimal point followed by at least
+# one digit.  Yuck.
+
+_parser = re.compile(r"""
+    \s*
+    (?P<sign>[-+])?
+    (
+        (?P<int>\d+) (\. (?P<frac>\d*))?
+    |
+        \. (?P<onlyfrac>\d+)
+    )
+    ([eE](?P<exp>[-+]? \d+))?
+    \s* $
+""", re.VERBOSE).match
+
+del re
+
+
+def _string2exact(s):
+    """Return n, p s.t. float string value == n * 10**p exactly."""
+    m = _parser(s)
+    if m is None:
+        raise ValueError("can't parse as number: " + `s`)
+
+    exp = m.group('exp')
+    if exp is None:
+        exp = 0
+    else:
+        exp = int(exp)
+
+    intpart = m.group('int')
+    if intpart is None:
+        intpart = "0"
+        fracpart = m.group('onlyfrac')
+    else:
+        fracpart = m.group('frac')
+        if fracpart is None or fracpart == "":
+            fracpart = "0"
+    assert intpart
+    assert fracpart
+
+    i, f = long(intpart), long(fracpart)
+    nfrac = len(fracpart)
+    i = i * _tento(nfrac) + f
+    exp = exp - nfrac
+
+    if m.group('sign') == "-":
+        i = -i
+
+    return i, exp
+
+def _test():
+    """Unit testing framework"""
+    fp = FixedPoint
+    o = fp("0.1")
+    assert str(o) == "0.10"
+    t = fp("-20e-2", 5)
+    assert str(t) == "-0.20000"
+    assert t < o
+    assert o > t
+    assert min(o, t) == min(t, o) == t
+    assert max(o, t) == max(t, o) == o
+    assert o != t
+    assert --t == t
+    assert abs(t) > abs(o)
+    assert abs(o) < abs(t)
+    assert o == o and t == t
+    assert t.copy() == t
+    assert o == -t/2 == -.5 * t
+    assert abs(t) == o + o
+    assert abs(o) == o
+    assert o/t == -0.5
+    assert -(t/o) == (-t)/o == t/-o == 2
+    assert 1 + o == o + 1 == fp(" +00.000011e+5  ")
+    assert 1/o == 10
+    assert o + t == t + o == -o
+    assert 2.0 * t == t * 2 == "2" * t == o/o * 2L * t
+    assert 1 - t == -(t - 1) == fp(6L)/5
+    assert t*t == 4*o*o == o*4*o == o*o*4
+    assert fp(2) - "1" == 1
+    assert float(-1/t) == 5.0
+    for p in range(20):
+        assert 42 + fp("1e-20", p) - 42 == 0
+    assert 1/(42 + fp("1e-20", 20) - 42) == fp("100.0E18")
+    o = fp(".9995", 4)
+    assert 1 - o == fp("5e-4", 10)
+    o.set_precision(3)
+    assert o == 1
+    o = fp(".9985", 4)
+    o.set_precision(3)
+    assert o == fp(".998", 10)
+    assert o == o.frac()
+    o.set_precision(100)
+    assert o == fp(".998", 10)
+    o.set_precision(2)
+    assert o == 1
+    x = fp(1.99)
+    assert long(x) == -long(-x) == 1L
+    assert int(x) == -int(-x) == 1
+    assert x == long(x) + x.frac()
+    assert -x == long(-x) + (-x).frac()
+    assert fp(7) % 4 == 7 % fp(4) == 3
+    assert fp(-7) % 4 == -7 % fp(4) == 1
+    assert fp(-7) % -4 == -7 % fp(-4) == -3
+    assert fp(7.0) % "-4.0" == 7 % fp(-4) == -1
+    assert fp("5.5") % fp("1.1") == fp("5.5e100") % fp("1.1e100") == 0
+    assert divmod(fp("1e100"), 3) == (long(fp("1e100")/3), 1)
+
+if __name__ == '__main__':
+    _test()
+
--- a/python-modules-pre24/optparse.py
+++ b/python-modules-pre24/optparse.py
--- a/python-modules-pre24/textwrap.py
+++ b/python-modules-pre24/textwrap.py
@ -0,0 +1,374 @@
+"""Text wrapping and filling.
+"""
+
+# Copyright (C) 1999-2001 Gregory P. Ward.
+# Copyright (C) 2002, 2003 Python Software Foundation.
+# Written by Greg Ward <gward@python.net>
+
+__revision__ = "$Id: textwrap.py 46863 2006-06-11 19:42:51Z tim.peters $"
+
+import string, re
+
+# Do the right thing with boolean values for all known Python versions
+# (so this module can be copied to projects that don't depend on Python
+# 2.3, e.g. Optik and Docutils).
+try:
+    True, False
+except NameError:
+    (True, False) = (1, 0)
+
+__all__ = ['TextWrapper', 'wrap', 'fill']
+
+# Hardcode the recognized whitespace characters to the US-ASCII
+# whitespace characters.  The main reason for doing this is that in
+# ISO-8859-1, 0xa0 is non-breaking whitespace, so in certain locales
+# that character winds up in string.whitespace.  Respecting
+# string.whitespace in those cases would 1) make textwrap treat 0xa0 the
+# same as any other whitespace char, which is clearly wrong (it's a
+# *non-breaking* space), 2) possibly cause problems with Unicode,
+# since 0xa0 is not in range(128).
+_whitespace = '\t\n\x0b\x0c\r '
+
+class TextWrapper:
+    """
+    Object for wrapping/filling text.  The public interface consists of
+    the wrap() and fill() methods; the other methods are just there for
+    subclasses to override in order to tweak the default behaviour.
+    If you want to completely replace the main wrapping algorithm,
+    you'll probably have to override _wrap_chunks().
+
+    Several instance attributes control various aspects of wrapping:
+      width (default: 70)
+        the maximum width of wrapped lines (unless break_long_words
+        is false)
+      initial_indent (default: "")
+        string that will be prepended to the first line of wrapped
+        output.  Counts towards the line's width.
+      subsequent_indent (default: "")
+        string that will be prepended to all lines save the first
+        of wrapped output; also counts towards each line's width.
+      expand_tabs (default: true)
+        Expand tabs in input text to spaces before further processing.
+        Each tab will become 1 .. 8 spaces, depending on its position in
+        its line.  If false, each tab is treated as a single character.
+      replace_whitespace (default: true)
+        Replace all whitespace characters in the input text by spaces
+        after tab expansion.  Note that if expand_tabs is false and
+        replace_whitespace is true, every tab will be converted to a
+        single space!
+      fix_sentence_endings (default: false)
+        Ensure that sentence-ending punctuation is always followed
+        by two spaces.  Off by default because the algorithm is
+        (unavoidably) imperfect.
+      break_long_words (default: true)
+        Break words longer than 'width'.  If false, those words will not
+        be broken, and some lines might be longer than 'width'.
+    """
+
+    whitespace_trans = string.maketrans(_whitespace, ' ' * len(_whitespace))
+
+    unicode_whitespace_trans = {}
+    uspace = ord(u' ')
+    for x in map(ord, _whitespace):
+        unicode_whitespace_trans[x] = uspace
+
+    # This funky little regex is just the trick for splitting
+    # text up into word-wrappable chunks.  E.g.
+    #   "Hello there -- you goof-ball, use the -b option!"
+    # splits into
+    #   Hello/ /there/ /--/ /you/ /goof-/ball,/ /use/ /the/ /-b/ /option!
+    # (after stripping out empty strings).
+    wordsep_re = re.compile(
+        r'(\s+|'                                  # any whitespace
+        r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|'   # hyphenated words
+        r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))')   # em-dash
+
+    # XXX this is not locale- or charset-aware -- string.lowercase
+    # is US-ASCII only (and therefore English-only)
+    sentence_end_re = re.compile(r'[%s]'              # lowercase letter
+                                 r'[\.\!\?]'          # sentence-ending punct.
+                                 r'[\"\']?'           # optional end-of-quote
+                                 % string.lowercase)
+
+
+    def __init__(self,
+                 width=70,
+                 initial_indent="",
+                 subsequent_indent="",
+                 expand_tabs=True,
+                 replace_whitespace=True,
+                 fix_sentence_endings=False,
+                 break_long_words=True):
+        self.width = width
+        self.initial_indent = initial_indent
+        self.subsequent_indent = subsequent_indent
+        self.expand_tabs = expand_tabs
+        self.replace_whitespace = replace_whitespace
+        self.fix_sentence_endings = fix_sentence_endings
+        self.break_long_words = break_long_words
+
+
+    # -- Private methods -----------------------------------------------
+    # (possibly useful for subclasses to override)
+
+    def _munge_whitespace(self, text):
+        """_munge_whitespace(text : string) -> string
+
+        Munge whitespace in text: expand tabs and convert all other
+        whitespace characters to spaces.  Eg. " foo\tbar\n\nbaz"
+        becomes " foo    bar  baz".
+        """
+        if self.expand_tabs:
+            text = text.expandtabs()
+        if self.replace_whitespace:
+            if isinstance(text, str):
+                text = text.translate(self.whitespace_trans)
+            elif isinstance(text, unicode):
+                text = text.translate(self.unicode_whitespace_trans)
+        return text
+
+
+    def _split(self, text):
+        """_split(text : string) -> [string]
+
+        Split the text to wrap into indivisible chunks.  Chunks are
+        not quite the same as words; see wrap_chunks() for full
+        details.  As an example, the text
+          Look, goof-ball -- use the -b option!
+        breaks into the following chunks:
+          'Look,', ' ', 'goof-', 'ball', ' ', '--', ' ',
+          'use', ' ', 'the', ' ', '-b', ' ', 'option!'
+        """
+        chunks = self.wordsep_re.split(text)
+        chunks = filter(None, chunks)
+        return chunks
+
+    def _fix_sentence_endings(self, chunks):
+        """_fix_sentence_endings(chunks : [string])
+
+        Correct for sentence endings buried in 'chunks'.  Eg. when the
+        original text contains "... foo.\nBar ...", munge_whitespace()
+        and split() will convert that to [..., "foo.", " ", "Bar", ...]
+        which has one too few spaces; this method simply changes the one
+        space to two.
+        """
+        i = 0
+        pat = self.sentence_end_re
+        while i < len(chunks)-1:
+            if chunks[i+1] == " " and pat.search(chunks[i]):
+                chunks[i+1] = "  "
+                i += 2
+            else:
+                i += 1
+
+    def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
+        """_handle_long_word(chunks : [string],
+                             cur_line : [string],
+                             cur_len : int, width : int)
+
+        Handle a chunk of text (most likely a word, not whitespace) that
+        is too long to fit in any line.
+        """
+        space_left = max(width - cur_len, 1)
+
+        # If we're allowed to break long words, then do so: put as much
+        # of the next chunk onto the current line as will fit.
+        if self.break_long_words:
+            cur_line.append(reversed_chunks[-1][:space_left])
+            reversed_chunks[-1] = reversed_chunks[-1][space_left:]
+
+        # Otherwise, we have to preserve the long word intact.  Only add
+        # it to the current line if there's nothing already there --
+        # that minimizes how much we violate the width constraint.
+        elif not cur_line:
+            cur_line.append(reversed_chunks.pop())
+
+        # If we're not allowed to break long words, and there's already
+        # text on the current line, do nothing.  Next time through the
+        # main loop of _wrap_chunks(), we'll wind up here again, but
+        # cur_len will be zero, so the next line will be entirely
+        # devoted to the long word that we can't handle right now.
+
+    def _wrap_chunks(self, chunks):
+        """_wrap_chunks(chunks : [string]) -> [string]
+
+        Wrap a sequence of text chunks and return a list of lines of
+        length 'self.width' or less.  (If 'break_long_words' is false,
+        some lines may be longer than this.)  Chunks correspond roughly
+        to words and the whitespace between them: each chunk is
+        indivisible (modulo 'break_long_words'), but a line break can
+        come between any two chunks.  Chunks should not have internal
+        whitespace; ie. a chunk is either all whitespace or a "word".
+        Whitespace chunks will be removed from the beginning and end of
+        lines, but apart from that whitespace is preserved.
+        """
+        lines = []
+        if self.width <= 0:
+            raise ValueError("invalid width %r (must be > 0)" % self.width)
+
+        # Arrange in reverse order so items can be efficiently popped
+        # from a stack of chucks.
+        chunks.reverse()
+
+        while chunks:
+
+            # Start the list of chunks that will make up the current line.
+            # cur_len is just the length of all the chunks in cur_line.
+            cur_line = []
+            cur_len = 0
+
+            # Figure out which static string will prefix this line.
+            if lines:
+                indent = self.subsequent_indent
+            else:
+                indent = self.initial_indent
+
+            # Maximum width for this line.
+            width = self.width - len(indent)
+
+            # First chunk on line is whitespace -- drop it, unless this
+            # is the very beginning of the text (ie. no lines started yet).
+            if chunks[-1].strip() == '' and lines:
+                del chunks[-1]
+
+            while chunks:
+                l = len(chunks[-1])
+
+                # Can at least squeeze this chunk onto the current line.
+                if cur_len + l <= width:
+                    cur_line.append(chunks.pop())
+                    cur_len += l
+
+                # Nope, this line is full.
+                else:
+                    break
+
+            # The current line is full, and the next chunk is too big to
+            # fit on *any* line (not just this one).
+            if chunks and len(chunks[-1]) > width:
+                self._handle_long_word(chunks, cur_line, cur_len, width)
+
+            # If the last chunk on this line is all whitespace, drop it.
+            if cur_line and cur_line[-1].strip() == '':
+                del cur_line[-1]
+
+            # Convert current line back to a string and store it in list
+            # of all lines (return value).
+            if cur_line:
+                lines.append(indent + ''.join(cur_line))
+
+        return lines
+
+
+    # -- Public interface ----------------------------------------------
+
+    def wrap(self, text):
+        """wrap(text : string) -> [string]
+
+        Reformat the single paragraph in 'text' so it fits in lines of
+        no more than 'self.width' columns, and return a list of wrapped
+        lines.  Tabs in 'text' are expanded with string.expandtabs(),
+        and all other whitespace characters (including newline) are
+        converted to space.
+        """
+        text = self._munge_whitespace(text)
+        chunks = self._split(text)
+        if self.fix_sentence_endings:
+            self._fix_sentence_endings(chunks)
+        return self._wrap_chunks(chunks)
+
+    def fill(self, text):
+        """fill(text : string) -> string
+
+        Reformat the single paragraph in 'text' to fit in lines of no
+        more than 'self.width' columns, and return a new string
+        containing the entire wrapped paragraph.
+        """
+        return "\n".join(self.wrap(text))
+
+
+# -- Convenience interface ---------------------------------------------
+
+def wrap(text, width=70, **kwargs):
+    """Wrap a single paragraph of text, returning a list of wrapped lines.
+
+    Reformat the single paragraph in 'text' so it fits in lines of no
+    more than 'width' columns, and return a list of wrapped lines.  By
+    default, tabs in 'text' are expanded with string.expandtabs(), and
+    all other whitespace characters (including newline) are converted to
+    space.  See TextWrapper class for available keyword args to customize
+    wrapping behaviour.
+    """
+    w = TextWrapper(width=width, **kwargs)
+    return w.wrap(text)
+
+def fill(text, width=70, **kwargs):
+    """Fill a single paragraph of text, returning a new string.
+
+    Reformat the single paragraph in 'text' to fit in lines of no more
+    than 'width' columns, and return a new string containing the entire
+    wrapped paragraph.  As with wrap(), tabs are expanded and other
+    whitespace characters converted to space.  See TextWrapper class for
+    available keyword args to customize wrapping behaviour.
+    """
+    w = TextWrapper(width=width, **kwargs)
+    return w.fill(text)
+
+
+# -- Loosely related functionality -------------------------------------
+
+_whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE)
+_leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
+
+def dedent(text):
+    """Remove any common leading whitespace from every line in `text`.
+
+    This can be used to make triple-quoted strings line up with the left
+    edge of the display, while still presenting them in the source code
+    in indented form.
+
+    Note that tabs and spaces are both treated as whitespace, but they
+    are not equal: the lines "  hello" and "\thello" are
+    considered to have no common leading whitespace.  (This behaviour is
+    new in Python 2.5; older versions of this module incorrectly
+    expanded tabs before searching for common leading whitespace.)
+    """
+    # Look for the longest leading string of spaces and tabs common to
+    # all lines.
+    margin = None
+    text = _whitespace_only_re.sub('', text)
+    indents = _leading_whitespace_re.findall(text)
+    for indent in indents:
+        if margin is None:
+            margin = indent
+
+        # Current line more deeply indented than previous winner:
+        # no change (previous winner is still on top).
+        elif indent.startswith(margin):
+            pass
+
+        # Current line consistent with and no deeper than previous winner:
+        # it's the new winner.
+        elif margin.startswith(indent):
+            margin = indent
+
+        # Current line and previous winner have no common whitespace:
+        # there is no margin.
+        else:
+            margin = ""
+            break
+
+    # sanity check (testing/debugging only)
+    if 0 and margin:
+        for line in text.split("\n"):
+            assert not line or line.startswith(margin), \
+                   "line = %r, margin = %r" % (line, margin)
+
+    if margin:
+        text = re.sub(r'(?m)^' + margin, '', text)
+    return text
+
+if __name__ == "__main__":
+    #print dedent("\tfoo\n\tbar")
+    #print dedent("  \thello there\n  \t  how are you?")
+    print dedent("Hello there.\n  This is indented.")
--- a/release-notes.html
+++ b/release-notes.html
@ -18,6 +18,7 @@
 		<li>Collapse adjacent commands of the same type</li>
 		<li>Convert straight curves into line commands</li>
 		<li>Eliminate last segment in a polygon</li>
+		<li>Rework command-line argument parsing</li>
 	</ul>
 </section>

--- a/scour.py
+++ b/scour.py
@ -59,14 +59,17 @@ import xml.dom.minidom
 import re
 import math
 import base64
-import os.path
 import urllib
 from svg_regex import svg_parser
-from decimal import *
 import gzip
+import optparse

-# set precision to 5 decimal places by default
-getcontext().prec = 5
+# Python 2.3- did not have Decimal
+try:
+	from decimal import *
+except ImportError:
+	from fixedpoint import *
+	Decimal = FixedPoint	

 APP = 'scour'
 VER = '0.14'
@ -293,7 +296,7 @@ class Unit(object):
 	MM = 8
 	IN = 9
 	
-	@staticmethod
+#	@staticmethod
 	def get(str):
 		# GZ: shadowing builtins like 'str' is generally bad form
 		# GZ: encoding stuff like this in a dict makes for nicer code
@ -308,6 +311,8 @@ class Unit(object):
 		elif str == 'mm': return Unit.MM
 		elif str == 'in': return Unit.IN
 		return Unit.INVALID
+		
+	get = staticmethod(get)
 	
 class SVGLength(object):
 	def __init__(self, str):
@ -839,7 +844,7 @@ def repairStyle(node, options):
 		
 		# now if any of the properties match known SVG attributes we prefer attributes 
 		# over style so emit them and remove them from the style map
-		if not '--disable-style-to-xml' in options:
+		if options.style_to_xml:
 			for propName in styleMap.keys() :
 				if propName in svgAttributes :
 					node.setAttribute(propName, styleMap[propName])
@ -1307,9 +1312,9 @@ def cleanPath(element) :

 def parseListOfPoints(s):
 	"""
-	Parse string into a list of points.
+		Parse string into a list of points.
 	
-	Returns a list of (x,y) tuples where x and y are strings
+		Returns a list of (x,y) tuples where x and y are strings
 	"""
 	
 	# (wsp)? comma-or-wsp-separated coordinate pairs (wsp)?
@ -1329,7 +1334,7 @@ def parseListOfPoints(s):
 	
 def cleanPolygon(elem):
 	"""
-	Remove unnecessary closing point of polygon points attribute
+		Remove unnecessary closing point of polygon points attribute
 	"""
 	global numPointsRemovedFromPolygon
 	
@ -1347,11 +1352,11 @@ def cleanPolygon(elem):
 	
 def serializePath(pathObj):
 	"""
-	Reserializes the path data with some cleanups:
-		- removes scientific notation (exponents)
-		- removes all trailing zeros after the decimal
-		- removes extraneous whitespace
-		- adds commas between values in a subcommand if required
+		Reserializes the path data with some cleanups:
+			- removes scientific notation (exponents)
+			- removes all trailing zeros after the decimal
+			- removes extraneous whitespace
+			- adds commas between values in a subcommand if required
 	"""
 	pathStr = ""
 	for (cmd,data) in pathObj:
@ -1371,8 +1376,8 @@ def serializePath(pathObj):

 def embedRasters(element) :
 	"""
-	Converts raster references to inline images.
-	NOTE: there are size limits to base64-encoding handling in browsers 
+		Converts raster references to inline images.
+		NOTE: there are size limits to base64-encoding handling in browsers 
 	"""
 	global numRastersEmbedded

@ -1463,7 +1468,10 @@ def properlySizeDoc(docElement):
 # this is the main method
 # input is a string representation of the input XML
 # returns a string representation of the output XML
-def scourString(in_string, options=[]):
+def scourString(in_string, options=None):
+	if options is None:
+		options = _options_parser.get_default_values()
+	getcontext().prec = options.digits
 	global numAttrsRemoved
 	global numStylePropsFixed
 	global numElemsRemoved
@ -1493,7 +1501,7 @@ def scourString(in_string, options=[]):
 	numStylePropsFixed = repairStyle(doc.documentElement, options)

 	# convert colors to #RRGGBB format
-	if not '--disable-simplify-colors' in options:
+	if options.simple_colors:
 		numBytesSavedInColors = convertColors(doc.documentElement)
 	
 	# remove empty defs, metadata, g
@ -1516,14 +1524,14 @@ def scourString(in_string, options=[]):
 	while removeUnreferencedElements(doc) > 0:
 		pass

-	if '--enable-id-stripping' in options:
+	if options.strip_ids:
 		bContinueLooping = True
 		while bContinueLooping:
 			identifiedElements = findElementsWithId(doc.documentElement)
 			referencedIDs = findReferencedElements(doc.documentElement)
 			bContinueLooping = (removeUnreferencedIDs(referencedIDs, identifiedElements) > 0)
 	
-	if not '--disable-group-collapsing' in options:
+	if options.group_collapse:
 		while removeNestedGroups(doc.documentElement) > 0:
 			pass

@ -1571,135 +1579,121 @@ def scourString(in_string, options=[]):
 # used mostly by unit tests
 # input is a filename
 # returns the minidom doc representation of the SVG
-def scourXmlFile(filename, options=[]):
+def scourXmlFile(filename, options=None):
 	in_string = open(filename).read()
-#	print 'IN=',in_string
 	out_string = scourString(in_string, options)
-#	print 'OUT=',out_string
 	return xml.dom.minidom.parseString(out_string.encode('utf-8'))

-def printHeader():
-	print APP , VER
-	print COPYRIGHT
+# GZ: Seems most other commandline tools don't do this, is it really wanted?
+class HeaderedFormatter(optparse.IndentedHelpFormatter):
+	"""
+		Show application name, version number, and copyright statement
+		above usage information.
+	"""
+	def format_usage(self, usage):
+		return "%s %s\n%s\n%s" % (APP, VER, COPYRIGHT,
+			optparse.IndentedHelpFormatter.format_usage(self, usage))

-def printSyntaxAndQuit():
-	printHeader()
-	print 'usage: scour.py [-i input.svg] [-o output.svg] [OPTIONS]\n'
-	print 'If the input/output files are specified with a svgz extension, then compressed SVG is assumed.\n'
-	print 'If the input file is not specified, stdin is used.'
-	print 'If the output file is not specified, stdout is used.'
-	print 'If an option is not available below that means it occurs automatically'
-	print 'when scour is invoked.  Available OPTIONS:\n'
-	print '  --disable-simplify-colors  : Scour will not convert all colors to #RRGGBB format'
-	print '  --disable-style-to-xml     : Scour will not convert style properties into XML attributes'
-	print '  --disable-group-collapsing : Scour will not collapse <g> elements'
-	print '  --enable-id-stripping      : Scour will remove all un-referenced ID attributes'
-	print '  --set-precision N          : Scour will set the number of significant digits (default: 6)'
-	print ''
-	quit()	
+# GZ: would prefer this to be in a function or class scope, but tests etc need
+#     access to the defaults anyway
+_options_parser = optparse.OptionParser(
+	usage="%prog [-i input.svg] [-o output.svg] [OPTIONS]",
+	description=("If the input/output files are specified with a svgz"
+	" extension, then compressed SVG is assumed. If the input file is not"
+	" specified, stdin is used. If the output file is not specified, "
+	" stdout is used."),
+	formatter=HeaderedFormatter(max_help_position=30),
+	version=VER)

-# returns a tuple with:
-# input stream, output stream, a list of options specified on the command-line, 
-# input filename, and output filename
-def parseCLA():
-	args = sys.argv[1:]
+_options_parser.add_option("--disable-simplify-colors",
+	action="store_false", dest="simple_colors", default=True,
+	help="won't convert all colors to #RRGGBB format")
+_options_parser.add_option("--disable-style-to-xml",
+	action="store_false", dest="style_to_xml", default=True,
+	help="won't convert styles into XML attributes")
+_options_parser.add_option("--disable-group-collapsing",
+	action="store_false", dest="group_collapse", default=True,
+	help="won't collapse <g> elements")
+_options_parser.add_option("--enable-id-stripping",
+	action="store_true", dest="strip_ids", default=False,
+	help="remove all un-referenced ID attributes")
+# GZ: this is confusing, most people will be thinking in terms of
+#     decimal places, which is not what decimal precision is doing
+_options_parser.add_option("-p", "--set-precision",
+	action="store", type=int, dest="digits", default=5,
+	help="set number of significant digits (default: %default)")
+_options_parser.add_option("-i",
+	action="store", dest="infilename", help=optparse.SUPPRESS_HELP)
+_options_parser.add_option("-o",
+	action="store", dest="outfilename", help=optparse.SUPPRESS_HELP)

-	# by default the input and output are the standard streams
-	inputfilename = ''
-	outputfilename = ''
-	input = sys.stdin
-	output = sys.stdout
-	options = []
-	validOptions = [
-					'--disable-simplify-colors',
-					'--disable-style-to-xml',
-					'--disable-group-collapsing',
-					'--enable-id-stripping',
-					'--set-precision',
-					]
-					
-	i = 0
-	while i < len(args):
-		arg = args[i]
-		i += 1
-		if arg == '-i' :
-			if i < len(args) :
-				inputfilename = args[i]
-				if args[i][-5:] == '.svgz':
-					input = gzip.open(args[i], 'rb')
-				else:
-					input = open(args[i], 'r')
-				i += 1
-				continue
-			else:
-				printSyntaxAndQuit()
-		elif arg == '-o' :
-			if i < len(args) :
-				outputfilename = args[i]
-				if args[i][-5:] == '.svgz':
-					output = gzip.open(args[i], 'wb')
-				else:
-					output = open(args[i], 'w')
-				i += 1
-				continue
-			else:
-				printSyntaxAndQuit()
-		elif arg == '--set-precision':
-			if i < len(args):
-				getcontext().prec = int(args[i])
-				i += 1
-				continue
-			else:
-				printSyntaxAndQuit()
-		elif arg in validOptions :
-			options.append(arg)
-		else :
-			print 'Error!  Invalid argument:', arg
-			printSyntaxAndQuit()
-			
-	return (input, output, options, inputfilename, outputfilename)
+def maybe_gziped_file(filename, mode="r"):
+	if os.path.splitext(filename)[1].lower() in (".svgz", ".gz"):
+		return gzip.GzipFile(filename, mode)
+	return file(filename, mode)
+
+def parse_args(args=None):
+	options, rargs = _options_parser.parse_args(args)
+
+	if rargs:
+		_options_parser.error("Additional arguments not handled: %r, see --help" % rargs)
+	if options.digits < 0:
+		_options_parser.error("Can't have negative significant digits, see --help")
+	if options.infilename:
+		infile = maybe_gziped_file(options.infilename)
+		# GZ: could catch a raised IOError here and report
+	else:
+		# GZ: could sniff for gzip compression here
+		infile = sys.stdin
+	if options.outfilename:
+		outfile = maybe_gziped_file(options.outfilename, "w")
+	else:
+		outfile = sys.stdout
+
+	return options, [infile, outfile]

 if __name__ == '__main__':
+	if sys.platform == "win32":
+		from time import clock as get_tick
+	else:
+		# GZ: is this different from time.time() in any way?
+		def get_tick():
+			return os.times()[0]

-	startTimes = os.times()
+	start = get_tick()
 	
-	(input, output, options, inputfilename, outputfilename) = parseCLA()
+	options, (input, output) = parse_args()
 	
-	# if we are not sending to stdout, then print out app information
-	bOutputReport = False
-	if output != sys.stdout :
-		bOutputReport = True
-		printHeader()
+	print >>sys.stderr, "%s %s\n%s" % (APP, VER, COPYRIGHT)

 	# do the work
 	in_string = input.read()
-	out_string = scourString(in_string, options)
-	output.write(out_string.encode("utf-8"))
+	out_string = scourString(in_string, options).encode("UTF-8")
+	output.write(out_string)

 	# Close input and output files
 	input.close()
 	output.close()

-	endTimes = os.times()
+	end = get_tick()

-	# output some statistics if we are not using stdout
-	if bOutputReport :
-	    if inputfilename != '': 
-	    	print ' File:', inputfilename
-		print ' Time taken:', str(endTimes[0]-startTimes[0]) + 's'
-		print ' Number of elements removed:', numElemsRemoved
-		print ' Number of attributes removed:', numAttrsRemoved
-		print ' Number of unreferenced id attributes removed:', numIDsRemoved 
-		print ' Number of style properties fixed:', numStylePropsFixed
-		print ' Number of raster images embedded inline:', numRastersEmbedded
-		print ' Number of path segments reduced/removed:', numPathSegmentsReduced
-		print ' Number of curves straightened:', numCurvesStraightened
-		print ' Number of bytes saved in path data:', numBytesSavedInPathData
-		print ' Number of bytes saved in colors:', numBytesSavedInColors
-		print ' Number of points removed from polygons:',numPointsRemovedFromPolygon
-		oldsize = os.path.getsize(inputfilename)
-		newsize = os.path.getsize(outputfilename)
-		sizediff = (newsize / oldsize) * 100;
-		print ' Original file size:', oldsize, 'bytes; new file size:', newsize, 'bytes (' + str(sizediff)[:5] + '%)'
+	# GZ: unless silenced by -q or something?
+	# GZ: not using globals would be good too
+	print >>sys.stderr, ' File:', input.name, \
+		'\n Time taken:', str(end-start) + 's', \
+		'\n Number of elements removed:', numElemsRemoved, \
+		'\n Number of attributes removed:', numAttrsRemoved, \
+		'\n Number of unreferenced id attributes removed:', numIDsRemoved, \
+		'\n Number of style properties fixed:', numStylePropsFixed, \
+		'\n Number of raster images embedded inline:', numRastersEmbedded, \
+		'\n Number of path segments reduced/removed:', numPathSegmentsReduced, \
+		'\n Number of bytes saved in path data:', numBytesSavedInPathData, \
+		'\n Number of bytes saved in colors:', numBytesSavedInColors, \
+		'\n Number of points removed from polygons:',numPointsRemovedFromPolygon
+	oldsize = len(in_string)
+	newsize = len(out_string)
+	sizediff = (newsize / oldsize) * 100
+	print >>sys.stderr, ' Original file size:', oldsize, 'bytes;', \
+		'new file size:', newsize, 'bytes (' + str(sizediff)[:5] + '%)'


--- a/svg_regex.py
+++ b/svg_regex.py
@ -138,7 +138,8 @@ class SVGPathParser(object):
            'a': self.rule_elliptical_arc,
        }

-        self.number_tokens = set(['int', 'float'])
+#        self.number_tokens = set(['int', 'float'])
+        self.number_tokens = list(['int', 'float'])

    def parse(self, text):
        """ Parse a string of SVG <path> data.
--- a/testscour.py
+++ b/testscour.py
@ -156,7 +156,8 @@ class KeepUnreferencedIDsWhenEnabled(unittest.TestCase):
 			
 class RemoveUnreferencedIDsWhenEnabled(unittest.TestCase):
 	def runTest(self):
-		doc = scour.scourXmlFile('unittests/ids-to-strip.svg', ['--enable-id-stripping'])
+		doc = scour.scourXmlFile('unittests/ids-to-strip.svg',
+			scour.parse_args(['--enable-id-stripping'])[0])
 		self.assertEquals(doc.getElementsByTagNameNS(SVGNS, 'svg')[0].getAttribute('id'), '',
 			'<svg> ID not stripped' )

@ -168,7 +169,8 @@ class RemoveUselessNestedGroups(unittest.TestCase):

 class DoNotRemoveUselessNestedGroups(unittest.TestCase):
 	def runTest(self):
-		doc = scour.scourXmlFile('unittests/nested-useless-groups.svg', ['--disable-group-collapsing'])
+		doc = scour.scourXmlFile('unittests/nested-useless-groups.svg',
+			scour.parse_args(['--disable-group-collapsing'])[0])
 		self.assertEquals(len(doc.getElementsByTagNameNS(SVGNS, 'g')), 2,
 			'Useless nested groups were removed despite --disable-group-collapsing' )

@ -388,7 +390,8 @@ class RemoveFillOpacityWhenFillNone(unittest.TestCase):

 class ConvertFillPropertyToAttr(unittest.TestCase):
 	def runTest(self):
-		doc = scour.scourXmlFile('unittests/fill-none.svg', '--disable-simplify-colors')
+		doc = scour.scourXmlFile('unittests/fill-none.svg',
+			scour.parse_args(['--disable-simplify-colors'])[0])
 		self.assertEquals(doc.getElementsByTagNameNS(SVGNS, 'path')[1].getAttribute('fill'), 'black',
 			'fill property not converted to XML attribute' )

--- a/web.py
+++ b/web.py
@ -0,0 +1,45 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+#  Scour Web
+#
+#  Copyright 2009 Jeff Schiller
+#
+#  This file is part of Scour, http://www.codedread.com/scour/
+#
+#   Licensed under the Apache License, Version 2.0 (the "License");
+#   you may not use this file except in compliance with the License.
+#   You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+#   Unless required by applicable law or agreed to in writing, software
+#   distributed under the License is distributed on an "AS IS" BASIS,
+#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   See the License for the specific language governing permissions and
+#   limitations under the License.
+
+from mod_python import apache
+from scour import scourString
+
+def form(req):
+	return """<!DOCTYPE html>
+<html>
+<head>
+	<title>Scour it!</title>
+</head>
+<body>
+<form method="POST" action="fetch">
+	<p>Paste the SVG file here</p>
+	<textarea cols="80" rows="30" name="indoc" id="indoc"></textarea>
+	<p>Click "Go!" to Scour</p><input type="submit" value="Go!></input>
+</form>
+</body>
+</html>
+	"""
+
+def fetch(req,indoc):
+	req.content_type = "image/svg+xml"
+	req.write(scourString(indoc))
+
+
--- a/webscour.py
+++ b/webscour.py
@ -0,0 +1,61 @@
+#!/usr/bin/python2.4
+# -*- coding: utf-8 -*-
+
+#  Scour Web
+#
+#  Copyright 2009 Jeff Schiller
+#
+#  This file is part of Scour, http://www.codedread.com/scour/
+#
+#   Licensed under the Apache License, Version 2.0 (the "License");
+#   you may not use this file except in compliance with the License.
+#   You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+#   Unless required by applicable law or agreed to in writing, software
+#   distributed under the License is distributed on an "AS IS" BASIS,
+#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   See the License for the specific language governing permissions and
+#   limitations under the License.
+
+import cgi
+import cgitb
+cgitb.enable()
+from scour import scourString
+
+def main():
+	# From what I can make out, cgi.FieldStorage() abstracts away whether this is a GET/POST
+	# From http://www.linuxjournal.com/article/3616 it says that POST actually comes in via stdin
+	# and GET comes in QUERY_STRING environment variable.
+	form = cgi.FieldStorage()
+	
+	if not form.has_key('indoc'):
+		doGet()
+	else:
+		doPut(form)
+
+def doPut(form):
+	print "Content-type: image/svg+xml\n"
+	print scourString(form['indoc'].value, None)
+
+def doGet():
+	print "Content-type: text/html\n"
+
+	print """
+<!DOCTYPE html>
+<html>
+<head>
+	<title>Scour it!</title>
+</head>
+<body>
+<form method="POST" action="webscour.py">
+	<p>Paste the SVG file here</p>
+	<textarea cols="100" rows="40" name="indoc" id="indoc"></textarea>
+	<p>Click "Go!" to Scour</p><input type="submit" value="Go!></input>
+</form>
+</body>
+</html>
+	"""
+	
+main()