Commit graph

253 commits

Author SHA1 Message Date
Patrick Storz
c29ecd9304 Run tests with Python 3.7 / 3.8 2020-05-17 15:57:31 +02:00
Eduard Braun
049264eba6 Scour v0.37 2018-07-04 19:16:55 +02:00
Eduard Braun
5ccba31ff9 Update HISTORY.md 2018-07-04 19:05:25 +02:00
Patrick Storz
718748ff22
Merge pull request #199 from Ede123/newline_handling
Several improvements for handling whitespace including newlines, especially in text nodes
2018-07-03 22:56:36 +02:00
Eduard Braun
651694a6c0 Add unittests for whitespace handling in text node
Also expand/fix the test for line endings
2018-07-03 22:53:05 +02:00
Eduard Braun
703122369e Strip newlines from text nodes and be done with it
Follow the spec "blindly" as it turns out covering all the border
and getting reasonably styled output is just to cumbersome.
This way at least scour output is consistent and it also saves us
some bytes (a lot in some cases as we do not indent <tspan>s etc.
anymore)
2018-07-02 22:14:14 +02:00
Eduard Braun
2200f8dc81 temp 2018-07-02 01:05:54 +02:00
Eduard Braun
e1c2699f07 Improve whitespace handling in text content elements
SVG specifies special logic for handling whitespace, see
   https://www.w3.org/TR/SVG/text.html#WhiteSpace
by implementing it we can even shave off some unneeded bytes here
and there (e.g. consecutive spaces).

Unfortunately handling of newlines by renderers is inconsistent:
Sometimes they are replaced by a single space, sometimes they
are removed in the output.
As we can not know the expected behavior work around this by keeping
newlines inside text content elements intact.

Fixes #160.
2018-07-01 20:19:58 +02:00
Eduard Braun
7d28f5e051 Improve handling of newlines
Previously we added way to many and removed empty lines afterwards
(potentially destructive if xml:space="preserve")

Also adds proper indentation for comment nodes
2018-07-01 19:48:18 +02:00
Eduard Braun
06ea23d0e1 fix typo 2018-07-01 13:52:51 +02:00
Patrick Storz
8c95d950af
Merge pull request #192 from nthykier/gh-189-order-vs-SVGLength
Work around an exception in removeDefaultAttributeValue() caused by some rarely used filter attributes that allow an optional second value which SVGLength does not handle properly
2018-06-30 19:03:15 +02:00
Patrick Storz
5d579f8927
Also special-case baseFrequency and add 'radius 2018-06-30 18:58:36 +02:00
Eduard Braun
3c64623a12 Discontinue official support for Python 3.3
(testing failed due to wheel now requiring Python >= 3.4)

Also run flake8 in latest Python 3.6
(3.7 is not supported on Travis yet)
2018-06-29 19:29:09 +02:00
Patrick Storz
9f4a707bb7
Merge pull request #178 from nthykier/gh-163-path-rewrite
Correct handling of "m0 0" vs. "z" commands
2018-06-29 19:11:53 +02:00
Niels Thykier
8a2892b458 Avoid crashing on stdDeviation attribute
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-21 06:39:08 +00:00
Niels Thykier
c504891bd7 test: Use number-optional-number variant of kernelUnitLength
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-21 06:19:38 +00:00
Tobias Oberstein
47f918e696
Merge pull request #191 from nthykier/gh-190-optimizeTransform-IndexError
Avoid crashing on "scale(1)" (short for "scale(1, 1)")
2018-04-18 19:25:48 +02:00
Niels Thykier
18e57cddae Avoid crashing on "scale(1)" (short for "scale(1, 1)")
The scale function on the transform attribute has a short form, where
only the first argument is used.  But optimizeTransform would always
assume that there were two when checking for the identity scale.

Closes: #190
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-18 05:41:35 +00:00
Niels Thykier
a459d629c1 removeDefaultAttributeValue: Special-case order attribute
Scour tried to handle "order" attribute as a SVGLength.  However, the
"order" attribute *can* consist of two integers according to the
[SVG 1.1 Specification] and SVGLength is not designed to handle that.

With this change, we now pretend that "order" is a string, which side
steps this issue.

[SVG 1.1 Specification]: https://www.w3.org/TR/SVG11/single-page.html#filters-feConvolveMatrixElementOrderAttribute

Closes: #189
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-17 19:48:37 +00:00
Eduard Braun
8ddb7d8913 Add valid elements for 'spreadMethod' attribute
Turns out 'default_attributes_universal' is actually empty right now
so we might consider removing it altogether...
2018-04-15 18:40:06 +02:00
Eduard Braun
0ec0732447 Simplify 'default_attributes' handling a bit 2018-04-15 18:33:46 +02:00
Eduard Braun
20dcbcbe64 'default_attributes': make sure 'elements' is a list 2018-04-15 18:31:51 +02:00
Niels Thykier
1650f91ea4 Optimize removeDefaultAttributeValues
Avoid looping over DefaultAttribute(s) that are not relevant for a
given node.  This skips a lot of calls to removeDefaultAttributeValue
but more importantly, it avoids "node.nodeName not in attribute.elements"
line in removeDefaultAttributeValue.  As attribute.elements is a list, this
becomes expensive for "larger lists" (or in this case when there are a lot
of attributes).

This seems to remove about 1½-2 minutes of runtime (out of ~8) on the
1_42_polytope_7-cube.svg test case provided in #184.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-15 18:29:58 +02:00
Niels Thykier
5dc1b7a820 scour: Make optimized default_attribute data structures
There are a lot of "DefaultAttribute"s and for a given tag, most of
the "DefaultAttribute"s are not applicable.  Therefore, we create two
data structures to assist us with only dealing with the attributes
that matter.

Here there are two cases:

 * Those that always matter.  These go into
   default_attributes_unrestricted list.
 * Those that matter only based on the node name.  These go into the
   default_attributes_restricted_by_tag with the node name as key
   (with the value being a list of matching attributes).

In the next commit, we will use those for optimizing the removal of
default attributes.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-15 18:29:58 +02:00
Eduard Braun
3283d6d5ec Simplify control point detection logic
- make controlPoints() return a consistent type like flags()
- rename the ambiguous "reduce_precision" to "is_control_point"
2018-04-08 16:48:33 +02:00
Eduard Braun
103dcc0a48
Fix handling of boolean flags in elliptical path commands (#183)
* properly parse paths without space after boolean flags (fixes #161)
* omit space after boolean flag to shave off a few bytes when not using renderer workarounds
2018-04-08 15:32:47 +02:00
Niels Thykier
ba7f4b5f18 Remove more redundant uses of .keys()
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-26 22:36:19 +02:00
Niels Thykier
f8d5af0e56 Remove now unused variable
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-26 22:36:19 +02:00
Eduard Braun
d508f59aa6 Completely remove "walltime" variable and use time.time() directly 2018-03-26 22:34:11 +02:00
Niels Thykier
b622642aa1 Simplify timer selection to always use time.time() (#175)
In python2.7 and python3.3, time.time() is sufficient accurate for our
purpose and avoids going through hoops to select the best available
function.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-26 22:30:25 +02:00
Niels Thykier
38274f75bc Implement a basic rewrite of redundant commands
This basic implementation can drop and rewrite some cases of "m0 0"
and "z" without triggering the issues experienced in #163.  It works
by analysing the path backwards and tracking "z" and "m" commands.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-11 08:33:50 +00:00
Niels Thykier
a2c94c96fb Disable the "m0 0"-optimization as it is wrong in some cases
The "m0 0" rewrite gets some cases wrong, like:

         m150 240h200m0 0 150 150v-300z

Scour rewrote that into the following
         m150 240h200l150 150v-300z

However, these two paths do not produce an identical figure at all.
The first is a line followed by a triangle while the second is a
quadrilateral.

While there are some instances we can rewrite (that scour will no
longer rewrite), these will require an analysis over multiple commands
to determine whether the rewrite is safe.  This will reappear in the
next commit.

Closes: #163
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-11 08:25:46 +00:00
Niels Thykier
6ea126d290 Gracefully handle unreferenced gradients with --keep-unreferenced-defs (#173)
Closes: #156
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-10 16:06:50 +01:00
Niels Thykier
cae0faefa0 Avoid O(n²) in removeDuplicateGradient (#171)
The original implementation of removeDuplicateGradient does O(n²)
search over all gradients to remove duplicates.  In images with many
gradients (such as [MediaWiki_logo_1.svg]), this becomes a significant
overhead as that logo has over 900 duplicated gradients.

We solve this by creating a key for each gradient based on the
attributes we use for duplication detection.  This key is generated
such that if two gradients have the same key, they are duplicates (for
our purpose) and the keys are different then the gradients are
guaranteed to be different as well.  With such a key, we can rely on a
dict to handle the duplication detection (which it does very well).

This change improves the runtime performance on [MediaWiki_logo_1.svg]
by about 25% (8m51s -> 1m56s on 5 runs).

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-10 15:47:29 +01:00
Niels Thykier
7115e82cb8 Fix typo in HISTORY.md 2018-03-10 13:36:49 +01:00
Niels Thykier
0776d32179 Remove an unnecessary loop
The unprotected_ids function returns all unprotected ids and
removeUnreferencedIDs removes all of them that does not appear in the
return value of findReferencedElements.

On closer observation it turns out that removeUnreferencedIDs cannot
cause nodes/IDs to become unprotected nor unreferenced (as it only
remove the "id" attribute, not the node).  With this in mind, we can
just remove the loop and save a call to all of these functions.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-20 19:46:49 +01:00
Eduard Braun
5360db86d9 Fix the last instance of "list(dict.keys())" 2018-02-17 15:59:18 +01:00
Niels Thykier
2f0b3ea362 Rewrite redundant codepatterns introduced by py2 -> py3 conversion
The automated python2 -> python3 converter creates some suboptimal
code patterns in some cases, notably in its handling of dicts.

This commit handles the following cases:

  * "if x in list(y.keys()):" => "if x in y:"

    The original code is neuters the O(1) lookup effeciency of a dict
    by turning it into a list.  This occurs a O(n) in converting it to
    a list and then another O(n) for the lookup.  When done in a loop,
    this becomes O(n * m) rather than the optimal O(m).

  * "for x in list(y.keys()):" => "for x in y:" OR "for x in list(y):"

    A dict (y in these cases) operates as an iterator over keys in the
    dict by default.  This makes the entire "list(y.keys())" dance
    redundant _in most cases_.  In a some cases, scour modifies the
    dict while iterating over it and in those cases, we need a
    "list(y)" (but not a "y.keys()").

    The benefit of this differs between python2 and python3.  In
    python3, we basically "only" avoid function call.  In python2,
    y.keys() generates a list, so here we avoid generating a
    "throw-away list".

The test suite succeed both with "python testscour.py" and "python3
testscour.py" (used 2.7.14+ and 3.6.4 from Debian testing).

On a 341kB flame-graph generated by "nytprof" (a perl profiler), this
commit changes the runtimes of scour from the range 3.39s - 3.45s to
3.27s - 3.35s making it roughly 3% faster in this case (YMMV,
particularly with different input).  The timings were recorded using
the following command line:

  time PYTHONPATH=. python3 -m scour.scour --enable-id-stripping  \
     --shorten-ids --indent=none  --enable-comment-stripping
     -i input.svg -o output.svg

This was used 5 times with and 5 times without the patch picking the
worst and best time to define the range.  The runtime test was only
preformed on python3.

All changed lines where found with:
  grep -rE ' in list[(].*[.]keys[(][)][)]:'

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 15:55:40 +01:00
Eduard Braun
cb093e9171 Update docstring of findReferencedElements
(to match changes in c54a7239e7)
2018-02-17 14:27:19 +01:00
Niels Thykier
c54a7239e7 Simplify the "ids" structure returned by findReferencedElements
It was a dict with a two element list a la:

  {
    "id1": [len(nodeListX), nodeListX]],
    "id2": [len(nodeListY), nodeListY]],
    ...
  }

This can trivially be simplified to:

  {
    "id1": nodeListX,
    "id2": nodeListY,
    ...
  }

The two call-sites that actually needs the length (e.g. to sort by how
often the id is used) can trivially compute that via a call to "len".

All other call sites either just need to tell if an ID is used at all
or work the nodes referencing the id (e.g. to remap the id).  The
former are unaffected by this change and the latter can now avoid a
layer of indirection.

This refactoring has negiable changes to the runtime and probably also
to memory (not tested, but it is a minor constant improvement per
referenced id).

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 14:11:51 +01:00
Niels Thykier
b916a189e9 Avoid recomputing findReferencedElements in removeUnusedDefs
The removeUnusedDefs function does not actually remove anything (that
is left for its callers to do).  This implies that
findReferencedElements will return the same value before, during and
after a call to removeUnusedDefs.  Therefore, we can reuse the value
from findReferencedElements when recursing into child nodes.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 14:11:51 +01:00
Niels Thykier
633b381d87 findReferencedElements: Handle referencingProps separately
Split the handling of referencingProps into a separate loop that calls
findReferencingProperty directly.  This saves a bunch of "make list,
join list, append to another list and eventually split text into two
elements" operations.

This gives approximately 10% faster runtimes on 341 kB flamegraph
generated by the "nytprof" Perl profiler.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 14:11:51 +01:00
Eduard Braun
7249ae8b0a user clearer variable names for indent type and indent depth 2018-02-17 11:59:35 +01:00
Niels Thykier
843706be39 Catch specific exception rather than anything
The bare "except" also catches exceptions like "NameError" and
"SystemExit", which we really should not catch.  In scour.py, use the
most specific exception (NotFoundErr) and in the tests just catch any
"regular" exception.

Reported by flake8.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 11:47:36 +01:00
Niels Thykier
f3d8936b5e Rename "I" to "line_prefix" to avoid flake8 E741
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 11:47:36 +01:00
Michael Witten
b20a0698cc tests: Add unit tests for preservation of quotes in CSS styles
These tests will ensure that issues #21 and #56 do not return.
2017-09-03 18:07:27 +02:00
Michael Witten
0a146b7fef tests: Add unit tests for the escaping of quote characters in attribute values 2017-09-03 18:07:27 +02:00
Michael Witten
7e14cd352f scour.py: Escape quote characters in attribute values, as necessary and minimally
Either double quotes or single quotes are escaped; the choice is made
so as to minimize the length of the escaped string.
2017-09-03 18:07:27 +02:00
Michael Witten
f14784b01f scour.py: handle id' and xml:id' in the same code that handles other attributes 2017-08-26 19:11:37 +00:00
Michael Witten
fef2786c5e scour.py: minor rearrangement for the sake of clarity
There has been a minor rearrangement of the code that handles the children
of the element being serialized: The relevant `if' statement has had its
condition effectively negated and thus has also had its consequent and
alternative swapped; now, there is a very short consequent, followed by a
very long alternative, rather than a very long consequent followed by a
very short alternative.
2017-08-26 19:11:37 +00:00