Rewrite the code for ordering attributes in the output and extract it
into a function. As a side-effect, we ensure we only use the
`.item(index)` method once per attribute because it is inefficient
(see https://bugs.python.org/issue40689).
Signed-off-by: Niels Thykier <niels@thykier.net>
There is no need to create a list of it only to discard it after a
single use with join (which gladly accepts an iterator/generator
instead).
Signed-off-by: Niels Thykier <niels@thykier.net>
This rename makes py.test/py.test-3 find the test suite out of the
box. Example command lines:
# Running the test suite (optionally include "-v")
$ py.test-3
# Running the test suite with coverage enabled (and branch
# coverage).
$ py.test-3 --cov=scour --cov-report=html --cov-branch
Signed-off-by: Niels Thykier <niels@thykier.net>
In some cases, gnuplot generates a very suboptimal SVG content of the
following pattern:
<g color="black" fill="none" stroke="currentColor">
<path d="m82.5 323.3v-4.1" stroke="#000"/>
</g>
<g color="black" fill="none" stroke="currentColor">
<path d="m116.4 323.3v-4.1" stroke="#000"/>
</g>
... repeated 10+ more times here ...
<g color="black" fill="none" stroke="currentColor">
<path d="m65.4 72.8v250.5h420v-250.5h-420z" stroke="#000"/>
</g>
A more optimal pattern would be:
<g color="black" fill="none" stroke="#000">
<path d="m82.5 323.3v-4.1"/>
<path d="m116.4 323.3v-4.1"/>
... 10+ more paths here ...
<path d="m65.4 72.8v250.5h420v-250.5h-420z"/>
</g>
This patch enables that optimization by handling the merging of two
sibling <g> entries that have identical attributes. In the above
example that does not solve the rewrite from "currentColor" to "#000"
for the stroke attribute. However, the existing code already handles
that automatically after the <g> elements have been merged.
This change provides comparable results to --create-groups as shown by
the following diagram while being a distinct optimization:
+----------------------------+-------+--------+
| Test | Size | in % |
+----------------------------+-------+--------+
| baseline | 17961 | 100% |
| baseline + --create-groups | 17418 | 97.0% |
| patched | 16939 | 94.3% |
| patched + --create-groups | 16855 | 93.8% |
+----------------------------+-------+--------+
The image used in the size table above was generated based on the
instructions from https://bugs.debian.org/858039#10 with gnuplot 5.2
patchlevel 2. Beyond the test-based "--create-groups", the following
scour command-line parameters were used:
--enable-id-stripping --enable-comment-stripping \
--shorten-ids --indent=none
Note that the baseline was scour'ed repeatedly to stablize the image
size.
Signed-off-by: Niels Thykier <niels@thykier.net>
Follow the spec "blindly" as it turns out covering all the border
and getting reasonably styled output is just to cumbersome.
This way at least scour output is consistent and it also saves us
some bytes (a lot in some cases as we do not indent <tspan>s etc.
anymore)
SVG specifies special logic for handling whitespace, see
https://www.w3.org/TR/SVG/text.html#WhiteSpace
by implementing it we can even shave off some unneeded bytes here
and there (e.g. consecutive spaces).
Unfortunately handling of newlines by renderers is inconsistent:
Sometimes they are replaced by a single space, sometimes they
are removed in the output.
As we can not know the expected behavior work around this by keeping
newlines inside text content elements intact.
Fixes#160.
Previously we added way to many and removed empty lines afterwards
(potentially destructive if xml:space="preserve")
Also adds proper indentation for comment nodes
Work around an exception in removeDefaultAttributeValue() caused by some rarely used filter attributes that allow an optional second value which SVGLength does not handle properly
The scale function on the transform attribute has a short form, where
only the first argument is used. But optimizeTransform would always
assume that there were two when checking for the identity scale.
Closes: #190
Signed-off-by: Niels Thykier <niels@thykier.net>
Scour tried to handle "order" attribute as a SVGLength. However, the
"order" attribute *can* consist of two integers according to the
[SVG 1.1 Specification] and SVGLength is not designed to handle that.
With this change, we now pretend that "order" is a string, which side
steps this issue.
[SVG 1.1 Specification]: https://www.w3.org/TR/SVG11/single-page.html#filters-feConvolveMatrixElementOrderAttributeCloses: #189
Signed-off-by: Niels Thykier <niels@thykier.net>
This change pushes the responsibility of updating referencedIDs to its
callers where needed. The only caller of renameIDs is shortenIDs and
that works perfectly fine without updating its copy of referencedIDs.
In shortenIDs, we need to be able to lookup which nodes referenced the
"original ID" (and not the "new ID").
While shortenIDs *could* update referencedIDs so it remained valid, it
is extra complexity for no gain. As an example of this complexity,
imagine if two or more IDs are "rotated" like so:
Original IDs: a, bb, ccc, dddd
Mapping:
dddd -> ccc
ccc -> bb
bb -> a
a -> dddd
While doable within reasonable performance, we do not need to support
it at the moment, so there is no reason to handle that complexity.
Signed-off-by: Niels Thykier <niels@thykier.net>
With the current code, scour could do a pointless remap of an ID,
where there is no benefit in it. Consider:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<rect id="a" width="80" height="50" fill="red"/>
<rect id="b" width="80" height="50" fill="blue"/>
</defs>
<use xlink:href="#a"/>
<use xlink:href="#b"/>
<use xlink:href="#b"/>
</svg>
```
In this example, there is no point in swapping the IDs - even if "#b"
is used more often than "#a", they have the same length. Besides a
performance win on an already scour'ed image, it also mean scour will
behave like a function with a fixed-point (i.e. scour eventually stops
altering the image).
To solve this, we no longer check whether an we find exactly the same
ID. Instead, we look at the length of the new ID compared to the
original. This gives us a slight complication as we can now "reserve"
a "future" ID to avoid the rename.
Thanks to Eduard "Ede_123" Braun for providing the test case.
Signed-off-by: Niels Thykier <niels@thykier.net>
Avoid looping over DefaultAttribute(s) that are not relevant for a
given node. This skips a lot of calls to removeDefaultAttributeValue
but more importantly, it avoids "node.nodeName not in attribute.elements"
line in removeDefaultAttributeValue. As attribute.elements is a list, this
becomes expensive for "larger lists" (or in this case when there are a lot
of attributes).
This seems to remove about 1½-2 minutes of runtime (out of ~8) on the
1_42_polytope_7-cube.svg test case provided in #184.
Signed-off-by: Niels Thykier <niels@thykier.net>
There are a lot of "DefaultAttribute"s and for a given tag, most of
the "DefaultAttribute"s are not applicable. Therefore, we create two
data structures to assist us with only dealing with the attributes
that matter.
Here there are two cases:
* Those that always matter. These go into
default_attributes_unrestricted list.
* Those that matter only based on the node name. These go into the
default_attributes_restricted_by_tag with the node name as key
(with the value being a list of matching attributes).
In the next commit, we will use those for optimizing the removal of
default attributes.
Signed-off-by: Niels Thykier <niels@thykier.net>
This patch enables shortenIDs to remap IDs currently in use. This is
very helpful to ensure that scour does not change been "optimal" and
"suboptimal" choices for IDs as observed in GH#186.
Closes: #186
Signed-off-by: Niels Thykier <niels@thykier.net>
* properly parse paths without space after boolean flags (fixes#161)
* omit space after boolean flag to shave off a few bytes when not using renderer workarounds
In python2.7 and python3.3, time.time() is sufficient accurate for our
purpose and avoids going through hoops to select the best available
function.
Signed-off-by: Niels Thykier <niels@thykier.net>
This basic implementation can drop and rewrite some cases of "m0 0"
and "z" without triggering the issues experienced in #163. It works
by analysing the path backwards and tracking "z" and "m" commands.
Signed-off-by: Niels Thykier <niels@thykier.net>