Preserve prefix of attribute names when copying them over to the new
node. This fixes an unintentional rewrite of `xml:space` to `space`
that also caused scour to strip whitespace that should have been
preserved.
Closes: #239
Signed-off-by: Niels Thykier <niels@thykier.net>
This avoids calling `findReferencedElements` more than once per
removeDuplicateGradients. This is good for performance as
`findReferencedElements` is one of the slowest functions in scour.
Signed-off-by: Niels Thykier <niels@thykier.net>
Except for one caller, nothing cares what kind of collection is used.
By migrating to a set, we can enable a future rewrite.
Signed-off-by: Niels Thykier <niels@thykier.net>
Regex compilation is by far the most expensive part of
removeDuplicateGradients. This commit reduces the pain a bit by
trading "many small regexes" to "few larger regexes", which avoid some
of the compilation overhead.
Signed-off-by: Niels Thykier <niels@thykier.net>
This commit is mostly to enable the following commit to make
improvements. It does reduce the number of duplicate getAttribute
calls by a tiny bit but it is unlikely to matter in practice.
Signed-off-by: Niels Thykier <niels@thykier.net>
_getStyle accounted for ~8.9% (~17700) of all calls to getAttribute on
devices/hidef/secure-card.svgz file from the Oxygen icon theme. This
commit removes this part of the dead weight.
Signed-off-by: Niels Thykier <niels@thykier.net>
The `removeUnusedDefs` function can take `referencedIDs` as parameter
and its work do not invalidate it. By moving it up in
`removeUnreferencedElements` we can save a call to
`findReferencedElements` per call to `removeUnreferencedElements`.
Signed-off-by: Niels Thykier <niels@thykier.net>
Rewrite the code for ordering attributes in the output and extract it
into a function. As a side-effect, we ensure we only use the
`.item(index)` method once per attribute because it is inefficient
(see https://bugs.python.org/issue40689).
Signed-off-by: Niels Thykier <niels@thykier.net>
There is no need to create a list of it only to discard it after a
single use with join (which gladly accepts an iterator/generator
instead).
Signed-off-by: Niels Thykier <niels@thykier.net>
This rename makes py.test/py.test-3 find the test suite out of the
box. Example command lines:
# Running the test suite (optionally include "-v")
$ py.test-3
# Running the test suite with coverage enabled (and branch
# coverage).
$ py.test-3 --cov=scour --cov-report=html --cov-branch
Signed-off-by: Niels Thykier <niels@thykier.net>
In some cases, gnuplot generates a very suboptimal SVG content of the
following pattern:
<g color="black" fill="none" stroke="currentColor">
<path d="m82.5 323.3v-4.1" stroke="#000"/>
</g>
<g color="black" fill="none" stroke="currentColor">
<path d="m116.4 323.3v-4.1" stroke="#000"/>
</g>
... repeated 10+ more times here ...
<g color="black" fill="none" stroke="currentColor">
<path d="m65.4 72.8v250.5h420v-250.5h-420z" stroke="#000"/>
</g>
A more optimal pattern would be:
<g color="black" fill="none" stroke="#000">
<path d="m82.5 323.3v-4.1"/>
<path d="m116.4 323.3v-4.1"/>
... 10+ more paths here ...
<path d="m65.4 72.8v250.5h420v-250.5h-420z"/>
</g>
This patch enables that optimization by handling the merging of two
sibling <g> entries that have identical attributes. In the above
example that does not solve the rewrite from "currentColor" to "#000"
for the stroke attribute. However, the existing code already handles
that automatically after the <g> elements have been merged.
This change provides comparable results to --create-groups as shown by
the following diagram while being a distinct optimization:
+----------------------------+-------+--------+
| Test | Size | in % |
+----------------------------+-------+--------+
| baseline | 17961 | 100% |
| baseline + --create-groups | 17418 | 97.0% |
| patched | 16939 | 94.3% |
| patched + --create-groups | 16855 | 93.8% |
+----------------------------+-------+--------+
The image used in the size table above was generated based on the
instructions from https://bugs.debian.org/858039#10 with gnuplot 5.2
patchlevel 2. Beyond the test-based "--create-groups", the following
scour command-line parameters were used:
--enable-id-stripping --enable-comment-stripping \
--shorten-ids --indent=none
Note that the baseline was scour'ed repeatedly to stablize the image
size.
Signed-off-by: Niels Thykier <niels@thykier.net>
Follow the spec "blindly" as it turns out covering all the border
and getting reasonably styled output is just to cumbersome.
This way at least scour output is consistent and it also saves us
some bytes (a lot in some cases as we do not indent <tspan>s etc.
anymore)
SVG specifies special logic for handling whitespace, see
https://www.w3.org/TR/SVG/text.html#WhiteSpace
by implementing it we can even shave off some unneeded bytes here
and there (e.g. consecutive spaces).
Unfortunately handling of newlines by renderers is inconsistent:
Sometimes they are replaced by a single space, sometimes they
are removed in the output.
As we can not know the expected behavior work around this by keeping
newlines inside text content elements intact.
Fixes#160.
Previously we added way to many and removed empty lines afterwards
(potentially destructive if xml:space="preserve")
Also adds proper indentation for comment nodes
Work around an exception in removeDefaultAttributeValue() caused by some rarely used filter attributes that allow an optional second value which SVGLength does not handle properly