In some cases, gnuplot generates a very suboptimal SVG content of the
following pattern:
<g color="black" fill="none" stroke="currentColor">
<path d="m82.5 323.3v-4.1" stroke="#000"/>
</g>
<g color="black" fill="none" stroke="currentColor">
<path d="m116.4 323.3v-4.1" stroke="#000"/>
</g>
... repeated 10+ more times here ...
<g color="black" fill="none" stroke="currentColor">
<path d="m65.4 72.8v250.5h420v-250.5h-420z" stroke="#000"/>
</g>
A more optimal pattern would be:
<g color="black" fill="none" stroke="#000">
<path d="m82.5 323.3v-4.1"/>
<path d="m116.4 323.3v-4.1"/>
... 10+ more paths here ...
<path d="m65.4 72.8v250.5h420v-250.5h-420z"/>
</g>
This patch enables that optimization by handling the merging of two
sibling <g> entries that have identical attributes. In the above
example that does not solve the rewrite from "currentColor" to "#000"
for the stroke attribute. However, the existing code already handles
that automatically after the <g> elements have been merged.
This change provides comparable results to --create-groups as shown by
the following diagram while being a distinct optimization:
+----------------------------+-------+--------+
| Test | Size | in % |
+----------------------------+-------+--------+
| baseline | 17961 | 100% |
| baseline + --create-groups | 17418 | 97.0% |
| patched | 16939 | 94.3% |
| patched + --create-groups | 16855 | 93.8% |
+----------------------------+-------+--------+
The image used in the size table above was generated based on the
instructions from https://bugs.debian.org/858039#10 with gnuplot 5.2
patchlevel 2. Beyond the test-based "--create-groups", the following
scour command-line parameters were used:
--enable-id-stripping --enable-comment-stripping \
--shorten-ids --indent=none
Note that the baseline was scour'ed repeatedly to stablize the image
size.
Signed-off-by: Niels Thykier <niels@thykier.net>
Work around an exception in removeDefaultAttributeValue() caused by some rarely used filter attributes that allow an optional second value which SVGLength does not handle properly
The scale function on the transform attribute has a short form, where
only the first argument is used. But optimizeTransform would always
assume that there were two when checking for the identity scale.
Closes: #190
Signed-off-by: Niels Thykier <niels@thykier.net>
Scour tried to handle "order" attribute as a SVGLength. However, the
"order" attribute *can* consist of two integers according to the
[SVG 1.1 Specification] and SVGLength is not designed to handle that.
With this change, we now pretend that "order" is a string, which side
steps this issue.
[SVG 1.1 Specification]: https://www.w3.org/TR/SVG11/single-page.html#filters-feConvolveMatrixElementOrderAttributeCloses: #189
Signed-off-by: Niels Thykier <niels@thykier.net>
With the current code, scour could do a pointless remap of an ID,
where there is no benefit in it. Consider:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<rect id="a" width="80" height="50" fill="red"/>
<rect id="b" width="80" height="50" fill="blue"/>
</defs>
<use xlink:href="#a"/>
<use xlink:href="#b"/>
<use xlink:href="#b"/>
</svg>
```
In this example, there is no point in swapping the IDs - even if "#b"
is used more often than "#a", they have the same length. Besides a
performance win on an already scour'ed image, it also mean scour will
behave like a function with a fixed-point (i.e. scour eventually stops
altering the image).
To solve this, we no longer check whether an we find exactly the same
ID. Instead, we look at the length of the new ID compared to the
original. This gives us a slight complication as we can now "reserve"
a "future" ID to avoid the rename.
Thanks to Eduard "Ede_123" Braun for providing the test case.
Signed-off-by: Niels Thykier <niels@thykier.net>
* properly parse paths without space after boolean flags (fixes#161)
* omit space after boolean flag to shave off a few bytes when not using renderer workarounds
This basic implementation can drop and rewrite some cases of "m0 0"
and "z" without triggering the issues experienced in #163. It works
by analysing the path backwards and tracking "z" and "m" commands.
Signed-off-by: Niels Thykier <niels@thykier.net>
The "m0 0" rewrite gets some cases wrong, like:
m150 240h200m0 0 150 150v-300z
Scour rewrote that into the following
m150 240h200l150 150v-300z
However, these two paths do not produce an identical figure at all.
The first is a line followed by a triangle while the second is a
quadrilateral.
While there are some instances we can rewrite (that scour will no
longer rewrite), these will require an analysis over multiple commands
to determine whether the rewrite is safe. This will reappear in the
next commit.
Closes: #163
Signed-off-by: Niels Thykier <niels@thykier.net>
The bare "except" also catches exceptions like "NameError" and
"SystemExit", which we really should not catch. In scour.py, use the
most specific exception (NotFoundErr) and in the tests just catch any
"regular" exception.
Reported by flake8.
Signed-off-by: Niels Thykier <niels@thykier.net>
* Do not collapse straight path segments in paths that have intermediate markers (see #145). The intermediate nodes might be unnecessary for the shape of the path, but their markers would be lost.
* Collapse subpaths of moveto `m` and lineto `l` commands if they have the same direction (before we only collapsed horizontal/vertical `h`/`v` lineto commands)
* Attempt to collapse lineto `l` commands into a preceding moveto `m` command (these are then called "implicit lineto commands")
* Preserve empty path segments if they have `stroke-linecap` set to `round` or `square`. They render no visible line but a tiny dot or square.
When the preceeding path segment is a Bézier curve, too, the first control point of the shorthand defaults to the mirrored version of the second control point of this preceeding path segment. Scour always assumed (0,0) as the control point in this case which could result in modified path data (e.g. #91).
For example for `orient="auto"` SVGLength() returns (value=0, units=Unit.INVALID); since the default value for `orient` is zero it was removed as there was check for a valid unit.
- In text nodes quotes are fine
- In attributes quotes are fine if used reciprocally.
Escaping in the latter case often causes issues, e.g. with quoted font names (#21) or inline CSS styles (#56), while it probably does not gain anything (if quotes are wrongly used in attribute names the XML is most likely invalid to start with)