Commit graph

232 commits

Author SHA1 Message Date
Niels Thykier
e25b0dae73 Remove a (now) unused parameter to renameID
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-15 17:36:07 +00:00
Niels Thykier
91503c6d7e renameID: Replace referencedIDs with referringNodes
This change pushes the responsibility of updating referencedIDs to its
callers where needed.  The only caller of renameIDs is shortenIDs and
that works perfectly fine without updating its copy of referencedIDs.
In shortenIDs, we need to be able to lookup which nodes referenced the
"original ID" (and not the "new ID").

While shortenIDs *could* update referencedIDs so it remained valid, it
is extra complexity for no gain.  As an example of this complexity,
imagine if two or more IDs are "rotated" like so:

      Original IDs: a, bb, ccc, dddd
      Mapping:
        dddd -> ccc
        ccc  -> bb
        bb   -> a
        a    -> dddd

While doable within reasonable performance, we do not need to support
it at the moment, so there is no reason to handle that complexity.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-15 17:35:05 +00:00
Niels Thykier
d6406a3470 shortenIDs: Avoid pointless renames of IDs
With the current code, scour could do a pointless remap of an ID,
where there is no benefit in it.  Consider:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
 <defs>
  <rect id="a" width="80" height="50" fill="red"/>
  <rect id="b" width="80" height="50" fill="blue"/>
 </defs>
 <use xlink:href="#a"/>
 <use xlink:href="#b"/>
 <use xlink:href="#b"/>
</svg>
```

In this example, there is no point in swapping the IDs - even if "#b"
is used more often than "#a", they have the same length.  Besides a
performance win on an already scour'ed image, it also mean scour will
behave like a function with a fixed-point (i.e. scour eventually stops
altering the image).

To solve this, we no longer check whether an we find exactly the same
ID.  Instead, we look at the length of the new ID compared to the
original.  This gives us a slight complication as we can now "reserve"
a "future" ID to avoid the rename.

Thanks to Eduard "Ede_123" Braun for providing the test case.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-15 17:34:24 +00:00
Niels Thykier
00cf42b554 Rename function to match DEP8 conventions
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-15 16:22:00 +00:00
Niels Thykier
0254014e06 Enable shortenIDs to recycle existing IDs
This patch enables shortenIDs to remap IDs currently in use.  This is
very helpful to ensure that scour does not change been "optimal" and
"suboptimal" choices for IDs as observed in GH#186.

Closes: #186
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-04-13 21:00:35 +00:00
Eduard Braun
3283d6d5ec Simplify control point detection logic
- make controlPoints() return a consistent type like flags()
- rename the ambiguous "reduce_precision" to "is_control_point"
2018-04-08 16:48:33 +02:00
Eduard Braun
103dcc0a48
Fix handling of boolean flags in elliptical path commands (#183)
* properly parse paths without space after boolean flags (fixes #161)
* omit space after boolean flag to shave off a few bytes when not using renderer workarounds
2018-04-08 15:32:47 +02:00
Niels Thykier
ba7f4b5f18 Remove more redundant uses of .keys()
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-26 22:36:19 +02:00
Niels Thykier
f8d5af0e56 Remove now unused variable
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-26 22:36:19 +02:00
Eduard Braun
d508f59aa6 Completely remove "walltime" variable and use time.time() directly 2018-03-26 22:34:11 +02:00
Niels Thykier
b622642aa1 Simplify timer selection to always use time.time() (#175)
In python2.7 and python3.3, time.time() is sufficient accurate for our
purpose and avoids going through hoops to select the best available
function.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-26 22:30:25 +02:00
Niels Thykier
6ea126d290 Gracefully handle unreferenced gradients with --keep-unreferenced-defs (#173)
Closes: #156
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-10 16:06:50 +01:00
Niels Thykier
cae0faefa0 Avoid O(n²) in removeDuplicateGradient (#171)
The original implementation of removeDuplicateGradient does O(n²)
search over all gradients to remove duplicates.  In images with many
gradients (such as [MediaWiki_logo_1.svg]), this becomes a significant
overhead as that logo has over 900 duplicated gradients.

We solve this by creating a key for each gradient based on the
attributes we use for duplication detection.  This key is generated
such that if two gradients have the same key, they are duplicates (for
our purpose) and the keys are different then the gradients are
guaranteed to be different as well.  With such a key, we can rely on a
dict to handle the duplication detection (which it does very well).

This change improves the runtime performance on [MediaWiki_logo_1.svg]
by about 25% (8m51s -> 1m56s on 5 runs).

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-03-10 15:47:29 +01:00
Niels Thykier
7115e82cb8 Fix typo in HISTORY.md 2018-03-10 13:36:49 +01:00
Niels Thykier
0776d32179 Remove an unnecessary loop
The unprotected_ids function returns all unprotected ids and
removeUnreferencedIDs removes all of them that does not appear in the
return value of findReferencedElements.

On closer observation it turns out that removeUnreferencedIDs cannot
cause nodes/IDs to become unprotected nor unreferenced (as it only
remove the "id" attribute, not the node).  With this in mind, we can
just remove the loop and save a call to all of these functions.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-20 19:46:49 +01:00
Eduard Braun
5360db86d9 Fix the last instance of "list(dict.keys())" 2018-02-17 15:59:18 +01:00
Niels Thykier
2f0b3ea362 Rewrite redundant codepatterns introduced by py2 -> py3 conversion
The automated python2 -> python3 converter creates some suboptimal
code patterns in some cases, notably in its handling of dicts.

This commit handles the following cases:

  * "if x in list(y.keys()):" => "if x in y:"

    The original code is neuters the O(1) lookup effeciency of a dict
    by turning it into a list.  This occurs a O(n) in converting it to
    a list and then another O(n) for the lookup.  When done in a loop,
    this becomes O(n * m) rather than the optimal O(m).

  * "for x in list(y.keys()):" => "for x in y:" OR "for x in list(y):"

    A dict (y in these cases) operates as an iterator over keys in the
    dict by default.  This makes the entire "list(y.keys())" dance
    redundant _in most cases_.  In a some cases, scour modifies the
    dict while iterating over it and in those cases, we need a
    "list(y)" (but not a "y.keys()").

    The benefit of this differs between python2 and python3.  In
    python3, we basically "only" avoid function call.  In python2,
    y.keys() generates a list, so here we avoid generating a
    "throw-away list".

The test suite succeed both with "python testscour.py" and "python3
testscour.py" (used 2.7.14+ and 3.6.4 from Debian testing).

On a 341kB flame-graph generated by "nytprof" (a perl profiler), this
commit changes the runtimes of scour from the range 3.39s - 3.45s to
3.27s - 3.35s making it roughly 3% faster in this case (YMMV,
particularly with different input).  The timings were recorded using
the following command line:

  time PYTHONPATH=. python3 -m scour.scour --enable-id-stripping  \
     --shorten-ids --indent=none  --enable-comment-stripping
     -i input.svg -o output.svg

This was used 5 times with and 5 times without the patch picking the
worst and best time to define the range.  The runtime test was only
preformed on python3.

All changed lines where found with:
  grep -rE ' in list[(].*[.]keys[(][)][)]:'

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 15:55:40 +01:00
Eduard Braun
cb093e9171 Update docstring of findReferencedElements
(to match changes in c54a7239e7)
2018-02-17 14:27:19 +01:00
Niels Thykier
c54a7239e7 Simplify the "ids" structure returned by findReferencedElements
It was a dict with a two element list a la:

  {
    "id1": [len(nodeListX), nodeListX]],
    "id2": [len(nodeListY), nodeListY]],
    ...
  }

This can trivially be simplified to:

  {
    "id1": nodeListX,
    "id2": nodeListY,
    ...
  }

The two call-sites that actually needs the length (e.g. to sort by how
often the id is used) can trivially compute that via a call to "len".

All other call sites either just need to tell if an ID is used at all
or work the nodes referencing the id (e.g. to remap the id).  The
former are unaffected by this change and the latter can now avoid a
layer of indirection.

This refactoring has negiable changes to the runtime and probably also
to memory (not tested, but it is a minor constant improvement per
referenced id).

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 14:11:51 +01:00
Niels Thykier
b916a189e9 Avoid recomputing findReferencedElements in removeUnusedDefs
The removeUnusedDefs function does not actually remove anything (that
is left for its callers to do).  This implies that
findReferencedElements will return the same value before, during and
after a call to removeUnusedDefs.  Therefore, we can reuse the value
from findReferencedElements when recursing into child nodes.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 14:11:51 +01:00
Niels Thykier
633b381d87 findReferencedElements: Handle referencingProps separately
Split the handling of referencingProps into a separate loop that calls
findReferencingProperty directly.  This saves a bunch of "make list,
join list, append to another list and eventually split text into two
elements" operations.

This gives approximately 10% faster runtimes on 341 kB flamegraph
generated by the "nytprof" Perl profiler.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 14:11:51 +01:00
Eduard Braun
7249ae8b0a user clearer variable names for indent type and indent depth 2018-02-17 11:59:35 +01:00
Niels Thykier
843706be39 Catch specific exception rather than anything
The bare "except" also catches exceptions like "NameError" and
"SystemExit", which we really should not catch.  In scour.py, use the
most specific exception (NotFoundErr) and in the tests just catch any
"regular" exception.

Reported by flake8.

Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 11:47:36 +01:00
Niels Thykier
f3d8936b5e Rename "I" to "line_prefix" to avoid flake8 E741
Signed-off-by: Niels Thykier <niels@thykier.net>
2018-02-17 11:47:36 +01:00
Michael Witten
b20a0698cc tests: Add unit tests for preservation of quotes in CSS styles
These tests will ensure that issues #21 and #56 do not return.
2017-09-03 18:07:27 +02:00
Michael Witten
0a146b7fef tests: Add unit tests for the escaping of quote characters in attribute values 2017-09-03 18:07:27 +02:00
Michael Witten
7e14cd352f scour.py: Escape quote characters in attribute values, as necessary and minimally
Either double quotes or single quotes are escaped; the choice is made
so as to minimize the length of the escaped string.
2017-09-03 18:07:27 +02:00
Michael Witten
f14784b01f scour.py: handle id' and xml:id' in the same code that handles other attributes 2017-08-26 19:11:37 +00:00
Michael Witten
fef2786c5e scour.py: minor rearrangement for the sake of clarity
There has been a minor rearrangement of the code that handles the children
of the element being serialized: The relevant `if' statement has had its
condition effectively negated and thus has also had its consequent and
alternative swapped; now, there is a very short consequent, followed by a
very long alternative, rather than a very long consequent followed by a
very short alternative.
2017-08-26 19:11:37 +00:00
Michael Witten
695a91e447 scour.py: Satisfy the identing rules of PEP8 2017-08-26 18:44:58 +02:00
Michael Witten
7ee5f9774d scour.py: Use named constants rather than literal integers for `nodeType' 2017-08-26 18:44:58 +02:00
Eduard Braun
e36cd4832a Scour v0.36 2017-08-06 04:55:59 +02:00
Eduard Braun
49f3664f82 make flake8 happier 2017-08-06 04:55:43 +02:00
Eduard Braun
53d87ed35a make flake8 happy 2017-08-06 04:47:33 +02:00
Eduard Braun
c089448bb5 Update project description and use in both, README.md and setup.py 2017-08-06 04:38:33 +02:00
Eduard Braun
e0bfad272b Update README.md
- the original page only has a link these days
- project identity should be established by now
2017-08-06 04:13:23 +02:00
Eduard Braun
992b6850c6 Update HISTORY.md 2017-08-06 03:52:20 +02:00
Eduard Braun
cc592c8e8a Improve and fix behaviour when collapsing straight paths segments (#146)
* Do not collapse straight path segments in paths that have intermediate markers (see #145). The intermediate nodes might be unnecessary for the shape of the path, but their markers would be lost.
* Collapse subpaths of moveto `m` and lineto `l` commands if they have the same direction (before we only collapsed horizontal/vertical `h`/`v` lineto commands)
* Attempt to collapse lineto `l` commands into a preceding moveto `m` command (these are then called "implicit lineto commands")
* Preserve empty path segments if they have `stroke-linecap` set to `round` or `square`. They render no visible line but a tiny dot or square.
2017-05-18 00:53:25 +02:00
Ville Skyttä
75bacbc8e6 Python 3.6 invalid escape sequence deprecation fix (#144)
(see https://docs.python.org/3/whatsnew/3.6.html#deprecated-python-behavior)
2017-05-09 22:07:06 +02:00
Ville Skyttä
62b16c11d8 Spelling fixes 2017-05-09 21:45:04 +02:00
Eduard Braun
5bfffc2ca8 Hardcode printing of "flowtext" warning to stderr
Third-party applications obviously can not handle additional output on stdout nor can they be expected to do any weird stdout/sterr redirection as we do via `options.stdout`
We probably shouldn't print anything in `scourString()` to start with unless we offer an option to disable all non-SVG output for third-party libraries to use.
2017-04-30 04:13:44 +02:00
Eduard Braun
98e3040645 Add unittest for --set-c-precision (7cb0d36d72) 2017-02-25 19:44:18 +01:00
Eduard Braun
51c1e6af23 Improve options handling for precision options
- prevent '--set-precision=0' by requiring >=1
- warn user if '--set-c-precision' > '--set-precision' instead of silently ignoring the value
- some code cleanup
2017-02-25 19:44:18 +01:00
Eduard Braun
c2a65a772e Some code refactoring 2017-02-25 19:44:18 +01:00
Eduard Braun
12237e01c8 Refactor logic to detect control points from 7cb0d36d72 and also include control points of quadratic Bézier curve commands ("q") 2017-02-25 19:44:18 +01:00
Eduard Braun
090884a70f Don't force whitespace for elliptical paths (fixes #89)
This was only required in an early draft of the SVG spec (an error that was corrected later, see [1,2])

[1] https://github.com/scour-project/scour/issues/89#issuecomment-244216600
[2] https://github.com/scour-project/scour/issues/89#issuecomment-244337118
2017-02-25 19:44:18 +01:00
Eduard Braun
2ebe9741b2 Rename a variable plus some editing of comments 2017-02-25 19:44:18 +01:00
Eduard Braun
a7e7b4c21d Cleanup options.
Also omit short option strings of advanced options for now (if we offer them again in future, they should be chosen very carefully as should the options for which we offer them)
2017-02-23 23:39:27 +01:00
pborunda
7cb0d36d72 Improve precision options for smaller output size (#131)
Add a separate precision option for curve control points (--set-c-precision)
This can considerably reduce file size with marginal effect on visual appearance.
2017-02-23 22:00:32 +01:00
Eduard Braun
ffeb76c894 Unittests: remove temporary file 'testscour_temp.svg' after running tests 2017-02-22 22:13:04 +01:00