scour

Author	SHA1	Message	Date
Patrick Storz	c29ecd9304	Run tests with Python 3.7 / 3.8	2020-05-17 15:57:31 +02:00
Eduard Braun	049264eba6	Scour v0.37	2018-07-04 19:16:55 +02:00
Eduard Braun	5ccba31ff9	Update HISTORY.md	2018-07-04 19:05:25 +02:00
Patrick Storz	718748ff22	Merge pull request #199 from Ede123/newline_handling Several improvements for handling whitespace including newlines, especially in text nodes	2018-07-03 22:56:36 +02:00
Eduard Braun	651694a6c0	Add unittests for whitespace handling in text node Also expand/fix the test for line endings	2018-07-03 22:53:05 +02:00
Eduard Braun	703122369e	Strip newlines from text nodes and be done with it Follow the spec "blindly" as it turns out covering all the border and getting reasonably styled output is just to cumbersome. This way at least scour output is consistent and it also saves us some bytes (a lot in some cases as we do not indent <tspan>s etc. anymore)	2018-07-02 22:14:14 +02:00
Eduard Braun	2200f8dc81	temp	2018-07-02 01:05:54 +02:00
Eduard Braun	e1c2699f07	Improve whitespace handling in text content elements SVG specifies special logic for handling whitespace, see https://www.w3.org/TR/SVG/text.html#WhiteSpace by implementing it we can even shave off some unneeded bytes here and there (e.g. consecutive spaces). Unfortunately handling of newlines by renderers is inconsistent: Sometimes they are replaced by a single space, sometimes they are removed in the output. As we can not know the expected behavior work around this by keeping newlines inside text content elements intact. Fixes #160.	2018-07-01 20:19:58 +02:00
Eduard Braun	7d28f5e051	Improve handling of newlines Previously we added way to many and removed empty lines afterwards (potentially destructive if xml:space="preserve") Also adds proper indentation for comment nodes	2018-07-01 19:48:18 +02:00
Eduard Braun	06ea23d0e1	fix typo	2018-07-01 13:52:51 +02:00
Patrick Storz	8c95d950af	Merge pull request #192 from nthykier/gh-189-order-vs-SVGLength Work around an exception in removeDefaultAttributeValue() caused by some rarely used filter attributes that allow an optional second value which SVGLength does not handle properly	2018-06-30 19:03:15 +02:00
Patrick Storz	5d579f8927	Also special-case baseFrequency and add 'radius	2018-06-30 18:58:36 +02:00
Eduard Braun	3c64623a12	Discontinue official support for Python 3.3 (testing failed due to wheel now requiring Python >= 3.4) Also run flake8 in latest Python 3.6 (3.7 is not supported on Travis yet)	2018-06-29 19:29:09 +02:00
Patrick Storz	9f4a707bb7	Merge pull request #178 from nthykier/gh-163-path-rewrite Correct handling of "m0 0" vs. "z" commands	2018-06-29 19:11:53 +02:00
Niels Thykier	8a2892b458	Avoid crashing on stdDeviation attribute Signed-off-by: Niels Thykier <niels@thykier.net>	2018-04-21 06:39:08 +00:00
Niels Thykier	c504891bd7	test: Use number-optional-number variant of kernelUnitLength Signed-off-by: Niels Thykier <niels@thykier.net>	2018-04-21 06:19:38 +00:00
Tobias Oberstein	47f918e696	Merge pull request #191 from nthykier/gh-190-optimizeTransform-IndexError Avoid crashing on "scale(1)" (short for "scale(1, 1)")	2018-04-18 19:25:48 +02:00
Niels Thykier	18e57cddae	Avoid crashing on "scale(1)" (short for "scale(1, 1)") The scale function on the transform attribute has a short form, where only the first argument is used. But optimizeTransform would always assume that there were two when checking for the identity scale. Closes: #190 Signed-off-by: Niels Thykier <niels@thykier.net>	2018-04-18 05:41:35 +00:00
Niels Thykier	a459d629c1	removeDefaultAttributeValue: Special-case order attribute Scour tried to handle "order" attribute as a SVGLength. However, the "order" attribute can consist of two integers according to the [SVG 1.1 Specification] and SVGLength is not designed to handle that. With this change, we now pretend that "order" is a string, which side steps this issue. [SVG 1.1 Specification]: https://www.w3.org/TR/SVG11/single-page.html#filters-feConvolveMatrixElementOrderAttribute Closes: #189 Signed-off-by: Niels Thykier <niels@thykier.net>	2018-04-17 19:48:37 +00:00
Eduard Braun	8ddb7d8913	Add valid elements for 'spreadMethod' attribute Turns out 'default_attributes_universal' is actually empty right now so we might consider removing it altogether...	2018-04-15 18:40:06 +02:00
Eduard Braun	0ec0732447	Simplify 'default_attributes' handling a bit	2018-04-15 18:33:46 +02:00
Eduard Braun	20dcbcbe64	'default_attributes': make sure 'elements' is a list	2018-04-15 18:31:51 +02:00
Niels Thykier	1650f91ea4	Optimize removeDefaultAttributeValues Avoid looping over DefaultAttribute(s) that are not relevant for a given node. This skips a lot of calls to removeDefaultAttributeValue but more importantly, it avoids "node.nodeName not in attribute.elements" line in removeDefaultAttributeValue. As attribute.elements is a list, this becomes expensive for "larger lists" (or in this case when there are a lot of attributes). This seems to remove about 1½-2 minutes of runtime (out of ~8) on the 1_42_polytope_7-cube.svg test case provided in #184. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-04-15 18:29:58 +02:00
Niels Thykier	5dc1b7a820	scour: Make optimized default_attribute data structures There are a lot of "DefaultAttribute"s and for a given tag, most of the "DefaultAttribute"s are not applicable. Therefore, we create two data structures to assist us with only dealing with the attributes that matter. Here there are two cases: * Those that always matter. These go into default_attributes_unrestricted list. * Those that matter only based on the node name. These go into the default_attributes_restricted_by_tag with the node name as key (with the value being a list of matching attributes). In the next commit, we will use those for optimizing the removal of default attributes. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-04-15 18:29:58 +02:00
Eduard Braun	3283d6d5ec	Simplify control point detection logic - make controlPoints() return a consistent type like flags() - rename the ambiguous "reduce_precision" to "is_control_point"	2018-04-08 16:48:33 +02:00
Eduard Braun	103dcc0a48	Fix handling of boolean flags in elliptical path commands (#183 ) * properly parse paths without space after boolean flags (fixes #161) * omit space after boolean flag to shave off a few bytes when not using renderer workarounds	2018-04-08 15:32:47 +02:00
Niels Thykier	ba7f4b5f18	Remove more redundant uses of .keys() Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-26 22:36:19 +02:00
Niels Thykier	f8d5af0e56	Remove now unused variable Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-26 22:36:19 +02:00
Eduard Braun	d508f59aa6	Completely remove "walltime" variable and use time.time() directly	2018-03-26 22:34:11 +02:00
Niels Thykier	b622642aa1	Simplify timer selection to always use time.time() (#175 ) In python2.7 and python3.3, time.time() is sufficient accurate for our purpose and avoids going through hoops to select the best available function. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-26 22:30:25 +02:00
Niels Thykier	38274f75bc	Implement a basic rewrite of redundant commands This basic implementation can drop and rewrite some cases of "m0 0" and "z" without triggering the issues experienced in #163. It works by analysing the path backwards and tracking "z" and "m" commands. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-11 08:33:50 +00:00
Niels Thykier	a2c94c96fb	Disable the "m0 0"-optimization as it is wrong in some cases The "m0 0" rewrite gets some cases wrong, like: m150 240h200m0 0 150 150v-300z Scour rewrote that into the following m150 240h200l150 150v-300z However, these two paths do not produce an identical figure at all. The first is a line followed by a triangle while the second is a quadrilateral. While there are some instances we can rewrite (that scour will no longer rewrite), these will require an analysis over multiple commands to determine whether the rewrite is safe. This will reappear in the next commit. Closes: #163 Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-11 08:25:46 +00:00
Niels Thykier	6ea126d290	Gracefully handle unreferenced gradients with --keep-unreferenced-defs (#173 ) Closes: #156 Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-10 16:06:50 +01:00
Niels Thykier	cae0faefa0	Avoid O(n²) in removeDuplicateGradient (#171 ) The original implementation of removeDuplicateGradient does O(n²) search over all gradients to remove duplicates. In images with many gradients (such as [MediaWiki_logo_1.svg]), this becomes a significant overhead as that logo has over 900 duplicated gradients. We solve this by creating a key for each gradient based on the attributes we use for duplication detection. This key is generated such that if two gradients have the same key, they are duplicates (for our purpose) and the keys are different then the gradients are guaranteed to be different as well. With such a key, we can rely on a dict to handle the duplication detection (which it does very well). This change improves the runtime performance on [MediaWiki_logo_1.svg] by about 25% (8m51s -> 1m56s on 5 runs). Signed-off-by: Niels Thykier <niels@thykier.net>	2018-03-10 15:47:29 +01:00
Niels Thykier	7115e82cb8	Fix typo in HISTORY.md	2018-03-10 13:36:49 +01:00
Niels Thykier	0776d32179	Remove an unnecessary loop The unprotected_ids function returns all unprotected ids and removeUnreferencedIDs removes all of them that does not appear in the return value of findReferencedElements. On closer observation it turns out that removeUnreferencedIDs cannot cause nodes/IDs to become unprotected nor unreferenced (as it only remove the "id" attribute, not the node). With this in mind, we can just remove the loop and save a call to all of these functions. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-20 19:46:49 +01:00
Eduard Braun	5360db86d9	Fix the last instance of "list(dict.keys())"	2018-02-17 15:59:18 +01:00
Niels Thykier	2f0b3ea362	Rewrite redundant codepatterns introduced by py2 -> py3 conversion The automated python2 -> python3 converter creates some suboptimal code patterns in some cases, notably in its handling of dicts. This commit handles the following cases: * "if x in list(y.keys()):" => "if x in y:" The original code is neuters the O(1) lookup effeciency of a dict by turning it into a list. This occurs a O(n) in converting it to a list and then another O(n) for the lookup. When done in a loop, this becomes O(n * m) rather than the optimal O(m). * "for x in list(y.keys()):" => "for x in y:" OR "for x in list(y):" A dict (y in these cases) operates as an iterator over keys in the dict by default. This makes the entire "list(y.keys())" dance redundant _in most cases_. In a some cases, scour modifies the dict while iterating over it and in those cases, we need a "list(y)" (but not a "y.keys()"). The benefit of this differs between python2 and python3. In python3, we basically "only" avoid function call. In python2, y.keys() generates a list, so here we avoid generating a "throw-away list". The test suite succeed both with "python testscour.py" and "python3 testscour.py" (used 2.7.14+ and 3.6.4 from Debian testing). On a 341kB flame-graph generated by "nytprof" (a perl profiler), this commit changes the runtimes of scour from the range 3.39s - 3.45s to 3.27s - 3.35s making it roughly 3% faster in this case (YMMV, particularly with different input). The timings were recorded using the following command line: time PYTHONPATH=. python3 -m scour.scour --enable-id-stripping \ --shorten-ids --indent=none --enable-comment-stripping -i input.svg -o output.svg This was used 5 times with and 5 times without the patch picking the worst and best time to define the range. The runtime test was only preformed on python3. All changed lines where found with: grep -rE ' in list[(].*[.]keys[(][)][)]:' Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-17 15:55:40 +01:00
Eduard Braun	cb093e9171	Update docstring of findReferencedElements (to match changes in `c54a7239e7`)	2018-02-17 14:27:19 +01:00
Niels Thykier	c54a7239e7	Simplify the "ids" structure returned by findReferencedElements It was a dict with a two element list a la: { "id1": [len(nodeListX), nodeListX]], "id2": [len(nodeListY), nodeListY]], ... } This can trivially be simplified to: { "id1": nodeListX, "id2": nodeListY, ... } The two call-sites that actually needs the length (e.g. to sort by how often the id is used) can trivially compute that via a call to "len". All other call sites either just need to tell if an ID is used at all or work the nodes referencing the id (e.g. to remap the id). The former are unaffected by this change and the latter can now avoid a layer of indirection. This refactoring has negiable changes to the runtime and probably also to memory (not tested, but it is a minor constant improvement per referenced id). Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-17 14:11:51 +01:00
Niels Thykier	b916a189e9	Avoid recomputing findReferencedElements in removeUnusedDefs The removeUnusedDefs function does not actually remove anything (that is left for its callers to do). This implies that findReferencedElements will return the same value before, during and after a call to removeUnusedDefs. Therefore, we can reuse the value from findReferencedElements when recursing into child nodes. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-17 14:11:51 +01:00
Niels Thykier	633b381d87	findReferencedElements: Handle referencingProps separately Split the handling of referencingProps into a separate loop that calls findReferencingProperty directly. This saves a bunch of "make list, join list, append to another list and eventually split text into two elements" operations. This gives approximately 10% faster runtimes on 341 kB flamegraph generated by the "nytprof" Perl profiler. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-17 14:11:51 +01:00
Eduard Braun	7249ae8b0a	user clearer variable names for indent type and indent depth	2018-02-17 11:59:35 +01:00
Niels Thykier	843706be39	Catch specific exception rather than anything The bare "except" also catches exceptions like "NameError" and "SystemExit", which we really should not catch. In scour.py, use the most specific exception (NotFoundErr) and in the tests just catch any "regular" exception. Reported by flake8. Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-17 11:47:36 +01:00
Niels Thykier	f3d8936b5e	Rename "I" to "line_prefix" to avoid flake8 E741 Signed-off-by: Niels Thykier <niels@thykier.net>	2018-02-17 11:47:36 +01:00
Michael Witten	b20a0698cc	tests: Add unit tests for preservation of quotes in CSS styles These tests will ensure that issues #21 and #56 do not return.	2017-09-03 18:07:27 +02:00
Michael Witten	0a146b7fef	tests: Add unit tests for the escaping of quote characters in attribute values	2017-09-03 18:07:27 +02:00
Michael Witten	7e14cd352f	scour.py: Escape quote characters in attribute values, as necessary and minimally Either double quotes or single quotes are escaped; the choice is made so as to minimize the length of the escaped string.	2017-09-03 18:07:27 +02:00
Michael Witten	f14784b01f	scour.py: handle `id' and` xml:id' in the same code that handles other attributes	2017-08-26 19:11:37 +00:00
Michael Witten	fef2786c5e	scour.py: minor rearrangement for the sake of clarity There has been a minor rearrangement of the code that handles the children of the element being serialized: The relevant `if' statement has had its condition effectively negated and thus has also had its consequent and alternative swapped; now, there is a very short consequent, followed by a very long alternative, rather than a very long consequent followed by a very short alternative.	2017-08-26 19:11:37 +00:00

1 2 3 4 5 ...

253 commits