Comment 1 by Patrick Georgi, Mar 8, 2011
Attached patch is a rewrite of the diff parser. It parses diffs more directly: 1. skip non-diff data 2. read header (--- and +++ lines) 2a. if header malformed, go to 1 3. while hunk header is at the current position, parse chunk 3a. if hunk malformed, try if a new hunk starts at the current location 4. go to 1 So it tries to be robust in light of misformatted diffs, but I didn't test that yet. For common errors (eg. extra newlines in the text) it will drop some hunks that it could recover with some extra complexity, but the patch would be broken regardless. Tested with a couple of mtn and git repos, but I don't have svn or hg around for testing.
- rewrite-diff-parser.diff - 12.29 kB - view
Status:
Started
Comment 2 by Patrick Georgi, Mar 8, 2011
Actually, I found an svn repo - looks good. So only mercurial is still untested.
Comment 3 by Thomas Keller, Mar 9, 2011
We definitely need more tests for this change. There is at least one cornercase I found where the hunk parser will fail - for single line adds / deletes. GNU diff leaves ",1" optional, so lines like "@@ -l -l,s @@" or "@@ -l,s -l @@" are valid and must be parsable as well. In fact, if you create a new single line file in monotone and add it, it will return you such a hunk. Other than that very good work. To get this into 1.1 we just need more confidence that it really works for at least all the cases it worked previously :)
Comment 4 by Thomas Keller, Mar 9, 2011
...and of course I meant "@@ -l +l,s @@" and "@@ -l,s +l @@" Sorry for the noise.
Comment 5 by Patrick Georgi, Mar 9, 2011
Ah, wasn't aware of that quirk - the new patch has support for it. I tested the issue with an ad-hoc test suite, but that requires more work before it's ready for public consumption.
- rewrite-diff-parser.diff - 12.43 kB - view
Comment 6 by Patrick Georgi, Mar 16, 2011
Attached:
- Updated parser (slightly more compatible, though it shouldn't
matter in real life)
- Patch to hopefully reduce memory use (by discarding references to
string representations early)
- Test suite: Unpack to indefero toplevel directory, run with
"php testdiff.php". Compares src/IDF/Diff.php to the
original version (added as Diff.original.php)
Result: Identical behaviour except for svn, where some additional
bits are added to the internal data structures (which are not used
later). These additional bits are misinterpretations
("empty" files for empty file additions which svn emits by
its svn-header only, without a diff; svn metadata is added to the
diff data, where it definitely doesn't belong)
I refuse to implement _that_ quirk in the new parser, too.
- 0001-New-diff-parser.patch - 13.17 kB - view
- 0002-Cut-down-memory-use-when-parsing-diffs.patch - 1.46 kB - view
- diff-parser-test.tar.bz2 - 13.59 kB
Comment 7 by Thomas Keller, Mar 16, 2011
The first two patches have been committed in revision baa8841. I added the missing method to set the strip path level for IDF_Scm_Mercurial as well and fixed a couple of whitespace issues. I'll add some phpunit test infrastructure and the Diff tests in a few.
Comment 8 by Thomas Keller, Mar 17, 2011
Test infrastructure and test itself have been added in commit 61d7b4a .
Status:
Fixed
Sign in to reply to this comment.
Reported by Thomas Keller, Mar 8, 2011