Reported by Thomas Keller, Mar 8, 2011
A regression originating from commit 336faa4503 breaks some commit views for the monotone backend (for interested parties, monotone commit 3d96895e9ff2ea7d2eaa5ad2c46f60cfbf146461). The issue is that the parser won't recognize any next file diff in a multi-file diff output as long as the variable $indiff stays `true`. Reverting the previously mentioned change fixes the issue for mtn, but possibly breaks again with svn. Maybe we should rework the parser to ultimatively recognize "+++" as start of a new file diff and not any of the ever-changing header lines of all the different SCMs. And we should really add some unit tests for the parser to ensure we do not break with any SCM's diff output again in the future.
Comment 1 by Patrick Georgi, Mar 8, 2011
Attached patch is a rewrite of the diff parser. It parses diffs more directly: 1. skip non-diff data 2. read header (--- and +++ lines) 2a. if header malformed, go to 1 3. while hunk header is at the current position, parse chunk 3a. if hunk malformed, try if a new hunk starts at the current location 4. go to 1 So it tries to be robust in light of misformatted diffs, but I didn't test that yet. For common errors (eg. extra newlines in the text) it will drop some hunks that it could recover with some extra complexity, but the patch would be broken regardless. Tested with a couple of mtn and git repos, but I don't have svn or hg around for testing.
Comment 2 by Patrick Georgi, Mar 8, 2011
Actually, I found an svn repo - looks good. So only mercurial is still untested.
Comment 3 by Thomas Keller, Mar 9, 2011
We definitely need more tests for this change. There is at least one cornercase I found where the hunk parser will fail - for single line adds / deletes. GNU diff leaves ",1" optional, so lines like "@@ -l -l,s @@" or "@@ -l,s -l @@" are valid and must be parsable as well. In fact, if you create a new single line file in monotone and add it, it will return you such a hunk. Other than that very good work. To get this into 1.1 we just need more confidence that it really works for at least all the cases it worked previously :)
Comment 4 by Thomas Keller, Mar 9, 2011
...and of course I meant "@@ -l +l,s @@" and "@@ -l,s +l @@" Sorry for the noise.
Comment 5 by Patrick Georgi, Mar 9, 2011
Ah, wasn't aware of that quirk - the new patch has support for it. I tested the issue with an ad-hoc test suite, but that requires more work before it's ready for public consumption.
Comment 6 by Patrick Georgi, Mar 16, 2011
Attached: - Updated parser (slightly more compatible, though it shouldn't matter in real life) - Patch to hopefully reduce memory use (by discarding references to string representations early) - Test suite: Unpack to indefero toplevel directory, run with "php testdiff.php". Compares src/IDF/Diff.php to the original version (added as Diff.original.php) Result: Identical behaviour except for svn, where some additional bits are added to the internal data structures (which are not used later). These additional bits are misinterpretations ("empty" files for empty file additions which svn emits by its svn-header only, without a diff; svn metadata is added to the diff data, where it definitely doesn't belong) I refuse to implement _that_ quirk in the new parser, too.
Comment 7 by Thomas Keller, Mar 16, 2011
The first two patches have been committed in revision baa8841. I added the missing method to set the strip path level for IDF_Scm_Mercurial as well and fixed a couple of whitespace issues. I'll add some phpunit test infrastructure and the Diff tests in a few.
Comment 8 by Thomas Keller, Mar 17, 2011
Test infrastructure and test itself have been added in commit 61d7b4a .