Next: Copying This Manual, Previous: Standards conformance, Up: Comparing and Merging Files [Contents][Index]
Here are some ideas for improving GNU diff
and
patch
. The GNU project has identified some
improvements as potential programming projects for volunteers. You
can also help by reporting any bugs that you find.
If you are a programmer and would like to contribute something to the GNU project, please consider volunteering for one of these projects. If you are seriously contemplating work, please write to gvc@gnu.org to coordinate with other volunteers.
Next: Reporting Bugs, Up: Future Projects [Contents][Index]
diff
and patch
One should be able to use GNU diff
to generate a
patch from any pair of directory trees, and given the patch and a copy
of one such tree, use patch
to generate a faithful copy of
the other. Unfortunately, some changes to directory trees cannot be
expressed using current patch formats; also, patch
does not
handle some of the existing formats. These shortcomings motivate the
following suggested projects.
Next: Handling Changes to the Directory Structure, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
diff
, diff3
and sdiff
treat each line of
input as a string of unibyte characters. This can mishandle multibyte
characters in some cases. For example, when asked to ignore spaces,
diff
does not properly ignore a multibyte space character.
Also, diff
currently assumes that each byte is one column
wide, and this assumption is incorrect in some locales, e.g., locales
that use UTF-8 encoding. This causes problems with the -y or
--side-by-side option of diff
.
These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.
The IBM GNU/Linux Technology Center Internationalization Team has
proposed
patches
to support internationalized diff
.
Unfortunately, these patches are incomplete and are to an older
version of diff
, so more work needs to be done in this area.
Next: Files that are Neither Directories Nor Regular Files, Previous: Handling Multibyte and Varying-Width Characters, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
diff
and patch
do not handle some changes to directory
structure. For example, suppose one directory tree contains a directory
named ‘D’ with some subsidiary files, and another contains a file
with the same name ‘D’. ‘diff -r’ does not output enough
information for patch
to transform the directory subtree into
the file.
There should be a way to specify that a file has been removed without
having to include its entire contents in the patch file. There should
also be a way to tell patch
that a file was renamed, even if
there is no way for diff
to generate such information.
There should be a way to tell patch
that a file’s timestamp
has changed, even if its contents have not changed.
These problems can be fixed by extending the diff
output format
to represent changes in directory structure, and extending patch
to understand these extensions.
Next: File Names that Contain Unusual Characters, Previous: Handling Changes to the Directory Structure, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
Some files are neither directories nor regular files: they are unusual
files like symbolic links, device special files, named pipes, and
sockets. Currently, diff
treats symbolic links as if they
were the pointed-to files, except that a recursive diff
reports an error if it detects infinite loops of symbolic links (e.g.,
symbolic links to ..). diff
treats other special
files like regular files if they are specified at the top level, but
simply reports their presence when comparing directories. This means
that patch
cannot represent changes to such files. For
example, if you change which file a symbolic link points to,
diff
outputs the difference between the two files, instead
of the change to the symbolic link.
diff
should optionally report changes to special files specially,
and patch
should be extended to understand these extensions.
Next: Outputting Diffs in Timestamp Order, Previous: Files that are Neither Directories Nor Regular Files, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
When a file name contains an unusual character like a newline or
white space, ‘diff -r’ generates a patch that patch
cannot
parse. The problem is with format of diff
output, not just with
patch
, because with odd enough file names one can cause
diff
to generate a patch that is syntactically correct but
patches the wrong files. The format of diff
output should be
extended to handle all possible file names.
Next: Ignoring Certain Changes, Previous: File Names that Contain Unusual Characters, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
Applying patch
to a multiple-file diff can result in files
whose timestamps are out of order. GNU patch
has
options to restore the timestamps of the updated files
(see Updating Timestamps on Patched Files), but sometimes it is useful to generate
a patch that works even if the recipient does not have GNU patch,
or does not use these options. One way to do this would be to
implement a diff
option to output diffs in timestamp order.
Next: Improving Performance, Previous: Outputting Diffs in Timestamp Order, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
It would be nice to have a feature for specifying two strings, one in from-file and one in to-file, which should be considered to match. Thus, if the two strings are ‘foo’ and ‘bar’, then if two lines differ only in that ‘foo’ in file 1 corresponds to ‘bar’ in file 2, the lines are treated as identical.
It is not clear how general this feature can or should be, or what syntax should be used for it.
A partial substitute is to filter one or both files before comparing, e.g.:
sed 's/foo/bar/g' file1 | diff - file2
However, this outputs the filtered text, not the original.
Previous: Ignoring Certain Changes, Up: Suggested Projects for Improving GNU diff
and patch
[Contents][Index]
When comparing two large directory structures, one of which was
originally copied from the other with timestamps preserved (e.g.,
with ‘cp -pR’), it would greatly improve performance if an option
told diff
to assume that two files with the same size and
timestamps have the same content. See diff
Performance Tradeoffs.
Previous: Suggested Projects for Improving GNU diff
and patch
, Up: Future Projects [Contents][Index]
If you think you have found a bug in GNU cmp
,
diff
, diff3
, or sdiff
, please report it
by electronic mail to the
GNU utilities
bug report mailing list bug-diffutils@gnu.org. Please send
bug reports for GNU patch
to
bug-patch@gnu.org. Send as precise a description of the
problem as you can, including the output of the --version
option and sample input files that produce the bug, if applicable. If
you have a nontrivial fix for the bug, please send it as well. If you
have a patch, please send it too. It may simplify the maintainer’s
job if the patch is relative to a recent test release, which you can
find in the directory ftp://alpha.gnu.org/gnu/diffutils/.
Next: Copying This Manual, Previous: Standards conformance, Up: Comparing and Merging Files [Contents][Index]