table of contents
- NAME
- SYNOPSIS
- DESCRIPTION
- SAFETY WARNINGS
- OPERATION
- WORKING WITH MERCURIAL
- WORKING WITH SUBVERSION
- IGNORE PATTERNS
- TRANSLATION STYLE
- ADVANCED EXAMPLES
- STREAM SYNTAX EXTENSIONS
- INCOMPATIBLE LANGUAGE CHANGES
- LIMITATIONS AND GUARANTEES
- REQUIREMENTS
- CANONICALIZATION RULES
- CRASH RECOVERY
- ERROR RETURNS
- SEE ALSO
- AUTHOR
- NOTES
REPOSURGEON(1) | Development Tools | REPOSURGEON(1) |
NAME¶
reposurgeon - surgical operations on repositoriesSYNOPSIS¶
reposurgeon [command...]
DESCRIPTION¶
The purpose of reposurgeon is to enable risky operations that VCSes (version-control systems) don't want to let you do, such as (a) editing past comments and metadata, (b) excising commits, (c) coalescing and splitting commits, (d) removing files and subtrees from repo history, (e) merging or grafting two or more repos, and (f) cutting a repo in two by cutting a parent-child link, preserving the branch structure of both child repos.A major use of reposurgeon is to assist a human operator to perform higher-quality conversions among version control systems than can be achieved with fully automated converters.
The original motivation for reposurgeon was to clean up artifacts created by repository conversions. It was foreseen that the tool would also have applications when code needs to be removed from repositories for legal or policy reasons.
To keep reposurgeon simple and flexible, it normally does not do its own repository reading and writing. Instead, it relies on being able to parse and emit the command streams created by git-fast-export and read by git-fast-import. This means that it can be used on any version-control system that has both fast-export and fast-import utilities. The git-import stream format also implicitly defines a common language of primitive operations for reposurgeon to speak.
Fully supported systems (those for which reposurgeon can both read and write repositories) include git, hg, bzr, svn, darcs, bk, RCS, and SRC. For a complete list, with dependencies and technical notes, type prefer to the reposurgeon prompt.
Writing to the file-oriented systems RCS and SRC is done via rcs-fast-import(1) and has some serious limitations because those systems cannot represent all the metadata in a git-fast-export stream. Consult that tool's documentation for details and partial workarounds.
Writing Subversion repositories also has some significant limitations, discussed in the section on Working With Subversion.
Fossil repository files can be read in using the --format=fossil option of the read command and written out with the --format=fossil option of the write. Ignore patterns are not translated in either direction.
CVS is supported for read only, not write. For CVS, reposurgeon must be run from within a repository directory (one with a CVSROOT subdirectory). Each module becomes a subdirectory in the the reposurgeon representation of the change history.
In order to deal with version-control systems that do not have fast-export equivalents, reposurgeon can also host extractor code that reads repositories directly. For each version-control system supported through an extractor, reposurgeon uses a small amount of knowledge about the system's command-line tools to (in effect) replay repository history into an input stream internally. Repositories under systems supported through extractors can be read by reposurgeon, but not modified by it. In particular, reposurgeon can be used to move a repository history from any VCS supported by an extractor to any VCS supported by a normal importer/exporter pair.
Mercurial repository reading is implemented with an extractor class; writing is handled with the stock "hg fastimport" command. A test extractor exists for git, but is normally disabled in favor of the regular exporter.
For guidance on the pragmatics of repository conversion, see the DVCS Migration HOWTO[1].
SAFETY WARNINGS¶
reposurgeon is a sharp enough tool to cut you. It takes care not to ever write a repository in an actually inconsistent state, and will terminate with an error message rather than proceed when its internal data structures are confused. However, there are lots of things you can do with it - like altering stored commit timestamps so they no longer match the commit sequence - that are likely to cause havoc after you're done. Proceed with caution and check your work.Also note that, if your DVCS does the usual thing of making commit IDs a cryptographic hash of content and parent links, editing a publicly-accessible repository with this tool would be a bad idea. All of the surgical operations in reposurgeon will modify the hash chains.
Please also see the notes on system-specific issues under the section called “LIMITATIONS AND GUARANTEES”.
OPERATION¶
The program can be run in one of two modes, either as an interactive command interpreter or in batch mode to execute commands given as arguments on the reposurgeon invocation line. The only differences between these modes are (1) the interactive one begins by turning on the 'verbose 1' option, (2) in batch mode all errors (including normally recoverable errors in selection-set syntax) are fatal, and (3) each command-line argument beginning with “--” has that stripped off (which, in particular means that --help and --version will work as expected). Also, in interactive mode, Ctrl-P and Ctrl-N will be available to scroll through your command history and tab completion of both command keywords and name arguments (wherever that makes semantic sense) is available.A git-fast-import stream consists of a sequence of commands which must be executed in the specified sequence to build the repo; to avoid confusion with reposurgeon commands we will refer to the stream commands as events in this documentation. These events are implicitly numbered from 1 upwards. Most commands require specifying a selection of event sequence numbers so reposurgeon will know which events to modify or delete.
For all the details of event types and semantics, see the git-fast-import(1) manual page; the rest of this paragraph is a quick start for the impatient. Most events in a stream are commits describing revision states of the repository; these group together under a single change comment one or more fileops (file operations), which usually point to blobs that are revision states of individual files. A fileop may also be a delete operation indicating that a specified previously-existing file was deleted as part of the version commit; there are a couple of other special fileop types of lesser importance.
Commands to reposurgeon consist of a command keyword, sometimes preceded by a selection set, sometimes followed by whitespace-separated arguments. It is often possible to omit the selection-set argument and have it default to something reasonable.
Here are some motivating examples. The commands will be explained in more detail after the description of selection syntax.
:15 edit ;; edit the object associated with mark :15 edit ;; edit all editable objects 29..71 list ;; list summary index of events 29..71 236..$ list ;; List events from 236 to the last <#523> inspect ;; Look for commit #523; they are numbered ;; 1-origin from the beginning of the repository. <2317> inspect ;; Look for a tag with the name 2317, a tip commit ;; of a branch named 2317, or a commit with legacy ID ;; 2317. Inspect what is found. A plain number is ;; probably a legacy ID inherited from a Subversion ;; revision number. /regression/ list ;; list all commits and tags with comments or ;; committer headers or author headers containing ;; the string "regression" 1..:97 & =T delete ;; delete tags from event 1 to mark 97 [Makefile] inspect ;; Inspect all commits with a file op touching Makefile ;; and all blobs referred to in a fileop ;; touching Makefile. :46 tip ;; Display the branch tip that owns commit :46. @dsc(:55) list ;; Display all commits with ancestry tracing to :55 @min([.gitignore]) remove .gitignore delete ;; Remove the first .gitignore fileop in the repo.
SELECTION SYNTAX¶
A selection set is ordered; that is, any given element may occur only one, and the set is ordered by when its members were first added.The selection-set specification syntax is an expression-oriented minilanguage. The most basic term in this language is a location. The following sorts of primitive locations are supported:
event numbers
marks
tag and branch names
legacy IDs
commit numbers
reset@ names
$
These may be grouped into sets in the following ways:
ranges
lists
There are some other ways to construct event sets:
visibility sets
B | blobs | Most default selection sets exclude blobs; they have to be manipulated through the commits they are attached to. |
C | commits | |
D | all-delete commits | These are artifacts produced by some older repository-conversion tools. |
H | head (branch tip) commits | |
O | orphaned (parentless) commits | |
U | commits with callouts as parents | |
Z | commits with no fileops | |
M | merge (multi-parent) commits | |
F | fork (multi-child) commits | |
L | commits with unclean multi-line comments (without a separating empty line after the first) | |
I | commits for which metadata cannot be decoded to UTF-8 | |
T | tags | |
R | resets | |
P | Passthrough | All event types simply passed through, including comments, progress commands, and checkpoint commands. |
N | Legacy IDs | Any string matching a cookie (legacy-ID) format. |
references
type | interpretation |
tag name | annotated tag with that name |
branch name | the branch tip commit |
legacy ID | commit with that legacy ID |
assigned name | name equated to a selection by assign |
Note that if an annotated tag and a branch have the same name foo, <foo> will resolve to the tag rather than the branch tip commit.
dates and action stamps
type | interpretation |
RFC3339 timestamp | commit or tag with that time/date |
action stamp (timestamp!email) | commits or tags with that timestamp and author (or committer if no author). |
yyyy-mm-dd part of RFC3339 timestamp | all commits and tags with that date |
To refine the match to a single commit, use a 1-origin index suffix separated by '#'. Thus "<2000-02-06T09:35:10Z>" can match multiple commits, but "<2000-02-06T09:35:10Z#2>" matches only the second in the set.
text search
A text search normally matches against the comment fields of commits and annotated tags, or against their author/committer names, or against the names of tags; also the text of passthrough objects.
The scope of a text search can be changed with qualifier letters after the trailing slash. These are as follows:
letter | interpretation |
a | author name in commit |
b | branch name in commit; also matches blobs referenced by commits on matching branches, and tags which point to commmits on patching branches. |
c | comment text of commit or tag |
r | committish reference in tag or reset |
p | text in passthrough |
t | tagger in tag |
n | name of tag |
B | blob content |
Multiple qualifier letters can add more search scopes.
(The “b” qualifier replaces the branchset syntax in earlier versions of reposurgeon.)
paths
By default, a path is related to a commit if the latter has a fileop that touches that file path - modifies that change it, deletes that remove it, renames and copies that have it as a source or target. When the 'c' flag is in use the meaning changes: the paths related to a commit become all paths that would be present in a checkout for that commit.
A path literal matches a commit if and only if the path literal is exactly one of the paths related to the commit (no prefix or suffix operation is done). In particular a path literal won't match if it corresponds to a directory in the chosen repository.
A regular expression matches a commit if it matches any path related to the commit anywhere in the path. You can use '^' or '$' if you want the expression to only match at the beginning or end of paths. When the 'a' flag is in use, the path expression selects commits whose every path matches the regular expression. This is not always a subset of commits selected without the 'a' flag because it also selects commits with no related paths (e.g. empty commits, deletealls and commits with empty trees). If you want to avoid those, you can use e.g. '[/regex/] & [/regex/a]'.
The flags 'D', "M', 'R', 'C', 'N' restrict match checking to the corresponding fileop types. Note that this means an 'a' match is easier (not harder) to achieve. These are no-ops when used with 'c'.
A path or literal matches a blob if it matches any path that appeared in a modification fileop that referred to that blob. To select purely matching blobs or matching commits, compose a path expression with =B or =C.
If you need to embed '[^/]' into your regular expression (e.g. to express "all characters but a slash") you can use a Python string escape such as \x2f.
function calls
name | interpretation |
min | minimum member of a selection set |
max | maximum member of a selection set |
amp | nonempty selection set becomes all objects, empty set is returned empty |
par | all parents of commits in the argument set |
chn | all children of commits in the argument set |
dsc | all commits descended from the argument set (argument set included) |
anc | all commits whom the argument set is descended from (argument set included) |
pre | events before the argument set; empty if the argument set includes the first event. |
suc | events after the argument set; empty if the argument set includes the last event. |
srt | sort the argument set by event number. |
Set expressions may be combined with the operators | and &; these are, respectively, set union and intersection. The | has lower precedence than intersection, but you may use parentheses '(' and ')' to group expressions in case there is ambiguity (this replaces the curly brackets used in older versions of the syntax).
Any set operation may be followed by '?' to add the set members' neighbors and referents. This extends the set to include the parents and children of all commits in the set, and the referents of any tags and resets in the set. Each blob reference in the set is replaced by all commit events that refer to it. The '?' can be repeated to extend the neighborhood depth. The result of a '?' extension is sorted so the result is in ascending order.
Do set negation with prefix ~; it has higher precedence than & and | but lower than ?
IMPORT AND EXPORT¶
reposurgeon can hold multiple repository states in core. Each has a name. At any given time, one may be selected for editing. Commands in this group import repositories, export them, and manipulate the in-core list and the selection.read [--format=fossil] [directory|-|<infile]
If the contents is a fast-import stream, any "cvs-revision" property on a commit is taken to be a newline-separated list of CVS revision cookies pointing to the commit, and used for reference lifting.
If the contents is a fast-import stream, any "legacy-id" property on a commit is taken to be a legacy ID token pointing to the commit, and used for reference-lifting.
If the read location is a git repository and contains a .git/cvsauthors file (such as is left in place by git cvsimport -A) that file will be read in as if it had been given to the authors read command.
If the read location is a directory, and its repository subdirectory has a file named legacy-map, that file will be read as though passed to a legacy read command.
If the read location is a file and the --format=fossil is used, the file is interpreted as a Fossil repository.
The --preserve is interpreted in a way dependent of the type of the incoming repository or stream. Presently it only affects the processing of Subversion repositories; see the section called “WORKING WITH SUBVERSION” for details.
The just-read-in repo is added to the list of loaded repositories and becomes the current one, selected for surgery. If it was read from a plain file and the file name ends with one of the extensions .fi or .svn, that extension is removed from the load list name.
Note: this command does not take a selection set.
write [--legacy] [--format=fossil] [--noincremental] [--callout] [>outfile|-]
Alternatively, if there is no redirect and the argument names a directory, the repository is rebuilt into that directory, with any selection set being ignored; if that target directory is nonempty its contents are backed up to a save directory.
If the write location is a file and the --format=fossil is used, the file is written in Fossil repository format.
With the --legacy option, the Legacy-ID of each commit is appended to its commit comment at write time. This option is mainly useful for debugging conversion edge cases.
If you specify a partial selection set such that some commits are included but their parents are not, the output will include incremental dump cookies for each branch with an origin outside the selection set, just before the first reference to that branch in a commit. An incremental dump cookie looks like "refs/heads/foo^0" and is a clue to export-stream loaders that the branch should be glued to the tip of a pre-existing branch of the same name. The --noincremental option suppresses this behavior.
When you specify a partial selection set, including a commit object forces the inclusion of every blob to which it refers and every tag that refers to it.
Specifying a partial selection may cause a situation in which some parent marks in merges don't correspond to commits present in the dump. When this happens and --callout option was specified, the write code replaces the merge mark with a callout, the action stamp of the parent commit; otherwise the parent mark is omitted. Importers will fail when reading a stream dump with callouts; it is intended to be used by the graft command.
Specifying a write selection set with gaps in it is allowed but unlikely to lead to good results if it is loaded by an importer.
Property extensions will be be omitted from the output if the importer for the preferred repository type cannot digest them.
Note: to examine small groups of commits without the progress meter, use inspect.
choose [reponame]
With no argument, lists the names of the currently stored repositories and their load times. The second column is '*' for the currently selected repository, '-' for others.
drop [reponame]
rename reponame
REBUILDS IN PLACE¶
reposurgeon can rebuild an altered repository in place. Untracked files are normally saved and restored when the contents of the new repository is checked out (but see the documentation of the “preserve” command for a caveat).rebuild [directory]
The single argument, if present, specifies the target directory in which to do the rebuild; if the repository read was from a repo directory (and not a git-import stream), it defaults to that directory. If the target directory is nonempty its contents are backed up to a save directory. Files and directories on the repository's preserve list are copied back from the backup directory after repo rebuild. The default preserve list depends on the repository type, and can be displayed with the stats command.
If reposurgeon has a nonempty legacy map, it will be written to a file named legacy-map in the repository subdirectory as though by a legacy write command. (This will normally be the case for Subversion and CVS conversions.)
preserve [file...]
It is only necessary to use this feature if your version-control system lacks a command to list files under version control. Under systems with such a command (which include git and hg), all files that are neither beneath the repository dot directory nor under reposurgeon temporary directories are preserved automatically.
unpreserve [file...]
TIMEQUAKES AND TIMEBUMPS¶
Modifying a repository so every commit in it has a unique timestamp is often a useful thing to do, in order for every commit has a unique action stamp that can be referred to in surgical commands.timequake
Because commits are checked in ascending order, this logic will normally do the right thing on chains of three or more commits with identical timestamps.
Any timestamp collisions left after this operation are probably cross-branch and have to be individually dealt with using 'timebump' commands.
timebump [seconds]
Those of you twitchy about "rewriting history" should bear in mind that the commit stamps in many older repositories were never very reliable to begin with.
CVS in particular is notorious for shipping client-side timestamps with timezone and DST issues (as opposed to UTC) that don't necessary compare well with stamps from different clients of the same CVS server. Thus, inducing a timequake in a CVS repo seldom produces effects anywhere near as large than the measurement noise of the repository's own timestamps.
Subversion was somewhat better about this, as commits were stamped at the server, but older Subversion repositories often have sections that predate the era of ubiquitous NTP time.
INFORMATION AND REPORTS¶
Commands in this group report information about the selected repository.The output of these commands can individually be redirected to a named output file. Where indicated in the syntax, you can prefix the output filename with “>” and give it as a following argument. If you use “>>” the file is opened for append rather than write.
list [>outfile]
stamp [>outfile]
tip [>outfile]
If a commit is at a branch tip, its tip is its branch name. If it has only one child, its tip is the child's tip. If it has multiple children, then if there is a child with a matching branch name its tip is the child's tip. Otherwise this function throws a recoverable error.
tags [>outfile]
stats [repo-name...] [>outfile]
count [>outfile]
inspect [>outfile]
graph [>outfile]
You may find a script like this useful:
graph $1 >/tmp/foo$$ shell dot </tmp/foo$$ -Tpng | display -; rm /tmp/foo$$
You can substitute in your own preferred image viewer, of course.
sizes [>outfile]
The numbers are not an exact measure of storage size: they are intended mainly as a way to get information on how to efficiently partition a repository that has become large enough to be unwieldy.
Supports > redirection.
lint [>outfile]
Options to issue only partial reports are supported; "lint --options" or "lint -?" lists them.
The options and output format of this command are unstable; they may change without notice as more sanity checks are added.
when >timespec
SURGICAL OPERATIONS¶
These are the operations the rest of reposurgeon is designed to support.squash [policy...]
Normally, when a commit is squashed, its file operation list (and any associated blob references) gets either prepended to the beginning of the operation list of each of the commit's children or appended to the operation list of each of the commit's parents. Then children of a deleted commit get it removed from their parent set and its parents added to their parent set.
The analogous operation is performed on commit comments, so no comment text is ever outright discarded. Exception: comments consisting of "*** empty log messages ***", as generated by CVS, are ignored.
The default is to squash forward, modifying children; but see the list of policy modifiers below for how to change this.
Warning
It is easy to get the bounds of a squash command wrong, with confusing and destructive results. Beware thinking you can squash on a selection set to merge all commits except the last one into the last one; what you will actually do is to merge all of them to the first commit after the selected set.
Following all operation moves, every one of the altered file operation lists is reduced to a shortest normalized form. The normalized form detects various combinations of modification, deletion, and renaming and simplifies the operation sequence as much as it can without losing any information.
After canonicalization, a file op list may still end up containing multiple M operations on the same file. Normally the tool utters a warning when this occurs but does not try to resolve it.
The following modifiers change these policies:
--delete
--coalesce
--pushback
--pushforward
--tagforward
--tagback
--quiet
--complain
--empty-only
Under any of these policies except “--delete”, deleting a commit that has children does not back out the changes made by that commit, as they will still be present in the blobs attached to versions past the end of the deletion set. All a delete does when the commit has children is lose the metadata information about when and by who those changes were actually made; after the delete any such changes will be attributed to the first undeleted children of the deleted commits. It is expected that this command will be useful mainly for removing commits mechanically generated by repository converters such as cvs2svn.
delete [policy...]
divide parent [child]
If the repo was named 'foo', you will normally end up with two repos named 'foo-early' and 'foo-late' (option and feature events at the beginning of the early segment will be duplicated onto the beginning of the late one.). But if the commit graph would remain connected through another path after the cut, the behavior changes. In this case, if the parent and child were on the same branch 'qux', the branch segments are renamed 'qux-early' and 'qux-late' but the repo is not divided.
expunge [--notagify] [path | /regexp/]...
All filemodify (M) operations and delete (D) operations involving a matched file in the selected set of events are disconnected from the repo and put in a removal set. Renames are followed as the tool walks forward in the selection set; each triggers a warning message. If a selected file is a copy (C) target, the copy will be deleted and a warning message issued. If a selected file is a copy source, the copy target will be added to the list of paths to be deleted and a warning issued.
After file expunges have been performed, any commits with no remaining file operations will be removed, and any tags pointing to them. By default each deleted commit is replaced with a tag of the form 'emptycommit-ident' on the preceding commit unless --notagify is specified as an argument. Commits with deleted fileops pointing both in and outside the path set are not deleted, but are cloned into the removal set.
The removal set is not discarded. It is assembled into a new repository named after the old one with the suffix "-expunges" added. Thus, this command can be used to carve a repository into sections by file path matches.
tagify [--canonicalize] [--tipdeletes] [--tagify-merges]
The name of the generated tag will be 'emptycommit-ident', where ident is generated from the legacy ID of the deleted commit, or from its mark, or from its index in the repository, with a disambiguation suffix if needed.
With the --canonicalize, tagify tries harder to detect trivial commits by first ensuring that all fileops of selected commits will have an actual effect when processed by fast-import.
With the --tipdeletes, tagify also considers branch tips with only deleteall fileops to be candidates for tagification. The corresponding tags get names of the form 'tipdelete-branchname' rather than the default 'emptycommit-ident'.
With the --tagify-merges, tagify also tagifies merge commits that have no fileops. When this is done the merge link is move to the yagified commit's parent.
coalesce [--debug|--changelog] [timefuzz]
The optional second argument, if present, is a maximum time separation in seconds; the default is 90 seconds.
The default selection set for this command is =C, all commits. Occasionally you may want to restrict it, for example to avoid coalescing unrelated cliques of "*** empty log message ***" commits from CVS lifts.
With the --debug option, show messages about mismatches.
With the --changelog option, any commit with a comment containing the string 'empty log message' (such as is generated by CVS) and containing exactly one file operation modifying a path ending in ChangeLog is treated specially. Such ChangeLog commits are considered to match any commit before them by content, and will coalesce with it if the committer matches and the commit separation is small enough. This option handles a convention used by Free Software Foundation projects.
split {at|by} item
The commit is copied and inserted into a new position in the event sequence, immediately following itself; the duplicate becomes the child of the original, and replaces it as parent of the original's children. Commit metadata is duplicated; the new commit then gets a new mark. If the new commit has a legacy ID, the suffix '.split' is appended to it.
Finally, some file operations - starting at the one matched or indexed by the split argument - are moved forward from the original commit into the new one. Legal indices are 2-n, where n is the number of file operations in the original commit.
add {D path | M perm mark path | R source target | C source target}
For a D operation to be valid there must be an M operation for the path in the commit's ancestry. For an M operation to be valid, the 'perm' part must be a token ending with 755 or 644 and the 'mark' must refer to a blob that precedes the commit location. For an R or C operation to be valid, there must be an M operation for the source in the commit's ancestry.
remove [index | path | deletes] [to commit]
If the “to” clause is present, the removed op is appended to the commit specified by the following singleton selection set. This option cannot be combined with “deletes”.
Note that this command does not attempt to scavenge blobs even if the deleted fileop might be the only reference to them. This behavior may change in a future release.
blob
renumber
A side effect of this comment is to clean up stray "done" passthroughs that may have entered the repository via graft operations. After a renumber, the repository will have at most one "done" and it will be at the end of the events.
dedup
mailbox_out [>outfile]
The output from this command can optionally be redirected to a named output file. Prefix the filename with “>” and give it as a following argument.
May have an option --filter, followed by = and a /-enclosed regular expression. If this is given, only headers with names matching it are emitted. In this context the name of the header includes its trailing colon.
mailbox_in [--create] [--empty-only] [<infile] [--changed >outfile]
Users should be aware that modifying an Event-Number or Event-Mark field will change which event the update from that message is applied to. This is unlikely to have good results.
The header CheckText, if present, is examined to see if the comment text of the associated event begins with it. If not, the mailbox modification is aborted. This helps ensure that you are landing updates ob the events you intend.
If the “--create” modifier is present, new tags and commits will be appended to the repository. In this case it is an error for a tag name to match any exting tag name. Commit objects are created with no fileops. If Committer-Date or Tagger-Date fields are not present they are filled in with the time at which this command is executed. If Committer or Tagger fields are not present, reposurgeon will attempt to deduce the user's git-style identity and fill it in. If a singleton commit set was specified for commit creations, the new commits are made children of that commit.
Otherwise, if the Event-Number and Event-Mark fields are absent, the mailbox_in logic will attempt to match the commit or tag first by Legacy-ID, then by a unique committer ID and timestamp pair.
If output is redirected and the modifier “--changed” appears, a minimal set of modifications actually made is written to the output file in a form that can be fed back in. Supports > redirection.
If the option --empty-only is given, this command will throw a recoverable error if it tries to alter a message body that is neither empty nor consists of the CVS empty-comment marker.
setfield attribute value
Attempts to set nonexistent attributes are ignored. Valid values for the attribute are internal Python field names; in particular, for commits, “comment” and “branch” are legal. Consult the source code for other interesting values.
The special fieldnames 'author', 'commitdate' and 'authdate' apply only to commits in the range. The latter two sets attribution dates. The former sets the author's name and email address (assuming the value can be parsed for both), copying the committer timestamp. The author's timezone may be deduced from the email address.
setperm 100644|100755|120000 path...
append [--rstrip] [>text]
If the option --rstrip is given, the comment is right-stripped before the new text is appended.
filter [--shell|--regex|--replace|--dedos]
In any mode other than --dedos, attempting to specify a selection set including both blobs and non-blobs (that is, commits or tags) throws an error. Inline content in commits is filtered when the selection set contains (only) blobs and the commit is within the range bounded by the earliest and latest blob in the specification.
When filtering blobs, if the command line contains the magic cookie '%PATHS%' it is replaced with a space-separated list of all paths that reference the blob.
With --shell, the remainder of the line specifies a filter as a shell command. Each blob or comment is presented to the filter on standard input; the content is replaced with whatever the filter emits to standard output.
With --regex, the remainder of the line is expected to be a Python regular expression substitution written as /from/to/ with from and to being passed as arguments to the standard re.sub() function and it applied to modify the content. Actually, any non-space character will work as a delimiter in place of the /; this makes it easier to use / in patterns. Ordinarily only the first such substitution is performed; putting 'g' after the slash replaces globally, and a numeric literal gives the maximum number of substitutions to perform. Other flags available restrict substitution scope - 'c' for comment text only, 'C' for committer name only, 'a' for author names only. Note that parsing of a --regex argument will be confused by any substring consisting of whitespace followed by #; use "\s" rather than whitespace to avoid this.
With --replace, the behavior is like --regexp but the expressions are not interpreted as regular expressions. (This is slightly faster).
With --dedos, DOS/Windows-style \r\n line terminators are replaced with \n.
transcode codec
Attempting to specify a selection set including both blobs and non-blobs (that is, commits or tags) throws an error. Inline content in commits is filtered when the selection set contains (only) blobs and the commit is within the range bounded by the earliest and latest blob in the specification.
The encoding argument must name one of the codecs known to the Python standard codecs library. In particular, 'latin-1' is a valid codec name.
Errors in this command are fatal, because an error may leave repository objects in a damaged state.
The theory behind the design of this command is that the repository might contain a mixture of encodings used to enter commit metadata by different people at different times. After using =I to identify metadata containing non-Unicode high bytes in text, a human must use context to identify which particular encodings were used in particular event spans and compose appropriate transcode commands to fix them up.
edit
Normally this command ignores blobs because mailbox_out does. However, if you specify a selection set consisting of a single blob, your editor will be called directly on the blob file.
Supports < and > redirection.
timeoffset offset [timezone]
Optionally you may also specify another argument in the form [+-]hhmm, a timezone literal to apply. To apply a timezone without an offset, use an offset literal of +0 or -0.
unite [--prune] reponame...
The root of each repo (other than the oldest repo) will be grafted as a child to the last commit in the dump with a preceding commit date. This will produce a union repository with one branch for each part. Running last to first, duplicate tag and branch names will be disambiguated using the source repository name (thus, recent duplicates will get priority over older ones). After all grafts, marks will be renumbered.
The name of the new repo will be the names of all parts concatenated, separated by '+'. It will have no source directory or preferred system type.
With the option --prune, at each join D operations for every ancestral file existing will be prepended to the root commit, then it will be canonicalized using the rules for squashing the effect will be that only files with properly matching M, R, and C operations in the root survive.
graft [--prune] reponame
If the selection set is of size 1, it must identify a single commit in the currently chosen repo; in this case the name repo's root will become a child of the specified commit. If the selection set is empty, the named repo must contain one or more callouts matching a commits in the currently chosen repo.
Labels and branches in the named repo are prefixed with its name; then it is grafted to the selected one. Any other callouts in the named repo are also resolved in the context of the currently chosen one. Finally, the named repo is removed from the load list.
With the option --prune, prepend a deleteall operation into the root of the grafted repository.
path [source] rename [--force] [target]
Ordinarily, if the target path already exists in the fileops, or is visible in the ancestry of the commit, this command throws an error. With the --force option, these checks are skipped.
paths [{sub|sup}] [dirname] [>outfile]
With the 'sub' modifier, take a second argument that is a directory name and prepend it to every path. With the 'sup' modifier, strip any directory argument from the start of the path if it appears there; with no argument, strip the first directory component from every path.
merge
unmerge
It is equivalent to reparent --rebase first_parent,commit, where commit is the same selection set as used with unmerge and first_parent is a set resolving commit's first parent (see the reparent command below
The main interest of the unmerge is that you don't have to find and specify the first parent yourself, saving time and avoiding errors when nearby surgery would make a manual first parent argument stale.
reparent [options...] [policy]
Selection set:
# this makes 17 the parent of 33 17,33 reparent # this also makes 17 the parent of 33 33,17 reparent # this makes 33 a root (parentless) commit 33 reparent # this makes 33 an octopus merge commit. its first parent # is commit 15, second parent is 17, and third parent is 22 22,33,15,17 reparent
Options:
--use-order
# this makes 33 the parent of 17 33|17 reparent --use-order # this makes 17 an octopus merge commit. its first parent # is commit 22, second parent is 33, and third parent is 15 22,33,15|17 reparent --use-order
Because ancestor commit events must appear before their descendants, giving a commit with a low event number a parent with a high event number triggers a re-sort of the events. A re-sort assigns different event numbers to some or all of the events. Re-sorting only works if the reparenting does not introduce any cycles. To swap the order of two commits that have an ancestor–descendant relationship without introducing a cycle during the process, you must reparent the descendant commit first.
Policy:
--rebase
reorder [--quiet]
Older revision control systems tracked change history on a per-file basis, rather than as a series of atomic changesets, which often made it difficult to determine the relationships between changes. Some tools which convert a history from one revision control system to another attempt to infer changesets by comparing file commit comment and time-stamp against those of other nearby commits, but such inference is a heuristic and can easily fail. In the best case, when inference fails, a range of commits in the resulting conversion which should have been coalesced into a single changeset instead end up as a contiguous range of separate commits. This situation typically can be repaired easily enough with the coalesce or squash commands. However, in the worst case, numerous commits from several different topics, each of which should have been one or more distinct changesets, may end up interleaved in an apparently chaotic fashion. To deal with such cases, the commits need to be re-ordered, so that those pertaining to each particular topic are clumped together, and then possibly squashed into one or more changesets pertaining to each topic. This command, reorder, can help with the first task; the squash command with the second.
Selected commits are re-arranged in the order specified; for instance: ":7,:5,:9,:3 reorder". The specified commit range must be contiguous; each commit must be accounted for after re-ordering. Thus, for example, ':5' can not be omitted from ":7,:5,:9,:3 reorder". (To drop a commit, use the delete or squash command.) The selected commits must represent a linear history, however, the lowest numbered commit being re-ordered may have multiple parents, and the highest numbered may have multiple children.
Re-ordered commits and their immediate descendants are inspected for rudimentary fileops inconsistencies. Warns if re-ordering results in a commit trying to delete, rename, or copy a file before it was ever created. Likewise, warns if all of a commit's fileops become no-ops after re-ordering. Other fileops inconsistencies may arise from re-ordering, both within the range of affected commits and beyond; for instance, moving a commit which renames a file ahead of a commit which references the original name. Such anomalies can be discovered via manual inspection and repaired with the add and remove (and possibly path) commands. Warnings can be suppressed with --quiet.
In addition to adjusting their parent/child relationships, re-ordering commits also re-orders the underlying events since ancestors must appear before descendants, and blobs must appear before commits which reference them. This means that events within the specified range will have different event numbers after the operation.
branch branchname {rename|delete} [arg]
For a 'rename', the third argument may be any token that is a syntactically valid branch name (but not the name of an existing branch).
For a 'delete', no third argument is required. The name portion of a delete may be a regexp wrapped in //; if so, all objects of the specified type with names matching the regexp are deleted. This is useful for mass deletion of junk tags such as CVS branch-root tags.
For either name, if it does not contain a '/' the prefix 'refs/heads' is prepended.
tag tagname {move|rename|delete} [arg]
Creation is a special case. First argument is a name, which must not be an existing tag. Takes a singleton event second argument which must point to a commit. A tag object pointing to the commit is created and inserted just after the last tag in the repo (or just after the last commit if there are no tags). The tagger, committish, and comment fields are copied from the commit's committer, mark, and comment fields.
Otherwise, first argument must be an existing tag name; second argument must be one of the verbs “move”, “rename”, or “delete”.
For a “move”, a third argument must be a singleton selection set. For a “rename”, the third argument may be any token that is a syntactically valid tag name (but not the name of an existing tag). For a “delete”, no third argument is required.
The tagname may use backslash escapes interpreted by the Python string-escape codec, such as \s.
The behavior of this command is complex because features which present as tags may be any of three things: (1) True tag objects, (2) lightweight tags, actually sequences of commits with a common branchname beginning with “refs/tags” - in this case the tag is considered to point to the last commit in the sequence, (3) Reset objects. These may occur in combination; in fact, stream exporters from systems with annotation tags commonly express each of these as a true tag object (1) pointing at the tip commit of a sequence (2) in which the basename of the common branch field is identical to the tag name. An exporter that generates lightweight-tagged commit sequences (2) may or may not generate resets pointing at their tip commits.
This command tries to handle all combinations in a natural way by doing up to three operations on any true tag, commit sequence, and reset matching the source name. In a rename, all are renamed together. In a delete, any matching tag or reset is deleted; then matching branch fields are changed to match the branch of the unique descendent of the tagged commit, if there is one. When a tag is moved, no branch fields are changed and a warning is issued.
Attempts to delete a lightweight tag may fail with the message “couldn't determine a unique successor”. When this happens, the tag is on a commit with multiple children that have different branch labels. There is a hole in the specification of git fast-import streams that leaves it uncertain how branch labels can be safely reassigned in this case; rather than do something risky, reposurgeon throws a recoverable error.
reset resetname {create|move|rename|delete} [arg]
In the other modes, the first argument must match an existing reset name; second argument must be one of the verbs “move”, “rename”, or “delete”.
The reset name may use backslash escapes interpreted by the Python string-escape codec, such as \s.
For a “move”, a third argument must be a singleton selection set. For a “rename”, the third argument may be any token token that matches a syntactically valid reset name (but not the name of an existing reset). For a “delete”, no third argument is required.
For either name, if it does not contain a “/” the prefix “heads/” is prepended. If it does not begin with “refs/”, “refs/” is prepended.
An argument matches a reset's name if it is either the entire reference (refs/heads/FOO or refs/tags/FOO for some some value of FOO) or the basename (e.g. FOO), or a suffix of the form heads/FOO or tags/FOO. An unqualified basename is assumed to refer to a head.
When a reset is renamed, commit branch fields matching the tag are renamed with it to match. When a reset is deleted, matching branch fields are changed to match the branch of the unique descendent of the tip commit of the associated branch, if there is one. When a reset is moved, no branch fields are changed.
debranch source-branch... [target-branch].
The history of the source branch is merged into the history of the target branch, becoming the history of a subdirectory with the name of the source branch. Any resets of the source branch are removed.
strip [blobs|reduce].
With the modifier 'blobs', replace each blob in the repository with a small, self-identifying stub, leaving all metadata and DAG topology intact. This is useful when you are reporting a bug, for reducing large repositories to test cases of manageable size.
A selection set is effective only with the 'blobs' option, defaulting to all blobs. The 'reduce' mode always acts on the entire repository.
With the modifier 'reduce', perform a topological reduction that throws out uninteresting commits. If a commit has all file modifications (no deletions or copies or renames) and has exactly one ancestor and one descendant, then it may be boring. To be fully boring, it must also not be referred to by any tag or reset. Interesting commits are not boring, or have a non-boring parent or non-boring child.
With no modifiers, this command strips blobs.
ignores [rename]. [translate]. [defaults].
If the rename modifier is present, this command attempts to rename all ignore-pattern files to whatever is appropriate for the preferred type - e.g. .gitignore for git, .hgignore for hg, etc. This option does not cause any translation of the ignore files it renames.
If the translate modifier is present, syntax translation of each ignore file is attempted. At present, the only transformation the code knows is to prepend a 'syntax: glob' header if the preferred type is hg.
If the defaults modifier is present, the command attempts to prepend these default patterns to all ignore files. If no ignore file is created by the first commit, it will be modified to create one containing the defaults. This command will error out on prefer types that have no default ignore patterns (git and hg, in particular). It will also error out when it knows the import tool has already set default patterns.
attribution [selection] {show | set | delete | prepend | append} [args]
Attributions upon which to operate are selected in much the same way as events are selected, as described in SELECTION SYNTAX. selection is an expression composed of 1-origin attribution-sequence numbers, '$' for last attribution, '..' ranges, comma-separated items, '(...)' grouping, set operations '|' union, '&' intersection, and '~' negation, and function calls @min(), @max(), @amp(), @pre(), @suc(), @srt(). Attributions can also be selected by visibility set '=C' for committers, '=A' for authors, and '=T' for taggers. Finally, /regex/ will attempt to match the Python regular expression regex against an attribution name and email address; '/n' limits the match to only the name, and '/e' to only the email address.
With the exception of show, all actions require an explicit event selection upon which to operate. Available actions are:
[selection] [show] [>file]
selection set name [email]
[date]
selection set [name] email [date]
selection set [name] [email] date
[selection] delete
[selection] prepend name [email]
[date]
[selection] prepend [name] email [date]
If name is omitted, an attempt is made to infer it from email by trying to match email against an existing attribution of the event, with preference given to the attribution before which the new attribution is being inserted. Similarly, email is inferred from an existing matching name. Likewise, for date.
As a convenience, if selection is empty or not specified a new author is prepended to the author list.
It is presently an error to insert a new committer or tagger attribution. To change a committer or tagger, use set instead.
[selection] append name [email]
[date]
[selection] append [name] email [date]
If name is omitted, an attempt is made to infer it from email by trying to match email against an existing attribution of the event, with preference given to the attribution after which the new attribution is being inserted. Similarly, email is inferred from an existing matching name. Likewise, for date.
As a convenience, if selection is empty or not specified a new author is appended to the author list.
It is presently an error to insert a new committer or tagger attribution. To change a committer or tagger, use set instead.
REFERENCE LIFTING¶
This group of commands is meant for fixing up references in commits that are in the format of older version control systems. The general workflow is this: first, go over the comment history and change all old-fashioned commit references into machine-parseable cookies. Then, automatically turn the machine-parseable cookie into action stamps. The point of dividing the process this way is that the first part is hard for a machine to get right, while the second part is prone to errors when a human does it.A Subversion cookie is a comment substring of the form [[SVN:ddddd]] (example: [[SVN:2355]] with the revision read directly via the Subversion exporter, deduced from git-svn metadata, or matching a $Revision$ header embedded in blob data for the filename.
A CVS cookie is a comment substring of the form [[CVS:filename:revision]] (example: [[CVS:src/README:1.23]] with the revision matching a CVS $Id$ or $Revision$ header embedded in blob data for the filename.
A mark cookie is of the form [[:dddd]] and is simply a reference to the specified mark. You may want to hand-patch this in when one of previous forms is inconvenient.
An action stamp is an RFC3339 timestamp, followed by a '!', followed by an author email address (author rather than committer because that timestamp is not changed when a patch is replayed on to a branch). It attempts to refer to a commit without being VCS-specific. Thus, instead of "commit 304a53c2" or "r2355", "2011-10-25T15:11:09Z!fred@foonly.com".
The following git aliases allow git to work directly with action stamps. Append it to your ~/.gitconfig; if you already have an [alias] section, leave off the first line.
[alias] # git stamp <commit-ish> - print a reposurgeon-style action stamp stamp = show -s --format='%cI!%ce' # git scommit <stamp> <rev-list-args> - list most recent commit that matches <stamp>. # Must also specify a branch to search or --all, after these arguments. scommit = "!f(){ d=${1%%!*}; a=${1##*!}; arg=\"--until=$d -1\"; if [ $a != $1 ]; then arg=\"$arg --committer=$a\"; fi; shift; git rev-list $arg ${1:+\"$@\"}; }; f" # git scommits <stamp> <rev-list-args> - as above, but list all matching commits. scommits = "!f(){ d=${1%%!*}; a=${1##*!}; arg=\"--until=$d --after $d\"; if [ $a != $1 ]; then arg=\"$arg --committer=$a\"; fi; shift; git rev-list $arg ${1:+\"$@\"}; }; f" # git smaster <stamp> - list most recent commit on master that matches <stamp>. smaster = "!f(){ git scommit \"$1\" master --first-parent; }; f" smasters = "!f(){ git scommits \"$1\" master --first-parent; }; f" # git shs <stamp> - show the commits on master that match <stamp>. shs = "!f(){ stamp=$(git smasters $1); shift; git show ${stamp:?not found} $*; }; f" # git slog <stamp> <log-args> - start git log at <stamp> on master slog = "!f(){ stamp=$(git smaster $1); shift; git log ${stamp:?not found} $*; }; f" # git sco <stamp> - check out most recent commit on master that matches <stamp>. sco = "!f(){ stamp=$(git smaster $1); shift; git checkout ${stamp:?not found} $*; }; f"
There is a rare case in which an action stamp will not refer uniquely to one commit. It is theoretically possible that the same author might check in revisions on different branches within the one-second resolution of the timestamps in a fast-import stream. There is nothing to be done about this; tools using action stamps need to be aware of the possibility and throw a warning when it occurs.
In order to support reference lifting, reposurgeon internally builds a legacy-reference map that associates revision identifiers in older version-control systems with commits. The contents of this map comes from three places: (1) cvs2svn:rev properties if the repository was read from a Subversion dump stream, (2) $Id$ and $Revision$ headers in repository files, and (3) the .git/cvs-revisions created by git cvsimport.
The detailed sequence for lifting possible references is this: first, find possible CVS and Subversion references with the references or =N visibility set; then replace them with equivalent cookies; then run references lift to turn the cookies into action stamps (using the information in the legacy-reference map) without having to do the lookup by hand.
references [list|edit|lift] [>outfile]
With the modifier 'edit', edit the set where revision IDs are found. This version of the command supports < and > redirection. This is equivalent to '=N edit'.
With the modifier "lift", attempt to resolve Subversion and CVS cookies in comments into action stamps using the legacy map. An action stamp is a timestamp/email/sequence-number combination uniquely identifying the commit associated with that blob, as described in the section called “TRANSLATION STYLE”.
It is not guaranteed that every such reference will be resolved, or even that any at all will be. Normally all references in history from a Subversion repository will resolve, but CVS references are less likely to be resolvable.
CHANGELOGS¶
CVS and Subversion do not have separated notions of committer and author for changesets; when lifted to a VCS that does, like git, their one author field is used for both.However, if the project used the FSF ChangeLog convention, many changesets will include a ChangeLog modification listing an author for the commit. In the common case that the changeset was derived from a patch and committed by a project maintainer, but the ChangeLog entry names the actual author, this information can be recovered.
Use the "changelogs" command/ This takes neither arguments nor a selection set. It mines the ChangeLog files for authorship data.
It assumes such files have the basename 'ChangeLog', and that they are in the format used by FSF projects: entry header lines begin with YYYY-MM-DD and are followed by a fullname/address. When a ChangeLog file modification is found in a clique, the entry header at or before the section changed since its last revision is parsed and the address is inserted as the commit author.
If the entry header contains an email address but no name, a name will be filled in if possible by looking for the address in author map entries.
In accordance with FSF policy for ChangeLogs, any date in an attribution header is discarded and the committer date is used. However, if the nam is an author-map alias with an associated timezone, that zone is used.
The command reports statistics on how many commits were altered.
RELEASE TARBALLS¶
When converting a legacy repository, it sometimes happens that there are archived releases of the project surviving from before the date of the repository's initial commit. It may be desirable to insert those releases at the front of the repository history.To do this, use the "incorporate" command. This command takes as its single argument naming a tarball, the content of which is to be inserted as a commit. It may be a gzipped or bzipped tarball. The initial segment of each path is assumed to be a version directory and stripped off. The number of segments stripped off can be set with the option --strip=n, n defaulting to 1.
Takes a singleton selection set. Normally inserts before that commit; with the option --after, insert after it. The default selection set is the very first commit of the repository.
The option --date can be used to set the commit date. It takes an argument, which is expected to be an RFC3339 timestamp.
The generated commit has a committer field (the invoking user) and gets as its commit date the modification time of the newest file in the tarball (not the mod time of the tarball itself). No author field is generated. A comment recording the tarball name is generated.
Note that the import stream generated by this command is - while correct - not optimal, and may in particular contain duplicate blobs.
VARIABLES, MACROS AND EXTENSIONS¶
Occasionally you will need to issue a large number of complex surgical commands of very similar form, and it's convenient to be able to package that form so you don't need to do a lot of error-prone typing. For those occasions, reposurgeon supports simple forms of named variables and macro expansion.assign [name]
With no selection set and no name, list all assignments.>
If the option --singleton is given, the assignment will throw an error if the selection set is not a singleton.
Use this to optimize out location and selection computations that would otherwise be performed repeatedly, e.g. in macro calls.
unassign [name]
names [>outfile]
define name body
A later “do” call can invoke this macro.
The command “define” by itself without a name or body produces a macro list.
do name arguments...
If the macro expansion does not itself begin with a selection set, whatever set was specified before the "do" keyword is available to the command generated by the expansion.
undefine name]
Here's an example to illustrate how you might use this. In CVS repositories of projects that use the GNU ChangeLog convention, a very common pre-conversion artifact is a commit with the comment "*** empty log message ***" that modifies only a ChangeLog entry explaining the commit immediately previous to it. The following
define changelog <{0}> & /empty log message/ squash --pushback do changelog 2012-08-14T21:51:35Z do changelog 2012-08-08T22:52:14Z do changelog 2012-08-07T04:48:26Z do changelog 2012-08-08T07:19:09Z do changelog 2012-07-28T18:40:10Z
is equivalent to the more verbose
<2012-08-14T21:51:35Z> & /empty log message/ squash --pushback <2012-08-08T22:52:14Z> & /empty log message/ squash --pushback <2012-08-07T04:48:26Z> & /empty log message/ squash --pushback <2012-08-08T07:19:09Z> & /empty log message/ squash --pushback <2012-07-28T18:40:10Z> & /empty log message/ squash --pushback
but you are less likely to make difficult-to-notice errors typing the first version.
(Also note how the text regexp acts as a failsafe against the possibility of typing a wrong date that doesn't refer to a commit with an empty comment. This was a real-world example from the CVS-to-git conversion of groff.)
When even a macro is not enough, you can write and call custom Python extensions.
exec name
This can be called in a script with the extension code in a here-doc.
eval function-name
script filename [arg...]
During execution of the script, the script name replaces the string $0 and the optional following arguments (if any) replace the strings $1, $2 ... $n in the script text. This is done before tokenization, so the $1 in a string like “foo$1bar” will be expanded. Additionally, $$ is expanded to the current process ID (which may be useful for scripts that use tempfiles).
Within scripts (and only within scripts) reposurgeon accepts a slightly extended syntax: First, a backslash ending a line signals that the command continues on the next line. Any number of consecutive lines thus escaped are concatenated, without the ending backslashes, prior to evaluation. Second, a command that takes an input filename argument can instead take literal following data in the syntax of a shell here-document. That is: if the filename is replaced by "<<EOF", all following lines in the script up to a terminating line consisting only of "EOF" will be read, placed in a temporary file, and that file fed to the command and afterwards deleted. EOF may be replaced by any string. Backslashes have no special meaning while reading a here-document.
Scripts may have comments. Any line beginning with a '#' is ignored. If a line has a trailing position that begins with one or more whitespace characters followed by '#', that trailing portion is ignored.
ARTIFACT REMOVAL¶
Some commands automate fixing various kinds of artifacts associated with repository conversions from older systems.authors [read|write] [<filename] [>filename]
Lifts from CVS and Subversion may have only usernames local to the repository host in committer and author IDs. DVCSes want email addresses (net-wide identifiers) and complete names. To supply the map from one to the other, an authors file is expected to consist of lines each beginning with a local user ID, followed by a '=' (possibly surrounded by whitespace) followed by a full name and email address, optionally followed by a timezone offset field. Thus:
ferd = Ferd J. Foonly <foonly@foo.com> America/New_York
An authors file may also contain lines of this form
+ Ferd J. Foonly <foonly@foobar.com> America/Los_Angeles
These are interpreted as aliases for the last preceding '=' entry that may appear in ChangeLog files. When such an alias is matched on a ChangeLog attribution line, the author attribution for the commit is mapped to the basename, but the timezone is used as is. This accommodates people with past addresses (possibly at) different locations) unifying such aliases in metadata so searches and statistical aggregation will work better.
An authors file may have comment lines beginning with '#'; these are ignored.
When an authors file is applied, email addresses in committer and author metadata for which the local ID matches between < and @ are replaced according to the mapping (this handles git-svn lifts). Alternatively, if the local ID is the entire address, this is also considered a match (this handles what git-cvsimport and cvs2git do). If a timezone was specified in the map entry, that person's author and committer dates are mapped to it.
With the 'read' modifier, or no modifier, apply author mapping data (from standard input or a <-redirected file). May be useful if you are editing a repo or dump created by cvs2git or by git-svn invoked without -A.
With the 'write' modifier, write a mapping file that could be interpreted by authors read, with entries for each unique committer, author, and tagger (to standard output or a <-redirected mapping file). This may be helpful as a start on building an authors file, though each part to the right of an equals sign will need editing.
branchify [path-set]
With no arguments, displays the current branchification set.
An asterisk at the end of a path in the set means 'all immediate subdirectories of this path, unless they are part of another (longer) path in the branchify set'.
Note that the branchify set is a property of the reposurgeon interpreter, not of any individual repository, and will persist across Subversion dumpfile reads. This may lead to unexpected results if you forget to re-set it.
branchify_map [/regex/branch/...]
With no arguments the current regex replacement pairs are shown. Passing 'reset' will clear the mapping.
The branchify command will match each branch name against regex1 and if it matches rewrite its branch name to branch1. If not it will try regex2 and so forth until it either found a matching regex or there are no regexs left. The regular expressions should be in Python's[2]. format. The branch name can use backreferences (see the re.sub function in the Python documentation).
Note that the regular expressions are appended to 'refs/' without either the needed 'heads/' or 'tags/'. This allows for choosing the right kind of branch type.
While the syntax template above uses slashes, any first character will be used as a delimeter (and you will need to use a different one in the common case that the paths contain slashes).
You must give this command before the Subversion repository read it is supposed to affect!
Note that the branchify_map set is a property of the reposurgeon interpreter, not of any individual repository, and will persist across Subversion dumpfiile or repository reads. This may lead to unexpected results if you forget to re-set it.
EXAMINING TREE STATES¶
manifest [regular expression] [>outfile]checkout directory
diff [>outfile]
HOUSEKEEPING¶
These are backed up by the following housekeeping commands, none of which take a selection set:help
shell
prefer [repotype]
First, if there are multiple repositories in a directory you do a read on, reposurgeon will read the preferred one (otherwise it will complain that it can't choose among them).
Secondly, this will change reposurgeon's preferred type for output. This means that you do a write to a directory, it will build a repo of the preferred type rather than its original type (if it had one).
If no preferred type has been explicitly selected, reading in a repository (but not a fast-import stream) will implicitly set the preferred type to the type of that repository.
In older versions of reposurgeon this command changed the type of the selected repository, if there is one. That behavior interacted badly with attempts to interpret legacy IDs and has been removed.
sourcetype [repotype]
The source type affects the interpretation of legacy IDs (for purposes of the =N visibility set and the 'references' command) by controlling the regular expressions used to recognize them. If no preferred output type has been set, it may also change the output format of stream files made from the repository.
The source type is reliably set whenever a live repository is read, or when a Subversion stream or Fossil dump is interpreted but not necessarily by other stream files. Streams generated by cvs-fast-export(1) using the --reposurgeon are detected as CVS. In some other cases, the source system is detected from the presence of magic $-headers in contents blobs.
INSTRUMENTATION¶
A few commands have been implemented primarily for debugging and regression-testing purposes, but may be useful in unusual circumstances.The output of most of these commands can individually be redirected to a named output file. Where indicated in the syntax, you can prefix the output filename with “>” and give it as a following argument.
index [>outfile]
The default selection set for this command is =CTRU, all objects except blobs.
resolve [label-text...]
Implemented mainly for regression testing, but may be useful for exploring the selection-set language.
attribution selection resolve [>outfile] [label-text...]
Implemented mainly for regression testing, but may be useful for exploring the selection-set language.
verbose [n]
quiet [on | off]
print output-text...
echo [number]
version [version...]
It is good practice to start your lift script with a version requirement, especially if you are going to archive it for later reference.
prompt [format...]
chosen
Thus, one useful format might be 'rs[%(chosen)s]%% '.
More format items may be added in the future. The default prompt corresponds to the format 'reposurgeon%% '. The format line is evaluated with shell quotng of tokens, so that spaces can be included.
history
legacy [read|write] [<filename] [>filename]
A legacy-reference file maps reference cookies to (committer, commit-date, sequence-number) pairs; these in turn (should) uniquely identify a commit. The format is two whitespace-separated fields: the cookie followed by an action stamp identifying the commit.
It should not normally be necessary to use this command. The legacy map is automatically preserved through repository reads and rebuilds, being stored in the file legacy-map under the repository subdirectory..
set [option]
Most options are described in conjunction with the specific operations that the modify. One of general interest is “compressblobs”; this enables compression on the blob files in the internal representation reposurgeon uses for editing repositories. With this option, reading and writing of repositories is slower, but editing a repository requires less (sometimes much less) disk space.
clear [option]
profile
timing
exit
WORKING WITH MERCURIAL¶
reposurgeon uses a built-in extractor class to perform extractions from Mercurial repositories.Mercurial branches are exported as branches in the exported repository and tags are exported as tags. By default, bookmarks are ignored. You can specify explicit handling for bookmarks by setting reposurgeon.bookmarks in your .hg/hgrc. Set the value to the prefix that reposurgeon should use for bookmarks.
For example, if your bookmarks represent branches, put this at the bottom of your .hg/hgrc:
[reposurgeon] bookmarks=heads/
If you do that, it's your responsibility to ensure that branch names do not conflict with bookmark names. You can add a prefix like bookmarks=heads/feature- to disambiguate as necessary.
WORKING WITH SUBVERSION¶
reposurgeon can read Subversion dumpfiles or edit a Subversion repository (and you must point it at a repository, not a checkout directory).READING SUBVERSION REPOSITORIES¶
Certain optional modifiers on the read command change its behavior when reading Subversion repositories:--nobranch
--preserve
--ignore-properties
--user-ignores
--use-uuid
These modifiers can go anywhere in any order on the read command line after the read verb. They must be whitespace-separated.
It is also possible to embed a magic comment in a Subversion stream file to set these options. Prefix a space-separated list of them with the magic comment " # reposurgeon-read-options:"; the leading space is required. This may be useful when synthesizing test loads; in partticular, a stream file that does not set up a standard trunk/branches/tags directoryt layout can use this to perform a mapping of all commits onto the master branch that the git importer will accept.
Here are the rules used for mapping subdirectories in a Subversion repository to branches:
If you give the option --nobranch when reading a Subversion repository, branch analysis is skipped and the repository is treated as though flat (left as a linear sequence of commits on refs/heads/master). This may be useful if your repository configuration is highly unusual and you need to do your own branch surgery. Note that this option will disable partitioning of mixed commits.
Branch-creation operations with no following commits are treated differently depending on whether or not the --preserve option is on. If it is off (the default) the branch creation becomes an empty gitspace branch represented by a reset operation; any comment on the commit is issued with a warning. If --preserve is on, the comment metadata is preserved in an empty commit attached to the branchpoint.
Otherwise, each commit that only creates or deletes directories (in particular, copy commits for tags and branches, and commits that only change properties) will be transformed into a tag named after the tag or branch, containing the date/author/comment metadata from the commit.
Subversion branch deletions are turned into deletealls, clearing the fileset of the import-stream branch. When a branch finishes with a deleteall at its tip, the deleteall is transformed into a tag. This rule cleans up after aborted branch renames.
Occasionally (and usually by mistake) a branchy Subversion repository will contain revisions that touch multiple branches. These are handled by partitioning them into multiple import-stream commits, one on each affected branch. The Legacy-ID of such a split commit will have a pseudo-decimal part - for example, if Subversion revision 2317 touches three branches, the three generated commits will have IDs 2317.1, 2317.2, and 2317.3.
The svn:executable and svn:special properties are translated into permission settings in the input stream; svn:executable becomes 100755 and svn:special becomes 120000 (indicating a symlink; the blob contents will be the path to which the symlink should resolve).
Any cvs2svn:rev properties generated by cvs2svn are incorporated into the internal map used for reference-lifting, then discarded.
Normally, per-directory svn:ignore properties become .gitignore files. Actual .gitignore files in a Subversion directory are presumed to have been created by git-svn users separately from native Subversion ignore properties and discarded with a warning. It is up to the user to merge the content of such files into the target repository by hand. But this behavior is inverted by the --user-ignores option; if that is on, .gitignore files are passed through and Subversion svn:ignore properties are discarded.
(Regardless of the setting of the --user-ignores option, .cvsignore files found in Subversion repositories always become .gitignores in the translation. The assumption is that these date from before a CVS-to-SVN lift and should be preserved to affect behavior when browsing that section of the repository.)
svn:mergeinfo properties are interpreted. Any svn:mergeinfo property on a revision A with a merge source range ending in revision B produces a merge link such that B becomes a parent of A.
All other Subversion properties are discarded. (This may change in a future release.) The property for which this is most likely to cause semantic problems is svn:eol-style. However, since property-change-only commits get turned into annotated tags, the translated tags will retain information about setting changes.
The sub-second resolution on Subversion commit dates is discarded; Git wants integer timestamps only.
Because fast-import format cannot represent an empty directory, empty directories in Subversion repositories will be lost in translation.
Normally, Subversion local usernames are mapped in the style of git cvs-import; thus user "foo" becomes "foo <foo>", which is sufficient to pacify git and other systems that require email addresses. With the option "svn_use_uuid", usernames are mapped in the git-svn style, with the repository's UUID used as a fake domain in the email address. Both forms can be remapped to real address using the authors read command.
Reading a Subversion stream enables writing of the legacy map as 'legacy' passthroughs when the repo is written to a stream file.
reposurgeon tries hard to silently do the right thing, but there are Subversion edge cases in which it emits warnings because a human may need to intervene and perform fixups by hand. Here are the less obvious messages it may emit:
user-generated .gitignore
can't connect nonempty branch XXXX to origin
permission information may be lost
properties set
branch links detected by file ops only
could not tagify root commit
deleting parentless tip delete
mid-branch deleteall
lookback for XXX failed, not making branch link
WRITING SUBVERSION REPOSITORIES¶
reposurgeon has support for writing Subversion repositories. Due to mismatches between the ontology of Subversion and that of git import streams, this support has some significant limitations and bugs.In summary, Subversion repository histories do not round-trip through reposurgeon editing. File content changes are preserved but some metadata is unavoidably lost. Furthermore, writing out a DVCS history in Subversion also loses significant portions of its metadata. Details follow.
Writing a Subversion repository or dump stream discards author information, the committer's name, and the hostname part of the commit address; only the commit timestamp and the local part of the committer's email address are preserved, the latter becoming the Subversion author field. However, reading a Subversion repository and writing it out again will preserve the author fields.
Import-stream timestamps have 1-second granularity. The sub-second parts of Subversion commit timestamps will be lost on their way through reposurgeon.
Empty directories aren't represented in import streams. Consequently, reading and writing Subversion repositories preserves file content, but not empty directories. It is also not guaranteed that after editing a Subversion repository that the sequence of directory creations and deletions relative to other operations will be identical; the only guarantee is that enclosing directories will be created before any files in them are.
When reading a Subversion repository, reposurgeon discards the special directory-copy nodes associated with branch creations. These can't be recreated if and when the repository is written back out to Subversion; rather, each branch copy node from the original translates into a branch creation plus the first set of file modifications on the branch.
When reading a Subversion repository, reposurgeon also automatically breaks apart mixed-branch commits. These are not re-united if the repository is written back out.
When writing to a Subversion repository, all lightweight tags become Subversion tag copies with empty log comments, named for the tag basename. The committer name and timestamp are copied from the commit the tag points to. The distinction between heads and tags is lost.
Because of the preceding two points, it is not guaranteed that even revision numbers will be stable when a Subversion repository is read in and then written out!
Subversion repositories are always written with a standard (trunk/tags/branches) layout. Thus, a repository with a nonstandard shape that has been analyzed by reposurgeon won't be written out with the same shape.
When writing a Subversion repository, branch merges are translated into svn:mergeinfo properties in the simplest possible way - as an svn:mergeinfo property of the translated merge commit listing the merge source revisions.
Subversion has a concept of "flows"; that is, named segments of history corresponding to files or directories that are created when the path is added, cloned when the path is copied, and deleted when the path is deleted. This information is not preserved in import streams or the internal representation that reposurgeon uses. Thus, after editing, the flow boundaries of a Subversion history may be arbitrarily changed.
IGNORE PATTERNS¶
reposurgeon recognizes how supported VCSes represent file ignores (CVS .cvsignore files lurking untranslated in older Subversion repositories, Subversion ignore properties, .gitignore/.hgignore/.bzrignore file in other systems) and moves ignore declarations among these containers on repo input and output. This will be sufficient if the ignore patterns are exact filenames.Translation may not, however, be perfect when the ignore patterns are Unix glob patterns or regular expressions. This compatibility table describes which patterns will translate; “plain” indicates a plain filename with no glob or regexp syntax or negation.
RCS has no ignore files or patterns and is therefore not included in the table.
from CVS | from svn | from git | from hg | from bzr | from darcs | from SRC | from bk | |
to CVS | all | all | all except !-prefixed but nonempty | all | all except RE:- and !-prefixed | plain | all | all |
to svn | all except !.PP | all | all except !-prefixed | all | all except RE:- and !-prefixed | plain | all | all |
to git | all | all | all | all except !-prefixed | all except RE:-prefixed | plain | all | all |
to hg | all except ! | all | all except !-prefixed | all | all except RE:- and !-prefixed | plain | all | all |
to bzr | all | all | all | all | all | plain | all | all |
to darcs | plain | plain | plain | plain | plain | all | all | all |
to SRC | all except ! | all | all except !-prefixed | all | all except RE:- and !-prefixed | plain | all | all |
The hg rows and columns of the table describes compatibility to hg's glob syntax rather than its default regular-expression syntax. When writing to an hg repository from any other kind, reposurgeon prepends to the output .hgignore a "syntax: glob" line.
TRANSLATION STYLE¶
After converting a CVS, SVN, or BitKeeper repository, check for and remove $-cookies in the head revision(s) of the files. The full Subversion set is $Date:, $Revision:, $Author:, $HeadURL and $Id:. CVS uses $Author:, $Date:, $Header:, $Id:, $Log:, $Revision:, also (rarely) $Locker:, $Name:, $RCSfile:, $Source:, and $State:.When you need to specify a commit, use the action-stamp format that references lift generates when it can resolve an SVN or CVS reference in a comment. It is best that you not vary from this format, even in trivial ways like omitting the 'Z' or changing the 'T' or '!' or ':'. Making action stamps uniform and machine-parseable will have good consequences for future repository-browsing tools.
Sometimes, in converting a repository, you may need to insert an explanatory comment - for example, if metadata has been garbled or missing and you need to point to that fact. It's helpful for repository-browsing tools if there is a uniform syntax for this that is highly unlikely to show up in repository comments. We recommend enclosing translation notes in [[ ]]. This has the advantage of being visually similar to the [ ] traditionally used for editorial comments in text.
It is good practice to include, in the comment for the root commit of the repository, a note dating and attributing the conversion work and explaining these conventions. Example:
[[This repository was converted from Subversion to git on 2011-10-24 by Eric S. Raymond <esr@thyrsus.com>. Here and elsewhere, conversion notes are enclosed in double square brackets. Junk commits generated by cvs2svn have been removed, commit references have been mapped into a uniform VCS-independent syntax, and some comments edited into summary-plus-continuation form.]]
It is also good practice to include a generated tag at the point of conversion. E.g
mailbox_in --create <<EOF Tag-Name: git-conversion Marks the spot at which this repository was converted from Subversion to git. EOF
ADVANCED EXAMPLES¶
define lastchange { @max(=B & [/ChangeLog/] & /{0}/B)? list }
List the last commit that refers to a ChangeLog file containing a specified string. (The trick here is that ? extends the singleton set consisting of the last eligible ChangeLog blob to its set of referring commits, and listonly notices the commits.)
STREAM SYNTAX EXTENSIONS¶
The event-stream parser in “reposurgeon” supports some extended syntax. Exporters designed to work with “reposurgeon” may have a --reposurgeon option that enables emission of extended syntax; notably, this is true of cvs-fast-export(1). The remainder of this section describes these syntax extensions. The properties they set are (usually) preserved and re-output when the stream file is written.The token “#reposurgeon” at the start of a comment line in a fast-import stream signals reposurgeon that the remainder is an extension command to be interpreted by “reposurgeon”.
One such extension command is implemented: #sourcetype, which behaves identically to the reposurgeon sourcetype command. An exporter for a version-control system named “frobozz” could, for example, say
#reposurgeon sourcetype frobozz
Within a commit, a magic comment of the form “#legacy-id” declares a legacy ID from the stream file's source version-control system.
Also accepted is the bzr syntax for setting per-commit properties. While parsing commit syntax, a line beginning with the token “property” must contibue with a whitespace-separated property-name token. If it is then followed by a newline it is taken to set that boolean-valued property to true. Otherwise it must be followed by a numeric token specifying a data length, a space, following data (which may contain newlines) and a terminating newline. For example:
commit refs/heads/master mark :1 committer Eric S. Raymond <esr@thyrsus.com> 1289147634 -0500 data 16 Example commit. property legacy-id 2 r1 M 644 inline README
Unlike other extensions, bzr properties are only preserved on stream output if the preferred type is bzr, because any importer other than bzr's will choke on them.
INCOMPATIBLE LANGUAGE CHANGES¶
In versions before 3.23, “prefer” changed the repository type as well as the preferred output format.In versions before 3.0, the general command syntax put the command verb first, then the selection set (if any) then modifiers (VSO). It has changed to optional selection set first, then command verb, then modifiers (SVO). The change made parsing simpler, allowed abolishing some noise keywords, and recapitulates a successful design pattern in some other Unix tools - notably sed(1).
In versions before 3.0, path expressions only matched commits, not commits and the associated blobs as well. The names of the “a” and “c” flags were different.
In reposurgeon versions before 3.0, the delete command had the semantics of squash; also, the policy flags did not require a “--” prefix. The “--delete” flag was named “obliterate”.
In reposurgeon versions before 3.0, read and write optionally took file arguments rather than requiring redirects (and the write command never wrote into directories). This was changed in order to allow these commands to have modifiers. These modifiers replaced several global options that no longer exist.
In reposurgeon versions before 3.0, the earliest factor in a unite command always kept its tag and branch names unaltered. The new rule for resolving name conflicts, giving priority to the latest factor, produces more natural behavior when uniting two repositories end to end; the master branch of the second (later) one keeps its name.
In reposurgeon versions before 3.0, the tagify command expected policies as trailing arguments to alter its behaviour. The new syntax uses similarly named options with leading dashes, that can appear anywhere after the tagify command
In versions before 2.9. the syntax of "authors", "legacy", "list", and "mailbox_{in|out}" was different (and "legacy" was "fossils"). They took plain filename arguments rather that using redirect < and >.
LIMITATIONS AND GUARANTEES¶
Guarantee: In DVCses that use commit hashes, editing with reposurgeon never changes the hash of a commit object unless (a) you edit the commit, or (b) it is a descendant of an edited commit in a VCS that includes parent hashes in the input of a child object's hash (git and hg both do this).Guarantee: reposurgeon only requires main memory proportional to the size of a repository's metadata history, not its entire content history. (Exception: the data from inline content is held in memory.)
Guarantee: In the worst case, reposurgeon makes its own copy of every content blob in the repository's history and thus uses intermediate disk space approximately equal to the size of a repository's content history. However, when the repository to be edited is presented as a stream file, reposurgeon requires no or only very little extra disk space to represent it; the internal representation of content blobs is a (seek-offset, length) pair pointing into the stream file.
Guarantee: reposurgeon never modifies the contents of a repository it reads, nor deletes any repository. The results of surgery are always expressed in a new repository.
Guarantee: Any line in a fast-import stream that is not a part of a command reposurgeon parses and understands will be passed through unaltered. At present the set of potential passthroughs is known to include the progress, the options, and checkpoint commands as well as comments led by #.
Guarantee: All reposurgeon operations either preserve all repository state they are not explicitly told to modify or warn you when they cannot do so.
Guarantee: reposurgeon handles the bzr commit-properties extension, correctly passing through property items including those with embedded newlines. (Such properties are also editable in the mailbox format.)
Limitation: Because reposurgeon relies on other programs to generate and interpret the fast-import command stream, it is subject to bugs in those programs.
Limitation: bzr suffers from deep confusion over whether its unit of work is a repository or a floating branch that might have been cloned from a repo or created from scratch, and might or might not be destined to be merged to a repo one day. Its exporter only works on branches, but its importer creates repos. Thus, a rebuild operation will produce a subdirectory structure that differs from what you expect. Look for your content under the subdirectory 'trunk'.
Limitation: under git, signed tags are imported verbatim. However, any operation that modifies any commit upstream of the target of the tag will invalidate it.
Limitation: Stock git (at least as of version 1.7.3.2) will choke on property extension commands. Accordingly, reposurgeon omits them when rebuilding a repo with git type.
Limitation: Converting an hg repo that uses bookmarks (not branches) to git can lose information; the branch ref that git assigns to each commit may not be the same as the hg bookmark that was active when the commit was originally made under hg. Unfortunately, this is a real ontological mismatch, not a problem that can be fixed by cleverness in reposurgeon.
Limitation: Converting an hg repo that uses branches to git can lose information because git does not store an explicit branch as part of commit metadata, but colors commits with branch or tag names on the fly using a specific coloring algorithm, which might not match the explicit branch assignments to commits in the original hg repo. Reposurgeon preserves the hg branch information when reading an hg repo, so it is available from within reposurgeon itself, but there is no way to preserve it if the repo is written to git.
Limitation: While the Subversion read-side support is in good shape, the write-side support is more of a sketch or proof-of-concept than a robust implementation; it only works on very simple cases and does not round-trip. It may improve in future releases.
Limitation: Not all BitKeeper versions have the fast-import and fast-export commands that reposurgeon requires. They are present back to the 7.3 opensource version.
Limitation: reposurgeon may misbehave under a filesystem which smashes case in filenames, or which nominally preserves case but maps names differing only by case to the same filesystem node (Mac OS X behaves like this by default). Problems will arise if any two paths in a repo differ by case only. To avoid the problem on a Mac, do all your surgery on an HFS+ file system formatted with case sensitivity specifically enabled.
Limitation: If whitespace followed by # appears in a string or regexp command argument, it will be misinterpreted as the beginning of a line-ending comment and screw up parsing.
Guarantee: As version-control systems add support for the fast-import format, their repositories will become editable by reposurgeon.
Limitations edescribed above are unlikely to change. Do "help bugs" at the reposurgeon prompt to see up-to-date information on reposurgeon bugs and internal problems that are expected to be fixed in some future release.
REQUIREMENTS¶
reposurgeon relies on importers and exporters associated with the VCSes it supports.git
bzr
hg
svn
darcs
CVS
RCS
CANONICALIZATION RULES¶
It is expected that reposurgeon will be extended with more deletion policies. Policy authors may need to know more about how a commit's file operation sequence is reduced to normal form after operations from deleted commits are prepended to it.Recall that each commit has a list of file operations, each a M (modify), D (delete), R (rename), C (copy), or 'deleteall' (delete all files). Only M operations have associated blobs. Normally there is only one M operation per individual file in a commit's operation list.
To understand how the reduction process works, it's enough to understand the case where all the operation in the list are working on the same file. Sublists of operations referring to different files don't affect each other and reducing them can be thought of as separate operations. Also, a "deleteall" acts as a D for everything and cancels all operations before it in the list.
The reduction process walks through the list from the beginning looking for adjacent pairs of operations it can compose. The following table describes all possible cases and all but one of the reductions.
M + D → D | If a file is modified then deleted, the result is as though it had been deleted. If the M was the only modify for the file, it's removed too. |
M a + R a b → R a b + M b | The purpose of this transformation is to push renames toward the beginning of the list, where they may become adjacent to another R or C they can be composed with. If the M is the only modify operation for this file, the rename is dropped. |
M a + C a b | No reduction. |
M b + R a b → nothing | Should be impossible, and may indicate repository corruption. |
M b + C a b → nothing | The copy undoes the modification. |
D + M → M | If a file is deleted and modified, the result is as though the deletion had not taken place (because M operations store entire files, not deltas). |
D + {D|R|C} | These cases should be impossible and would suggest the repository has been corrupted. |
R a b + D a | Should never happen, and is another case that would suggest repository corruption. |
R a b + D b → D a | The delete removes the just-renamed file. |
{R|C} + M | No reduction. |
R a b + R b c → R a c | The b terms have to match for these operations to have made sense when they lived in separate commits; if they don't, it indicates repository corruption. |
R a b + C b c | No reduction. |
C a b + D a → R a b | Copy followed by delete of the source is a rename. |
C a b + D b → nothing | This delete undoes the copy. |
C a b + R a c | No reduction. |
C a b + R b c → C a c | Copy followed by a rename of the target reduces to single copy |
C + C | No reduction. |
CRASH RECOVERY¶
This section will become relevant only if reposurgeon or something underneath it in the software and hardware stack crashes while in the middle of writing out a repository, in particular if the target directory of the rebuild is your current directory.The tool has two conflicting objectives. On the one hand, we never want to risk clobbering a pre-existing repo. On the other hand, we want to be able to run this tool in a directory with a repo and modify it in place.
We resolve this dilemma by playing a game of three-directory monte.
So far, all operations are safe; the worst that can happen up to this point if the process gets interrupted is that the staging and backup directories get left behind.
During the critical region, all signals that can be ignored are ignored.
ERROR RETURNS¶
Returns 1 on fatal error, 0 otherwise. In batch mode all errors are fatal.SEE ALSO¶
bzr(1), cvs(1), darcs(1), git(1), hg(1), rcs(1), svn(1). bk(1).AUTHOR¶
Eric S. Raymond <esr@thyrsus.com>; project page at http://www.catb.org/~esr/reposurgeon.NOTES¶
- 1.
- DVCS Migration HOWTO
- 2.
- Python's
12/22/2018 | reposurgeon |