Scroll to navigation

datalad diff(1) datalad datalad diff(1)


datalad diff - report changes of dataset components.


datalad diff [-h] [-d DATASET] [--revision [REVISION EXPRESSION]] [--staged] [--ignore-subdatasets IGNORE_SUBDATASETS] [--report-untracked REPORT_UNTRACKED] [-r] [--recursion-limit LEVELS] [PATH [PATH ...]]


Reports can be generated for changes between recorded revisions, or between a revision and the state of a dataset's work tree.

Unlike 'git diff', this command also reports untracked content when comparing a revision to the state of the work tree. Such content is marked with the property STATE='UNTRACKED' in the command results.

The following types of changes are distinguished and reported via the STATE result property:

- added - copied - deleted - modified - renamed - typechange - unmerged - untracked

Whenever applicable, source and/or destination revisions are reported to indicate when exactly within the requested revision range a particular component changed its status.

Optionally, the reported changes can be limited to a subset of paths within a dataset.


path to be evaluated. Constraints: value must be a string [Default: None]

-h, --help, --help-np
show this help message. --help-np forcefully disables the use of a pager for displaying the help message
-d DATASET, --dataset DATASET
specify the dataset to query. If no dataset is given, an attempt is made to identify the dataset based on the input and/or the current working directory. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) [Default: None]
comparison reference specification. Three modes are supported: 1) <revision> changes you have in your working tree relative to the named revision (this can also be a branch name, tag, commit or any label Git can understand). 2) <revision>..<revision> changes between two arbitrary revisions. 3) <revision>...<revision> changes on the branch containing and up to the second <revision>, starting at a common ancestor of both revisions. [Default: None]
get the changes already staged for a commit relative to an optionally given revision (by default the most recent one). [Default: False]
--ignore-subdatasets IGNORE_SUBDATASETS
speed up execution by (partially) not evaluating the state of subdatasets in a parent dataset. With "none" a subdataset is considered modified when it either contains untracked or modified content or its last saved state differs from that recorded in the parent dataset. When "untracked" is used subdatasets are not considered modified when they only contain untracked content (but they are still scanned for modified content). Using "dirty" ignores all changes to the work tree of subdatasets, only changes to the revisions stored in the parent dataset are shown. Using "all" hides all changes to subdatasets. Note, even with "all" recursive execution will still report other changes in any existing subdataset, only the subdataset record in a parent dataset is not evaluated. Constraints: value must be one of ('none', 'untracked', 'dirty', 'all') [Default: 'none']
--report-untracked REPORT_UNTRACKED
If and how untracked content is reported when comparing a revision to the state of the work tree. 'no': no untracked files are reported; 'normal': untracked files and entire untracked directories are reported as such; 'all': report individual files even in fully untracked directories. Constraints: value must be one of ('no', 'normal', 'all') [Default: 'normal']
-r, --recursive
if set, recurse into potential subdataset. [Default: False]
--recursion-limit LEVELS
limit recursion into subdataset to the given number of levels. Constraints: value must be convertible to type 'int' [Default: None]


datalad is developed by The DataLad Team and Contributors <>.