.\" Automatically generated by Pandoc 2.9.2.1 .\" .TH "VAST" "1" "" "" "" .hy .SH NAME .PP \f[C]vast\f[R] \[en] manage a VAST node .SH OVERVIEW .PP This section describes the VAST system and its components from a user interaction point of view. .SS vast .PP \f[B]VAST\f[R] is a platform for network forensics at scale. It ingests security telemetry in a unified data model and offers a type-safe search interface to extract a data in various formats. .PP The \f[C]vast\f[R] executable manages a VAST deployment by starting and interacting with a \f[B]node\f[R], the server-side component that manages the application state. .SS Usage .PP The command line interface (CLI) is the primary way to interact with VAST. All functionality is available in the form of \f[I]commands\f[R], each of which have their own set of options: .IP .nf \f[C] vast [options] [command] [options] [command] ... \f[R] .fi .PP Commands are recursive and the top-level root command is the \f[C]vast\f[R] executable itself. Usage follows typical UNIX applications: .IP \[bu] 2 \f[I]standard input\f[R] feeds data to commands .IP \[bu] 2 \f[I]standard output\f[R] represents the result of a command .IP \[bu] 2 \f[I]standard error\f[R] includes logging output .PP The \f[C]help\f[R] subcommand always prints the usage instructions for a given command, e.g., \f[C]vast help\f[R] lists all available top-level subcommands. .PP More information about subcommands is available using \f[C]help\f[R] and \f[C]documentation\f[R] subcommands. E.g., \f[C]vast import suricata help\f[R] prints a helptext for \f[C]vast import suricata\f[R], and \f[C]vast start documentation\f[R] prints a longer documentation for \f[C]vast start\f[R]. .SS Configuration .PP In addition to command options, a YAML configuration file \f[C]vast.yaml\f[R] allows for persisting option values and tweaking system parameters. Command line options always override configuration file values. .PP During startup, VAST looks for configuration files in the following places, and merges their content with the more specific files taking a higher precedence: .IP "1." 3 \f[C]/vast/vast.yaml\f[R] for system-wide configuration, where \f[C]\f[R] is the platform-specific directory for configuration files, e.g., \f[C]/etc/vast\f[R]. .IP "2." 3 \f[C]\[ti]/.config/vast/vast.yaml\f[R] for user-specific configuration. VAST respects the XDG base directory specification and its environment variables. .IP "3." 3 A configuration file passed using \f[C]--config=path/to/vast.yaml\f[R] on the command line. .SS System Architecture .PP VAST consists of multiple \f[I]components\f[R], each of which implement specific system functionality. The following key componetns exist: .PP \f[B]source\f[R] Generates events by parsing a particular data format, such as packets from a network interface, IDS log files, or generic CSV or JSON data. .PP \f[B]sink\f[R] Produces events by printing them in a particular format, such as ASCII, CSV, JSON, PCAP, or Zeek logs. .PP \f[B]archive\f[R] Stores the raw event data. .PP \f[B]index\f[R] Accelerates queries by constructing index structures that point into the \f[B]archive\f[R]. .PP \f[B]importer\f[R] Ingests events from \f[B]source\f[R]s, assigns them unique IDs, and relays them to \f[B]archive\f[R] and \f[B]index\f[R] for persistence. .PP \f[B]exporter\f[R] Accepts query expressions from users, extracts events, and relays results to \f[B]sink\f[R]s. .SS Schematic .IP .nf \f[C] +--------------------------------------------+ | node | | | +--------+ | +--------+ | +-------+ | source | | +--->archive <------+ +-------> sink | +----zeek+-------+ | +--------<---+ v-----------++ | +---json+ | | | | | exporter | | | +v------++ +------>------------+ | ... | |importer| | | ... | ... | +\[ha]------++ | | | | | | | +-->------------+ | +--------+-------+ | | | exporter | | | source | | | +--------v \[ha]-----------++ | +-------+ +----pcap+ | +---> index <------+ +-------> sink | | +--------+ | +--ascii+ | | | | +--------------------------------------------+ \f[R] .fi .PP The above diagram illustrates the default configuration of a single node and the flow of messages between the components. The \f[B]importer\f[R], \f[B]index\f[R], and \f[B]archive\f[R] are singleton instances within the \f[B]node\f[R]. \f[B]Source\f[R]s are spawned on demand for each data import. \f[B]Sink\f[R]s and \f[B]exporter\f[R]s form pairs that are spawned on demand for each query. \f[B]Source\f[R]s and \f[B]sink\f[R]s exist in their own \f[C]vast\f[R] processes, and are responsible for parsing the input and formatting the search results. #### count .PP The \f[C]count\f[R] command counts the number of events that a given query expression yields. For example: .IP .nf \f[C] vast count \[aq]:addr in 192.168.0.0/16\[aq] \f[R] .fi .PP This prints the number of events in the database that have an address field in the subnet \f[C]192.168.0.0/16\f[R]. .PP An optional \f[C]--estimate\f[R] flag skips the candidate checks, i.e., asks only the index and does not verify the hits against the database. This is a faster operation and useful when an upper bound suffices. .SS dump .PP The \f[C]dump\f[R] command prints configuration and schema-related information. By default, the output is JSON-formatted. The flag \f[C]--yaml\f[R] flag switches to YAML output. .PP For example, to see all registered concept definitions, use the following command: .IP .nf \f[C] vast dump concepts \f[R] .fi .PP To dump all models in YAML format, use: .IP .nf \f[C] vast dump --yaml models \f[R] .fi .PP Specifying \f[C]dump\f[R] alone without a subcommand shows the concatenated output from all subcommands. ##### concepts .PP The \f[C]dump concepts\f[R] command prints all registered concept definitions. .IP .nf \f[C] vast dump concepts \f[R] .fi .SS models .PP The \f[C]dump models\f[R] command prints all registered model definitions. .IP .nf \f[C] vast dump models \f[R] .fi .SS export .PP The \f[C]export\f[R] command retrieves a subset of data according to a given query expression. The export format must be explicitly specified: .IP .nf \f[C] vast export [options] [options] \f[R] .fi .PP This is easiest explained on an example: .IP .nf \f[C] vast export --max-events=100 --continuous json \[aq]:timestamp < 1 hour ago\[aq] \f[R] .fi .PP The above command outputs line-delimited JSON like this, showing one event per line: .IP .nf \f[C] {\[dq]ts\[dq]: \[dq]2020-08-06T09:30:12.530972\[dq], \[dq]nodeid\[dq]: \[dq]1E96ADC85ABCA4BF7EE5440CCD5EB324BEFB6B00#85879\[dq], \[dq]aid\[dq]: 9, \[dq]actor_name\[dq]: \[dq]pcap-reader\[dq], \[dq]key\[dq]: \[dq]source.start\[dq], \[dq]value\[dq]: \[dq]1596706212530\[dq]} \f[R] .fi .PP The above command signals the running server to export 100 events to the \f[C]export\f[R] command, and to do so continuously (i.e., not matching data that was previously imported). Only events that have a field of type \f[C]timestamp\f[R] will be exported, and only if the timestamp in that field is older than 1 hour ago from the current time at the node. .PP The default mode of operation for the \f[C]export\f[R] command is historical queries, which exports data that was already archived and indexed by the node. The \f[C]--unified\f[R] flag can be used to export both historical and continuous data. .PP For more information on the query expression, see the query language documentation (https://docs.tenzir.com/vast/query-language/overview). .PP Some export formats have format-specific options. For example, the \f[C]pcap\f[R] export format has a \f[C]--flush-interval\f[R] option that determines after how many packets the output is flushed to disk. A list of format-specific options can be retrieved using the \f[C]vast export help\f[R], and individual documentation is available using \f[C]vast export documentation\f[R]. ##### zeek .PP The Zeek (https://zeek.org) export format writes events in Zeek\[cq]s tab-separated value (TSV) style. .SS csv .PP The \f[C]export csv\f[R] command renders comma-seperatated values (https://en.wikipedia.org/wiki/Comma-separated_values) in tabular form. The first line in a CSV file contains a header that describes the field names. The remaining lines contain concrete values. Except for the header, one line corresponds to one event. .SS ascii .PP The ASCII export format renders events according to VAST\[cq]s data grammar. It merely dumps the data, without type information, and is therefore useful when digging for specific values. .SS json .PP The JSON export format renders events in newline-delimited JSON (aka. JSONL (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON)). .SS null .PP The null export format does not render its results, and is used for debugging and benchmarking only. .SS explore .PP The \f[C]explore\f[R] command correlates spatially and temporally related activity. .PP :::note Work In Progress This documentation does not represent the current state of the \f[C]vast explore\f[R] command. Only some of the options shown below are currently implemented. ::: .PP First, VAST evaluates the provided query expression. The results serve as input to generate further queries. Specifying temporal constraints (\f[C]--after\f[R], \f[C]--before\f[R], or \f[C]--context\f[R]) apply relative to the timestamp field of the results. Specifying spatial constraints can include a join field (\f[C]--by\f[R]) or a join expression (\f[C]--where\f[R]) that references fields from the result set. Restricting the exploration to specific sets of types (\f[C]--for\f[R]) works in both cases. .PP The \f[C]--before\f[R], \f[C]--after\f[R], and \f[C]--context\f[R] parameters create a time box around every result of the query. For example, this invocation shows all events that happened up to five minutes after each connection to 192.168.1.10: .IP .nf \f[C] vast explore --after=5min \[aq]zeek.conn.id.resp_h == 192.168.1.10\[aq] \f[R] .fi .PP The \f[C]--for\f[R] option restricts the result set to specific types. Note that \f[C]--for\f[R] cannot appear alone but must occur with at least one other of the selection options. For example, this invocation shows all DNS requests captured by Zeek up to 60 seconds after a connection to 192.168.1.10: .IP .nf \f[C] vast explore --after=60s --for=zeek.dns \[aq]zeek.conn.id.resp_h == 192.168.1.10\[aq] \f[R] .fi .PP The \f[C]--by\f[R] option takes a field name as argument and restricts the set of returned records to those records that have a field with the same name and where that field has the same value as the same field in the original record. In other words, it performs an equi-join over the given field. .PP For example, to select all outgoing connections from some address up to five minutes after a connection to host 192.168.1.10 was made from that address: .IP .nf \f[C] vast explore --after=5min --by=orig_h \[aq]zeek.conn.id.resp_h == 192.168.1.10\[aq] \f[R] .fi .PP The \f[C]--where\f[R] option specifies a dynamic filter expression that restricts the set of returned records to those for which the expression returns true. Syntactically, the expression must be a boolean expression in the VAST query language. Inside the expression, the special character \f[C]$\f[R] refers to an element of the result set. Semantically, the \f[C]where\f[R] expression generates a new query for each result of the original query. In every copy of the query, the \f[C]$\f[R] character refers to one specific result of the original query. .PP For example, the following query first looks for all DNS queries to the host \f[C]evil.com\f[R] captured by Zeek, and then generates a result for every outgoing connection where the destination IP was one of the IPs inside the \f[C]answer\f[R] field of the DNS result. .IP .nf \f[C] vast explore --where=\[aq]resp_h in $.answer\[aq] \[aq]zeek.dns.query == \[dq]evil.com\[dq]\[aq] \f[R] .fi .PP Combined specification of the \f[C]--where\f[R], \f[C]--for\f[R], or \f[C]--by\f[R] options results in the intersection of the result sets of the individual options. Omitting all of the \f[C]--after\f[R], \f[C]--before\f[R] and \f[C]--context\f[R] options implicitly sets an infinite range, i.e., it removes the temporal constraint. .PP Unlike the \f[C]export\f[R] command, the output format can be selected using \f[C]--format=\f[R]. The default export format is \f[C]json\f[R]. .SS get .PP The \f[C]get\f[R] command retrieves events of specific ids that were assigned to them when they were imported. .IP .nf \f[C] vast get [options] [ids] \f[R] .fi .PP Let\[cq]s look at an example: .IP .nf \f[C] vast get 0 42 1234 \f[R] .fi .PP The above command outputs the requested events in JSON format. Other formatters can be selected with the \f[C]--format\f[R] option. .SS infer .PP The \f[C]infer\f[R] command attempts to derive a schema from user input. Upon success, it prints a schema template to standard output. .PP The \f[C]infer\f[R] command allows for inferring schemas for the Zeek TSV and JSON formats. Note that the input is required to be JSON, and unlike other VAST commands, JSONL (newline-delimited JSON) is not supported. .PP Example usage: .IP .nf \f[C] gunzip -c integration/data/json/conn.log.json.gz | head -1 | vast infer \f[R] .fi .PP Note that the output of the \f[C]vast infer\f[R] command still needs to be manually edited in case there was an ambiguity, as the type system of the data source format may be less strict than the data model used by VAST. E.g., there is no way to represent an IP address in JSON other than using a string type. .PP The \f[C]vast infer\f[R] command is a good starting point for writing custom schemas, but is not designed to be a replacement for it. .PP For more informatio on VAST\[cq]s data model, head over to our data model documentation page (https://docs.tenzir.com/vast/data-model/overview). .SS import .PP The \f[C]import\f[R] command ingests data. An optional filter expression allows for restricing the input to matching events. The format of the imported data must be explicitly specified: .IP .nf \f[C] vast import [options] [options] [expr] \f[R] .fi .PP The \f[C]import\f[R] command is the dual to the \f[C]export\f[R] command. .PP This is easiest explained on an example: .IP .nf \f[C] vast import suricata < path/to/eve.json \f[R] .fi .PP The above command signals the running node to ingest (i.e., to archive and index for later export) all Suricata events from the Eve JSON file passed via standard input. .SS Filter Expressions .PP An optional filter expression allows for importing the relevant subset of information only. For example, a user might want to import Suricata Eve JSON, but skip over all events of type \f[C]suricata.stats\f[R]. .IP .nf \f[C] vast import suricata \[aq]#type != \[dq]suricata.stats\[dq]\[aq] < path/to/eve.json \f[R] .fi .PP For more information on the optional filter expression, see the query language documentation (https://docs.tenzir.com/vast/query-language/overview). .SS Format-Specific Options .PP Some import formats have format-specific options. For example, the \f[C]pcap\f[R] import format has an \f[C]interface\f[R] option that can be used to ingest PCAPs from a network interface directly. To retrieve a list of format-specific options, run \f[C]vast import help\f[R], and similarly to retrieve format-specific documentation, run \f[C]vast import documentation\f[R]. .SS Type Filtering .PP The \f[C]--type\f[R] option filters known event types based on a prefix. E.g., \f[C]vast import json --type=zeek\f[R] matches all event types that begin with \f[C]zeek\f[R], and restricts the event types known to the import command accordingly. .PP VAST permanently tracks imported event types. They do not need to be specified again for consecutive imports. .SS Batching .PP The import command parses events into table slices (batches). The following options control the batching: .SS \f[C]vast.import.batch-encoding\f[R] .PP Selects the encoding of table slices. Available options are \f[C]msgpack\f[R] (row-based) and \f[C]arrow\f[R] (column-based). .SS \f[C]vast.import.batch-size\f[R] .PP Sets an upper bound for the number of events per table slice. .PP Most components in VAST operate on table slices, which makes the table slice size a fundamental tuning knob on the spectrum of throughput and latency. Small table slices allow for shorter processing times, resulting in more scheduler context switches and a more balanced workload. However, the increased pressure on the scheduler comes at the cost of throughput. A large table slice size allows actors to spend more time processing a block of memory, but makes them yield less frequently to the scheduler. As a result, other actors scheduled on the same thread may have to wait a little longer. .PP The \f[C]vast.import.batch-size\f[R] option merely controls number of events per table slice, but not necessarily the number of events until a component forwards a batch to the next stage in a stream. The CAF streaming framework (https://actor-framework.readthedocs.io/en/latest/Streaming.html) uses a credit-based flow-control mechanism to determine buffering of tables slices. Setting \f[C]vast.import.batch-size\f[R] to 0 causes the table slice size to be unbounded and leaves it to other parameters to determine the actual table slice size. .SS \f[C]vast.import.batch-timeout\f[R] .PP Sets a timeout for forwarding buffered table slices to the importer. .PP The \f[C]vast.import.batch-timeout\f[R] option controls the maximum buffering period until table slices are forwarded to the node. The default batch timeout is one second. .SS \f[C]vast.import.read-timeout\f[R] .PP Sets a timeout for reading from input sources. .PP The \f[C]vast.import.read-timeout\f[R] option determines how long a call to read data from the input will block. The process yields and tries again at a later time if no data is received for the set value. The default read timeout is 20 milliseconds. ##### zeek .PP The \f[C]import zeek\f[R] command consumes Zeek (https://zeek.org) logs in tab-separated value (TSV) style, and the \f[C]import zeek-json\f[R] command consumes Zeek logs as line-delimited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON) objects as produced by the json-streaming-logs (https://github.com/corelight/json-streaming-logs) package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional \f[C]_path\f[R] field to disambiguate the log type. For stock Zeek JSON logs, use the existing \f[C]import json\f[R] with the \f[C]-t\f[R] flag to specify the log type. .PP Here\[cq]s an example of a typical Zeek \f[C]conn.log\f[R]: .IP .nf \f[C] #separator \[rs]x09 #set_separator , #empty_field (empty) #unset_field - #path conn #open 2014-05-23-18-02-04 #fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents #types time string addr port addr port enum string interval count count string bool count string count count count count table[string] 1258531221.486539 Pii6cUUq1v4 192.168.1.102 68 192.168.1.1 67 udp - 0.163820 301 300 SF - 0 Dd 1 329 1 328 (empty) 1258531680.237254 nkCxlvNN8pi 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - 0 D 7 546 0 0 (empty) 1258531693.816224 9VdICMMnxQ7 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - 0 D 7 546 0 0 (empty) 1258531635.800933 bEgBnkI31Vf 192.168.1.103 138 192.168.1.255 138 udp - 46.725380 560 0 S0 - 0 D 3 644 0 0 (empty) 1258531693.825212 Ol4qkvXOksc 192.168.1.102 138 192.168.1.255 138 udp - 2.248589 348 0 S0 - 0 D 2 404 0 0 (empty) 1258531803.872834 kmnBNBtl96d 192.168.1.104 137 192.168.1.255 137 udp dns 3.748893 350 0 S0 - 0 D 7 546 0 0 (empty) 1258531747.077012 CFIX6YVTFp2 192.168.1.104 138 192.168.1.255 138 udp - 59.052898 549 0 S0 - 0 D 3 633 0 0 (empty) 1258531924.321413 KlF6tbPUSQ1 192.168.1.103 68 192.168.1.1 67 udp - 0.044779 303 300 SF - 0 Dd 1 331 1 328 (empty) 1258531939.613071 tP3DM6npTdj 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty) 1258532046.693816 Jb4jIDToo77 192.168.1.104 68 192.168.1.1 67 udp - 0.002103 311 300 SF - 0 Dd 1 339 1 328 (empty) 1258532143.457078 xvWLhxgUmj5 192.168.1.102 1170 192.168.1.1 53 udp dns 0.068511 36 215 SF - 0 Dd 1 64 1 243 (empty) 1258532203.657268 feNcvrZfDbf 192.168.1.104 1174 192.168.1.1 53 udp dns 0.170962 36 215 SF - 0 Dd 1 64 1 243 (empty) 1258532331.365294 aLsTcZJHAwa 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.100381 273 0 S0 - 0 D 2 329 0 0 (empty) \f[R] .fi .PP When Zeek rotates logs (https://docs.zeek.org/en/stable/frameworks/logging.html#rotation), it produces compressed batches of \f[C]*.tar.gz\f[R] regularly. Ingesting a compressed batch involves unpacking and concatenating the input before sending it to VAST: .IP .nf \f[C] gunzip -c *.gz | vast import zeek \f[R] .fi .SS zeek-json .PP The \f[C]import zeek\f[R] command consumes Zeek (https://zeek.org) logs in tab-separated value (TSV) style, and the \f[C]import zeek-json\f[R] command consumes Zeek logs as line-delimited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON) objects as produced by the json-streaming-logs (https://github.com/corelight/json-streaming-logs) package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional \f[C]_path\f[R] field to disambiguate the log type. For stock Zeek JSON logs, use the existing \f[C]import json\f[R] with the \f[C]-t\f[R] flag to specify the log type. .PP Here\[cq]s an example of a typical Zeek \f[C]conn.log\f[R]: .IP .nf \f[C] #separator \[rs]x09 #set_separator , #empty_field (empty) #unset_field - #path conn #open 2014-05-23-18-02-04 #fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents #types time string addr port addr port enum string interval count count string bool count string count count count count table[string] 1258531221.486539 Pii6cUUq1v4 192.168.1.102 68 192.168.1.1 67 udp - 0.163820 301 300 SF - 0 Dd 1 329 1 328 (empty) 1258531680.237254 nkCxlvNN8pi 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - 0 D 7 546 0 0 (empty) 1258531693.816224 9VdICMMnxQ7 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - 0 D 7 546 0 0 (empty) 1258531635.800933 bEgBnkI31Vf 192.168.1.103 138 192.168.1.255 138 udp - 46.725380 560 0 S0 - 0 D 3 644 0 0 (empty) 1258531693.825212 Ol4qkvXOksc 192.168.1.102 138 192.168.1.255 138 udp - 2.248589 348 0 S0 - 0 D 2 404 0 0 (empty) 1258531803.872834 kmnBNBtl96d 192.168.1.104 137 192.168.1.255 137 udp dns 3.748893 350 0 S0 - 0 D 7 546 0 0 (empty) 1258531747.077012 CFIX6YVTFp2 192.168.1.104 138 192.168.1.255 138 udp - 59.052898 549 0 S0 - 0 D 3 633 0 0 (empty) 1258531924.321413 KlF6tbPUSQ1 192.168.1.103 68 192.168.1.1 67 udp - 0.044779 303 300 SF - 0 Dd 1 331 1 328 (empty) 1258531939.613071 tP3DM6npTdj 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty) 1258532046.693816 Jb4jIDToo77 192.168.1.104 68 192.168.1.1 67 udp - 0.002103 311 300 SF - 0 Dd 1 339 1 328 (empty) 1258532143.457078 xvWLhxgUmj5 192.168.1.102 1170 192.168.1.1 53 udp dns 0.068511 36 215 SF - 0 Dd 1 64 1 243 (empty) 1258532203.657268 feNcvrZfDbf 192.168.1.104 1174 192.168.1.1 53 udp dns 0.170962 36 215 SF - 0 Dd 1 64 1 243 (empty) 1258532331.365294 aLsTcZJHAwa 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.100381 273 0 S0 - 0 D 2 329 0 0 (empty) \f[R] .fi .PP When Zeek rotates logs (https://docs.zeek.org/en/stable/frameworks/logging.html#rotation), it produces compressed batches of \f[C]*.tar.gz\f[R] regularly. Ingesting a compressed batch involves unpacking and concatenating the input before sending it to VAST: .IP .nf \f[C] gunzip -c *.gz | vast import zeek \f[R] .fi .SS csv .PP The \f[C]import csv\f[R] command imports comma-separated values (https://en.wikipedia.org/wiki/Comma-separated_values) in tabular form. The first line in a CSV file must contain a header that describes the field names. The remaining lines contain concrete values. Except for the header, one line corresponds to one event. .PP Because CSV has no notion of typing, it is necessary to select a layout via \f[C]--type\f[R] whose field names correspond to the CSV header field names. Such a layout must either be defined in a schema file known to VAST, or be defined in a schema passed using \f[C]--schema\f[R] or \f[C]--schema-file\f[R]. .PP E.g., to import Threat Intelligence data into VAST, the known type \f[C]intel.indicator\f[R] can be used: .IP .nf \f[C] vast import --type=intel.indicator --read=path/to/indicators.csv csv \f[R] .fi .SS json .PP The \f[C]json\f[R] import format consumes line-delimited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON) objects according to a specified schema. That is, one line corresponds to one event. The object field names correspond to record field names. .PP JSON\[cq]s can express only a subset VAST\[cq]s data model. For example, VAST has first-class support IP addresses but JSON can only represent them as strings. To get the most out of your data, it is therefore important to define a schema to get a differentiated view of the data. .PP The \f[C]infer\f[R] command also supports schema inference for JSON data. For example, \f[C]head data.json | vast infer\f[R] will print a raw schema that can be supplied to \f[C]--schema-file\f[R] / \f[C]-s\f[R] as file or to \f[C]--schema\f[R] / \f[C]-S\f[R] as string. However, after \f[C]infer\f[R] dumps the schema, the generic type name should still be adjusted and this would be the time to make use of more precise types, such as \f[C]timestamp\f[R] instead of \f[C]time\f[R], or annotate them with additional attributes, such as \f[C]#skip\f[R]. .PP If no type prefix is specified with \f[C]--type\f[R] / \f[C]-t\f[R], or multiple types match based on the prefix, VAST uses an exact match based on the field names to automatically deduce the event type for every line in the input. .SS suricata .PP The \f[C]import suricata\f[R] command format consumes Eve JSON (https://suricata.readthedocs.io/en/latest/output/eve/eve-json-output.html) logs from Suricata (https://suricata-ids.org). Eve JSON is Suricata\[cq]s unified format to log all types of activity as single stream of line-delimited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON). .PP For each log entry, VAST parses the field \f[C]event_type\f[R] to determine the specific record type and then parses the data according to the known schema. .PP To add support for additional fields and event types, adapt the \f[C]suricata.schema\f[R] file that ships with every installation of VAST. .IP .nf \f[C] vast import suricata < path/to/eve.log \f[R] .fi .SS syslog .PP Ingest Syslog messages into VAST. The following formats are supported: - RFC 5424 (https://tools.ietf.org/html/rfc5424) - A fallback format that consists only of the Syslog message. .IP .nf \f[C] # Import from file. vast import syslog --read=path/to/sys.log # Continuously import from a stream. syslog | vast import syslog \f[R] .fi .SS test .PP The \f[C]import test\f[R] command exists primarily for testing and benchmarking purposes. It generates and ingests random data for a given schema. .SS pivot .PP The \f[C]pivot\f[R] command retrieves data of a related type. It inspects each event in a query result to find an event of the requested type. If the related type exists in the schema, VAST will dynamically create a new query to fetch the contextual data according to the type relationship. .IP .nf \f[C] vast pivot [options] \f[R] .fi .PP VAST uses the field \f[C]community_id\f[R] to pivot between logs and packets. Pivoting is currently implemented for Suricata, Zeek (with community ID computation (https://github.com/corelight/bro-community-id) enabled), and PCAP. For Zeek specifically, the \f[C]uid\f[R] field is supported as well. .PP For example, to get all events of type \f[C]pcap.packet\f[R] that can be pivoted to over common fields from other events that match the query \f[C]dest_ip == 72.247.178.18\f[R], use this command: .IP .nf \f[C] vast pivot pcap.packet \[aq]dest_ip == 72.247.178.18\[aq] \f[R] .fi .PP The \f[C]pivot\f[R] command is similar to the \f[C]explore\f[R] command in that they allow for querying additional context. .PP Unlike the \f[C]export\f[R] command, the output format can be selected using \f[C]--format=\f[R]. The default export format is \f[C]json\f[R]. .PP For more information on schema pivoting, head over to docs.tenzir.com (https://docs.tenzir.com/vast/features/schema-pivoting). .SS spawn .PP The \f[C]spawn\f[R] command spawns a component inside the node. This is useful when the server process itself is to be used for importing events, e.g., because the latency for sending events to the server process is too high. .PP Currently, only the \f[C]spawn source\f[R] command is documented. See \f[C]vast spawn source help\f[R] for more information. ##### source .PP The \f[C]spawn source\f[R] command spawns a new source inside the node. .PP The following commands do the same thing, except for the \f[C]spawn source\f[R] version not running in a separate process: .IP .nf \f[C] vast spawn source [options] [options] [expr] vast import [options] [options] [expr] \f[R] .fi .PP For more information, please refer to the documentation for the \f[C]import\f[R] command (https://docs.tenzir.com/vast/cli/vast/import). ###### csv .PP The \f[C]spawn source csv\f[R] command spawns a CSV source inside the node and is the analog to the \f[C]import csv\f[R] command. .PP For more information, please refer to the documentation for the commands \f[C]spawn source\f[R] (https://docs.tenzir.com/vast/cli/vast/spawn/source) and \f[C]import csv\f[R] (https://docs.tenzir.com/vast/cli/vast/import#import-csv). .SS json .PP The \f[C]spawn source json\f[R] command spawns a JSON source inside the node and is the analog to the \f[C]import json\f[R] command. .PP For more information, please refer to the documentation for the commands \f[C]spawn source\f[R] (https://docs.tenzir.com/vast/cli/vast/spawn/source) and \f[C]import json\f[R] (https://docs.tenzir.com/vast/cli/vast/import#import-json). .SS suricata .PP The \f[C]spawn source suricata\f[R] command spawns a Suricata source inside the node and is the analog to the \f[C]import suricata\f[R] command. .PP For more information, please refer to the documentation for the \f[C]spawn source\f[R] (https://docs.tenzir.com/vast/cli/vast/spawn/source) and \f[C]import suricata\f[R] (https://docs.tenzir.com/vast/cli/vast/import#import-suricata). .SS syslog .PP The \f[C]spawn source syslog\f[R] command spawns a Syslog source inside the node and is the analog to the \f[C]import syslog\f[R] command. .PP For more information, please refer to the documentation for the commands \f[C]spawn source\f[R] (https://docs.tenzir.com/vast/cli/vast/spawn/source) and \f[C]import syslog\f[R] (https://docs.tenzir.com/vast/cli/vast/import#import-syslog). .SS test .PP The \f[C]spawn source test\f[R] command spawns a test source inside the node and is the analog to the \f[C]import test\f[R] command. .PP For more information, please refer to the documentation for the commands \f[C]spawn source\f[R] (https://docs.tenzir.com/vast/cli/vast/spawn/source) and \f[C]import test\f[R] (https://docs.tenzir.com/vast/cli/vast/import#import-test). .SS zeek .PP The \f[C]spawn source zeek\f[R] command spawns a Zeek source inside the node and is the analog to the \f[C]import zeek\f[R] command. .PP For more information, please refer to the documentation for the commands \f[C]spawn source\f[R] (https://docs.tenzir.com/vast/cli/vast/spawn/source) and \f[C]import zeek\f[R] (https://docs.tenzir.com/vast/cli/vast/import#import-zeek). .SS start .PP The \f[C]start\f[R] command spins up a VAST node. Starting a node is the first step when deploying VAST as a continuously running server. The process runs in the foreground and uses standard error for logging. Standard output remains unused, unless the \f[C]--print-endpoint\f[R] option is enabled. .PP By default, the \f[C]start\f[R] command creates a \f[C]vast.db\f[R] directory in the current working directory. It is recommended to set the options for the node in the \f[C]vast.yaml\f[R] file, such that they are picked up by all client commands as well. .PP In the most basic form, VAST spawns one server process that contains all core actors that manage the persistent state, i.e., archive and index. This process spawns only one \[lq]container\[rq] actor that we call a \f[C]node\f[R]. .PP The \f[C]node\f[R] is the core piece of VAST that is continuously running in the background, and can be interacted with using the \f[C]import\f[R] and \f[C]export\f[R] commands (among others). To gracefully stop the node, the \f[C]stop\f[R] command can be used. .PP To use VAST without running a central node, pass the \f[C]--node\f[R] flag to commands interacting with the node. This is useful mostly for quick experiments, and spawns an ad-hoc node instead of connecting to one. .PP Only one node can run at the same time for a given database. This is ensured using a lock file named \f[C]pid.lock\f[R] that lives inside the \f[C]vast.db\f[R] directory. .PP Further information on getting started with using VAST is available on docs.tenzir.com (https://docs.tenzir.com/vast/quick-start/introduction). .SS status .PP The \f[C]status\f[R] command dumps VAST\[cq]s runtime state in JSON format. .PP The unit of measurement for memory sizes is kilobytes. .PP For example, to see how many events of each type are indexed, this command can be used: .IP .nf \f[C] vast status --detailed | jq \[aq].index.statistics.layouts\[aq] \f[R] .fi .SS stop .PP The \f[C]stop\f[R] command gracefully brings down a VAST server, and is the analog of the \f[C]start\f[R] command. .PP While it is technically possible to shut down a VAST server gracefully by sending \f[C]SIGINT(2)\f[R] to the \f[C]vast start\f[R] process, it is recommended to use \f[C]vast stop\f[R] to shut down the server process, as it works over the wire as well and guarantees a proper shutdown. The command blocks execution until the node has quit, and returns a zero exit code when it succeeded, making it ideal for use in launch system scripts. .SS version .PP The \f[C]version\f[R] command prints the version of the VAST executable and its major dependencies in JSON format. .SH ISSUES .PP If you encounter a bug or have suggestions for improvement, please file an issue at . .SH SEE ALSO .PP Visit for more information about VAST. .SH AUTHORS Tenzir GmbH.