Scroll to navigation

API(3) libcrush API(3)

NAME

API

SYNOPSIS

Data Structures


struct crush_bucket
struct crush_bucket_uniform
struct crush_bucket_list
struct crush_bucket_straw2
struct crush_map

Macros


#define CRUSH_ITEM_NONE 0x7fffffff

Enumerations


enum crush_opcodes { CRUSH_RULE_NOOP = 0, CRUSH_RULE_TAKE = 1, CRUSH_RULE_CHOOSE_FIRSTN = 2, CRUSH_RULE_CHOOSE_INDEP = 3, CRUSH_RULE_EMIT = 4, CRUSH_RULE_CHOOSELEAF_FIRSTN = 6, CRUSH_RULE_CHOOSELEAF_INDEP = 7, CRUSH_RULE_SET_CHOOSE_TRIES = 8, CRUSH_RULE_SET_CHOOSELEAF_TRIES = 9, CRUSH_RULE_SET_CHOOSE_LOCAL_TRIES = 10, CRUSH_RULE_SET_CHOOSE_LOCAL_FALLBACK_TRIES = 11, CRUSH_RULE_SET_CHOOSELEAF_VARY_R = 12, CRUSH_RULE_SET_CHOOSELEAF_STABLE = 13 }
enum crush_algorithm { CRUSH_BUCKET_UNIFORM = 1, CRUSH_BUCKET_LIST = 2, CRUSH_BUCKET_STRAW2 = 5 }

Functions


struct crush_map * crush_create ()
void crush_finalize (struct crush_map *map)
struct crush_rule * crush_make_rule (int len, int ruleset, int type, int minsize, int maxsize)
void crush_rule_set_step (struct crush_rule *rule, int pos, int op, int arg1, int arg2)
int crush_add_rule (struct crush_map *map, struct crush_rule *rule, int ruleno)
int crush_add_bucket (struct crush_map *map, int bucketno, struct crush_bucket *bucket, int *idout)
struct crush_bucket * crush_make_bucket (struct crush_map *map, int alg, int hash, int type, int size, int *items, int *weights)
int crush_bucket_add_item (struct crush_map *map, struct crush_bucket *bucket, int item, int weight)
int crush_bucket_adjust_item_weight (struct crush_map *map, struct crush_bucket *bucket, int item, int weight)
int crush_reweight_bucket (struct crush_map *crush, struct crush_bucket *bucket)
int crush_remove_bucket (struct crush_map *map, struct crush_bucket *bucket)
int crush_bucket_remove_item (struct crush_map *map, struct crush_bucket *bucket, int item)
void crush_destroy (struct crush_map *map)
int crush_do_rule (const struct crush_map *map, int ruleno, int x, int *result, int result_max, const __u32 *weights, int weight_max, void *cwin)

Detailed Description

Macro Definition Documentation

#define CRUSH_ITEM_NONE 0x7fffffff

The equivalent of NULL for an item, i.e. the absence of an item.

Enumeration Type Documentation

enum crush_algorithm

Items within a bucket are chosen with crush_do_rule() using one of three algorithms representing a tradeoff between performance and reorganization efficiency. If you are unsure of which bucket type to use, we recommend using CRUSH_BUCKET_STRAW2.

The table summarizes how the speed of each option measures up against mapping stability when items are added or removed.

Bucket Alg     Speed       Additions    Removals
------------------------------------------------
uniform         O(1)       poor         poor
list            O(n)       optimal      poor
straw2          O(n)       optimal      optimal

Enumerator

Devices are rarely added individually in a large system. Instead, new storage is typically deployed in blocks of identical devices, often as an additional shelf in a server rack or perhaps an entire cabinet. Devices reaching their end of life are often similarly decommissioned as a set (individual failures aside), making it natural to treat them as a unit. CRUSH uniform buckets are used to represent an identical set of devices in such circumstances. The key advantage in doing so is performance related: CRUSH can map replicas into uniform buckets in constant time. In cases where the uniformity restrictions are not appropriate, other bucket types can be used. If the size of a uniform bucket changes, there is a complete reshuffling of data between devices, much like conventional hash-based distribution strategies.
List buckets structure their contents as a linked list, and can contain items with arbitrary weights. To place a replica, CRUSH begins at the head of the list with the most recently added item and compares its weight to the sum of all remaining items’ weights. Depending on the value of hash( x , r , item), either the current item is chosen with the appropriate probability, or the process continues recursively down the list. This is a natural and intuitive choice for an expanding cluster: either an object is relocated to the newest device with some appropriate probability, or it remains on the older devices as before. The result is optimal data migration when items are added to the bucket. Items removed from the middle or tail of the list, however, can result in a significant amount of unnecessary movement, making list buckets most suitable for circumstances in which they never (or very rarely) shrink.
List and tree buckets are structured such that a limited number of hash values need to be calculated and compared to weights in order to select a bucket item. In doing so, they divide and conquer in a way that either gives certain items precedence (e. g., those at the beginning of a list) or obviates the need to consider entire subtrees of items at all. That im- proves the performance of the replica placement process, but can also introduce suboptimal reorganization behavior when the contents of a bucket change due an addition, removal, or re-weighting of an item.

The straw2 bucket type allows all items to fairly “compete” against each other for replica placement through a process analogous to a draw of straws. To place a replica, a straw of random length is drawn for each item in the bucket. The item with the longest straw wins. The length of each straw is initially a value in a fixed range. Each straw length is scaled by a factor based on the item’s weight so that heavily weighted items are more likely to win the draw. Although this process is almost twice as slow (on average) than a list bucket and even slower than a tree bucket (which scales logarithmically), straw2 buckets result in optimal data movement between nested items when modified.

enum crush_opcodes

Enumerator

do nothing

Function Documentation

int crush_add_bucket (struct crush_map * map, int bucketno, struct crush_bucket * bucket, int * idout)

Add bucket into the crush map and assign it the bucketno unique identifier. If bucketno is -1, the function will assign the lowest available identifier. The bucket identifier must be a negative integer. The bucket identifier is returned via idout.

  • return -ENOMEM if realloc(3) fails to expand the array of buckets in the map
  • return -EEXIST if the bucketno identifier is already assigned to another bucket.

Parameters:

map the crush_map
bucketno the bucket unique identifer or -1
bucket the bucket to add to the map
idout a pointer to the bucket identifier

Returns:

0 on success, < 0 on error

int crush_add_rule (struct crush_map * map, struct crush_rule * rule, int ruleno)

Add the rule into the crush map and assign it the ruleno unique identifier. If ruleno is -1, the function will assign the lowest available identifier. The ruleno value must be a positive integer lower than CRUSH_MAX_RULES.

  • return -ENOSPC if the rule identifier is >= CRUSH_MAX_RULES
  • return -ENOMEM if realloc(3) fails to expand the array of rules in the map

Parameters:

map the crush_map
rule the rule to add to the map
ruleno a positive integer < CRUSH_MAX_RULES or -1

Returns:

the rule unique identifier on success, < 0 on error

int crush_bucket_add_item (struct crush_map * map, struct crush_bucket * bucket, int item, int weight)

Add item to bucket with weight. The weight of the new item is added to the weight of the bucket so that it reflects the total weight of all items.

If bucket->alg is CRUSH_BUCKET_UNIFORM, the value of weight must be equal to __(struct crush_bucket_uniform *)bucket->item_weight__.

  • return -ENOMEN if the bucket cannot be resized with realloc(3).
  • return -ERANGE if adding weight to the weight of the bucket overflows.
  • return -1 if the value of bucket->alg is unknown.

Returns:

0 on success, < 0 on error

int crush_bucket_adjust_item_weight (struct crush_map * map, struct crush_bucket * bucket, int item, int weight)

If bucket->alg is CRUSH_BUCKET_UNIFORM, __(struct crush_bucket_uniform *)bucket->item_weight__ is set to weight and the weight of the bucket is set to be the number of items in the bucket times the weight. The return value is the difference between the new bucket weight and the former bucket weight. The item argument is ignored.

If bucket->alg is different from CRUSH_BUCKET_UNIFORM, set the weight of item in bucket. The former weight of the item is subtracted from the weight of of the bucket and the new weight is added. The return value is the difference between the new item weight and the former item weight.

Returns:

the difference between the new weight and the former weight

int crush_bucket_remove_item (struct crush_map * map, struct crush_bucket * bucket, int item)

Remove item from bucket and subtract the item weight from the bucket weight. If the weight of the item is greater than the weight of the bucket, silentely set the bucket weight to zero.

  • return -ENOMEN if the bucket cannot be sized down with realloc(3).
  • return -1 if the value of bucket->alg is unknown.

Parameters:

map unused
bucket the bucket from which item is removed
item the item to remove from bucket

Returns:

0 on success, < 0 on error

struct crush_map* crush_create ()

Allocate a crush_map with malloc(3) and initialize it. The caller is responsible for deallocating the crush_map with crush_destroy().

The content of the allocated crush_map is undefined and it is the responsibility of the caller to set meaningful values.

The recommended values are:


struct crush_map *m = crush_create();
m->choose_local_tries = 0;
m->choose_local_fallback_tries = 0;
m->choose_total_tries = 50;
m->chooseleaf_descend_once = 1;
m->chooseleaf_vary_r = 1;
m->chooseleaf_stable = 1;
m->allowed_bucket_algs =
(1 << CRUSH_BUCKET_UNIFORM) |
(1 << CRUSH_BUCKET_LIST) |
(1 << CRUSH_BUCKET_STRAW2);

Returns:

a pointer to the newly created crush_map or NULL

void crush_destroy (struct crush_map * map)

Deallocate the map, previously allocated with crush_create.

Parameters:

map the crush map

int crush_do_rule (const struct crush_map * map, int ruleno, int x, int * result, int result_max, const __u32 * weights, int weight_max, void * cwin)

Map x to result_max items and store them in the result array. The mapping is done by following each step of the rule ruleno. See crush_make_rule(), crush_rule_set_step() and crush_add_rule() for more information on how the rules are created, populated and added to the crush map.

The return value is the the number of items in the result array. If the caller asked for result_max items and the return value is X where X < result_max, the content of __result[0,X[__ is defined but the content of __result[X,result_max[__ is undefined. For example:

crush_do_rule(map, ruleno=1, x=1, result, result_max=3,...) == 1
result[0] is set
result[1] is undefined
result[2] is undefined

An entry in the result array is either an item in the crush map or CRUSH_ITEM_NONE if no item was found. For example:

crush_do_rule(map, ruleno=1, x=1, result, result_max=4,...) == 2
result[0] is CRUSH_ITEM_NONE
result[1] is item number 5
result[2] is undefined
result[3] is undefined

The weight array contains the probabilities that a leaf is ignored even if it is selected. It is a 16.16 fixed point number in the range [0x00000,0x10000]. The lower the value, the more often the leaf is ignored. For instance:

  • weight[leaf] == 0x00000 == 0.0 always ignore
  • weight[leaf] == 0x10000 == 1.0 never ignore
  • weight[leaf] == 0x08000 == 0.5 ignore 50% of the time
  • weight[leaf] == 0x04000 == 0.25 ignore 75% of the time
  • etc.

During mapping, each leaf is checked against the weight array, using the leaf as an index. If there is no entry in weight for the leaf, it is ignored. If there is an entry, the leaf will be ignored some of the time, depending on the probability.

The cwin argument must be set as follows:


char __cwin__[crush_work_size(__map__, __result_max__)];
crush_init_workspace(__map__, __cwin__);

Parameters:

map the crush_map
ruleno a positive integer < CRUSH_MAX_RULES
x the value to map to result_max items
result an array of items of size result_max
result_max the size of the result array
weights an array of weights of size weight_max
weight_max the size of the weights array
cwin must be the value of crush_work_size(map, result_max)

Returns:

0 on error or the size of result on success

void crush_finalize (struct crush_map * map)

Analyze the content of map and set the internal values required before it can be used to map values with crush_do_rule(). The caller must make sure it is run before crush_do_rule() and after any function that modifies the map (crush_add_bucket(), etc.).

Parameters:

map the crush_map

struct crush_bucket* crush_make_bucket (struct crush_map * map, int alg, int hash, int type, int size, int * items, int * weights)

Allocate a crush_bucket with malloc(3) and initialize it. The content of the bucket is filled with size items from items. The item selection is set to use alg which is one of CRUSH_BUCKET_UNIFORM , CRUSH_BUCKET_LIST or CRUSH_BUCKET_STRAW2. The initial items are assigned a weight from the weights array, depending on the value of alg. If alg is CRUSH_BUCKET_UNIFORM, all items are set to have a weight equal to weights[0], otherwise the weight of items[x] is set to be the value of weights[x].

Parameters:

map unused
alg algorithm for item selection
hash always set to CRUSH_HASH_RJENKINS1
type user defined bucket type
size of the items array
items array of size items
weights the weight of each item in items, depending on alg

struct crush_rule* crush_make_rule (int len, int ruleset, int type, int minsize, int maxsize)

Allocate an empty crush_rule structure large enough to store len steps. Steps can be added to a rule via crush_rule_set_step(). The ruleset is a user defined integer, not used by libcrush and stored in the allocated rule at rule->ruleset.

The rule is designed to allow crush_do_rule() to get at least minsize items and at most maxsize items.

The type is defined by the caller and will be used by crush_find_rule() when looking for a rule and by CRUSH_RULE_CHOOSE* steps when looking for items.

If malloc(3) fails, return NULL.

Parameters:

len number of steps in the rule
ruleset user defined value
type user defined value
minsize minimum number of items the rule can map
maxsize maximum number of items the rule can map

Returns:

a pointer to the newly created rule or NULL

int crush_remove_bucket (struct crush_map * map, struct crush_bucket * bucket)

Remove bucket from map and deallocate it via crush_destroy_bucket(). assert(3) that bucket is in map. The caller is responsible for making sure the bucket is not the child of any other bucket in the map.

Parameters:

map a crush_map containing bucket
bucket the bucket to remove from map

Returns:

0

int crush_reweight_bucket (struct crush_map * crush, struct crush_bucket * bucket)

Recursively update the weight of bucket and its children, deep first. The bucket weight is set to the sum of the weight of the items it contains.

  • return -ERANGE if the sum of the weight of the items in bucket overflows.
  • return -1 if the value of bucket->alg is unknown.

Parameters:

crush a crush_map containing bucket
bucket the root of the tree to reweight

Returns:

0 on success, < 0 on error

void crush_rule_set_step (struct crush_rule * rule, int pos, int op, int arg1, int arg2)

Set the pos step of the rule to an operand and up to two arguments. The value of the operand op determines if the arguments are used and how:

  • CRUSH_RULE_NOOP do nothing.
  • CRUSH_RULE_TAKE select the arg1 item
  • CRUSH_RULE_CHOOSE_FIRSTN and CRUSH_RULE_CHOOSE_INDEP recursively explore each bucket currently selected, looking for arg1 items of type arg2 and select them.
  • CRUSH_RULE_CHOOSELEAF_FIRSTN and CRUSH_RULE_CHOOSELEAF_INDEP recursively explore each bucket currently selected, looking for arg1 leaves within all the buckets of type arg2 and select them.
  • CRUSH_RULE_EMIT append the selection to the results and clear the selection

In all CHOOSE steps, if arg1 is zero, the number of items to select is determined by the max_result argument of crush_do_rule(), i.e. arg1 is max_result minus the number of items already in the result.

Parameters:

rule the rule in which the step is inserted
pos the zero based step index
op one of CRUSH_RULE_NOOP, CRUSH_RULE_TAKE, CRUSH_RULE_CHOOSE_FIRSTN, CRUSH_RULE_CHOOSE_INDEP, CRUSH_RULE_CHOOSELEAF_FIRSTN, CRUSH_RULE_CHOOSELEAF_INDEP or CRUSH_RULE_EMIT
arg1 first argument for op
arg2 second argument for op

Author

Generated automatically by Doxygen for libcrush from the source code.

Sun Feb 12 2017 Version 1.0.0