.TH binary 3erl "stdlib 3.14" "Ericsson AB" "Erlang Module Definition" .SH NAME binary \- Library for handling binary data. .SH DESCRIPTION .LP This module contains functions for manipulating byte-oriented binaries\&. Although the majority of functions could be provided using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume less memory, or both, than a counterpart written in pure Erlang\&. .LP The module is provided according to Erlang Enhancement Proposal (EEP) 31\&. .LP .RS -4 .B Note: .RE The library handles byte-oriented data\&. For bitstrings that are not binaries (does not contain whole octets of bits) a \fIbadarg\fR\& exception is thrown from any of the functions in this module\&. .SH DATA TYPES .nf \fBcp()\fR\& .br .fi .RS .LP Opaque data type representing a compiled search pattern\&. Guaranteed to be a \fItuple()\fR\& to allow programs to distinguish it from non-precompiled search patterns\&. .RE .nf \fBpart()\fR\& = {Start :: integer() >= 0, Length :: integer()} .br .fi .RS .LP A representation of a part (or range) in a binary\&. \fIStart\fR\& is a zero-based offset into a \fIbinary()\fR\& and \fILength\fR\& is the length of that part\&. As input to functions in this module, a reverse part specification is allowed, constructed with a negative \fILength\fR\&, so that the part of the binary begins at \fIStart\fR\& + \fILength\fR\& and is -\fILength\fR\& long\&. This is useful for referencing the last \fIN\fR\& bytes of a binary as \fI{size(Binary), -N}\fR\&\&. The functions in this module always return \fIpart()\fR\&s with positive \fILength\fR\&\&. .RE .SH EXPORTS .LP .nf .B at(Subject, Pos) -> byte() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pos = integer() >= 0 .br .RE .RE .RS .LP Returns the byte at position \fIPos\fR\& (zero-based) in binary \fISubject\fR\& as an integer\&. If \fIPos\fR\& >= \fIbyte_size(Subject)\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B bin_to_list(Subject) -> [byte()] .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP Same as \fIbin_to_list(Subject, {0,byte_size(Subject)})\fR\&\&. .RE .LP .nf .B bin_to_list(Subject, PosLen) -> [byte()] .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br PosLen = part() .br .RE .RE .RS .LP Converts \fISubject\fR\& to a list of \fIbyte()\fR\&s, each representing the value of one byte\&. \fIpart()\fR\& denotes which part of the \fIbinary()\fR\& to convert\&. .LP \fIExample:\fR\& .LP .nf 1> binary:bin_to_list(<<"erlang">>, {1,3}). "rla" %% or [114,108,97] in list notation. .fi .LP If \fIPosLen\fR\& in any way references outside the binary, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B bin_to_list(Subject, Pos, Len) -> [byte()] .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pos = integer() >= 0 .br Len = integer() .br .RE .RE .RS .LP Same as\fI bin_to_list(Subject, {Pos, Len})\fR\&\&. .RE .LP .nf .B compile_pattern(Pattern) -> cp() .br .fi .br .RS .LP Types: .RS 3 Pattern = binary() | [binary()] .br .RE .RE .RS .LP Builds an internal structure representing a compilation of a search pattern, later to be used in functions \fImatch/3\fR\&, \fImatches/3\fR\&, \fIsplit/3\fR\&, or \fIreplace/4\fR\&\&. The \fIcp()\fR\& returned is guaranteed to be a \fItuple()\fR\& to allow programs to distinguish it from non-precompiled search patterns\&. .LP When a list of binaries is specified, it denotes a set of alternative binaries to search for\&. For example, if \fI[<<"functional">>,<<"programming">>]\fR\& is specified as \fIPattern\fR\&, this means either \fI<<"functional">>\fR\& or \fI<<"programming">>\fR\&"\&. The pattern is a set of alternatives; when only a single binary is specified, the set has only one element\&. The order of alternatives in a pattern is not significant\&. .LP The list of binaries used for search alternatives must be flat and proper\&. .LP If \fIPattern\fR\& is not a binary or a flat proper list of binaries with length > 0, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B copy(Subject) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP Same as \fIcopy(Subject, 1)\fR\&\&. .RE .LP .nf .B copy(Subject, N) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br N = integer() >= 0 .br .RE .RE .RS .LP Creates a binary with the content of \fISubject\fR\& duplicated \fIN\fR\& times\&. .LP This function always creates a new binary, even if \fIN = 1\fR\&\&. By using \fIcopy/1\fR\& on a binary referencing a larger binary, one can free up the larger binary for garbage collection\&. .LP .RS -4 .B Note: .RE By deliberately copying a single binary to avoid referencing a larger binary, one can, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed\&. Sharing binary data is usually good\&. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate copying can be a good idea\&. .LP If \fIN\fR\& < \fI0\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B decode_unsigned(Subject) -> Unsigned .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Unsigned = integer() >= 0 .br .RE .RE .RS .LP Same as \fIdecode_unsigned(Subject, big)\fR\&\&. .RE .LP .nf .B decode_unsigned(Subject, Endianness) -> Unsigned .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Endianness = big | little .br Unsigned = integer() >= 0 .br .RE .RE .RS .LP Converts the binary digit representation, in big endian or little endian, of a positive integer in \fISubject\fR\& to an Erlang \fIinteger()\fR\&\&. .LP \fIExample:\fR\& .LP .nf 1> binary:decode_unsigned(<<169,138,199>>,big). 11111111 .fi .RE .LP .nf .B encode_unsigned(Unsigned) -> binary() .br .fi .br .RS .LP Types: .RS 3 Unsigned = integer() >= 0 .br .RE .RE .RS .LP Same as \fIencode_unsigned(Unsigned, big)\fR\&\&. .RE .LP .nf .B encode_unsigned(Unsigned, Endianness) -> binary() .br .fi .br .RS .LP Types: .RS 3 Unsigned = integer() >= 0 .br Endianness = big | little .br .RE .RE .RS .LP Converts a positive integer to the smallest possible representation in a binary digit representation, either big endian or little endian\&. .LP \fIExample:\fR\& .LP .nf 1> binary:encode_unsigned(11111111, big). <<169,138,199>> .fi .RE .LP .nf .B first(Subject) -> byte() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP Returns the first byte of binary \fISubject\fR\& as an integer\&. If the size of \fISubject\fR\& is zero, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B last(Subject) -> byte() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP Returns the last byte of binary \fISubject\fR\& as an integer\&. If the size of \fISubject\fR\& is zero, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B list_to_bin(ByteList) -> binary() .br .fi .br .RS .LP Types: .RS 3 ByteList = iolist() .br .RE .RE .RS .LP Works exactly as \fIerlang:list_to_binary/1\fR\&, added for completeness\&. .RE .LP .nf .B longest_common_prefix(Binaries) -> integer() >= 0 .br .fi .br .RS .LP Types: .RS 3 Binaries = [binary()] .br .RE .RE .RS .LP Returns the length of the longest common prefix of the binaries in list \fIBinaries\fR\&\&. .LP \fIExample:\fR\& .LP .nf 1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]). 2 2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]). 0 .fi .LP If \fIBinaries\fR\& is not a flat list of binaries, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B longest_common_suffix(Binaries) -> integer() >= 0 .br .fi .br .RS .LP Types: .RS 3 Binaries = [binary()] .br .RE .RE .RS .LP Returns the length of the longest common suffix of the binaries in list \fIBinaries\fR\&\&. .LP \fIExample:\fR\& .LP .nf 1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]). 3 2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]). 0 .fi .LP If \fIBinaries\fR\& is not a flat list of binaries, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B match(Subject, Pattern) -> Found | nomatch .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Found = part() .br .RE .RE .RS .LP Same as \fImatch(Subject, Pattern, [])\fR\&\&. .RE .LP .nf .B match(Subject, Pattern, Options) -> Found | nomatch .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Found = part() .br Options = [Option] .br Option = {scope, part()} .br .nf \fBpart()\fR\& = {Start :: integer() >= 0, Length :: integer()} .fi .br .RE .RE .RS .LP Searches for the first occurrence of \fIPattern\fR\& in \fISubject\fR\& and returns the position and length\&. .LP The function returns \fI{Pos, Length}\fR\& for the binary in \fIPattern\fR\&, starting at the lowest position in \fISubject\fR\&\&. .LP \fIExample:\fR\& .LP .nf 1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]). {1,4} .fi .LP Even though \fI<<"cd">>\fR\& ends before \fI<<"bcde">>\fR\&, \fI<<"bcde">>\fR\& begins first and is therefore the first match\&. If two overlapping matches begin at the same position, the longest is returned\&. .LP Summary of the options: .RS 2 .TP 2 .B {scope, {Start, Length}}: Only the specified part is searched\&. Return values still have offsets from the beginning of \fISubject\fR\&\&. A negative \fILength\fR\& is allowed as described in section Data Types in this manual\&. .RE .LP If none of the strings in \fIPattern\fR\& is found, the atom \fInomatch\fR\& is returned\&. .LP For a description of \fIPattern\fR\&, see function \fIcompile_pattern/1\fR\&\&. .LP If \fI{scope, {Start,Length}}\fR\& is specified in the options such that \fIStart\fR\& > size of \fISubject\fR\&, \fIStart\fR\& + \fILength\fR\& < 0 or \fIStart\fR\& + \fILength\fR\& > size of \fISubject\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B matches(Subject, Pattern) -> Found .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Found = [part()] .br .RE .RE .RS .LP Same as \fImatches(Subject, Pattern, [])\fR\&\&. .RE .LP .nf .B matches(Subject, Pattern, Options) -> Found .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Found = [part()] .br Options = [Option] .br Option = {scope, part()} .br .nf \fBpart()\fR\& = {Start :: integer() >= 0, Length :: integer()} .fi .br .RE .RE .RS .LP As \fImatch/2\fR\&, but \fISubject\fR\& is searched until exhausted and a list of all non-overlapping parts matching \fIPattern\fR\& is returned (in order)\&. .LP The first and longest match is preferred to a shorter, which is illustrated by the following example: .LP .nf 1> binary:matches(<<"abcde">>, [<<"bcde">>,<<"bc">>,<<"de">>],[]). [{1,4}] .fi .LP The result shows that <<"bcde">> is selected instead of the shorter match <<"bc">> (which would have given raise to one more match, <<"de">>)\&. This corresponds to the behavior of POSIX regular expressions (and programs like awk), but is not consistent with alternative matches in \fIre\fR\& (and Perl), where instead lexical ordering in the search pattern selects which string matches\&. .LP If none of the strings in a pattern is found, an empty list is returned\&. .LP For a description of \fIPattern\fR\&, see \fIcompile_pattern/1\fR\&\&. For a description of available options, see \fImatch/3\fR\&\&. .LP If \fI{scope, {Start,Length}}\fR\& is specified in the options such that \fIStart\fR\& > size of \fISubject\fR\&, \fIStart + Length\fR\& < 0 or \fIStart + Length\fR\& is > size of \fISubject\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B part(Subject, PosLen) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br PosLen = part() .br .RE .RE .RS .LP Extracts the part of binary \fISubject\fR\& described by \fIPosLen\fR\&\&. .LP A negative length can be used to extract bytes at the end of a binary: .LP .nf 1> Bin = <<1,2,3,4,5,6,7,8,9,10>>. 2> binary:part(Bin, {byte_size(Bin), -5}). <<6,7,8,9,10>> .fi .LP .RS -4 .B Note: .RE part/2 and part/3 are also available in the \fIerlang\fR\& module under the names \fIbinary_part/2\fR\& and \fIbinary_part/3\fR\&\&. Those BIFs are allowed in guard tests\&. .LP If \fIPosLen\fR\& in any way references outside the binary, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B part(Subject, Pos, Len) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pos = integer() >= 0 .br Len = integer() .br .RE .RE .RS .LP Same as \fIpart(Subject, {Pos, Len})\fR\&\&. .RE .LP .nf .B referenced_byte_size(Binary) -> integer() >= 0 .br .fi .br .RS .LP Types: .RS 3 Binary = binary() .br .RE .RE .RS .LP If a binary references a larger binary (often described as being a subbinary), it can be useful to get the size of the referenced binary\&. This function can be used in a program to trigger the use of \fIcopy/1\fR\&\&. By copying a binary, one can dereference the original, possibly large, binary that a smaller binary is a reference to\&. .LP \fIExample:\fR\& .LP .nf store(Binary, GBSet) -> NewBin = case binary:referenced_byte_size(Binary) of Large when Large > 2 * byte_size(Binary) -> binary:copy(Binary); _ -> Binary end, gb_sets:insert(NewBin,GBSet). .fi .LP In this example, we chose to copy the binary content before inserting it in \fIgb_sets:set()\fR\& if it references a binary more than twice the data size we want to keep\&. Of course, different rules apply when copying to different programs\&. .LP Binary sharing occurs whenever binaries are taken apart\&. This is the fundamental reason why binaries are fast, decomposition can always be done with O(1) complexity\&. In rare circumstances this data sharing is however undesirable, why this function together with \fIcopy/1\fR\& can be useful when optimizing for memory use\&. .LP Example of binary sharing: .LP .nf 1> A = binary:copy(<<1>>, 100). <<1,1,1,1,1 ... 2> byte_size(A). 100 3> binary:referenced_byte_size(A). 100 4> <> = A. <<1,1,1,1,1 ... 5> {byte_size(B), binary:referenced_byte_size(B)}. {10,10} 6> {byte_size(C), binary:referenced_byte_size(C)}. {90,100} .fi .LP In the above example, the small binary \fIB\fR\& was copied while the larger binary \fIC\fR\& references binary \fIA\fR\&\&. .LP .RS -4 .B Note: .RE Binary data is shared among processes\&. If another process still references the larger binary, copying the part this process uses only consumes more memory and does not free up the larger binary for garbage collection\&. Use this kind of intrusive functions with extreme care and only if a real problem is detected\&. .RE .LP .nf .B replace(Subject, Pattern, Replacement) -> Result .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Replacement = Result = binary() .br .RE .RE .RS .LP Same as \fIreplace(Subject, Pattern, Replacement,[])\fR\&\&. .RE .LP .nf .B replace(Subject, Pattern, Replacement, Options) -> Result .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Replacement = binary() .br Options = [Option] .br Option = global | {scope, part()} | {insert_replaced, InsPos} .br InsPos = OnePos | [OnePos] .br OnePos = integer() >= 0 .br .RS 2 An integer() =< byte_size(Replacement) .RE Result = binary() .br .RE .RE .RS .LP Constructs a new binary by replacing the parts in \fISubject\fR\& matching \fIPattern\fR\& with the content of \fIReplacement\fR\&\&. .LP If the matching subpart of \fISubject\fR\& giving raise to the replacement is to be inserted in the result, option \fI{insert_replaced, InsPos}\fR\& inserts the matching part into \fIReplacement\fR\& at the specified position (or positions) before inserting \fIReplacement\fR\& into \fISubject\fR\&\&. .LP \fIExample:\fR\& .LP .nf 1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]). <<"a[b]cde">> 2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]). <<"a[b]c[d]e">> 3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]). <<"a[bb]c[dd]e">> 4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]). <<"a[b-b]c[d-d]e">> .fi .LP If any position specified in \fIInsPos\fR\& > size of the replacement binary, a \fIbadarg\fR\& exception is raised\&. .LP Options \fIglobal\fR\& and \fI{scope, part()}\fR\& work as for \fIsplit/3\fR\&\&. The return type is always a \fIbinary()\fR\&\&. .LP For a description of \fIPattern\fR\&, see \fIcompile_pattern/1\fR\&\&. .RE .LP .nf .B split(Subject, Pattern) -> Parts .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Parts = [binary()] .br .RE .RE .RS .LP Same as \fIsplit(Subject, Pattern, [])\fR\&\&. .RE .LP .nf .B split(Subject, Pattern, Options) -> Parts .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | cp() .br Options = [Option] .br Option = {scope, part()} | trim | global | trim_all .br Parts = [binary()] .br .RE .RE .RS .LP Splits \fISubject\fR\& into a list of binaries based on \fIPattern\fR\&\&. If option \fIglobal\fR\& is not specified, only the first occurrence of \fIPattern\fR\& in \fISubject\fR\& gives rise to a split\&. .LP The parts of \fIPattern\fR\& found in \fISubject\fR\& are not included in the result\&. .LP \fIExample:\fR\& .LP .nf 1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]). [<<1,255,4>>, <<2,3>>] 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]). [<<0,1>>,<<4>>,<<9>>] .fi .LP Summary of options: .RS 2 .TP 2 .B {scope, part()}: Works as in \fImatch/3\fR\& and \fImatches/3\fR\&\&. Notice that this only defines the scope of the search for matching strings, it does not cut the binary before splitting\&. The bytes before and after the scope are kept in the result\&. See the example below\&. .TP 2 .B trim: Removes trailing empty parts of the result (as does \fItrim\fR\& in \fIre:split/3\fR\&\&. .TP 2 .B trim_all: Removes all empty parts of the result\&. .TP 2 .B global: Repeats the split until \fISubject\fR\& is exhausted\&. Conceptually option \fIglobal\fR\& makes split work on the positions returned by \fImatches/3\fR\&, while it normally works on the position returned by \fImatch/3\fR\&\&. .RE .LP Example of the difference between a scope and taking the binary apart before splitting: .LP .nf 1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]). [<<"ban">>,<<"na">>] 2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]). [<<"n">>,<<"n">>] .fi .LP The return type is always a list of binaries that are all referencing \fISubject\fR\&\&. This means that the data in \fISubject\fR\& is not copied to new binaries, and that \fISubject\fR\& cannot be garbage collected until the results of the split are no longer referenced\&. .LP For a description of \fIPattern\fR\&, see \fIcompile_pattern/1\fR\&\&. .RE