.TH binary 3erl "stdlib 2.2" "Ericsson AB" "Erlang Module Definition" .SH NAME binary \- Library for handling binary data .SH DESCRIPTION .LP This module contains functions for manipulating byte-oriented binaries\&. Although the majority of functions could be implemented using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume less memory (or both) than a counterpart written in pure Erlang\&. .LP The module is implemented according to the EEP (Erlang Enhancement Proposal) 31\&. .LP .RS -4 .B Note: .RE The library handles byte-oriented data\&. Bitstrings that are not binaries (does not contain whole octets of bits) will result in a \fIbadarg\fR\& exception being thrown from any of the functions in this module\&. .SH DATA TYPES .nf \fBcp()\fR\& .br .fi .RS .LP Opaque data-type representing a compiled search-pattern\&. Guaranteed to be a tuple() to allow programs to distinguish it from non precompiled search patterns\&. .RE .nf \fBpart()\fR\& = {Start :: integer() >= 0, Length :: integer()} .br .fi .RS .LP A representaion of a part (or range) in a binary\&. Start is a zero-based offset into a binary() and Length is the length of that part\&. As input to functions in this module, a reverse part specification is allowed, constructed with a negative Length, so that the part of the binary begins at Start + Length and is -Length long\&. This is useful for referencing the last N bytes of a binary as {size(Binary), -N}\&. The functions in this module always return part()\&'s with positive Length\&. .RE .SH EXPORTS .LP .nf .B at(Subject, Pos) -> byte() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pos = integer() >= 0 .br .RE .RE .RS .LP Returns the byte at position \fIPos\fR\& (zero-based) in the binary \fISubject\fR\& as an integer\&. If \fIPos\fR\& >= \fIbyte_size(Subject)\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B bin_to_list(Subject) -> [byte()] .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP The same as \fIbin_to_list(Subject,{0,byte_size(Subject)})\fR\&\&. .RE .LP .nf .B bin_to_list(Subject, PosLen) -> [byte()] .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br PosLen = \fBpart()\fR\& .br .RE .RE .RS .LP Converts \fISubject\fR\& to a list of \fIbyte()\fR\&s, each representing the value of one byte\&. The \fIpart()\fR\& denotes which part of the \fIbinary()\fR\& to convert\&. Example: .LP .nf 1> binary:bin_to_list(<<"erlang">>,{1,3}). "rla" %% or [114,108,97] in list notation. .fi .LP If \fIPosLen\fR\& in any way references outside the binary, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B bin_to_list(Subject, Pos, Len) -> [byte()] .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pos = integer() >= 0 .br Len = integer() .br .RE .RE .RS .LP The same as\fI bin_to_list(Subject,{Pos,Len})\fR\&\&. .RE .LP .nf .B compile_pattern(Pattern) -> cp() .br .fi .br .RS .LP Types: .RS 3 Pattern = binary() | [binary()] .br .RE .RE .RS .LP Builds an internal structure representing a compilation of a search-pattern, later to be used in the \fBmatch/3\fR\&, \fBmatches/3\fR\&, \fBsplit/3\fR\& or \fBreplace/4\fR\& functions\&. The \fIcp()\fR\& returned is guaranteed to be a \fItuple()\fR\& to allow programs to distinguish it from non pre-compiled search patterns .LP When a list of binaries is given, it denotes a set of alternative binaries to search for\&. I\&.e if \fI[<<"functional">>,<<"programming">>]\fR\& is given as \fIPattern\fR\&, this means "either \fI<<"functional">>\fR\& or \fI<<"programming">>\fR\&"\&. The pattern is a set of alternatives; when only a single binary is given, the set has only one element\&. The order of alternatives in a pattern is not significant\&. .LP The list of binaries used for search alternatives shall be flat and proper\&. .LP If \fIPattern\fR\& is not a binary or a flat proper list of binaries with length > 0, a \fIbadarg\fR\& exception will be raised\&. .RE .LP .nf .B copy(Subject) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP The same as \fIcopy(Subject, 1)\fR\&\&. .RE .LP .nf .B copy(Subject, N) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br N = integer() >= 0 .br .RE .RE .RS .LP Creates a binary with the content of \fISubject\fR\& duplicated \fIN\fR\& times\&. .LP This function will always create a new binary, even if \fIN = 1\fR\&\&. By using \fIcopy/1\fR\& on a binary referencing a larger binary, one might free up the larger binary for garbage collection\&. .LP .RS -4 .B Note: .RE By deliberately copying a single binary to avoid referencing a larger binary, one might, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed\&. Sharing binary data is usually good\&. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate copying might be a good idea\&. .LP If \fIN\fR\& < \fI0\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B decode_unsigned(Subject) -> Unsigned .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Unsigned = integer() >= 0 .br .RE .RE .RS .LP The same as \fIdecode_unsigned(Subject, big)\fR\&\&. .RE .LP .nf .B decode_unsigned(Subject, Endianess) -> Unsigned .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Endianess = big | little .br Unsigned = integer() >= 0 .br .RE .RE .RS .LP Converts the binary digit representation, in big or little endian, of a positive integer in \fISubject\fR\& to an Erlang \fIinteger()\fR\&\&. .LP Example: .LP .nf 1> binary:decode_unsigned(<<169,138,199>>,big). 11111111 .fi .RE .LP .nf .B encode_unsigned(Unsigned) -> binary() .br .fi .br .RS .LP Types: .RS 3 Unsigned = integer() >= 0 .br .RE .RE .RS .LP The same as \fIencode_unsigned(Unsigned, big)\fR\&\&. .RE .LP .nf .B encode_unsigned(Unsigned, Endianess) -> binary() .br .fi .br .RS .LP Types: .RS 3 Unsigned = integer() >= 0 .br Endianess = big | little .br .RE .RE .RS .LP Converts a positive integer to the smallest possible representation in a binary digit representation, either big or little endian\&. .LP Example: .LP .nf 1> binary:encode_unsigned(11111111,big). <<169,138,199>> .fi .RE .LP .nf .B first(Subject) -> byte() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP Returns the first byte of the binary \fISubject\fR\& as an integer\&. If the size of \fISubject\fR\& is zero, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B last(Subject) -> byte() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br .RE .RE .RS .LP Returns the last byte of the binary \fISubject\fR\& as an integer\&. If the size of \fISubject\fR\& is zero, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B list_to_bin(ByteList) -> binary() .br .fi .br .RS .LP Types: .RS 3 ByteList = iodata() .br .RE .RE .RS .LP Works exactly as \fIerlang:list_to_binary/1\fR\&, added for completeness\&. .RE .LP .nf .B longest_common_prefix(Binaries) -> integer() >= 0 .br .fi .br .RS .LP Types: .RS 3 Binaries = [binary()] .br .RE .RE .RS .LP Returns the length of the longest common prefix of the binaries in the list \fIBinaries\fR\&\&. Example: .LP .nf 1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]). 2 2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]). 0 .fi .LP If \fIBinaries\fR\& is not a flat list of binaries, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B longest_common_suffix(Binaries) -> integer() >= 0 .br .fi .br .RS .LP Types: .RS 3 Binaries = [binary()] .br .RE .RE .RS .LP Returns the length of the longest common suffix of the binaries in the list \fIBinaries\fR\&\&. Example: .LP .nf 1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]). 3 2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]). 0 .fi .LP If \fIBinaries\fR\& is not a flat list of binaries, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B match(Subject, Pattern) -> Found | nomatch .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Found = \fBpart()\fR\& .br .RE .RE .RS .LP The same as \fImatch(Subject, Pattern, [])\fR\&\&. .RE .LP .nf .B match(Subject, Pattern, Options) -> Found | nomatch .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Found = \fBpart()\fR\& .br Options = [Option] .br Option = {scope, \fBpart()\fR\&} .br .nf \fBpart()\fR\& = {Start :: integer() >= 0, Length :: integer()} .fi .br .RE .RE .RS .LP Searches for the first occurrence of \fIPattern\fR\& in \fISubject\fR\& and returns the position and length\&. .LP The function will return \fI{Pos, Length}\fR\& for the binary in \fIPattern\fR\& starting at the lowest position in \fISubject\fR\&, Example: .LP .nf 1> binary:match(<<"abcde">>, [<<"bcde">>,<<"cd">>],[]). {1,4} .fi .LP Even though \fI<<"cd">>\fR\& ends before \fI<<"bcde">>\fR\&, \fI<<"bcde">>\fR\& begins first and is therefore the first match\&. If two overlapping matches begin at the same position, the longest is returned\&. .LP Summary of the options: .RS 2 .TP 2 .B {scope, {Start, Length}}: Only the given part is searched\&. Return values still have offsets from the beginning of \fISubject\fR\&\&. A negative \fILength\fR\& is allowed as described in the \fIDATA TYPES\fR\& section of this manual\&. .RE .LP If none of the strings in \fIPattern\fR\& is found, the atom \fInomatch\fR\& is returned\&. .LP For a description of \fIPattern\fR\&, see \fBcompile_pattern/1\fR\&\&. .LP If \fI{scope, {Start,Length}}\fR\& is given in the options such that \fIStart\fR\& is larger than the size of \fISubject\fR\&, \fIStart + Length\fR\& is less than zero or \fIStart + Length\fR\& is larger than the size of \fISubject\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B matches(Subject, Pattern) -> Found .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Found = [\fBpart()\fR\&] .br .RE .RE .RS .LP The same as \fImatches(Subject, Pattern, [])\fR\&\&. .RE .LP .nf .B matches(Subject, Pattern, Options) -> Found .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Found = [\fBpart()\fR\&] .br Options = [Option] .br Option = {scope, \fBpart()\fR\&} .br .nf \fBpart()\fR\& = {Start :: integer() >= 0, Length :: integer()} .fi .br .RE .RE .RS .LP Works like \fImatch/2\fR\&, but the \fISubject\fR\& is searched until exhausted and a list of all non-overlapping parts matching \fIPattern\fR\& is returned (in order)\&. .LP The first and longest match is preferred to a shorter, which is illustrated by the following example: .LP .nf 1> binary:matches(<<"abcde">>, [<<"bcde">>,<<"bc">>>,<<"de">>],[]). [{1,4}] .fi .LP The result shows that <<"bcde">> is selected instead of the shorter match <<"bc">> (which would have given raise to one more match,<<"de">>)\&. This corresponds to the behavior of posix regular expressions (and programs like awk), but is not consistent with alternative matches in re (and Perl), where instead lexical ordering in the search pattern selects which string matches\&. .LP If none of the strings in pattern is found, an empty list is returned\&. .LP For a description of \fIPattern\fR\&, see \fBcompile_pattern/1\fR\& and for a description of available options, see \fBmatch/3\fR\&\&. .LP If \fI{scope, {Start,Length}}\fR\& is given in the options such that \fIStart\fR\& is larger than the size of \fISubject\fR\&, \fIStart + Length\fR\& is less than zero or \fIStart + Length\fR\& is larger than the size of \fISubject\fR\&, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B part(Subject, PosLen) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br PosLen = \fBpart()\fR\& .br .RE .RE .RS .LP Extracts the part of the binary \fISubject\fR\& described by \fIPosLen\fR\&\&. .LP Negative length can be used to extract bytes at the end of a binary: .LP .nf 1> Bin = <<1,2,3,4,5,6,7,8,9,10>>. 2> binary:part(Bin,{byte_size(Bin), -5}). <<6,7,8,9,10>> .fi .LP .RS -4 .B Note: .RE \fBpart/2\fR\&and \fBpart/3\fR\& are also available in the \fIerlang\fR\& module under the names \fIbinary_part/2\fR\& and \fIbinary_part/3\fR\&\&. Those BIFs are allowed in guard tests\&. .LP If \fIPosLen\fR\& in any way references outside the binary, a \fIbadarg\fR\& exception is raised\&. .RE .LP .nf .B part(Subject, Pos, Len) -> binary() .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pos = integer() >= 0 .br Len = integer() .br .RE .RE .RS .LP The same as \fIpart(Subject, {Pos, Len})\fR\&\&. .RE .LP .nf .B referenced_byte_size(Binary) -> integer() >= 0 .br .fi .br .RS .LP Types: .RS 3 Binary = binary() .br .RE .RE .RS .LP If a binary references a larger binary (often described as being a sub-binary), it can be useful to get the size of the actual referenced binary\&. This function can be used in a program to trigger the use of \fIcopy/1\fR\&\&. By copying a binary, one might dereference the original, possibly large, binary which a smaller binary is a reference to\&. .LP Example: .LP .nf store(Binary, GBSet) -> NewBin = case binary:referenced_byte_size(Binary) of Large when Large > 2 * byte_size(Binary) -> binary:copy(Binary); _ -> Binary end, gb_sets:insert(NewBin,GBSet). .fi .LP In this example, we chose to copy the binary content before inserting it in the \fIgb_set()\fR\& if it references a binary more than twice the size of the data we\&'re going to keep\&. Of course different rules for when copying will apply to different programs\&. .LP Binary sharing will occur whenever binaries are taken apart, this is the fundamental reason why binaries are fast, decomposition can always be done with O(1) complexity\&. In rare circumstances this data sharing is however undesirable, why this function together with \fIcopy/1\fR\& might be useful when optimizing for memory use\&. .LP Example of binary sharing: .LP .nf 1> A = binary:copy(<<1>>,100). <<1,1,1,1,1 ... 2> byte_size(A). 100 3> binary:referenced_byte_size(A) 100 4> <<_:10/binary,B:10/binary,_/binary>> = A. <<1,1,1,1,1 ... 5> byte_size(B). 10 6> binary:referenced_byte_size(B) 100 .fi .LP .RS -4 .B Note: .RE Binary data is shared among processes\&. If another process still references the larger binary, copying the part this process uses only consumes more memory and will not free up the larger binary for garbage collection\&. Use this kind of intrusive functions with extreme care, and only if a real problem is detected\&. .RE .LP .nf .B replace(Subject, Pattern, Replacement) -> Result .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Replacement = Result = binary() .br .RE .RE .RS .LP The same as \fIreplace(Subject,Pattern,Replacement,[])\fR\&\&. .RE .LP .nf .B replace(Subject, Pattern, Replacement, Options) -> Result .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Replacement = binary() .br Options = [Option] .br Option = global | {scope, \fBpart()\fR\&} | {insert_replaced, InsPos} .br InsPos = OnePos | [OnePos] .br OnePos = integer() >= 0 .br .RS 2 An integer() =< byte_size(Replacement) .RE Result = binary() .br .RE .RE .RS .LP Constructs a new binary by replacing the parts in \fISubject\fR\& matching \fIPattern\fR\& with the content of \fIReplacement\fR\&\&. .LP If the matching sub-part of \fISubject\fR\& giving raise to the replacement is to be inserted in the result, the option \fI{insert_replaced, InsPos}\fR\& will insert the matching part into \fIReplacement\fR\& at the given position (or positions) before actually inserting \fIReplacement\fR\& into the \fISubject\fR\&\&. Example: .LP .nf 1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]). <<"a[b]cde">> 2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, [global,{insert_replaced,1}]). <<"a[b]c[d]e">> 3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, [global,{insert_replaced,[1,1]}]). <<"a[bb]c[dd]e">> 4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>, [global,{insert_replaced,[1,2]}]). <<"a[b-b]c[d-d]e">> .fi .LP If any position given in \fIInsPos\fR\& is greater than the size of the replacement binary, a \fIbadarg\fR\& exception is raised\&. .LP The options \fIglobal\fR\& and \fI{scope, part()}\fR\& work as for \fBsplit/3\fR\&\&. The return type is always a \fIbinary()\fR\&\&. .LP For a description of \fIPattern\fR\&, see \fBcompile_pattern/1\fR\&\&. .RE .LP .nf .B split(Subject, Pattern) -> Parts .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Parts = [binary()] .br .RE .RE .RS .LP The same as \fIsplit(Subject, Pattern, [])\fR\&\&. .RE .LP .nf .B split(Subject, Pattern, Options) -> Parts .br .fi .br .RS .LP Types: .RS 3 Subject = binary() .br Pattern = binary() | [binary()] | \fBcp()\fR\& .br Options = [Option] .br Option = {scope, \fBpart()\fR\&} | trim | global .br Parts = [binary()] .br .RE .RE .RS .LP Splits \fISubject\fR\& into a list of binaries based on \fIPattern\fR\&\&. If the option global is not given, only the first occurrence of \fIPattern\fR\& in \fISubject\fR\& will give rise to a split\&. .LP The parts of \fIPattern\fR\& actually found in \fISubject\fR\& are not included in the result\&. .LP Example: .LP .nf 1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]). [<<1,255,4>>, <<2,3>>] 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]). [<<0,1>>,<<4>>,<<9>>] .fi .LP Summary of options: .RS 2 .TP 2 .B {scope, part()}: Works as in \fBmatch/3\fR\& and \fBmatches/3\fR\&\&. Note that this only defines the scope of the search for matching strings, it does not cut the binary before splitting\&. The bytes before and after the scope will be kept in the result\&. See example below\&. .TP 2 .B trim: Removes trailing empty parts of the result (as does trim in \fIre:split/3\fR\&) .TP 2 .B global: Repeats the split until the \fISubject\fR\& is exhausted\&. Conceptually the global option makes split work on the positions returned by \fBmatches/3\fR\&, while it normally works on the position returned by \fBmatch/3\fR\&\&. .RE .LP Example of the difference between a scope and taking the binary apart before splitting: .LP .nf 1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]). [<<"ban">>,<<"na">>] 2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]). [<<"n">>,<<"n">>] .fi .LP The return type is always a list of binaries that are all referencing \fISubject\fR\&\&. This means that the data in \fISubject\fR\& is not actually copied to new binaries and that \fISubject\fR\& cannot be garbage collected until the results of the split are no longer referenced\&. .LP For a description of \fIPattern\fR\&, see \fBcompile_pattern/1\fR\&\&. .RE