Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
home:Ledest:erlang:23
erlang
5471-Fix-factual-errors-and-omissions-for-the-b...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File 5471-Fix-factual-errors-and-omissions-for-the-bit-syntax.patch of Package erlang
From 373ea1feb8c011fa5bf83e7cbd470660e37969eb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20Gustavsson?= <bjorn@erlang.org> Date: Thu, 23 Feb 2023 05:59:05 +0100 Subject: [PATCH] Fix factual errors and omissions for the bit syntax While at it, reorganize the documentation for the bit syntax into separate sections for each type of segment. Closes #6706 --- system/doc/reference_manual/expressions.xml | 213 +++++++++++++++----- 1 file changed, 161 insertions(+), 52 deletions(-) diff --git a/system/doc/reference_manual/expressions.xml b/system/doc/reference_manual/expressions.xml index 8f0ec51479..51d23baacf 100644 --- a/system/doc/reference_manual/expressions.xml +++ b/system/doc/reference_manual/expressions.xml @@ -1390,14 +1390,27 @@ Ei = Value | <item>For <c>binary</c> and <c>bitstring</c> it is the whole binary or bit string.</item> </list> - <p>In matching, this default value is only - valid for the last element. All other bit string or binary - elements in the matching must have a size specification.</p> + <p>In matching, the default value for a binary or bit string + segment is only valid for the last element. All other bit string + or binary elements in the matching must have a size + specification.</p> + + <p><strong>Example:</strong></p> + + <pre> +1> <input><<A/binary, B/binary>> = <<"abcde">>.</input> +* 1:3: a binary field without size is only allowed at the end of a binary pattern +2> <input><<A:3/binary, B/binary>> = <<"abcde">>.</input> +<<"abcde">> +3> <input>A.</input> +<<"abc">> +4> <input>B.</input> +<<"de">></pre> <p>For the <c>utf8</c>, <c>utf16</c>, and <c>utf32</c> types, <c>Size</c> must not be given. The size of the segment is implicitly determined by the type and value itself.</p> - + <p><c>TypeSpecifierList</c> is a list of type specifiers, in any order, separated by hyphens (-). Default values are used for any omitted type specifiers.</p> @@ -1423,60 +1436,156 @@ Ei = Value | </item> <tag><c>Unit</c>= <c>unit:IntegerLiteral</c></tag> - <item>The allowed range is 1..256. Defaults to 1 for <c>integer</c>, - <c>float</c>, and <c>bitstring</c>, and to 8 for <c>binary</c>. - No unit specifier must be given for the types - <c>utf8</c>, <c>utf16</c>, and <c>utf32</c>. - </item> + <item>The allowed range is 1 through 256. Defaults to 1 for <c>integer</c>, + <c>float</c>, and <c>bitstring</c>, and to 8 for <c>binary</c>. + For types <c>bitstring</c>, <c>bits</c>, and <c>bytes</c>, it is not allowed + to specify a unit value different from the default value. + No unit specifier must be given for the types <c>utf8</c>, <c>utf16</c>, + and <c>utf32</c>. + </item> </taglist> - <p>The value of <c>Size</c> multiplied with the unit gives - the number of bits. A segment of type <c>binary</c> must have - a size that is evenly divisible by 8. For a segment of type <c>float</c> - the size must be either 64, 32, or 16.</p> - - <note><p>When constructing binaries, if the size <c>N</c> of an integer - segment is too small to contain the given integer, the most significant - bits of the integer are silently discarded and only the <c>N</c> least - significant bits are put into the binary.</p></note> - - <p>The types <c>utf8</c>, <c>utf16</c>, and <c>utf32</c> specifies - encoding/decoding of the <em>Unicode Transformation Format</em>s UTF-8, UTF-16, - and UTF-32, respectively.</p> - - <p>When constructing a segment of a <c>utf</c> type, <c>Value</c> - must be an integer in the range 0..16#D7FF or - 16#E000....16#10FFFF. Construction - fails with a <c>badarg</c> exception if <c>Value</c> is - outside the allowed ranges. The size of the resulting binary - segment depends on the type or <c>Value</c>, or both:</p> - <list type="bulleted"> - <item>For <c>utf8</c>, <c>Value</c> is encoded in 1-4 bytes.</item> - <item>For <c>utf16</c>, <c>Value</c> is encoded in 2 or 4 bytes.</item> - <item>For <c>utf32</c>, <c>Value</c> is always be encoded in 4 bytes.</item> - </list> - <p>When constructing, a literal string can be given followed - by one of the UTF types, for example: <c><![CDATA[<<"abc"/utf8>>]]></c> - which is syntactic sugar for - <c><![CDATA[<<$a/utf8,$b/utf8,$c/utf8>>]]></c>.</p> + <section> + <title>Integer segments</title> + <p>The value of <c>Size</c> multiplied with the unit gives the + size of the segment in bits.</p> + + <p>When constructing binaries, if the size <c>N</c> of an integer + segment is too small to contain the given integer, the most significant + bits of the integer are silently discarded and only the <c>N</c> least + significant bits are put into the binary. For example, <c><<16#ff:4>></c> + will result in the binary <c><<15:4>></c>.</p> + </section> + + <section> + <title>Float segments</title> + <p>The value of <c>Size</c> multiplied with the unit gives + the size of the segment in bits. The size of a float segment in bits must be + one of 16, 32, or 64.</p> + + <p>When constructing binaries, if the size of a float segment is too small + to contain the representation of the given float value, an exception is raised.</p> + + <p>When matching binaries, matching of float segments fails if the bits of the segment + does not contain the representation of a finite floating point value.</p> + </section> + + <section> + <title>Binary segments</title> + <p>In this section, the phrase "binary segment" refers to any + one of the segment types <c>binary</c>, <c>bitstring</c>, + <c>bytes</c>, and <c>bits</c>.</p> + + <p>When constructing binaries and no size is specified for a + binary segment, the entire binary value is interpolated into the + binary being constructed. However, the size in bits of the + binary being interpolated must be evenly divisible by the unit + value for the segment; otherwise an exception is raised.</p> - <p>A successful match of a segment of a <c>utf</c> type, results - in an integer in the range 0..16#D7FF or 16#E000..16#10FFFF. - The match fails if the returned value falls outside those ranges.</p> + <p>For example, the following examples all succeed:</p> + + <pre> +1> <input><<(<<"abc">>)/bitstring>>.</input> +<<"abc">> +2> <input><<(<<"abc">>)/binary-unit:1>>.</input> +<<"abc">> +3> <input><<(<<"abc">>)/binary>>.</input> +<<"abc">></pre> - <p>A segment of type <c>utf8</c> matches 1-4 bytes in the binary, - if the binary at the match position contains a valid UTF-8 sequence. - (See RFC-3629 or the Unicode standard.)</p> + <p>The first two examples have a unit value of 1 for the segment, + while the third segment has a unit value of 8.</p> - <p>A segment of type <c>utf16</c> can match 2 or 4 bytes in the binary. - The match fails if the binary at the match position does not contain - a legal UTF-16 encoding of a Unicode code point. (See RFC-2781 or - the Unicode standard.)</p> + <p>Attempting to interpolate a bit string of size 1 into a + binary segment with unit 8 (the default unit for <c>binary</c>) + fails as shown in this example:</p> - <p>A segment of type <c>utf32</c> can match 4 bytes in the binary in the - same way as an <c>integer</c> segment matches 32 bits. - The match fails if the resulting integer is outside the legal ranges - mentioned above.</p> + <pre> +<input>1> <<(<<1:1>>)/binary>>.</input> +** exception error: bad argument</pre> + + <p>For the construction to succeed, the unit value of the + segment must be 1:</p> + + <pre> +2> <input><<(<<1:1>>)/bitstring>>.</input> +<<1:1>> +3> <input><<(<<1:1>>)/binary-unit:1>>.</input> +<<1:1>></pre> + + <p>Similarly, when matching a binary segment with no size + specified, the match succeeds if and only if the size in bits of + the rest of the binary is evenly divisible by the unit + value:</p> + + <pre> +1> <input><<_/binary-unit:16>> = <<"">>.</input> +<<>> +2> <input><<_/binary-unit:16>> = <<"a">>.</input> +** exception error: no match of right hand side value <<"a">> +3> <input><<_/binary-unit:16>> = <<"ab">>.</input> +<<"ab">> +4> <input><<_/binary-unit:16>> = <<"abc">>.</input> +** exception error: no match of right hand side value <<"abc">> +5> <input><<_/binary-unit:16>> = <<"abcd">>.</input> +<<"abcd">></pre> + + <p>When a size is explicitly specified for a binary segment, + the segment size in bits is the value of <c>Size</c> multiplied + by the default or explicit unit value.</p> + + <p>When constructing binaries, the size of the binary being interpolated + into the constructed binary must be at least as large as the size of + the binary segment.</p> + + <p><strong>Examples:</strong></p> + <pre> +1> <input><<(<<"abc">>):2/binary>>.</input> +<<"ab">> +2> <input><<(<<"a">>):2/binary>>.</input> +** exception error: construction of binary failed + *** segment 1 of type 'binary': the value <<"a">> is shorter than the size of the segment</pre> + </section> + + <section> + <title>Unicode segments</title> + <p>The types <c>utf8</c>, <c>utf16</c>, and <c>utf32</c> specifies + encoding/decoding of the <em>Unicode Transformation Format</em>s UTF-8, UTF-16, + and UTF-32, respectively.</p> + + <p>When constructing a segment of a <c>utf</c> type, + <c>Value</c> must be an integer in the range 0 through 16#D7FF + or 16#E000 through 16#10FFFF. Construction fails with a + <c>badarg</c> exception if <c>Value</c> is outside the allowed + ranges. The sizes of the encoded values are as follows:</p> + <list type="bulleted"> + <item>For <c>utf8</c>, <c>Value</c> is encoded in 1-4 bytes.</item> + <item>For <c>utf16</c>, <c>Value</c> is encoded in 2 or 4 bytes.</item> + <item>For <c>utf32</c>, <c>Value</c> is encoded in 4 bytes.</item> + </list> + + <p>When constructing, a literal string can be given followed + by one of the UTF types, for example: <c><![CDATA[<<"abc"/utf8>>]]></c> + which is syntactic sugar for + <c><![CDATA[<<$a/utf8,$b/utf8,$c/utf8>>]]></c>.</p> + + <p>A successful match of a segment of a <c>utf</c> type, results + in an integer in the range 0 through 16#D7FF or 16#E000 through 16#10FFFF. + The match fails if the returned value falls outside those ranges.</p> + + <p>A segment of type <c>utf8</c> matches 1-4 bytes in the binary, + if the binary at the match position contains a valid UTF-8 sequence. + (See RFC-3629 or the Unicode standard.)</p> + + <p>A segment of type <c>utf16</c> can match 2 or 4 bytes in the binary. + The match fails if the binary at the match position does not contain + a legal UTF-16 encoding of a Unicode code point. (See RFC-2781 or + the Unicode standard.)</p> + + <p>A segment of type <c>utf32</c> can match 4 bytes in the binary in the + same way as an <c>integer</c> segment matches 32 bits. + The match fails if the resulting integer is outside the legal ranges + previously mentioned.</p> + </section> <p><em>Examples:</em></p> <pre> -- 2.35.3
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor