A variablelength quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eightbit bytes) to represent an arbitrarily large integer. It was defined for use in the standard MIDI file format^{[1]} to save additional space for a resource constrained system, and is also used in the later Extensible Music Format (XMF). A VLQ is essentially a base128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. See the example below.
Base128 is also used in ASN.1 BER encoding to encode tag numbers and Object Identifiers.^{[2]} It is also used in the WAP environment, where it is called variable length unsigned integer or uintvar. The DWARF debugging format^{[3]} defines a variant called LEB128 (or ULEB128 for unsigned numbers), where the least significant group of 7 bits are encoded in the first byte and the most significant bits are in the last byte (so effectively it is the littleendian analog of a variablelength quantity). Google Protocol Buffers use a similar format to have compact representation of integer values,^{[4]} as does Oracle Portable Object Format (POF)^{[5]} and the Microsoft .NET Framework "7bit encoded int" in the BinaryReader and BinaryWriter classes.^{[6]}
It is also used extensively in web browsers for source mapping – which contain a lot of integer line & column number mappings – to keep the size of the map to a minimum.^{[7]}
YouTube Encyclopedic

1/1Views:3 489

✪ XOR Electronics Nerdseq Review1: Features, Sequencer Screen Use/Edits
Transcription
Contents
General structure
The encoding assumes an octet (an eightbit byte) where the most significant bit (MSB), also commonly known as the sign bit, is reserved to indicate whether another VLQ octet follows.
7  6  5  4  3  2  1  0 

2^{7}  2^{6}  2^{5}  2^{4}  2^{3}  2^{2}  2^{1}  2^{0} 
A  B_{n} 
If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.
B is a 7bit number [0x00, 0x7F] and n is the position of the VLQ octet where B_{0} is the least significant. The VLQ octets are arranged most significant first in a stream.
Variants
The general VLQ encoding is simple, but in basic form is only defined for unsigned integers (nonnegative, positive or zero), and is somewhat redundant, since prepending 0x80 octets corresponds to zero padding. There are various signed number representations to handle negative numbers, and techniques to remove the redundancy.
Signed numbers
Sign bit
Negative numbers can be handled using a sign bit, which only needs to be present in the first octet.
In the data format for Unreal Packages used by the Unreal Engine, a variable length quantity scheme called Compact Indices^{[8]} is used. The only difference in this encoding is that the first VLQ has the sixth bit reserved to indicate whether the encoded integer is positive or negative. Any consecutive VLQ octet follows the general structure.
First VLQ Octet  Other VLQ Octets  

7  6  5  4  3  2  1  0  7  6  5  4  3  2  1  0 
2^{7}  2^{6}  2^{5}  2^{4}  2^{3}  2^{2}  2^{1}  2^{0}  2^{7}  2^{6}  2^{5}  2^{4}  2^{3}  2^{2}  2^{1}  2^{0} 
A  B  C_{0}  A  C_{n} (n>0) 
If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.
If B is 0, then the VLQ represents a positive integer. If B is 1, then the VLQ represents a negative number.
C is number chunk being encoded and n is the position of the VLQ octet where C_{0} is the least significant. The VLQ octets are arranged most significant first in a stream.^{[dubious – discuss]}
Zigzag encoding
An alternative way to encode negative numbers is to use the leastsignificant bit for sign. This is notably done for Google Protocol Buffers, and is known as a zigzag encoding for signed integers.^{[9]} One can encode the numbers so that encoded 0 corresponds to 0, 1 to −1, 10 to 1, 11 to −2, 100 to 2, etc.: counting up alternates between nonnegative (starting at 0) and negative (since each step changes the leastsignificant bit, hence the sign), whence the name "zigzag encoding". Concretely, transform the integer as (n << 1) ^ (n >> k  1)
for fixed kbit integers.
Two's complement
LEB128 uses two's complement to represent signed numbers. In this scheme of representation, n bits encodes a range from 2^{n} to 2^{n}1, and all negative numbers start with a 1 in the most significant bit. In Signed LEB128, the input is sign extended so that its length is a multiple of 7 bits. From there the encoding proceeds as usual.^{[10]}
In LEB128, the stream is arranged least significant first.^{[10]}
Removing Redundancy
With the VLQ encoding described above, any number that can be encoded with N octets can also be encoded with more than N octets simply by prepending additional 0x80 octets as zeropadding. For example, the decimal number 358 can be encoded as the 2octet VLQ 0x8266 or the number 0358 can be encoded as 3octet VLQ 0x808266 or 00358 as the 4octet VLQ 0x80808266 and so forth.
However, the VLQ format used in Git^{[11]} removes this prepending redundancy and extends the representable range of shorter VLQs by adding an offset to VLQs of 2 or more octets in such a way that the lowest possible value for such an (N+1)octet VLQ becomes exactly one more than the maximum possible value for an Noctet VLQ. In particular, since a 1octet VLQ can store a maximum value of 127, the minimum 2octet VLQ (0x8000) is assigned the value 128 instead of 0. Conversely, the maximum value of such a 2octet VLQ (0xff7f) is 16511 instead of just 16383. Similarly, the minimum 3octet VLQ (0x808000) has a value of 16512 instead of zero, which means that the maximum 3octet VLQ (0xffff7f) is 2113663 instead of just 2097151. And so forth.
Examples
Here is a worked out example for the decimal number 137:
 Represent the value in binary notation (e.g. 137 as 10001001)
 Break it up in groups of 7 bits starting from the lowest significant bit (e.g. 137 as 0000001 0001001). This is equivalent to representing the number in base 128.
 Take the lowest 7 bits and that gives you the least significant byte (0000 1001). This byte comes last.
 For all the other groups of 7 bits (in the example, this is 000 0001), set the MSB to 1 (which gives 1000 0001 in our example). Thus 137 becomes 1000 0001 0000 1001 where the bits in boldface are something we added. These added bits denote if there is another byte to follow or not. Thus, by definition, the very last byte of a variable length integer will have 0 as its MSB.
Another way to look at this is to represent the value in base128, and then set the MSB of all but the last base128 digit to 1.
The Standard MIDI File format specification gives more examples:^{[1]}^{[12]}
Integer (decimal)  Integer (hexadecimal)  Integer (binary)  Variablelength quantity (hexadecimal)  Variablelength quantity (binary) 

0  0x00000000  00000000 00000000 00000000 00000000 
0x00  00000000 
127  0x0000007F  00000000 00000000 00000000 01111111 
0x7F  01111111 
128  0x00000080  00000000 00000000 00000000 10000000 
0x81 0x00  10000001 00000000 
8192  0x00002000  00000000 00000000 00100000 00000000 
0xC0 0x00  11000000 00000000 
16383  0x00003FFF  00000000 00000000 00111111 11111111 
0xFF 0x7F  11111111 01111111 
16384  0x00004000  00000000 00000000 01000000 00000000 
0x81 0x80 0x00  10000001 10000000 00000000 
2097151  0x001FFFFF  00000000 00011111 11111111 11111111 
0xFF 0xFF 0x7F  11111111 11111111 01111111 
2097152  0x00200000  00000000 00100000 00000000 00000000 
0x81 0x80 0x80 0x00  10000001 10000000 10000000 00000000 
134217728  0x08000000  00001000 00000000 00000000 00000000 
0xC0 0x80 0x80 0x00  11000000 10000000 10000000 00000000 
268435455  0x0FFFFFFF  00001111 11111111 11111111 11111111 
0xFF 0xFF 0xFF 0x7F  11111111 11111111 11111111 01111111 
References
 ^ ^{a} ^{b} MIDI File Format: Variable Quantities
 ^ http://www.itu.int/ITUT/studygroups/com17/languages/X.6900207.pdf
 ^ DWARF Standard
 ^ Google Protocol Buffers
 ^ Oracle Portable Object Format (POF)
 ^ System.IO.BinaryWriter.Write7BitEncodedInt(int) method and System.IO.BinaryReader.Read7BitEncodedInt() method
 ^ Introduction to javascript source maps
 ^ http://unreal.epicgames.com/Packages.htm
 ^ Protocol Buffers: Encoding: Signed Integers
 ^ ^{a} ^{b} Free Standards Group (December 2005). "DWARF Debugging Information Format Specification Version 3.0" (PDF). p. 70. Retrieved 20090719.
 ^ https://github.com/git/git/blob/7fb6aefd2aaffe66e614f7f7b83e5b7ab16d4806/varint.c#L4
 ^ Standard MIDIFile Format Spec. 1.1 (PDF)
External links
 MIDI Manufacturers Association (MMA)  Source for Englishlanguage MIDI specs
 Association of Musical Electronics Industry (AMEI) Source for Japaneselanguage MIDI specs