BAR, or the Binary Artifact Reference system, is a protocol for providing a
"universal schema" for characterizing and accessing binary data.
The goal of BAR is to promote a greater understanding of binary file formats
and make overall maintenance, versioning, interpreting, and creation of files
of various formats much easier than before.
To understand BAR, one also needs to understand another type of universal
schema, known as W3C's XML Schema.
XML Schema: Pros and Cons
The XML Schema provides the following advantages to data processing:
-
Ability to characterize any type of data
-
Easily readable data format
-
Robust implementation
-
Built-in validation
-
Offers basic serialization and deserialization
XML Schema falls short in the following areas:
-
Limits control over methods of serialization and deserialization
-
Requires data conversion from binary formats for readability to be obtained
-
In absence of data conversion, requires overhead packing and unpacking of
binary data into a format like Base64 encoding
-
Forces larger file size: text takes up much more space
-
Requires text parsing, which takes more processing time than binary parsing
-
Assumes XML parsing/generation capability exists on client
However useful XML Schema is, it falls short in an important area: XML
Schema is only capable of characterizing XML documents. This
means that transitioning implementations to XML format forces conversions from
legacy binary formats to XML format, and from XML format to binary format, as
part of a program’s implementation. When text parsing and generation is a
critical part of a program’s operation, the program tends to run slower.
Improper use of XML can easily turn an otherwise efficient binary parsing
algorithm into a grossly inefficient text-parsing algorithm that is hardly
value-added.
Text-based formats must reserve more memory for storing text-based files.
Text-based formats are also costlier when it comes to bandwidth: larger
overall data size means it will take longer to transfer over a network.
The security issues associated with transferring XML over a network also leave
much to be desired.
BAR: Pros and Cons
BAR addresses many of the same issues as XML schema:
-
Ability to characterize nearly any type of data
-
Easily readable data format
-
Robust implementation
-
Built-in validation
-
Extensive control over methods of serialization and deserialization
-
Allows serialization or deserialization to occur either forwards or backwards
-
Allows nonlinear serialization or deserialization
BAR has the following disadvantages:
-
Assumes BAR services exist on client
-
Requires stricter controls over parsing
-
Has a slightly greater learning curve than XML Schema
BAR surpasses XML Schema in a whole range of categories. BAR does not require
data conversion, and no overhead exists if interpretation cannot or will not be
done. BAR does not force a larger file size. BAR does not require a great deal
of text parsing, although this is permitted for text-specific file formats or
text portions of file formats.
The principal advantage of BAR over XML Schema is that BAR can theoretically
process any format in existence such that impact to existing file format
implementations is minimized: file formats can stay as they are.
The principal disadvantage of BAR is that binary file format parsing forces the
protocol to grant a great deal of flexibility to a specification’s
implementation file. Parsing is characterized both by defined structural
elements as well as code, which forces the parsing operation to be “sandboxed,”
or carefully controlled so that rogue code does not destabilize either the BAR
services or the platform on which the services run.
Of course, this disadvantage has been taken care of, so you don't have to worry
about it! A BAR implementation file, which contains a "schema" for a
binary file, is a format whose processing is controlled very tightly by the BAR
engine itself. The BAR engine provides the API programmer with
universal, fault-tolerant interfacial functions, while at the same time
enforcing internal consistency of its implementation files, no matter how
complex they are.
BAR: Core Features
The BAR protocol is primarily concerned with serialization and deserialization
of file formats. BAR also has the ability to perform format conversions and
data type look-ups. The basic features are described as follows.
-
Deserialization:
Deserialization is the process of constructing conceptual objects from binary
file data. For the purposes of BAR terminology, deserialization is the
equivalent of “reading” a file into storage structures in RAM.
-
Serialization: Serialization is the process of
packing and translating conceptual objects into a binary file. For the purposes
of BAR terminology, serialization is the equivalent of “writing” a file from
storage structures.
-
Format Conversions: Because BAR is designed to provide
binary file data in unaltered form, format conversions are often unnecessary.
However, it is often convenient to provide services that perform automatic
format conversions in order to eliminate very low-level parsing tasks on the
client side:
-
Byte ordering: Inconsistent relative position of multi-byte
numbers (least significant byte first versus most significant byte first) is a
common issue with binary file formats. Certain types can be used to perform
automatic little-endian-to-big-endian format conversions and vice versa.
-
String conversion: At one time, strings were composed of
only a series of one-byte characters. Now, multi-byte character strings such as
Unicode have come into widespread use. Certain types can be used to perform
automatic string format conversions.
-
Floating point implementation:
Potential variations in floating-point number format can be problematic to
address, since implementation is often machine-dependent. Certain types and
routines can be used to perform automatic floating-point number format
conversions.
-
Bit field extraction/population: Bit fields are sometimes
tricky to extract from binary files, since they normally require bit fiddling.
In case a programming language does not directly support bit field extraction
and population, certain types can be used to integrate bit fields into the
specification, causing automatic extraction and population.
-
Bit scan reading/writing: Bit scans are notoriously difficult to
process because they tend to require advanced bit fiddling and have lengths
that are often very difficult to calculate. Characteristics of a bit scan can
be specified in BAR such that much of the bit fiddling is handled
automatically, for both read and write operations.
-
Data Type Lookups: Some implementations that perform
serialization and deserialization do not allow you to "explore" the document
object model; they only allow you to use it. BAR is not so limited:
it allows for extremely in-depth querying of specification features, such as
types, names, unique identifiers, sizes, and other attributes.
See also: [What
is BARfly?] [Who should use BARfly?]
|