BARfly Help - Contents - What is BAR?

What is BAR?

BAR, or the Binary Artifact Reference system, is a protocol for providing a "universal schema" for characterizing and accessing binary data.  The goal of BAR is to promote a greater understanding of binary file formats and make overall maintenance, versioning, interpreting, and creation of files of various formats much easier than before.

To understand BAR, one also needs to understand another type of universal schema, known as W3C's XML Schema.


 XML Schema:  Pros and Cons

The XML Schema provides the following advantages to data processing:

  • Ability to characterize any type of data
  • Easily readable data format
  • Robust implementation
  • Built-in validation
  • Offers basic serialization and deserialization

XML Schema falls short in the following areas:

  • Limits control over methods of serialization and deserialization
  • Requires data conversion from binary formats for readability to be obtained
  • In absence of data conversion, requires overhead packing and unpacking of binary data into a format like Base64 encoding
  • Forces larger file size: text takes up much more space
  • Requires text parsing, which takes more processing time than binary parsing
  • Assumes XML parsing/generation capability exists on client

However useful XML Schema is, it falls short in an important area:  XML Schema is only capable of characterizing XML documents.  This means that transitioning implementations to XML format forces conversions from legacy binary formats to XML format, and from XML format to binary format, as part of a program’s implementation.  When text parsing and generation is a critical part of a program’s operation, the program tends to run slower.  Improper use of XML can easily turn an otherwise efficient binary parsing algorithm into a grossly inefficient text-parsing algorithm that is hardly value-added.

Text-based formats must reserve more memory for storing text-based files.  Text-based formats are also costlier when it comes to bandwidth:  larger overall data size means it will take longer to transfer over a network.  The security issues associated with transferring XML over a network also leave much to be desired.


 BAR:  Pros and Cons

BAR addresses many of the same issues as XML schema:

  • Ability to characterize nearly any type of data
  • Easily readable data format
  • Robust implementation
  • Built-in validation
  • Extensive control over methods of serialization and deserialization
  • Allows serialization or deserialization to occur either forwards or backwards
  • Allows nonlinear serialization or deserialization

BAR has the following disadvantages:

  • Assumes BAR services exist on client
  • Requires stricter controls over parsing
  • Has a slightly greater learning curve than XML Schema

BAR surpasses XML Schema in a whole range of categories. BAR does not require data conversion, and no overhead exists if interpretation cannot or will not be done. BAR does not force a larger file size. BAR does not require a great deal of text parsing, although this is permitted for text-specific file formats or text portions of file formats.

The principal advantage of BAR over XML Schema is that BAR can theoretically process any format in existence such that impact to existing file format implementations is minimized: file formats can stay as they are.

The principal disadvantage of BAR is that binary file format parsing forces the protocol to grant a great deal of flexibility to a specification’s implementation file. Parsing is characterized both by defined structural elements as well as code, which forces the parsing operation to be “sandboxed,” or carefully controlled so that rogue code does not destabilize either the BAR services or the platform on which the services run.

Of course, this disadvantage has been taken care of, so you don't have to worry about it!  A BAR implementation file, which contains a "schema" for a binary file, is a format whose processing is controlled very tightly by the BAR engine itself.  The BAR engine provides the API programmer with universal, fault-tolerant interfacial functions, while at the same time enforcing internal consistency of its implementation files, no matter how complex they are.


 BAR:  Core Features 

The BAR protocol is primarily concerned with serialization and deserialization of file formats. BAR also has the ability to perform format conversions and data type look-ups. The basic features are described as follows.

  • Deserialization:  Deserialization is the process of constructing conceptual objects from binary file data. For the purposes of BAR terminology, deserialization is the equivalent of “reading” a file into storage structures in RAM.
  • Serialization:  Serialization is the process of packing and translating conceptual objects into a binary file. For the purposes of BAR terminology, serialization is the equivalent of “writing” a file from storage structures.
  • Format Conversions:  Because BAR is designed to provide binary file data in unaltered form, format conversions are often unnecessary. However, it is often convenient to provide services that perform automatic format conversions in order to eliminate very low-level parsing tasks on the client side:
    • Byte ordering:  Inconsistent relative position of multi-byte numbers (least significant byte first versus most significant byte first) is a common issue with binary file formats. Certain types can be used to perform automatic little-endian-to-big-endian format conversions and vice versa.
    • String conversion:  At one time, strings were composed of only a series of one-byte characters. Now, multi-byte character strings such as Unicode have come into widespread use. Certain types can be used to perform automatic string format conversions.
    • Floating point implementation:  Potential variations in floating-point number format can be problematic to address, since implementation is often machine-dependent. Certain types and routines can be used to perform automatic floating-point number format conversions.
    • Bit field extraction/population:  Bit fields are sometimes tricky to extract from binary files, since they normally require bit fiddling. In case a programming language does not directly support bit field extraction and population, certain types can be used to integrate bit fields into the specification, causing automatic extraction and population.
    • Bit scan reading/writing:  Bit scans are notoriously difficult to process because they tend to require advanced bit fiddling and have lengths that are often very difficult to calculate. Characteristics of a bit scan can be specified in BAR such that much of the bit fiddling is handled automatically, for both read and write operations.
  • Data Type Lookups:  Some implementations that perform serialization and deserialization do not allow you to "explore" the document object model; they only allow you to use it.  BAR is not so limited:  it allows for extremely in-depth querying of specification features, such as types, names, unique identifiers, sizes, and other attributes.

  See also:  [What is BARfly?] [Who should use BARfly?]


BARfly Help Copyright © 2009 Christopher Allen