Why Randomize?

December 18th, 2009

I’m going to talk about a format-related topic that isn’t strictly related to BARfly. It deals with the formatting of early video games with randomly set-up designs, and how they were able to accomplish lots of replay value with “a different style of game every time you play.”

To design a game to be more enjoyable, you try to make the experience as unique as possible whenever the game is played or replayed. One of the ways you do this is to randomize some parts of a game. Mainly, the map design, the character placement, the protagonist’s starting position, and the order of appearance of items or other objects.

But why did some games employ this and not others? And why is randomization mostly a thing of past games and not more modern ones?

In part it’s because the programming and design effort necessary for a quality-controlled, randomized game environment requires more time and resources, but this isn’t a huge issue for the most part. The real reason so many early games had randomization was because of compression. You don’t have a lot of memory to store your map and character data–let’s see, how do we best represent a level design with less?

The answer was often to just remix the level differently every time, because that way, you don’t require a lot of memory to store static designs. I’ve already mentioned Diablo, Kroz, and Vintage Hyperactive, all games with randomizations. But I’d like to talk about some games that came out even earlier, with far less obvious implementations of randomization.

The first example is Impossible Mission. Below are screenshots of the game, which, by themselves, are static, meaning they look the same every time you play the game. But the rooms, as they appear in the overall map, are in different locations in the game each time you play. Think of each room as a very large “brick” that can be “moved around” at will.



There are other randomizations, too, like robot AI, puzzle piece locations, and puzzle piece appearance and interlock abilities. I’ve talked in depth about this game on my Impossible Mission tribute site, so I won’t repeat too much. But I will say that the Commodore 64 developer, lacking a whole lot of memory with which to store maps, would have had an incentive to make an algorithm for level generation instead of a static map, which takes up more memory.

The next example is far less obvious. This is Tarzan, for the ColecoVision. You might think, do entire screens get moved around like “bricks” like in Impossible Mission? Well, no. It’s a lot more subtle than that.

The top (trees and vine background) and bottom (jungle floor and rivers) each have a set of “lanes” that can feature a sequence of individual background terrain segments. It’s not a total mishmash of disparate terrains, of course: quality steps are implemented to ensure that smooth transitions between ground, water, and tree are always present.

It certainly doesn’t LOOK like randomly generated terrain. But each conventional screen is different, every time you play the game. The basic “appearance” of the terrain doesn’t change much, but each screen is unique save a handful of scripted screens, which are statically put together.

When such quality games have random level generation, it inevitably makes me wonder why some of the most impressive games with level design out there did NOT have random designs. In particular, I would have loved to see randomization in the maps for games like Metroid or Zelda. After all, these games also had to compress data, and it should come as no surprise that they employed some of the same compression strategies.

But you can’t always get what you want. Budgets and deadlines often determine what a software developer is allowed to do, and static design allows for more consistent human-supervised quality control.

Of course, game design has gotten so advanced anymore, and there are such incredible tools at the developer’s disposal, that it’s a wonder that randomization is not more standard fare these days. Budgets are bigger; teams are larger. Maybe starting with my planned game Brian’s Journey, we can trend towards randomization?

MDB.BAR: Making a database out of…a database

November 18th, 2009

Since mid-September, I’ve hosted a limited BAR I.F. of the Microsoft Jet database format, or MDB. This I.F. breaks the database records into appropriately-sized chunks, identifying various fields.

Of course, the first I.F. for MDB only allows one to look at data table format information. It does not support the “magic row-column” formula used to extract the data, which is to say, you can plug in a record number, a column number, and then you’ve got your data for that row-column combination.

I’m working to correct this, which will allow a person to get information out of any field in a data table. But even this seems like it’s too limited. So you get the information out of row 3, column 7. What does this mean? Most people reference records by a primary key, which is one or more of the columns. Oftentimes, non-key fields must be searched, with each matching record returned. And, of course, let’s not forget that columns are usually referred to in queries by name, and not by number.

BARfly would do wonders to extract entire data tables at once, given this modification. But what would it mean to use a BAR I.F. to manage the database file like you would expect an actual database to be managed? What would you have to do?

Well, I’m not going to discuss all the details, because it would take too long. So I’ll limit my focus to just a fundamental operation programmers and DBAs routinely perform on databases: the select query.

Queries can be written in many forms, but I’m just going to focus on MDB’s preferred format, which is called SQL. A SQL query has the following general form:

SELECT alpha,gamma,beta FROM greekdb.alphabetletters WHERE (caps=”true” AND highlite<>“yellow”) ORDER BY delta

Roughly translated, we want to extract a variable number of records, with three columns (“alpha”, “gamma”, and “beta”) per record returned, from the table “alphabetletters,” which belongs to database greekdb. Limit the records returned to those whose column “caps” has a true value, and whose column “highlite” has any value other than yellow. Finally, order the results, ascending, by the contents of the “delta” field in each returned record.

Any decent SQL processing system will take a database query and strip it down to the fundamental inputs that really matter in the query itself. Once all the text is read and understood, the database query looks more like this to the query handling logic:

Search database 2, table 5, for records where (column 10 is true and column 12 has a negative string comparison result with the text yellow). From these records, extract columns 4, 6, and 5 into the result dataset, in that order, ignoring the other columns. Finally, order the records of the result dataset using column 7 (ascending by numeric comparison).

Computers are nothing but numbers, and everything ultimately breaks down into a number. Once all the SQL has been translated, the real work is retrieving (or populating) all the pertinent data in the database for the user.

So, could a BAR I.F. do all that? Translate the query text, AND collect the necessary data, AND package the results in a format the user had requested? Absolutely.

BAR, remember, has decent regular expression processing capabilities, and can translate the SQL query as needed. Offset tweaking and length calculations are a fundamental part of the BAR deserialization procedure, allowing one to perform the “magic row-column” formula as many times as needed. Finally, the Deserialize functionality allows endless possibilities for data translation after nodes have already been characterized, yielding an appropriate dataset in the order the user had requested.

SQL gurus know that many queries are far more complex than the example I’ve just given. One-to-many relationships, multiple keys, internal consistency rules, etc. are not covered by the example. Clearly, I would be reinventing many wheels by making BARfly encompass nearly every possible creative query that a person could ever dream up, with few obvious returns. It’s far more worth it to use a genuine frontend, like Microsoft Access, if you need to use all the features of the database software.

But if you’re needing to perform data recovery, or if you want to perform easy translation of data without regards to internal consistency, or if you want to migrate the data into XML, UTD, or another format in preparation for a big move to a new format, or if you are writing a program that requires a light memory footprint with little need for all the advanced features of a database-access library of functions, consider the I.F. approach.

And don’t forget this–an I.F. is platform-independent.

I want to mention one more point. Microsoft has released 2.x of the Entity framework, which attempts to provide more universal access to different types of databases, with SQL Server being the preferred target. Query format is no longer important, nor is programmatic knowledge of database-specific features (Oracle has different features of SQL Server, for instance). Great stuff. But the framework does not, to my knowledge, allow anyone to customize what, exactly, the backend is. You’re locked into a predefined set of known database formats.

BAR represents the final piece of the puzzle. With BAR, there is the very real possibility of having a unique component that defines the format for a custom database backend. Having one or more implementation files act as “backend definitions” could, theoretically, allow for a framework with unfettered access to any type of data, regardless of whether or not the data accessed is even a database at all!

Software Development Case Study!

September 30th, 2009

Now here’s something software developers will find very useful! The following is a case study on how to use BARfly to design and implement levels for a video game.

I’ve already got several video game file types supported here. In particular, WAD (for Doom), NIB and NGR (for Nibbler), and a special Kroz level extractor from source code, which I’ve mentioned in previous entries.

But what if your level designs are more complex than linear? Whether it’s in binary or text format, many developers can get by with just a linear file format. A grid speaks for itself, with or without run-length encoding, and objects can be spot-placed with coordinate pairs (or triplets). Unfortunately, many games have trickier design formats. Like these:

  1. Doom. This and other FPS games geometrically organize the points, lines, and textures. But when it comes to linking all these objects together, you’ll need a system to “compile” them into something usable before a game can be played.
  2. Abuse. Not as well-known as Doom, but still worth a mention. Abuse is a very curious beast because LISP is used to drive almost all parts of the gameplay. What is LISP? It’s a linked-list based language, one you’re likely to encounter only in grad-level computer science. Each object, each enemy, etc. has one or more “links” to another object or enemy. Pretty open-ended!
  3. Diablo. This game, and its sequel, Diablo II, have dynamic level design. This means you don’t just have a level grid or spot-placements of objects: you have patterns, probabilities, and schemes determining what a level “might” look like. Levels are mixed in a similar fashion to Nethack or Rogue, which are text-based predecessor games that influenced Diablo’s creators significantly.

Questions to ask: how do you store links and relationships between objects? How do such relationships shape the data structures and storage formats themselves? How much work must be done in the construction of the levels versus the game engine’s actual setup process?

For credibility’s sake, I have no choice but to answer the above questions myself. I’m coming out with a new game soon, called Brian’s Journey. This uses a new game engine, which, at the core, is just a platform-independent graphical wrapper around the BAR engine. To design a level in this game, you’ll need two files: a room definition file and a shape interpretation file.

Text or binary? How about both? This is a profound “gotcha” for game developers. People care immensely about the end product (the game), but the path to the end product can be impossibly bumpy. Do you make text files, which means you use just a text editor, saving time, but perhaps making visualization harder? And making a tricky text-to-binary conversion part of the engine? Or do you pack everything in binary, forcing you to construct a complex editor that few people might ever use, and even fewer like using? Giving you the advantage, perhaps, of less pre-map processing when the level is actually played?

Thanks to BARfly, you can actually have both. Here’s what the shape interpretation file looks like for Brian’s Journey:

Shape file (text)

Looks kind of like source code. In a manner of speaking, it is. Now look at what happens when BARfly loads it:

Shape file (BARfly)

Awesome. Everything’s a node. But do we really want to cycle through all that filler syntax to get at the associations we need? After running the I.F. function “Normalize,” the data looks like this:

Shape file (final)

This function strips out comments and whitespace and gives us just name-value pairs! Architecturally, a program won’t have any trouble accessing this vectored data. Here’s my actual game engine code that loads a shape interpretation file:

//Load interface to shape file.
sh8 = i.Create_BAR(“lf8r_shapeinterp.bar”);
if (!sh8) return 1;

//Load file.
if (sh8->Load(filename) < 0) return 2;

//Normalize (get rid of extra components).
assoc_count = sh8->Function_Call_L(“ShapeCharAssociations.Normalize”);

Not many lines, huh? With so much of the parsing and conversion done in the I.F., the game developer is free to keep the game’s business logic pegged to architecture, rather than low-level parsing.

Okay, great. But that was the easy part. What about the room definition file format? Here’s an snippet of a room definition file:

Room file (text)

Quite complex, indeed. The room definition has some parts ordered and some parts random–taking the best principles from Nethack, Diablo, Kroz, Vintage Hyperactive, and a variety of static level design formats. And because it’s text-based, you won’t need to design rooms in an editor (although you can later). Loading in BARfly gives you the following:

Room file (BARfly)

Do we have normalization I.F. functions here, too? We do. We run first Assimilate_Binary_Style, which removes comments, whitespace, etc…

Room file (intermediate)

…and then Normalize, which translates the remaining text-based nodes to binary-based nodes.

Room file (final)

Before, everything was a text field. Now, everything is a binary structure. If you save this file and reload it, it stays a binary file!

This is a rather novel method, from a software architect’s perspective, when translating text into binary. Traditionally, an architect will need to store text and binary fields separately, forcing a kind of “double vision” to develop. On one side you have your human-readable text, and on the other side you have your binary structures and relationships, which require their own separate architectures for processing.

But with BARfly, you can implement and test all the parsing and conversion logic without ever having to make the video game that uses it (I haven’t yet). There is no “dual architecture.” There is only one. Rather than throw away the source, you preserve the physical relationships of the nodes, and gradually “nibble around the edges” of the text architecture until it becomes a binary format.

Because the I.F for the room definition files does all the heavy lifting, the game engine only needs to deal with the binary format. Compilation isn’t just implicit as part of the level’s load process: it’s totally invisible to the game’s architecture!

Some of you sharp developers might have started to wonder what this means for games that would like to have the ability to read other games’ file formats. Currently, it’s not worth it, because you’d need to invest in a whole separate library of code devoted to reading just that other format. But if the architecture is pre-built into its own BAR implementation file? You’d just need to distribute that I.F., nothing more.

From Data to Code and Back Again

September 8th, 2009

I have had the file INITDATA.BAR for a long time for internal use. Only today did I decide to release it to the public.

How many times has a software developer wanted to get data incorporated into storage in a local executable…only to discover that you have to use the native data-initialization syntax, very much unlike the format in which the data is stored?

You can be very patient and hand-enter the data (you’d never do it by hand unless there is only a small amount of it).

Take this example. I want to enter the following bytes as static constant data, to be used in the body of a program later: 40H 7FH 23H A8H. In C-based languages, you’d initialize an array like this:

static const char mydata[] = { 0x40, 0x7F, 0x23, 0xA8 };

But chances are, the data isn’t going to be written out as “0×40, 0x7F, 0×23, 0xA8″ or anything even remotely like it. Furthermore, you might want the data written out as short integers, signed or unsigned, in decimal instead of hexadecimal, etc.

Four bytes? Just hand-translate it. Over a hundred? Maybe not. Different base? Ugh. Wrong endian order? Super-ugh.

A simple solution, if not straightforward, is to load the data as a separate file from the disk, like this:


FILE *afile = fopen("mydata.dat", "rb");
if (afile) {
fread(mydata, sizeof(mydata), 1, afile);
fclose(afile);
}

Ah, but what about…

  1. Is pathname always relative? Should it be absolute? How do I reconcile with installation folder, etc?
  2. What if file can’t be loaded?
  3. Is file same size as declared variable? How do I resize it if source data changes? Should I dynamically allocate memory for it?
  4. What if I forget to go through all the proper value checks and closes because as a programmer I’m kind of lazy?
  5. Does this separate load step REALLY need to be done at all?

That last question is the clincher. The answer is NO.

The C++ initialized data converter is the answer. It reads a binary data file and spits out C++ initialized data. Or, alternatively, it can read initialized data and spit out binary data. It’s a two-way converter with many F8-mode selection functions.

If you’ve ever looked at how many files are in the BARfly installation folder, you’d know there are surprisingly few. BARfly uses BAR to run its architecture…but where are all the BAR-based architectural components? They’re not BAR files, and they’re not in the master registry.

The answer: INITDATA.BAR. I’ve long since made initialized data out of such components. The BARfly.exe executable has literally swallowed them.

Engineers of all stripes love the idea of making a machine work with fewer moving parts. Would you rather carry around a large package or several smaller ones? Same load. One’s a lot easier to carry.

RegExp in BAR: An Application

August 24th, 2009

We know regular expressions are used all the time. But what do they look like, and how do they fit into the larger scheme of things with BAR?

I’ve provided a case study here. There is an I.F. for Kroz level files (one of Apogee’s earliest game creations) on this site. When I designed the I.F. originally, it picked apart data from source code by using BAR’s 1.0 functionality, which was relatively limited–binary parsing only. But source code is best parsed as text, by regular expressions…which only debuted with version 1.3b of BAR.

No one should reasonably expect to write binary parsing code for most text formats, especially if you aren’t dealing with a cornucopia of powerfully optimized functions in your arsenal. Instead, just write regular expressions for the simplest of syntaxes and work your way up in complexity. Here’s a complete list of the regular expression assignments used in kroz_alt.bar, the 1.3b-capable version of the Kroz level file reader:


//Unordered level syntax
block unorganized textual df_header ::= "DF[" ["0-9"]+ "]:=" [^"'"]* "'";
block unorganized textual df_chars ::= [^"'"]*;
block unorganized textual df_transition ::= "'" ["\x0-\x20"]* "+" [^"'"]* "'";
block unorganized nofragment df_spawncounts { unittype = unsigned short; };

//Ordered level syntax
block unorganized textual procedure_header ::= "procedure Level" ["0-9"]+ ";" ["\x0-\x20"]* "begin";
block unorganized textual fp_padding ::= ^"FP[";
block unorganized textual fp_header ::= "FP[" ["0-9"]+ "]:=" ["\x0-\x20"]* "'";
block unorganized textual fp_chars ::= [^"'"]*;
block unorganized textual fp_remainder ::= ^"end;";
block unorganized textual nofragment fp_symbol_line {
unittype = unsigned char;
enum = kroz_char_enums;
};

//Transition syntax
block unorganized textual df_filler ::= ^"DF[";
block unorganized textual df_filler2 ::= "DF[" ^"DF[";
block unorganized textual fp_filler ::= ^"procedure Level";
block unorganized textual fp_filler2 ::= "procedure Level" ^"procedure Level";

You'll never need to write memcmps and strcmps and strtoks and mids and strlens and...you get the idea. All the above unorganized blocks instantly validate and size properly if the pattern matches!

Of course, there is often a need to put fields together in particular patterns, in which you must extract individual portions of each field. You don't want to make a truly massive regular expression for the entire dataset in such a case--it gives you only one node of data. Instead, you'll want to employ organized blocks to characterize the text fields in a more list-friendly fashion:


block organized ordered_level_line {
mainbody nodelist {
block fp_padding;
block fp_header;
block fp_chars;
};
bool Termination() { return (++iterations >= max_ylines); };
};

block organized ordered_level {
void Initialization() { iterations = 0; };

mainbody nodelist {
block procedure_header;
block organized ordered_level_lines {
mainbody nodelist repeats {
block ordered_level_line;
};
};
block fp_remainder;
};
};

block organized unordered_level {
mainbody nodelist {
block df_header;
block df_chars;
choice optional { block df_transition; };
choice optional { block df_chars; };
choice optional { block df_transition; };
choice optional { block df_chars; };
};
};

In the final release of the I.F., I've made the block definitions a bit more complex, of course. This is because the above definitions only give you the text fields as an ordered list--people would find the data a heckuva lot more useful if the fields had undergone alphanumeric-to-binary conversion, had enumerations broken out, etc.

To do this, just add some Deserialize calls and you're good to go.

The output for an unordered level looks like this:


struct unordered_spawn_counts {
slow_enemy = 600,
medium_enemy = 0,
fast_enemy = 0,
breakable_block = 0,
whip = 20,
stairs = 1,
chest = 0,
slow_time = 5,
gem = 30,
blindness_potion = 0,
teleport_scroll = 5,
key = 0,
door = 0,
solid_wall = 0,
speed_time = 0,
teleport_trap = 0,
river = 0,
power_ring = 0,
forest = 0,
tree = 0,
bomb = 0,
lava = 0,
pit = 0,
staff = 0,
tunnel = 0,
freeze_time = 0,
nugget_or_artifact = 20,
quake_trap = 5,
invisible_breakable_block = 0,
invisible_solid_wall = 0,
invisible_door = 0,
enemy_stop_space = 0,
enemy_activator = 0,
enemy_zap_spell = 0,
enemy_creation_trap = 0,
enemy_generator = 0,
enemy_activator2 = 0,
moving_block = 700...

A list of descriptive spawn counts for a randomly generated level! Now that's useful. But is it an improvement? See for yourself how Scott Miller encoded them originally:


DF[21]:=
{ 1 2 3 X W L C S + I T K D # F . R Q B V = A U Z * E ; : - @ ] G ( M )}
'600 20 1 5 30 5 20 5 700 '+
{ P ! O H N [ | " 4 5 6 7 8 9 Y 0 ~ $}

Good Lord! I’m dead serious! If you can make sense of that, you let me know!

As far as parsing difficulty is concerned, I’d place Kroz level files in the “moderate” category when it comes to what you can do with regular expressions. XML would fall into the “easy” category because it’s very sound and has a well-defined syntax. METAR would fall into the “hard” category because it’s ill-defined, inconsistent, and barely even human-readable. Not that any text format would be impossible.

Since BAR is a relatively new technology, I’m all ears for interesting new challenges people have with ETL or other readabilty/portability/conversion issues. Chances are good that BAR can tear it to pieces within hours.

Regular Expressions: A subset of BAR

July 27th, 2009

Great news! We’re now at 1.3b of the BAR engine. This means that you can define both text and binary syntaxes easily with BAR.

BAR was syntactically weak when it came to validating and sizing text strings in earlier versions. Take the text “procedure Level17 ;” for example. If this is a de-facto header, it doesn’t fit within a neat header structure in BAR. You’ll need to account for lots of variable-length portions of data, with optional whitespace, and character combinations not easily reconciled by the basic scripting functionality:

char procedure_start_string[] = "procedure Level";
block unorganized textual nofragment procedure_header_1 {
unittype = char;

bool Validation()
return (!memcmp(this, procedure_start_string, strlen(procedure_start_string)));
};
long BlockSize() {
return strlen(procedure_start_string);
};
};

And this is only one portion! The entire portion is characterized by the following:

block organized procedure_header {
mainbody nodelist {
block procedure_header_1;
block numerals;
choice optional { block whitespace; };
block semicolon;
};
};

Note we haven’t even declared how whitespace, numerals, or semicolon are supposed to characterize our fields. The bottom line, folks: this is a yucky, yucky way to characterize text formats.

With BAR 1.3b, you can simplify everything. Replace all the above with just this one line:

block unorganized textual procedure_header ::= "procedure Level" ["0-9"]+ ["\x0-\x20"]* ";"

That’s it! Just one line for a node with a complex syntax.

Regular expressions, which are often defined using either Perl “slashed” syntax or Extended-Backus-Naur Form (EBNF), are rather difficult to read if you’re not familiar with them. However, they are easy to understand once you get the hang of them, and syntactically, they are incredibly powerful.

In BAR’s case, I have chosen to use regular expression syntax that closely resembles the EBNF definitions found on W3C’s website for XML and other formats (http://www.w3.org). I’ve also been designing a still-unreleased I.F. called XML.BAR, which uses many of the same expressions from W3C as a way to characterize unorganized blocks in BAR.

BAR now supports most of the staples found in regular expressions:

  • Quoted strings: using “abc” or ‘abc’, indicates presence of whole, case-sensitive strings.
  • Character classes []: using brackets, indicates multiple character choices that can be present at any one particular character location.
  • Asterisk (*): place on end of expression to repeat indefinitely, and make expression optional.
  • Plus (+): place on end of expression to repeat indefinitely, and force presence of at least one iteration.
  • Question (?): place on end of expression to make expression optional (0 or 1 instance only).
  • Specific Repeat Counts {3, 5}: place on end of expression to make expression have a repeat count within a specific range. In this example, minimum is 3 iterations, maximum is 5 iterations.
  • NOT operator (^): place inside character class, in front of quoted string, or in front of parenthetical notation to match every possibility BUT the combination to the right.
  • AND, OR, and AND NOT operators ((space, |, -): adjacent expressions with just a space between them (AND), a pipe between them (OR), or a hyphen between them (AND NOT) act as boolean operators when testing multiple conditions in expressions.

There are still limitations:

  • Character classes allow a NOT operator inside brackets, but it must not be quoted.
  • Character classes have valid characters or ranges inside quotes (single or double). All markup is consistent with BAR’s backslash-oriented markup for string literals; there is no Perl-like markup for whitespace such as /s or related escapes.
  • To specify the hyphen character in a character class, it must be placed at the very beginning of the string. All other appearances count as range specifiers.
  • ^”abc” Has the effect of returning all characters leading UP to the combination “abc”, if it exists. If “abc” doesn’t exist, the entire set of remaining characters is returned.
  • [^"0-9"] Has the effect of matching all characters EXCEPT numerals.
  • ^(“abc” | “123″) Only looks at IMMEDIATE location for non-match to “abc” or “123″. Will not scan for either combination and then stop.
  • ["a-z"]* – “aa” Only excludes a combination that starts with “aa”. Will not extend to the first arbitrary point at which “aa” is found.
  • AND has higher priority than OR, which in turn has higher priority than AND NOT.
  • Unorganized blocks are forced to have 1-byte character unit type as well as the nofragment attribute.
  • You cannot specify already-declared names of unorganized blocks in these expressions. For example, you can’t declare “Name” first, then declare Name2 ::= Name ” ” Name ” ” Name.
  • Organized blocks cannot be declared using regular expressions.

I would eventually like to relax many of these restrictions, especially the last two. Feedback on what sort of improvements you’d like to see in this arena is more than welcome.

The file size and time-to-implementation for many of my formats in the works has dropped dramatically as a result of these changes! A few syntactic changes can go a long way. When each BAR I.F. can act as a unique integrated compiler, the possibilities are endless.

BAR Engine Update: 1.3

July 15th, 2009

The demo version of the software is now running 1.3. While BARfly still has most of these features hidden from the user, the engine is a lot more powerful than it used to be. In particular:

1) BAR Navigational Strings (BNS) now officially supported
2) Bookmark stacks (Push and Pop for bookmarks) now officially supported
3) More advanced reading, writing, insertion, and deletion operations
4) Many new I.F. operators and built-in functions
5) Ability to insert and delete nodes from I.F. functions
6) Native callback functions from I.F. functions
7) More diverse auto-advancement options for reading, writing, insertion, and deletion operations

BAR developers can take advantage of many new programming features, like scripts that can insert and delete nodes, more “STDLIB.H-type functions” like atof, atol, stricmp, and strtok, and more optimized compiled opcode execution.

Why does BARfly not look different? For a simple reason: most of these updates would only give you a few more function-call choices when you press F8. The real power lies in the type of implementation files you can create now!

Right on the heels of this update will come two even more important updates:

1) DecideChoice: a new method for large decision lists, which picks a choice using a numerical index rather than evaluating each individual list choice.
2) Unorganized block “regular expression” definitions: the ability to build text blocks using EBNF notation (like W3C uses to describe XML). This will enhance BAR, allowing it to perform high-quality text-parsing operations with ease!

Related to that last point, should BAR be changed to BATAR, or “Binary AND Text Artifact Reference?” Well, not really. Text is really a subset of binary, so it’s still fair to call it BAR.

BNS: BAR Navigational Strings

July 15th, 2009

Here’s something that was indirectly supported by BARfly, but only recently supported by the underlying BAR engine with version 1.3.

A BAR Navigational String, or BNS, is a “path” string used to locate BAR nodes in the tree control using absolute or relative positions. I’ve actually modeled it off the UNIX/DOS/Windows method of accessing files and directories through pathnames.

For example, if you want to access a file, called “thisfile.txt”, located inside the “thisdir” directory, which is itself nested inside another “outerdir” directory, the pathname would look like this:

“outerdir/thisdir/thisfile.txt”

Similarly, in BAR, you can access a node several levels “deep” in the tree by referring to children with names or numbers. Each token, separated by forward or backward slashes, represents one navigation. Take the following example:

“outernode/thisnode/62″

This BNS says, execute Child(“outernode”), followed by Child(“thisnode”), followed by Child(62).

Neat, isn’t it? You can compress many navigation operations into only one function argument. The best part is that almost nothing new needs to be learned. The philosophy behind node navigation is totally synonymous with file system navigation!

There are other parallels, too. A summary of how BNS compares with file system pathnames:

“/”
File system: Go to root directory
BNS: Execute Toplevel()

“a/b/c”
File system: Go to subdirectory “a,” then subdirectory “b,” then refer to “c” (can be either file or directory)
BNS: Execute Child(“a”), then Child(“b”), then Child(“c”)

“../otherdir”
File system: Go up to parent directory, then down again to subdirectory “otherdir”
BNS: Execute Parent(), then Child(“otherdir”)

“.”
File system: Refer to same directory
BNS: No navigation

Beyond this point, BNS and file system pathnames diverge. For example, BNS does not currently support wildcards, and BNS is capable of other operations that file systems cannot perform. Some additional BNS syntax:

“44″
Navigate to zero-based child (45th child of current)

“>>”
Navigate to next 2 siblings (call Next(1) twice)

“<<<"
Navigate to previous 3 siblings (call Previous(1) three times)

“+50″
Navigate 50 siblings forward (call Next(50))

“-30″
Navigate 30 siblings backward (call Previous(30))

“^13″
Search forward for construct UID of 13 (call Search_Forward(13))

“^container7″
Search forward for construct or variable name of “container7″ (call Search_Forward(“container7″))

Eventually, BNS will support even more radical navigation possibilities, like wildcards, named bookmarks, and maybe even special subroutine-based navigations.

Ultimately, my goal is to merge BNS with file system logic. Think about what you could do if your ability to access information is not confined to just the file system or even the file contents: you could “collect” or “populate” information inside a file from the command line of an application! Too bad no operating systems let you dive this “deep” right now.

Pop quiz: what does this BNS do?

“/1078/mychild/./nextchild/../+8/otherchild/-1/32/>>>>>/^1″

Answer:

Toplevel(); Child(1078); Child(“mychild”); ; Child(“nextchild”); Parent(); Next(8); Child(“otherchild”); Previous(); Child(32); Next(5); Search_Forward(1);

Characterization vs. Conversion

July 9th, 2009

NOTE: This entry is rather technical in nature, geared towards programmers.

A fellow named Robby recently posed a software engineer’s dilemma when it comes to characterizing a file like a database. The question is, at what point is the data useful to me? It might be useful only when converted into the data I want. Or, alternatively, it might only be useful completely raw. Or, possibly, somewhere in the middle.

In other words, where, in the process of deserialization, do you “stop?”

The nice thing about BAR is that you can choose, in the schema, exactly how far you wish to go when you deserialize. You are still limited by the nature of the file format itself, of course: those formats that are constructed with little consideration given to hierarchy, organization, or resynchronization on error will limit a person’s options.

Robby’s example was a JPEG file. Good example, because I offer a free JPEG I.F. on the website.

At the rawest of raw, use FLAT. This yields the entire JPEG file as a single unorganized block. You can read or write anything with FLAT–but chances are you want just a bit more detail.

The next step up is the free format, which breaks the data into segments. The actual image scan itself, though, is untouched.

The next step up is characterization of the bit scan fields (Arithmetic or Huffman). However, no attempt at converting the fields takes place.

The next step up is converting the bit fields into data that can be used. But…I’m being too kind here! In fact, there are four or five individual “stopping points” you could rest at, since many decoding steps are necessary for JPEG. This includes…

1) Arithmetic/Huffman field translation
2) IDCT (Inverse Discrete Cosine Transform) translation
3) Quantization
4) Component generation (generally YUV)
5) Image pixel generation (generally RGB)

Robby’s question was what was useful to him. But I’m asking a more radical question: what are all the possible ways this format can be useful to you?

BAR gives you an unprecedented luxury in being able to “see” the progress as it’s being done. If you need to develop an encoding or decoding implementation on your own, you generally have to rely on classic debugging and testing techniques: conditional breakpoints, single-stepping, debug-output dumps of iterative data. Not to mention cumbersome exception-handlers when you’ve inevitably screwed up.

If you screw up in BAR? No exceptions. Full call stack report. Full node record report. Immediate API return. All that, and it’s platform-independent, AND language-independent.

BAR can be used to characterize. BAR can be used to convert. I won’t presume to know exactly what each individual wants to do with his or her files. The power of BAR is really in the question, not the answer.

Why Use BAR?

July 9th, 2009

One of the most common questions I’ve found people asking me is this: Why would I want to use BAR or BARfly? What advantage do I gain by using this product?

Hmmm…if the product features described in the Main BARfly Website don’t provide a good answer, it will be a hard question to answer.

It’s possible your needs are very specific. For this reason, I’ve provided in the documentation some ideas about who you might be and why you would want to use BARfly. A quick rehash:

  1. Software Developers: People wanting to write code to support and maintain particular file formats
  2. Software Architects: People wanting to design structural elements to a software application
  3. Software Testers: People wanting to examine the contents of generated files or memory content
  4. Security Auditors: People wanting to study a company’s ability to keep data secure from hackers and crackers
  5. Database Administrators: People wanting to detect flaws and inefficiencies in a database, as well as develop solutions
  6. System Troubleshooters: People wanting to audit, diagnose, and fix files (tasks that were expensive or impossible before BARfly)
  7. Network Administrators: People wanting to examine traffic over a network in a schema-oriented fashion
  8. System Administrators: People wanting to do a number of the things listed above
  9. Cryptographers and Cryptanalysts: People trying to design and crack encrypted formats
  10. Casual Validators: Analysts that wants to check a file for consistency
  11. Data Entry Specialists: Individuals that must perform high-throughput data entry and format conversions
  12. Very Curious People: Individuals wanting to find out what all that weird unreadable stuff on their hard drive is

There are three builds of BARfly, which have capabilities reflecting the needs of the user:

  1. BARfly Bronze: Contains only viewing capability. You can view files, but you cannot edit them. Nor can you develop your own BAR implementation files.
  2. BARfly Silver: Contains viewing and editing capability. You can view files, edit them, and save them. You cannot develop your own BAR implementation files.
  3. BARfly Gold: Contains viewing and editing capability, plus the ability to develop BAR implementation files. This build comes with an integrated compiler, BARCC, that allows a user unlimited ability to create, edit, test, and use customized schemas.

Now, as for using the BAR engine in your own application, I’ll let you answer that question yourself. There are a HUGE number of technologies and languages being used. The fact that many “meta-languages” have been created has actually made the problem worse, because when people program in a meta, it actually reduces the readability and comprehensibility of the code or markup being written.

What’s to gain by learning an entirely new one? A lot, actually. For the most part, there’s very little to learn. If you know C++, you’ll know about 98% of BAR. BAR tries to keep the scripting and data-definition language low-level for handling low-level data. Where BAR is truly unique is in two simple definition types: blocks, and lists.

Think of BAR as the “XML Schema” for all files, text or binary. There are profound advantages to having applications reference platform-independent schemas for all their architecture, whether it’s in-memory only, secondary-storage only, or some combination of the two.

As of this writing, no attempt has been made to offer the BAR software development kit on this website. If enough people have looked at the documentation and are interested in trying it out, I’ll release it at a very reasonable cost, perhaps even for free.