Posts Tagged ‘conversion’

The Format of Music

Saturday, December 3rd, 2011

Note: You will need to have your speakers turned up to fully appreciate this entry.

I’m going to depart from the normal technical focus of my blog and talk about something more people can relate to: music.

That is, the format of music, as used in the context of computers. Most people can clearly see how enormous an influence technology has exerted on music in the last few decades. In the case of computers, there have been two main revolutions in the musical realm:

1) The ability to synthesize and create music by means not limited to the classical methods, performing and rehearsal.
2) The ability to script playback of digitized music in an automated fashion.

The first revolution came in the form of the synthesizer-loving 1980s, as well as in the various hardware and software that facilitated note-by-note recording and playback of wired instruments. The second revolution occurred at the roughly the same time, perhaps a bit later, in the form of video games, CD-ROM players, and MP3 players.

Both of these revolutions needed digital storage formats to keep track of the musical data. Without them, you’d have no way to “store” the music for playback beyond straight analog recordings.

The foremost and most well-known digital music format is Musical Instrument Data Interface, or MIDI. I originally created the 14 musical tracks of Cruz as MIDI files. A MIDI file stores notes and instruments as metadata–C key octave 4 note down, E key octave 3 note down, C key octave 4 note up, etc. The sequence of “command” and “data” bytes in a MIDI file, when interpreted by software and hardware, can be used to play back a composition that a musician is working with.

In this respect, MIDI has immeasurably helped in the composition process. The ability to listen to how your music sounds, when not being performed, is critical when deciding which music is good and which music still needs work.

Take the “Cruz Puzzle Theme” track’s music, composed in Anvil Studio. The metadata exists as command and data bytes, but it can be rendered on the computer screen as a musical staff customarily seen in sheet music:

Anvil Studio lets the composer add effects and notations to the staff. I didn’t in this case, because I was more concerned with how the music sounded than how it would look printed.

MIDI sequencers like Anvil Studio are invaluable assets to people like me–I absolutely need to hear the music while composing it to determine if I’m on the right path or not. This is very much unlike programming, where so much of a program must be written before compiling it to see if the entire thing works at once.

With only a few tweaks, a composer can move notes, change the tempo, change keys, change instruments, and make other changes that might have required countless hours if hand corrections were used on paper!

So MIDI is great. But it’s not enough.

Anyone who appreciates good musical performances knows that music, as performed, is a lot more nuanced than what simply appears on a staff. Many effects are instrument-specific, and the need for custom instruments is very important. MIDI locks the composer into a relatively limited number of instruments.

But it gets worse. Each MIDI playback device has its own wavetable “soundfont” for the instrument set! This means that the instruments for a MIDI song composed in one MIDI context might sound very different for some instruments when played back in a different MIDI context. If a composer works hard on making a song sound a certain way, he or she will not want to see it “ruined” by the wrong soundfont.

Compare the two songs, and tell if they sound the same:

Cruz Puzzle Theme – MP3
Cruz Puzzle Theme – MIDI

If the MIDI theme sounds different (and probably worse), it’s because the SoundMAX default soundfont is not the MIDI playback device on your computer. I converted the MIDI format to MP3, which is purely digital waveform, in order to head off the problem of inconsistent playback.

Which leads me to the next part of this topic: tracker software.

An early alternative to MIDI that allowed for direct wavetable configuration is MOD. The idea behind MOD, UMX, and other “tracker” formats is that both the note-by-note metadata and the wavetable samples used in playback are stored in the same file, resulting in consistent-sounding music regardless of playback device.

There are many brands of “tracker” software, but the idea behind them is essentially the same: you edit note metadata, applying effects, etc. This is much like Anvil Studio, although the editing display is often different from the “sheet music” view seen in Anvil Studio.

Above is a screenshot from FamiTracker, a utility that makes tracks compatible with the sound-generating capabilities from the 8-bit NES. I used it to make the “chiptune” variant of the Cruz Puzzle Theme. Have a listen:

Cruz Chiptune Puzzle Theme – MP3

The differences between MIDI and FamiTracker formats required some time to port over, but because I had already composed the original, it came together in only a single evening.

It certainly demonstrates the utility of FamiTracker to have allowed me to customize the note sampling periods. If you’re sharp, you noted from the Anvil Studio staff screenshot that I’ve hit you over the head with a very unusual feature in music: the 5/4 time signature!

The 5-count time signature is extremely rare. The vast majority of songs (and musicians, it would appear) never leave the comfortable realm of 4/4. But well-made software will accommodate nearly any “trick” the composer might have up his sleeve.

I made a 5-count song for two reasons: influence from Jesus Christ Superstar, which has 5-count and 7-count songs, and because I really wanted to challenge myself to make a quality 5-count song.

Of course, composing also requires audio editing, which is challenging no matter how difficult the source material was to create. This leads me to the final part of this topic: purely digital formats.

To represent digital waveform data, no matter what the source is (voice included), you must use a format like WAV or MP3. The WAV format contains lots (and I do mean lots) of individual sample points of the audio waveform for any one second of playback. For CD-quality audio, that’s 44,100 sample points per second!

Obviously, the large storage size required for waveform audio has created a need for compressed formats. The most common compression format is MP3, which sacrifices a bit of quality for reduced file size. There are a few others, such as OGG.

After music is composed, one must translate it to digital waveform data. But just one direct-translation operation is rarely adequate–filtering operations, volume control, fade-in and fade-out, making stereo tracks from mono, and a whole lot of other clipping, segueing, and tweaking operations must occur to make a song ready for use in the world of computing.

Several waveform-editing applications exist, such as Audacity. The screenshot below is of the waveform of the Cruz Puzzle Theme, as seen in Audacity.

In conclusion, I hope I haven’t blown anyone’s brains out. I’ve just summarized a huge topic, leaving out countless details, in the hopes of giving people a general idea of the “world” of musical composition that computer-minded composers live in.

The skill required to master all of these tools and formats? Hard to say. It definitely makes a difference how musically inclined you are in the first place, as well as how computer-savvy you already are.

Take me, for instance. I’ve used audio editing software for years, but I never composed anything (in Anvil Studio, OR FamiTracker, OR anything else) before 2011!

Only one thing is required for sure: you must love music.

Reverse Engineering with Style

Tuesday, April 13th, 2010

Folks, it was only a matter of time before I would talk about this subject. I’ve covered many different types of data formats, including secondary storage, network communications, and representation in RAM. But what about executable files?

If you’ve evaluated BARfly, you’ll know that the software is merciless. It doesn’t know when it’s being pushed too far. If you want to represent machine code instruction sets with it, you’re perfectly welcome to do so. Machine language is just another data format.

I won’t go into all the ethics of (or reasons for) reverse engineering. I will say this, though: most of the time, license agreements you sign when you purchase software or hardware have a clause forbidding the reverse engineering of your newly acquired property. Please obey those.

Now that I’ve said that, let’s talk about reverse engineering with style.

When a program’s source code is compiled and linked, it becomes machine language, a series of bytes that, when interpreted, tell a processor what to do. A machine language “instruction,” as it is known, generally has the form of an operational code (often known as an “opcode“), followed by one or more optional “mod bytes” that influence the performance of the opcode.

Most programmers nowadays don’t know about machine language. This is because compilers and linkers are so good anymore, and optimization so reliable, that everyone just thinks, “I’ll just create the source code and have the compiler handle the tough translation tasks.” Unless you’re tasked with making or modifying the compiler itself, you normally don’t need to know about the target instruction set.

But if you are performing reverse engineering, you’ll need to know about every instruction set you want to work with. Each computer manufacturer uses its own instruction set and has its own interface for programming the hardware, which isn’t necessarily linked to the instruction set.

Why do you need to know this info? Because once machine language is generated, it can’t be immediately molded back into source code. Machine language instructions tend to be simple operations. The equation A = B + C * D / E would require at least 5 different instructions.

As you can imagine, reverse engineering requires both a significant knowledge base and a whole lot of patience. The quality of the tools used in the process can make a huge difference in overall effectiveness.

A short list of tools you have:

  1. DEBUG: A very elementary disassembler and debugger, present since the early days of DOS.
  2. IDA Pro: A sophisticated, platform-independent graphical disassembly suite.
  3. Various off-the-shelf tools: Varying functionality and usefulness.
  4. Emulators: Programs designed to run a simulated version of the actual machine in a “box,” whose input and output can be easily manipulated.

DEBUG and similar disassemblers operate on the principle that any portion of memory can be reverse-engineered, as long as you know (a) the starting address, and (b) the range over which the code needs to be reverse-engineered. This is the “forward only” principle.

IDA Pro and similar tools operate on a “follow the path only” principle. Instead of linearly (and blindly) stepping through instructions in a memory buffer, IDA Pro allows you to keep track of the stack frame, the call tree of the program, the references to local and global variables, jump labels, etc. Generally, output from applications like IDA Pro are more useful than the “forward and blind” applications like DEBUG.

Programmers have made custom disassemblers that try to get around some of the limitations of DEBUG and/or IDA Pro. These tools have varying strengths and weaknesses, which I will discuss shortly. Finding the right tool can make a world of difference.

Emulation is the last resort of reverse engineering. With an emulator, you can get a realistic state of the processor and its memory at all times. DosBox is a common emulator for 16-bit DOS applications. Emulators are also commonly used to play classic games built for antiquated hardware; MAME is the best-known example.

The biggest strength of the “forward-only” principle in reverse engineering is that you cover all bases–you reach all portions of code, regardless of how they are actually covered. The weaknesses include possibly disassembling non-code (resulting in gibberish), straddling valid opcodes (resulting in incorrect disassembly), and limited ability to interpret the results.

The core strength of the “follow the path only” principle is that only the paths actually discovered through natural flow from points of entry will be disassembled. This yields much more useful and valid output than forward-only, and allows applications like IDA Pro to display all sorts of useful diagrams and relationships. The weakness of path-following is that it is restricted to just known points of entry and deterministic paths (not all code is deterministic).

I’m going to provide a case study in reverse engineering, using my own experience. I’ve already done a ROM hack total conversion with Metroid Master, but I wanted to do the next ROM hack without relying so much on others’ tools. This time, I decided to reverse engineer the code by building my own disassembler.

(NOTE: I feel I can safely talk about reverse engineering 8-bit Metroid because it’s been 25 years since it was popular. If I were talking about hacking Wii games, Nintendo might not like what I’m doing.)

There are many quality 6502 disassemblers out there. I’m mostly used to Intel’s x86 family of instruction sets. But the 6502 processor was once widely used, so learning it was something I absolutely had to do.

The rationale behind creating my own disassembler is that I wanted to circumvent the weaknesses of the existing disassemblers. My disassembler performs limited emulation, and can generate multiple types of readable assembly from machine language.

While 6502 disassembly output usually looks like this…

0001FFB0: sei
0001FFB1: cld
0001FFB2: ldx #$00
0001FFB4: stx $2000
0001FFB7: stx $2001
0001FFBA: lda $2002
0001FFBD: bpl $0001FFBA
0001FFBF: lda $2002
0001FFC2: bpl $0001FFBF
0001FFC4: ora #$FF
0001FFC6: sta $8000
0001FFC9: sta $A000
0001FFCC: sta $C000
0001FFCF: sta $E000
0001FFD2: jmp $C01A

…my disassembler, NESDis, can display it like this:

label_07_FFB0:
07/FFB0: 78 P |= F_INTERRUPT_DISABLE;
07/FFB1: D8 P &= ~F_DECIMAL_MODE;
07/FFB2: A2 00 X = 0×00;
07/FFB4: 8E 00 20 m[0x2000] = X; // PPU Command
07/FFB7: 8E 01 20 m[0x2001] = X; // PPU Command
label_07_FFBA:
07/FFBA: AD 02 20 A = m[0x2002]; // PPU Read
07/FFBD: 10 FB if (!(P & F_SIGN)) goto label_FFBA;
label_07_FFBF:
07/FFBF: AD 02 20 A = m[0x2002]; // PPU Read
07/FFC2: 10 FB if (!(P & F_SIGN)) goto label_FFBF;
label_07_FFC4:
07/FFC4: 09 FF A |= 0xFF;
07/FFC6: 8D 00 80 m[0x8000] = A; // MMC1 Control
07/FFC9: 8D 00 A0 m[0xA000] = A; // MMC1 CHR-ROM 1
07/FFCC: 8D 00 C0 m[0xC000] = A; // MMC1 CHR-ROM 2
07/FFCF: 8D 00 E0 m[0xE000] = A; // MMC1 PRG-ROM
07/FFD2: 4C 1A C0 goto label_0xC01A

Same machine language, but the assembly output looks COMPLETELY different. I think opcodes are much more readable if put into C-language equivalents. Wouldn’t you agree?

Critical hardware port programming is revealed. Original machine language and readable assembly are displayed concurrently. Addresses and bank numbers are known. More importantly, paths are followed in a similar way as IDA Pro, meaning that they need not be dumped in a linear order (they can be dumped hierarchically).

BUT…determinism is always an issue. Most instruction sets, including both x86 and 6502, have indirect jump opcodes. This constitutes a serious problem because the path to which to jump or call is determined at run-time. Emulation can help find the valid addresses, but it’s almost impossible to know all the valid addresses unless you watch every path being followed at run-time, which is usually not worth it.

Fortunately, programmers rarely get fancy with such opcodes for just any reason. A programmer will generally implement an indirect jump or call for one of the following reasons:

  1. Callback or “hook” procedure. Equivalent to a C-language pointer to a function.
  2. Switch control block. C-language “switch” statements can use a “jump table” to quickly direct program control to a wide variety of locations based on a scalar input. This is most economical when there are many scalar input choices that are numerically close to each other (e.g. 5, 6, 7, 8, and 9).

These are very specific circumstances. Therefore, the number of indirect jump opcodes in a program tends to be very few. When confronted with an indirect jump opcode, I had to perform manual inspection to figure out what the processor was trying to do:

label_07_C510:
07/C510: 0A A <<= 1;
07/C511: A8 Y = A;
07/C512: B9 1F C5 A = m[0xC51F + Y];
07/C515: 85 0A m[0x0A] = A;
07/C517: B9 20 C5 A = m[0xC520 + Y];
07/C51A: 85 0B m[0x0B] = A;
07/C51C: 6C 0A 00 indirect_jump(ia(0x000A));

What does this mean? Well, roughly translated, it means “use the value of Y as an index into a jump table.” Fortunately, the jump table is located immediately after the indirect jump opcode. I’ve found that the jump addresses, also co-located, evaluate to the following:

{ 0xC531, 0xC552, 0xC583, 0xC590, 0xC5B6, 0xC5C3, 0xC45C, 0xC45C, 0xC45C }

NESDis stops when it encounters an indirect jump opcode, because of lack of determinism. But you can pick any point of entry with NESDis. The next time I ran NESDis, I chose the point of entry as the first entry in the jump table, 0xC531. And lo! An entirely new set of paths was revealed:

label_07_C531:
07/C531: A0 00 Y = 0×00;
07/C533: 84 31 m[0x31] = Y;
07/C535: C8 Y++;
07/C536: 84 1D m[0x1D] = Y;
07/C538: 20 5D C4 funccall_0xC45D();
label_07_C53B:
07/C53B: 20 3E A9 funccall_0xA93E();
label_07_C53E:
07/C53E: 20 58 C1 funccall_0xC158();

By combining the results of individual NESDis outputs, we can eventually build a complete picture of the Metroid source code.

Individuals who are interested in reverse engineering now have an idea of where to begin. Of course, there are many other advanced tricks, but they have to be discovered on one’s own time.

I strongly suggest you check out emulation central, http://www.zophar.net, for more information.

Getting that image file to your screen

Thursday, January 28th, 2010

Something we take for granted: image files.

What are image files? How do we get to look at them? We use them all the time, for a variety of reasons, but how do you get from point A, a data file in storage, to point B, the pixels on your computer screen?

When people think of images, they normally think that the computer will just do whatever it takes to display the image, leaving us full control over what we do with the picture once it is rendered. They look at a file on the disk, click on it, and see an image thumbnail pop up. “Good and well,” a person will tell himself, “Because I know that this filename is associated with this picture. End of story.”

It’s unfortunate that image formatting is taken for granted so much. Computers have to do a LOT of things in order to go from an image file to a picture on the screen. Formats are very important to displaying an image, and the inability to display the image, even if only the thumbnail can’t be seen in an explorer window, will frustrate users immensely.

What happens when an image is processed, even when displaying just a thumbnail, is this:

1) File format must be identified. Tags and headers in the file identify the file as a BMP, a JPEG, a PNG, a TIFF, etc.
2) Dimensions (width and height), pixel depth (8-bit, 32-bit, etc.), palettes, and other information must be collected and stored in a format that best promotes the display of the image data on the computer screen.
3) The pixel data itself must be unpacked. There might be compression, or not, meaning a variety of algorithms could be used to get the unpacked pixel data.
4) The unpacked pixel data must be translated into the computer screen’s target format. Pixel depth conversions are the most common types of translation, for example, 8-bit to 16-bit. Re-ordering of the pixels within the bitmap (flipping or rotation) is also common.
5) The computer blits the pixels to the screen. This step is itself a complex process, because it might be a raw transfer of bytes, or the image might be scaled up or down in size to fit into a specific window, or it might be dithered or anti-aliased, or it might require transparency or alpha-blending.

And these steps apply only to raster formats. Vector formats also require the CPU to effectively “draw” the image from scratch, using a series of internal commands stored within the file. For example, SVG, CGM, or WMF.

That’s a lot of work behind the scenes! And the worst part of it is that if any part of these steps break down for a particular file format (for example, TIFF), you can forget about “seeing” the image. That’s it. Over. No room for error.

Obviously, the above problem I have outlined provides a strong incentive to standardize image formats. There’s nothing wrong with that, of course. But software developers are still their own worst enemies here: they continue to develop more formats, and enhance old formats, over time.

The massively disparate mechanisms used to read, draw, and render so many types of image formats has made it difficult for developers to support so many types of formats. Core formats like GIF, JPEG, and BMP are widely supported, but each format has its own unique applications, strengths, and weaknesses. So you can’t ever have a “master” format. If you WANTED to have a “master” format, you would end up creating another format…which means–you guessed it–you’re contributing to the problem.

The fundamental issue here is not that developers are bad people, or that operating systems have flaws in them that prevent all image formats from being viewed and used. The issue is knowledge, and lack of education. To make a decent image reader or writer, you need lots of programming code, usually optimized code, which accomplishes a specific task. And who has the ability to write or understand this code? Just the hardest-core of the already hard-core software developers, who also contribute to the problem.

BARfly confers an advantage to developers and other technical folks: it exposes the “guts” of an image file. It lets you know what’s “really” in it, letting you dissect or even edit the contents that paint programs will not touch. The long-planned “ImageMaker” protocol is not yet available in the software, but once it is, you will have, for the first time ever, a development platform capable of supporting every type of image file ever made, or ever could be made.

Until then, we’ll just have to be content with an elite group of uber-nerds who mercilessly change formatting rules with no prior notice and hold massive contempt for everyone else. Okay, I made that last part up. But the problem is known–we just need to finally start getting around to fixing it!