Vous êtes sur la page 1sur 17

Mpeg Audio Datafile Format Specification

MPEG audio datafile (*.m3d) is database of informations gathered from mpeg audio files. You do not need to have mpeg audio file itself to get it's info if you previously stored that info in database. This is very useful for cataloguing data when mpeg audio files are on removable media. Main purpose of this file format is mpeg audio info distribution among applications. It may be used as personal database, or as catalogue. It is currently supported by MPGScript MPEG audio cataloguing application and MPGTools Delphi unit for accessing informations from MPEG files. Last updated version of this document may be found at m3dspecs.htm File format specification Generally, M3D is divided in four sections: file signature, application identification, header info, and mpeg data. In file structure they are sorted out like this:
<file signature> This section signs file as MPEG Audio datafile and determines file version. Version numbers are defined by author and may not be changed by third parties. You are entitled to create only files according to current version definition. Current version is 1.2 and this document describes it. If you want to read older versions, look support site of MPGTools Delphi Unit for details. Check this site for future structure updates. <file_id> 8 characters 1 byte 1 byte Always contains #9'MP3DATA' characters. This part may be used by third party applications to recognize file as MPEG Audio Datafile. Contains file version number (currently 1) Contains file subversion number (currently 2)

<version> <subversion>

<application id> This section describes application used to create file. It also may contain other, application specific data. Third party applications may or may read data from this section or just skip them.

<block_length>

1 byte

Contains length of <application id> section excluding <block_length> byte Contains number of bytes used for application name (maximum value is 15). Containing application name. Should be used to determine which application created m3d file. Application name must be registered with author to avoid same ID-s for different applications. Space used by application. Author of that application is responsible for publishing data structure of this block if he wants third parties to use his data. He does not have to do that. This block length may be calculated as <block_length>-<app_name_length>-1. It may contain any additional data application needs

<app_name_length> 1 byte <app_name> up to 15 characters

<custom data>

<header> Contains header info about owner of file, catalogue info, order info or else. It may use different structure, therefore it's divided into three blocks: <header_type> 1 byte Describes type of header: 0 - custom header used by third party application (use <app id> to see which one) that created this header. Other applications should just skip reading header informations. Third party applications may publish their custom format. In that case, it is advisable to contact author of m3d to register its own type. 1 - catalogue header. Should be used for distributed MPEG catalogues. Pascal (Delphi) structure definition:
TMPEGDataCatalogue = packed record Title : string[30]; { Catalogue title } Publisher : string[30]; { Catalogue publisher name } City : String[30]; { Publisher's contact info } ZIP : String[10]; Country : String[20]; Address : String[30]; Phone: String[15]; Fax: string[15]; Email: string[30]; WWWURL: string[30]; end;

2 - order header. Should be used for MPEG orders generated from catalogues. Pacal (Delphi) structure defition:
TMPEGDataOrder1v1 = packed record CustomerID : string[15]; { customer unique ID used by catalogue publisher } Name : string[30]; { customer name and } City : String[30]; { other contact data } ZIP : String[10]; Country : String[20]; Address : String[30]; Phone: String[15]; Fax: string[15]; Email: string[30]; end;

Other - we may define other publicly available header types, that may be used by all applications. It is recommended to let us know about specific headers you use for your application, so we may add it to public header types if they are of general interest. <header_length> 2 bytes Contains length of header data. Maximum value is 65535. It may be used to simplify reading header info and to skip unsupported header types. Contains additional info about owner and data in file. It may have predefined or custom structure, which is described by <header_type>

<header_data>

up to 65535 bytes

<mpeg_data_records> This section contains unlimited number of MPEG data records. This is Pascal (Delphi) record structure (note that string3, string30, string4, string255 and string20 are actually string[3], string[30], string[4], string[255] and string[20]):
TMPEGData1v2 = packed record Header : String3; { Should contain "TAG" if header is correct } Title : String30; { Song title } Artist : String30; { Artist name } Album : String30; { Album } Year : String4; { Year } Comment : String30; { Comment } Genre : Byte; { Genre code } Track : byte; { Track number on Album } Duration : word; { Song duration }

FileLength : LongInt; Version : byte; 1,

{ File length } { MPEG audio version index (1 - Version 2 - Version 2, 3 - Version 2.5, 0 - unknown } Layer (1, 2, 3, 0 - unknown) } Sampling rate in Hz} Bit Rate } bits per minute - for future use } Number of channels (0 - Stereo, 1 - Joint-Stereo, 2 - Dual-channel, 3 - Single-Channel) } Copyrighted? } Original? } { Error protected? } If frame is padded } total frame size including CRC } 16 bit File CRC (without TAG). Not implemented yet. } MPEG audio file name } File last modification date and time

Layer : byte; SampleRate : LongInt; BitRate : LongInt; BPM : word; Mode : byte;

{ { { { {

Copyright : Boolean; { Original : Boolean; { ErrorProtection : boolean; Padding : Boolean; { FrameLength : Word; { CRC : word; { FileName : String255; FileDateTime : LongInt; in FileAttr : Word; VolumeLabel : string20; Selected : word; { {

DOS internal format } { File attributes } { Disk label } { If this field's value is greater than zero then file is selected. Value determines order of selection. } Reserved : array[1..45] of byte; { for future use } end;

This document and file specification are copyrighted by Predrag Supurovic (c)1998. You may use it freely.

Package Name
DvmMpeg

Description
Dali supports reading and processing of MPEG-1 Video, Audio and System. Currently, only Video encoding is supported. MPEG Video

The format of MPEG-1 Video is as follows. A MPEG-1 Video starts with arbitrary number of bytes, followed by a sequence header, followed by zero, one or more alternating sequence of GOP(Group of Pictures) header and GOPs, followed by a sequence end marker. A GOP is an series of pictures (frames) each of which consists of a picture header and the actual picture data. A picture can be of type I (intracoded), P (predicted) or B (bidirectional-predicted). Each GOP must have at least one I frame. Dal provides abstraction for each of these headers : MpegSeqHdr, MpegGopHdr and MpegPicHdr. Each header type supports the following basic primitives : find, dump, skip, parse, encode. A picture is decoded into three ScImages and zero (I), one (P) or two (B) VectorImages. To encode an MPEG-1 Video sequence, convert each RGB image frame to YUV ByteImages; perform a motion vector search on the Y ByteImage (for P and B frames); compress the ByteImages toScImages; and finally encode the ScImages into a Bitstream. Dali provides support for frame extraction through the use of an MpegVideoIndex abstraction. An MpegVideoIndex is used to store information about each frame. Frame type, length, number of reference frame (0 for I, 1 for P, or 2 for B), and the offsets needed to get to these frames. MPEG Audio

MPEG-1 Audio is sequence of frames. Each frame has a header and body. The body encodes some number samples for current frame. Samples from the next few frames can also be encoded in the current frame if space permits. We have four main abstractions for MPEG-1 Audio File. One for the mpeg audio header (MpegAudioHdr), and the other three for the encoded audio frames (one for each layer of encoding: MpegAudioL1,MpegAudioL2, MpegAudioL3). The audio frame abstraction for layer 1 and layer 2 contain compressed data for one channel of audio. For layer 3 stereo audio, since encoding of left and right channels depend on each other, the audio data contains compressed data for both channels. The encoding of each audio frames depends on the previous frames. To store these dependencies, we have two auxilary structures : MpegSynData and MpegGraData. Each corresponds to the dependencies of a channel from an audio stream, and is updated during each decoding. This makes direct access into the middle of the streams impossible at the moment. Note : We can still have direct access, provided that we precompute these dependencies and store them in files. But this feature is not available right now. MPEG System

A MPEG-1 System stream is a multiplex of videos and audios. It consists of a sequence of packets, each one contains raw compressed data from one of the streams. Dal provides abstractions for all three header structures that exists in MPEG-1 System streams: system header, pack header, and packet header. Dal also allows creation of a MpegSysToc, a table-of-content structure that stores offset and length of each packet, and the timestamp from each packet.

MPEG-1 Data Structures


The ISO/IEC 11172 specification defines the audio, video and multiplexing standards collectively and colloquially referred to as the MPEG-1 (Motion Picture

Experts Group) compression standard. The data structures for the various components in an encoded bitstream are given in a pseudo-C syntax, and are extensively discussed. However, it is difficult to get the big picture from reading the spec. More practically, in order to parse an MPEG-1 bitstream, it is necessary to know byte offsets within each structure. To make this information more readily accessible, we have condensed it into graphic form. Of course, this is no substitute for the original spec. Where more information is required than can be squeezed into the diagram, references are provided to the spec.

The Big Picture


A multiplexed MPEG-1 stream is composed of distinct Packs. Each Pack consists of a Pack header and any number of Packets. Within those Packets is either video or audio data. These structures above the video or audio level are called the system layer. Video or audio data is divided into Packets without regard to lower-level structures -- Groups, Pictures, etc. may break across Packet boundaries. Video information is composed of individual Pictures. We will not discuss the substructures of Pictures. Pictures themselves are of three types: I (intra), P (predictive), and B (bidirectional). I Pictures are self-contained, compressing the image using Discrete Cosine Transform (DCT) processing. P Pictures use less data and are predicted from the preceding I Picture. B Pictures use the least data and are interpolated using information from surrounding P and I Pictures. Pictures are organized into Groups of (typically) 15 or so Pictures. If a Group is preceded by a Sequence header, its first Picture is called an entrypoint. Audio information is composed of Frames. We will not discuss the substructure of Frames. There are no higher-level audio structures.

Vous aimerez peut-être aussi