FLV (Flash Video)
서론
RTMP를 통해 멀티미디어 데이터를 전송할 때, RTMP payload는 완전한 FLV 컨테이너가 아닌 FLV 태그 데이터 포맷으로 캡슐화된다.
이를 이해하기 위해 FLV와 그 구조에 대해 자세히 알아보자.
FLV (Flash Video)
FLV는 Adobe Systems에서 멀티미디어 데이터를 캡슐화하기 위해 개발한 컨테이너 포맷이다.
FLV는 상당히 오래된 컨테이너 포맷으로, MPEG-4 Part 12 컨테이너 포맷이 등장한 이후 잘 사용되지 않는다.
FLV Structure
그럼 FLV 구조에 대해 태그 위주로 알아보자. (더 자세한 내용은 공식 문서를 참고하자.)
FLV의 byte order는 big-endian이므로 프로토콜 분석 시 유의해야 한다.
FLV Header
Field | Type | Comment |
---|---|---|
Signature | UI8 | Signature byte always ‘F’ (0x46) |
Signature | UI8 | Signature byte always ‘L’ (0x4C) |
Signature | UI8 | Signature byte always ‘V’ (0x56) |
Version | UI8 | File version (for example, 0x01 for FLV version 1) |
TypeFlagsReserved | UB[5] | Must be 0 |
TypeFlagsAudio | UB[1] | Audio tags are present == 1 |
TypeFlagsReserved | UB[1] | Must be 0 |
TypeFlagsVideo | UB[1] | Video tags are present == 1 |
DataOffset | UI32 | The length of this header in bytes (The DataOffset field usually has a value of 9 for FLV version 1) This field is present to accommodate larger headers in future versions. |
FLV Body
Field | Type | Comment |
---|---|---|
PreviousTagSize0 | UI32 | Always 0 |
Tag1 | FLVTAG | First tag |
PreviousTagSize1 | UI32 | Size of previous tag, including its header, in bytes. For FLV version 1, this value is 11 plus the DataSize of the previous tag |
Tag2 | FLVTAG | Second tag |
… | ||
PreviousTagSizeN-1 | UI32 | Size of second-to-last tag, including its header, in bytes |
TagN | FLVTAG | Last tag |
PreviousTagSizeN | UI32 | Size of last tag, including its header, in bytes |
FLV Tags
FLV의 각 태그 타입은 하나의 스트림을 구성한다. (즉, 동일한 타입의 복수 스트림을 가질 수 없다.)
RTMP payload가 캡슐화하는 포맷은 여기 있는 태그 데이터를 말한다.
Field | Type | Comment |
---|---|---|
TagType | UB8 | Type of contents in this tag. The following types are defined: 8 = audio 9 = video 18 = script data all others: reserved |
DataSize | UI24 | Length of the data in Data field |
Timestamp | UI24 | Time in milliseconds at which the data in this tag applies. This value is relative to the first tag in the FLV file, which always has a timestamp of 0. In playback, the time sequencing of FLV tags depends on the FLV timestamps only. Any timing mechanisms built into the payload data format shall be ignored. |
TimestampExtended | UI8 | Extension of the Timestamp field to form a SI32 value. This field represents the upper 8 bits, while the previous Timestamp field represents the lower 24 bits of the time in milliseconds |
StreamID | UI24 | Always 0 |
Data | IF TagType == 8 * AUDIODATA IF TagType == 9 * VIDEODATA IF TagType == 18 * SCRIPTDATAOBJECT |
Body of the tag |
Audio Tags
AUDIODATA
Field | Type | Comment |
---|---|---|
SoundFormat | UB[4] | Format of SoundData. The following values are defined: 0 = Linear PCM, platform endian 1 = ADPCM 2 = MP3 3 = Linear PCM, little endian 4 = Nellymoser 16 kHz mono 5 = Nellymoser 8 kHz mono 6 = Nellymoser 7 = G.711 A-law logarithmic PCM 8 = G.711 mu-law logarithmic PCM 9 = reserved 10 = AAC 11 = Speex 14 = MP3 8 kHz 15 = Device-specific sound Formats 7, 8, 14, and 15 are reserved. AAC is supported in Flash Player 9,0,115,0 and higher. Speex is supported in Flash Player 10 and higher. |
SoundRate | UB[2] | Sampling rate. The following values are defined: 0 = 5.5 kHz 1 = 11 kHz 2 = 22 kHz 3 = 44 kHz (AAC always 3) |
SoundSize | UB[1] | Size of each audio sample. This parameter only pertains to uncompressed formats. Compressed formats always decode to 16 bits internally. 0 = 8-bit samples 1 = 16-bit samples |
SoundType | UB[1] | Mono or stereo sound 0 = Mono sound (Nellymoser always 0) 1 = Stereo sound (AAC always 1) |
SoundData | UI8[size of sound data] | IF SoundFormat == 10 * AACAUDIODATA ELSE * Sound data-varies by format |
AACAUDIODATA
Field | Type | Comment |
---|---|---|
AACPacketType | UI8 | 0: AAC Sequence header 1: AAC raw |
Data | UI8[n] | IF AACPacketType == 0 * AudioSpecificConfig ELSE IF AACPacketType == 1 * Raw AAC frame data |
Video Tags
VIDEODATA
Field | Type | Comment |
---|---|---|
FrameType | UB[4] | Type of video frame. The following values are defined: 1 = key frame (for AVC, a seekable frame) 2 = inter frame (for AVC, a non-seekable frame) 3 = disposable inter frame (H.263 only) 4 = generated key frame (reserved for server use only) 5 = video info/command frame |
CodecID | UB[4] | Codec Identifier. The following values are defined: 1 = JPEG (currently unused) 2 = Sorenson H.263 3 = Screen video 4 = On2 VP6 5 = On2 VP6 with alpha channel 6 = Screen video version 2 7 = AVC |
VideoData | IF CodecID == 2 * H263VIDEOPACKET IF CodecID == 3 * SCREENVIDEOPACKET IF CodecID == 4 * VP6FLVVIDEOPACKET IF CodecID == 5 * VP6FLVALPHAVIDEOPAKCET IF CodecID == 6 * SCREENV2VIDEOPACKET IF CodecID == 7 * AVCVIDEOPACKET |
Video frame payload or UI8. IF FrameType == 5, instead of a video payload, the message stream contains a UI8 with the following meaning: 0 = Start of client-side seeking video frame sequence 1 = End of client-side seeking video frame sequence |
AVCVIDEOPACKET
Field | Type | Comment |
---|---|---|
AVCPacketType | UI8 | The following values are defined: 0 = AVC sequence header 1 = AVC NALU 2 = AVC end of sequence (lower level NALU sequence ender is not required or supported) |
CompositionTime | SI24 | IF AVCPacketType == 1 * Composition time offset ELSE * 0 See ISO 14496-12, 8.15.3 for an explanation of composition times. The offset in an FLV file is always in milliseconds. |
Data | UI8[n] | IF AVCPacketType == 0 * AVCDecoderConfigurationRecord (same information as avcC box in MP4/FLV files) ELSE IF AVCPacketType == 1 * One or more NALUs (can be individual slices per FLV pakcet; that is full frames are not strictly required) ELSE IF AVPacketType == 2 * Emtpy |
Data Tags
데이터 태그는 내용이 많아 간략하게 정리했다.
SCRIPTDATA
SCRIPTDATA는 AMF0로 인코딩된 데이터를 포함하고 있다.
AMF(Action Message Format) : Action Script의 객체 그래프(object graph)를 직렬화한(serialize) 바이너리 포맷, Adobe Flash에서 메시지를 주고 받는 목적으로도 사용된다.
Field | Type | Comment |
---|---|---|
Objects | SCRIPTDATAOBJECT[] | |
Object.ObjectName | SCRIPTDATASTRING | Name of the object |
Object.ObjectData | SCRIPTDATAVALUE | Data of the object |
Object.ObjectData.Type | UI8 | Type of the variable: 0 = Number 1 = Boolean 2 = String 3 = Object 4 = MovieClip (reserved, not supported) 5 = Null 6 = Undefined 7 = Reference 8 = ECMA array 9 = Object end marker 10 = Strict array 11 = Date 12 = Long string |
Object.ObjectData.ECMAArrayLength | IF Type == 8 * UI32 |
Approximate number of fields of ECMA array |
Object.ObjectData.ScriptDataValue | IF Type == 0 * DOUBLE IF Type == 1 * UI8 IF Type == 2 * SCRIPTDATASTRING IF Type == 3 * SCRIPTDATAOBJECT[n] IF Type == 7 * UI16 IF Type == 8 * SCRIPTDATAVARIABLE[ECMAArrayLength] IF Type == 10 * SCRIPTDATAVARIABLE[n] IF Type == 11 * SCRIPTDATADATE IF Type == 12 * SCRIPTDATALONGSTRING |
Script data value. IF Type == 8 (ECMA array type), the ECMAArrayLength provides a hint to the software about how many items might be in the array. The array continues until SCRIPTDATAVARIABLEEND appears. IF Type == 10 (strict array type), the array begins with a UI32 type and contains that exact number of items. The array does not terminate with a SCRIPTDATAVARIABLEEND tags. |
Object.ObjectData.ScriptDataValueTerminator | IF Type == 3 * SCRIPTDATAOBJECTEND IF Type == 8 * SCRIPTDATAVARIABLEEND |
Terminators for Object and Strict array lists |
ObjectEndMarker | UI24 | Always 9, also known as a SCRIPTDATAOBJECTEND |
onMetaData
FLV metadata object는 onMetaData 태그명을 가진 SCRIPTDATA를 통해 전달된다.
onMetaData는 RTMP Data Message에서 metadata를 주고 받는 용도로도 사용된다. (이때 데이터는 SCRIPTDATA 포맷을 사용하지 않고 ECMAArray 포맷만 사용하는 것으로 보인다.)
Property Name | Type | Comment |
---|---|---|
audiocodecid | Number | Audio codec ID used in the file (see E.4.2.1 for available SoundFormat values) |
audiodatarate | Number | Audio bit rate in kilobits per second |
audiodelay | Number | Delay introduced by the audio codec in seconds |
audiosamplerate | Number | Frequency at which the audio stream is replayed |
audiosamplesize | Number | Resolution of a single audio sample |
canSeekToEnd | Boolean | Indicating the last video frame is a key frame |
creationdate | String | Creation date and time |
duration | Number | Total duration of the file in seconds |
filesize | Number | Total size of the file in bytes |
framerate | Number | Number of frames per second |
height | Number | Height of the video in pixels |
stereo | Boolean | Indicating stereo audio |
videocodecid | Number | Video codec ID used in the file (see E.4.3.1 for available CodecID values) |
videodatarate | Number | Video bit rate in kilobits per second |
width | Number | Width of the video in pixels |
Reference
- https://en.wikipedia.org/wiki/Flash_Video
- https://heesu0.github.io/rfc/rtmp/video_file_format_spec_v10.pdf
- https://heesu0.github.io/rfc/rtmp/amf0-file-format-spec.pdf
- https://heesu0.github.io/rfc/rtmp/video_file_format_spec_v10_1.pdf