서론

RTMP를 통해 멀티미디어 데이터를 전송할 때, RTMP payload는 완전한 FLV 컨테이너가 아닌 FLV 태그 데이터 포맷으로 캡슐화된다.

이를 이해하기 위해 FLV와 그 구조에 대해 자세히 알아보자.


FLV (Flash Video)

FLV는 Adobe Systems에서 멀티미디어 데이터를 캡슐화하기 위해 개발한 컨테이너 포맷이다.

FLV는 상당히 오래된 컨테이너 포맷으로, MPEG-4 Part 12 컨테이너 포맷이 등장한 이후 잘 사용되지 않는다.


FLV Structure

그럼 FLV 구조에 대해 태그 위주로 알아보자. (더 자세한 내용은 공식 문서를 참고하자.)

FLV의 byte order는 big-endian이므로 프로토콜 분석 시 유의해야 한다.

image


FLV Header

image

Field Type Comment
Signature UI8 Signature byte always ‘F’ (0x46)
Signature UI8 Signature byte always ‘L’ (0x4C)
Signature UI8 Signature byte always ‘V’ (0x56)
Version UI8 File version (for example, 0x01 for FLV version 1)
TypeFlagsReserved UB[5] Must be 0
TypeFlagsAudio UB[1] Audio tags are present == 1
TypeFlagsReserved UB[1] Must be 0
TypeFlagsVideo UB[1] Video tags are present == 1
DataOffset UI32 The length of this header in bytes (The DataOffset field usually has a value of 9 for FLV version 1)
This field is present to accommodate larger headers in future versions.


FLV Body

image

Field Type Comment
PreviousTagSize0 UI32 Always 0
Tag1 FLVTAG First tag
PreviousTagSize1 UI32 Size of previous tag, including its header, in bytes. For FLV version 1, this value is 11 plus the DataSize of the previous tag
Tag2 FLVTAG Second tag
   
PreviousTagSizeN-1 UI32 Size of second-to-last tag, including its header, in bytes
TagN FLVTAG Last tag
PreviousTagSizeN UI32 Size of last tag, including its header, in bytes


FLV Tags

FLV의 각 태그 타입은 하나의 스트림을 구성한다. (즉, 동일한 타입의 복수 스트림을 가질 수 없다.)

RTMP payload가 캡슐화하는 포맷은 여기 있는 태그 데이터를 말한다.

image

Field Type Comment
TagType UB8 Type of contents in this tag. The following types are defined:
8 = audio
9 = video
18 = script data
all others: reserved
DataSize UI24 Length of the data in Data field
Timestamp UI24 Time in milliseconds at which the data in this tag applies.
This value is relative to the first tag in the FLV file, which always has a timestamp of 0.
In playback, the time sequencing of FLV tags depends on the FLV timestamps only.
Any timing mechanisms built into the payload data format shall be ignored.
TimestampExtended UI8 Extension of the Timestamp field to form a SI32 value.
This field represents the upper 8 bits,
while the previous Timestamp field represents the lower 24 bits of the time in milliseconds
StreamID UI24 Always 0
Data IF TagType == 8
* AUDIODATA
IF TagType == 9
* VIDEODATA
IF TagType == 18
* SCRIPTDATAOBJECT
Body of the tag


Audio Tags

AUDIODATA

image

Field Type Comment
SoundFormat UB[4] Format of SoundData. The following values are defined:
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser 16 kHz mono
5 = Nellymoser 8 kHz mono
6 = Nellymoser
7 = G.711 A-law logarithmic PCM
8 = G.711 mu-law logarithmic PCM
9 = reserved
10 = AAC
11 = Speex
14 = MP3 8 kHz
15 = Device-specific sound
Formats 7, 8, 14, and 15 are reserved.
AAC is supported in Flash Player 9,0,115,0 and higher.
Speex is supported in Flash Player 10 and higher.
SoundRate UB[2] Sampling rate. The following values are defined:
0 = 5.5 kHz
1 = 11 kHz
2 = 22 kHz
3 = 44 kHz (AAC always 3)
SoundSize UB[1] Size of each audio sample. This parameter only pertains to uncompressed formats.
Compressed formats always decode to 16 bits internally.
0 = 8-bit samples
1 = 16-bit samples
SoundType UB[1] Mono or stereo sound
0 = Mono sound (Nellymoser always 0)
1 = Stereo sound (AAC always 1)
SoundData UI8[size of sound data] IF SoundFormat == 10
* AACAUDIODATA
ELSE
* Sound data-varies by format

AACAUDIODATA

image

Field Type Comment
AACPacketType UI8 0: AAC Sequence header
1: AAC raw
Data UI8[n] IF AACPacketType == 0
* AudioSpecificConfig
ELSE IF AACPacketType == 1
* Raw AAC frame data


Video Tags

VIDEODATA

image

Field Type Comment
FrameType UB[4] Type of video frame. The following values are defined:
1 = key frame (for AVC, a seekable frame)
2 = inter frame (for AVC, a non-seekable frame)
3 = disposable inter frame (H.263 only)
4 = generated key frame (reserved for server use only)
5 = video info/command frame
CodecID UB[4] Codec Identifier. The following values are defined:
1 = JPEG (currently unused)
2 = Sorenson H.263
3 = Screen video
4 = On2 VP6
5 = On2 VP6 with alpha channel
6 = Screen video version 2
7 = AVC
VideoData IF CodecID == 2
* H263VIDEOPACKET
IF CodecID == 3
* SCREENVIDEOPACKET
IF CodecID == 4
* VP6FLVVIDEOPACKET
IF CodecID == 5
* VP6FLVALPHAVIDEOPAKCET
IF CodecID == 6
* SCREENV2VIDEOPACKET
IF CodecID == 7
* AVCVIDEOPACKET
Video frame payload or UI8.
IF FrameType == 5, instead of a video payload, the message stream contains a UI8 with the following meaning:
0 = Start of client-side seeking video frame sequence
1 = End of client-side seeking video frame sequence

AVCVIDEOPACKET

image

Field Type Comment
AVCPacketType UI8 The following values are defined:
0 = AVC sequence header
1 = AVC NALU
2 = AVC end of sequence (lower level NALU sequence ender is not required or supported)
CompositionTime SI24 IF AVCPacketType == 1
* Composition time offset
ELSE
* 0
See ISO 14496-12, 8.15.3 for an explanation of composition times.
The offset in an FLV file is always in milliseconds.
Data UI8[n] IF AVCPacketType == 0
* AVCDecoderConfigurationRecord (same information as avcC box in MP4/FLV files)
ELSE IF AVCPacketType == 1
* One or more NALUs (can be individual slices per FLV pakcet; that is full frames are not strictly required)
ELSE IF AVPacketType == 2
* Emtpy


Data Tags

데이터 태그는 내용이 많아 간략하게 정리했다.

SCRIPTDATA

SCRIPTDATA는 AMF0로 인코딩된 데이터를 포함하고 있다.

AMF(Action Message Format) : Action Script의 객체 그래프(object graph)를 직렬화한(serialize) 바이너리 포맷, Adobe Flash에서 메시지를 주고 받는 목적으로도 사용된다.

Field Type Comment
Objects SCRIPTDATAOBJECT[]  
Object.ObjectName SCRIPTDATASTRING Name of the object
Object.ObjectData SCRIPTDATAVALUE Data of the object
Object.ObjectData.Type UI8 Type of the variable:
0 = Number
1 = Boolean
2 = String
3 = Object
4 = MovieClip (reserved, not supported)
5 = Null
6 = Undefined
7 = Reference
8 = ECMA array
9 = Object end marker
10 = Strict array
11 = Date
12 = Long string
Object.ObjectData.ECMAArrayLength IF Type == 8
* UI32
Approximate number of fields of ECMA array
Object.ObjectData.ScriptDataValue IF Type == 0
* DOUBLE
IF Type == 1
* UI8
IF Type == 2
* SCRIPTDATASTRING
IF Type == 3
* SCRIPTDATAOBJECT[n]
IF Type == 7
* UI16
IF Type == 8
* SCRIPTDATAVARIABLE[ECMAArrayLength]
IF Type == 10
* SCRIPTDATAVARIABLE[n]
IF Type == 11
* SCRIPTDATADATE
IF Type == 12
* SCRIPTDATALONGSTRING
Script data value.
IF Type == 8 (ECMA array type),
the ECMAArrayLength provides a hint to the software about how many items might be in the array.
The array continues until SCRIPTDATAVARIABLEEND appears.
IF Type == 10 (strict array type),
the array begins with a UI32 type and contains that exact number of items.
The array does not terminate with a SCRIPTDATAVARIABLEEND tags.
Object.ObjectData.ScriptDataValueTerminator IF Type == 3
* SCRIPTDATAOBJECTEND
IF Type == 8
* SCRIPTDATAVARIABLEEND
Terminators for Object and Strict array lists
ObjectEndMarker UI24 Always 9, also known as a SCRIPTDATAOBJECTEND

onMetaData

FLV metadata object는 onMetaData 태그명을 가진 SCRIPTDATA를 통해 전달된다.

onMetaData는 RTMP Data Message에서 metadata를 주고 받는 용도로도 사용된다. (이때 데이터는 SCRIPTDATA 포맷을 사용하지 않고 ECMAArray 포맷만 사용하는 것으로 보인다.)

Property Name Type Comment
audiocodecid Number Audio codec ID used in the file (see E.4.2.1 for available SoundFormat values)
audiodatarate Number Audio bit rate in kilobits per second
audiodelay Number Delay introduced by the audio codec in seconds
audiosamplerate Number Frequency at which the audio stream is replayed
audiosamplesize Number Resolution of a single audio sample
canSeekToEnd Boolean Indicating the last video frame is a key frame
creationdate String Creation date and time
duration Number Total duration of the file in seconds
filesize Number Total size of the file in bytes
framerate Number Number of frames per second
height Number Height of the video in pixels
stereo Boolean Indicating stereo audio
videocodecid Number Video codec ID used in the file (see E.4.3.1 for available CodecID values)
videodatarate Number Video bit rate in kilobits per second
width Number Width of the video in pixels


Reference

  • https://en.wikipedia.org/wiki/Flash_Video
  • https://heesu0.github.io/rfc/rtmp/video_file_format_spec_v10.pdf
  • https://heesu0.github.io/rfc/rtmp/amf0-file-format-spec.pdf
  • https://heesu0.github.io/rfc/rtmp/video_file_format_spec_v10_1.pdf

카테고리:

업데이트: