Question:
curious to know about different video , audio and picture formats?
anonymous
2006-05-02 10:47:28 UTC
could some body explain me what are the different kind of audio .. video and picture format ..what;s the difference between them ..i mean why cant we use one in place of another ...what's conversion of formats do exactly ..or any good place where i can get this information.
Three answers:
Saurabhz
2006-05-02 11:07:35 UTC
Dear frnd,

why so more formats .. just think we will be happy in one or two formats but why so many of them...lets try to categorize them:



1. raster vector difference : raster format like we see in our paint format (BMP) are simply pixles fomation which will dither when we zoom. but easily created. on the oter hand vector format is scripted format (used by autocad etc.) can be zoomed to many times without any ditheration.



2. Compress formats: like jpg coded specifically.



3. gif: animated image formates on layers.



4. OS specific formats: some used on one Operating system and some on another.



5. Program specific formats: Photoshop used psd..corel used cdr..there unique formats to store program specific info.



these info also applied to video formats too some are open formats(dat,mpeg) some are compress (mpeg4,divx) some made specifically for one players because they are not free (real = avi,quicktime = mov.)



major hand in format change is because of commercial usages ie you hav to pay somewhere for using these formats.



How ull play with these formats:

for image = download irfanview (www.irfanview.com)to open each format.

for video = find K-Lite codec pack at www.k-litecodecpack.com

for music = old winamp is perfect at www.winamp.com
?
2016-05-20 06:43:17 UTC
He is right. You need to have kodeks to be ale t play different kinds of audio and video.
anonymous
2006-05-02 11:25:16 UTC
Multimedia Technology Basics: FourCCs, AVI Codecs, ASF Codecs, WAV Codecs,

MOV Codecs, RM Codecs, YUV Codecs, RGB Codecs, Lossy and Lossless Codecs

and More

by Mike Melanson (mike at multimedia.cx)

v1.1: September 25, 2005





Copyright (c) 2003-2005 Mike Melanson

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.2

or any later version published by the Free Software Foundation;

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

A copy of the license is included in the section entitled "GNU

Free Documentation License".





Contents

--------

* Introduction

* Codecs

* FourCCs

* Multimedia Files

* What Application Can "Play" This File?

* RGB and YUV Colorspaces

* References

* Acknowledgements

* Changelog

* GNU Free Documentation License





Introduction

------------

This document is intended as a very brief overview of assorted technical

topics that will help a developer begin to understand computer

multimedia technology. There are many other references for the verbose

theory underlying certain of the presented concepts, particularly YUV

colorspaces; this document is long on technical explanation and short on

abstract concepts.



I run a technical multimedia website. Occasionally, I browse my ISP's web

server logs which contain information about search engine queries that

brought visitors to my site. I am curious to know if people are actually

finding what they are looking for.



I often see web log records that indicate visitors looking for "asf codec"

or "mov codec" or "yuv codec". With any luck, the search engines will

index this document and point visitors to more useful information.





Codecs

------

"Codec" is an abbreviation for COder/DECoder. Briefly, this refers to any

algorithm that codes data into another form and then decodes the coded

data in order to recover the original data (more or less). In the context

of multimedia technology, this means taking raw audio or video data, which

tends to be enormous, and sending it through a coder algorithm to

compress it to a considerably smaller size. Then it is stored on disk,

transmitted over the network, etc. until it is time to play it back. At

such time, the compressed data is sent through the decoder portion of the

decoder algorithm which reconstructs the original audio or video data for

playback.



Actually, a large majority of multimedia codecs do not reconstruct the

original audio or video data upon decompression. These codecs fall into

the category "lossy". Codecs that reconstruct the original data exactly

upon decompression are categorized as "lossless". Why would it be okay to

lose information during encoding? Many multimedia codecs throw away subtle

pieces of information which, according to empirical research, have little

impact on human perception. As a very simple example, 2 adjacent pixels

might be so close in color that the coder declares them to be the same

color and codes them together as "2 x color1" instead of "1 x color1, 1 x

color2". The decoded data will not be exactly the same as the original

data, but the goal is to be able to reconstruct a picture that will be

"good enough".





FourCCs

-------

A FourCC is short for "four-character code". FourCCs are very commonly

seen in multimedia files in order to identify audio or video codecs, as

well as to mark boundaries within the file.



A FourCC is generally comprised of 4 ASCII range characters which, when

examined as a hex dump, form a human-readable, four-character string. For

example:



08 77 73 74 62 6C 00 00 00 7F 73 74 73 64 00 00 .wstbl....stsd..

00 00 00 00 00 01 00 00 00 6F 53 56 51 33 00 00 .........oSVQ3..



This is taken from an Apple QuickTime file. There is at least one FourCC

('stsd') and two more ('stbl' and 'SVQ3') which are not immediately

discernible since the 'w' and 'o' characters preceding them are valid

ASCII characters.



Since a FourCC is made up of 4 ASCII bytes and each byte is 8 bits, a

FourCC is 32 bits long. This works well with modern 32-bit CPUs. As seen

in the above example, 'SVQ3' is also represented as 0x53565133 in

big-endian hexadecimal notation, or 0x33515653 in little-endian hex

notation. Such knowledge alleviates the need for memcmp() and strncmp()

functions when scanning for FourCCs.



It is important to note that FourCCs do not necessarily need to contain

4 valid alphanumeric ASCII characters. For example, there are a variety of

FourCCs in the QuickTime format which are well outside the range.





Multimedia Files

----------------

Many multimedia files that carry both audio and video bear extensions

such as .avi (Microsoft AVI files), .asf (a.k.a., .wmv and .wma,

collectively known as Microsoft ASF files), .mov (Apple QuickTime files),

and .rm (RealMedia files). Confusion often arises as one wonders what

application can, for example, "play .mov files". That is a very difficult

question to answer and here is why:



All of the formats mentioned in the preceding paragraph are also

referred to as multimedia container formats. All they do is pack chunks

of audio and video data together, interleaved, along with some

instructions to inform a playback application how the data is to be

decoded and presented to the user. This is the typical layout of many

multimedia file formats:



file header

title, creator, other meta-info

video header

video codec FourCC

width, height, colorspace, playback framerate

audio header

audio codec FourCC

bits/sample, playback frequency, channel count

file data

encoded audio chunk #0

encoded video chunk #0

encoded audio chunk #1

encoded video chunk #1

encoded audio chunk #2

encoded video chunk #2

encoded audio chunk #3

encoded video chunk #3

..

..



Those audio and video chunks can be encoded with any number of audio or

video codecs, the FourCCs of which are specified in the file header.



See The Almost Definitive FourCC Definition List listed in the reference

for more information on the jungle of FourCCs out there, and where they

commonly appear.





What Application Can "Play" This File?

--------------------------------------

Here comes the big question. You have some random Apple QuickTime file.

Perhaps you are running some non-Microsoft, non-Apple operating system and

there is no official Apple QuickTime application available. Is there a

program that can "play" the QT file?



Since a QuickTime file can contain many different types of audio or video

data, it is not enough to be able to simply decode the QuickTime container

format; the audio and video codec formats must be supported as well.



This is why there is no simple answer to whether or not a particular

multimedia application can "play" a type of multimedia container file

format. A player application needs to be able to decode the container

format and decode the audio and video codec formats inside.





Interleaving

------------

Interleaving is the process of storing alternating audio and video chunks

in the data section of a multimedia file:



encoded audio chunk #0

encoded video chunk #0

encoded audio chunk #1

encoded video chunk #1

encoded audio chunk #2

encoded video chunk #2

..

..

encoded audio chunk #n

encoded video chunk #n



Why is this done? Why not just place all of the video data in the file,

followed by all of the audio data? For example:



encoded video chunk #0

encoded video chunk #1

encoded video chunk #2

..

..

encoded video chunk #n

encoded audio chunk #0

encoded audio chunk #1

encoded audio chunk #2

..

..

encoded audio chunk #n



Conceptually, this appears to be a valid solution. In practice, however,

it falls over. Assuming these audio and video streams are part of the same

file on the same disk (almost always the case), there is a physical

mechanism called the disk read head which has to constantly make a leap

between two different positions on the disk. When the chunks are

interleaved, the read head does not need to seek at all; it can read all

the data off in a contiguous fashion.





RGB and YUV Colorspaces

-----------------------



There are two general families of colorspaces for video: RGB and YUV. If

you have any experience with computer graphics at all, you have probably

been exposed to the red-green-blue (RGB) colorspace. More specifically,

you have probably seen packed RGB colorspaces. A packed colorspace has

all of the elements interleaved. For example, a packed RGB24 colorspace

with 8 bits for each R, G, or B element, is laid out in memory as:



R G B R G B R G B ...



Sometimes, the opposite ordering is required. This would be expressed as

BGR24:



B G R B G R B G R ...



24 bits is awkward for many CPUs; 32 bits is far more conducive.

Therefore, packed 32-bit RGB formats are often used for video output in

the interest of speed. When this is done, a fourth component, usually

labeled 'A', is added.



ARGB: A R G B A R G B A R G B

BGRA: B G R A B G R A B G R A



Sometimes, the 'A' component actually represents an alpha transparency

value, used for blending RGB images together. For video playback, it is

generally ignored.



There are also many variations of 15- and 16-bit packed RGB formats. For

example, a RGB15 format may pack 5 bits for each component into the lower

15 bits of a 2-byte word and leave the top bit for some other use:



byte 0 byte 1

Xrrrrrgg gggbbbbb



Of course, how those 2 bytes are stored in memory (high or low byte

first) is dependent upon the application. BGR15 may also be seen. RGB16

formats typically allocate an extra bit for green:



byte 0 byte 1

rrrrrggg gggbbbbb



Many older video codecs rely on packed RGB colorspaces since that is

what the hardware was capable of displaying natively. Certain modern

codecs still used RGB colorspaces if the source material is conducive,

i.e., if it is non-photorealistic or just plain simple.



However, many modern video codecs rely on a YUV colorspace. 'YUV' is a

frustrating acronym since it is so difficult to guess what the letters

could possibly stand for. The colorspace was originally known as YCbCr,

with the 'b' and 'r' characters written as subscripts. This is what the

components break down as:



Y = luminance, or intensity

U = Cb = blue chrominance value

V = Cr = red chrominance value



Where is green represented? Green can be derived from the Y, U, and V

values. See the references for more information on converting YUV to RGB

and back.



Note that with RGB colorspaces, every single pixel has a different R, G,

and B sample. The same is not true with YUV colorspaces. YUV operates on

the empirical evidence that the human eye is more sensitive to variations

in the intensity of a pixel rather than variations in color. Thus, every

pixel in a YUV image has an associated Y sample, but groups of pixels

share U and V samples.



For example, examine the YUY2 colorspace, a.k.a., YUV 4:2:2 or just

YUV422. This is a packed YUV colorspace, which means that the Y, U, and V

samples are interleaved. The YUV data is laid out in memory as follows

(each sample is one byte):



Y0 U Y1 V Y0 U Y1 V Y0 U Y1 V



Each group of 4 bytes represents 2 pixels. The first pixel is represented

by (Y0, U, V) and the second by (Y1, U, V). So each pixel gets a Y sample

but has to share a U and a V sample.



Perhaps the most common YUV format is I420, a.k.a. YUV 4:2:0 or just

YUV420. This is the format used in JPEG, MPEG, and many other modern video

codecs. The most notable difference between this colorspace and any other

discussed up to this point is that it is a planar format, not a packed

format. This means that when the data is stored in memory-- all of the Y

data is stored first, then all of the U data, then all of the V data.



In I420 data, pixels are grouped in 2x2 blocks:



p0 p1

p2 p3



For each 2x2 block, each pixel is presented by a Y sample. But each pixel

in the block shares a U and a V sample:



Y0 Y1 U V

Y2 Y3



As a highly contrived example, consider a I420 image that is 6x2 pixels.



p0 p1 p2 p3 p4 p5

p6 p7 p8 p9 p10 p11



The image will be broken up into 3 2x2 blocks for the purpose of

representing it as I420:



p0 p1 | Y0 Y1 U0 V0

p6 p7 | Y6 Y7

|

p2 p3 | Y2 Y3 U1 V1

p8 p9 | Y8 Y9

|

p4 p5 | Y4 Y5 U2 V2

p10 p11 | Y10 Y11



The planes of data will be stored in memory as:



Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 U0 U1 U2 V0 V1 V2



Another common planar format is YV12. This is precisely the same as I420

except that U and V data planes are reversed.



Another planar YUV format occasionally seen is YUV9, a.k.a. YUV 4:1:0 or

just YUV410. This is equivalent to I420 except that an image is broken

into 4x4 pixel blocks. Each pixel gets its own Y sample while each block

shares one U and one V sample over the entire block.



Of course, there is also non-subsampled planar YUV available, YUV 4:4:4.

In other words, every pixel is represented by a Y, U, and V sample.



Notice that YUY2, I420, and YUV9 are all valid FourCCs. Where do these

FourCCs come from? I strongly suspect it is related to how many bits or

bytes are required to store a single pixel, on average. For YUY2 data, 4

bytes represent 2 pixels, so 2 bytes on average are required to

represent 1 pixel. In I420 data:



4 + 1 + 1 = 6 bytes * 8 bits/byte = 48 bits / 4 pixels = 12 bits/pixel



And for YUV9 data:



(16 + 1 + 1) * 8 = 144 bits / 16 pixels = 9 bits/pixel



Note that it is conceptually possible for RGB data to be stored in a

planar manner rather than packed. In practice, this is rarely done.





References

----------

The Almost Definitive FOURCC Definition List

http://www.fourcc.org/



RGB/YUV Pixel Conversion

http://www.fourcc.org/fccyvrgb.htm





Acknowledgements

----------------

Torben Nielsen (torben at Hawaii.Edu) for correcting



Diego Biurrun (diego at biurrun.de) for cosmetic English composition

fixes.





Changelog

---------

v1.1: September 25, 2005

- replaced YV12 with I420 (correct FourCC) and noted what YV12 really

means

- English composition fixes



v1.0: June 14, 2003

- initial release





GNU Free Documentation License

------------------------------

see http://www.gnu.org/licenses/fdl.html


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...