Falsehoods programmers believe about [video]

4 hours ago 3

(Except when there is)

Home All posts Atom feed

by Niklas Haas on December 25, 2016

Tagged as: mpv, video.

Inspired by numerous other such lists of falsehoods. Pretty much every video player in existence gets a good chunk if not the vast majority of these wrong. (Some of these also/mostly apply to users, though)

.. video decoding

decoding is bit-exact, so the decoder used does not affect the quality
since H.264 decoding is bit-exact, the decoder used does not affect the quality¹
hardware decoding means I don’t have to worry about performance
hardware decoding is always faster than software decoding
a H.264 hardware decoder can decode all H.264 files
a H.264 software decoder can decode all H.264 files
video decoding is easily parallelizable

.. video playback

the display’s refresh rate will be an integer multiple of the video file’s frame rate
the display’s clock will be in sync with the audio clock
I can accurately measure the display’s clock
I can accurately measure the audio clock
I can exclusively use the audio clock for timing
I can exclusively use the video clock for timing
my hardware contexts will survive the user’s coffee break
my hardware contexts will never disappear in the middle of playback
I can always request a new hardware context after my previous one disappeared
it’s okay to error and quit if I can’t request a hardware context
hardware decoding and video playback will happen on the same device
transferring frames from one device to another is easy
the user will not notice 3:2 pulldown
the user will not notice the odd dropped or duplicated frame
all video frames will be unique
all video frames will be decoded in order
all video sources can be seeked in
the user will never want to seek to non-keyframes
seeking to a position will produce the same output as decoding to a position
I can seek to a specific frame number
videos have a fixed frame rate
all frame timestamps are precise
all frame timestamps are precise in modern formats like .mkv
all frame timestamps are monotonically increasing
all frame timestamps are monotonically increasing as long as you don’t seek
all frame timestamps are unique
the duration of the final video frame is always known
users will not notice if I skip the final video frame
users will never want to play videos in reverse
users will not notice if I skip a video frame when pausing

.. video/image files

all video files have 8-bit per channel color
all video files have 8-bit or 10-bit per channel color
fine, but at least all channels are going to have the same number of bits
all samples are going to fit into a 32-bit integer
every pixel consists of three samples
every pixel consists of three or four samples
fine, every pixel consists of n samples
all images files are sRGB
all video files are BT.601 or BT.709
all image files are either sRGB or contain an ICC profile
4:2:0 is the only way to subsample images
all image files contain correct tags indicating their color space
interlaced video files no longer exist
I can detect whether a file is interlaced or not
the chroma location is the same for every YCbCr file
all HD videos are BT.709
video files will have the same refresh rate throughout the stream
video files will have the same resolution throughout the stream
video files will have the same color space throughout the stream
video files will have the same pixel format throughout the stream
fine, videos will have the same video codec throughout the stream
the video and audio tracks will start at the same time
the video and audio tracks will both be present throughout the stream
I can start playing an audio file at the first decoded sample, and stop playing it at the last
virtual timelines can be implemented on the demuxer level
adjacent frames will have similar durations
all multimedia formats have easily identifiable headers
a file will never be a legal JPEG and MP3 at the same time
applying heuristics to guess the right filetype is easy

.. image scaling

the GPU’s built-in bilinear scaling is sufficient for everybody
bicubic scaling is sufficient for everybody
the image can just be scaled in its native color space
I should linearize before scaling
I shouldn’t linearize before scaling
upscaling is the same as downscaling
the quality of scaling algorithms can be objectively measured
the slower a scaling algorithm is to compute, the better it will be
upscaling algorithms can invent information that doesn’t exist in the image
my scaling ratio is going to be the same in the x axis and the y axis
chroma upscaling isn’t as important as luma upscaling
chroma and luma can/should be scaled separately
I can ignore sub-pixel offsets when scaling and aligning planes
I should always take sub-pixel offsets into account when scaling
images contain no information above the Nyquist frequency
images contain no information outside the TV signal range

.. color spaces

all colors are specified in (R,G,B) triples
all colors are specified in RGB or CMYK
fine, all colors are specified in RGB, CMYK, HSV, HSL, YCbCr or XYZ
there is only one RGB color space
there is only one YCbCr color space for each RGB color space
fine, there is only one YCbCr color space for each RGB color space up to linear isomorphism
an RGB triple unambiguously specifies a color
an RGB triple + primaries unambiguously specifies a color
fine, a CIE XYZ triple unambiguously specifies a color
black is RGB (0,0,0), and white is RGB (255,255,255)
all color spaces have the same white point
color spaces are defined by the RGB primaries and white point
my users are not going to notice the difference between BT.601 and BT.709
there’s only one BT.601 color space
TV range YCbCr is the same thing as TV range RGB
full-range YCbCr doesn’t exist
standards bodies can agree on what full-range YCbCr means
b-bit full range means the interval [0, 2^b-1]
a full range 8-bit color value of 255 maps to the float 1.0
color spaces are two-dimensional
“linear light” means “linear light”
information outside of the interval [0,1] should always be discarded/clamped
all gamma curves are well defined outside of the interval [0,1]
HDR encoding is about making the image brighter
HDR encoding means darker blacks

.. color conversion

I don’t need to convert an image’s colors before displaying it on the screen
all color spaces are just linearly related
there’s only one way to convert between color spaces
I can just clip out-of-gamut colors after conversion
there’s only one way to pull 10-bit colors up to 16-bit precision
linearization happens after RGB conversion
I can freely convert between color spaces as long as I allow out-of-gamut colors
converting between color spaces is a mathematical process so it doesn’t depend on the display
converting from A to B is just the inverse of converting from B to A
the OOTF is conceptually part of the OETF
the OOTF is conceptually part of the EOTF
all OOTFs are reversible
all CMMs implement color conversion correctly
all professional CMMs implement color conversion correctly
I don’t need to dither after converting if the target colorspace is the same bit depth or higher
converting between bit depths is just a logical shift
converting between bit depths is just a multiplication
all ICC profiles contain tables for conversion in both directions
HDR tone-mapping is well-defined
HDR tone-mapping is well-defined if you know the source and target display capabilities
HDR metadata will always match the video stream
you can easily convert between PQ and HLG
you can easily convert between PQ and HLG if you know the mastering display’s metadata
converting from A to linear light to B gives you the same result as converting from A to B

.. video output

the graphics API will dither my output for me
there’s only one way to dither output
I need to dither to whatever my backbuffer precision is
dithering with random noise looks good
dithering artifacts are not visible at 6-bit precision
dithering artifacts are not visible at 7-bit precision
dithering artifacts are not visible at 8-bit precision
temporal dithering is better than static dithering
OpenGL is well-supported on all operating systems
OpenGL is well-supported on any operating system
waiting until the next vsync is easy in OpenGL
video drivers correctly implement the texture formats they advertise
I can accurately measure vsync timings
vsync timings are consistent for a fixed refresh rate
all displays with the same rate will vsync at the same time
I can control the window size and position

.. displays

all displays are 60 Hz
all refresh rates are integers
all displays have a fixed refresh rate
all displays are sRGB
all displays are approximately sRGB
displays have an infinite contrast
all displays have a contrast of around 1000:1
all displays have a white point of D65
all displays have square pixels
all displays use 8-bit per channel color
all displays are PC displays
my users will provide an ICC profile for their display
my users will only use a single display
my users will only use a single display for the duration of a video
all ICC profiles for displays will have the same rendering intent
all ICC profiles for displays will be black-scaled
all ICC profiles for displays won’t be black-scaled

.. subtitles

all subtitle files are UTF-8 encoded
all subtitles are stored/rendered as RGB
I can paint RGB subtitles on top of my RGB video files
I don’t need to worry about color management for subtitles
the subtitle color space will be the same as the video color space
rendering subtitles at the output resolution is always better than rendering them at the video resolution
there’s an ASS specification

Read Entire Article