Decoding to Surface with AMediaCodec: Mastering Input/Output Queues and Buffer Release
In the previous chapter, we successfully extracted compressed video packets from an MP4. Now, we must feed these compressed payloads into a hardware decoder to transform them into raw, displayable images.
AMediaCodec is the NDK interface for media encoding and decoding. For our player, we operate it strictly in Decode mode.
Establish Intuition: The Decoder Model
A hardware decoder is an industrial processing plant with an intake valve and an exhaust valve.
Intake: Compressed Packets (e.g., H.264 NAL Units).
Exhaust: Decoded Raw Image Frames (e.g., YUV buffers or GraphicBuffers).
However, you cannot arbitrarily shove bytes into the decoder, nor can you directly read images out of it. AMediaCodec enforces a strict Asynchronous Buffer Queue model.
1. You request an empty Input Buffer from the codec.
2. You copy your compressed payload into this buffer.
3. You return the filled Input Buffer back to the codec.
4. The codec processes the data internally.
5. You request a finished Output Buffer from the codec.
6. You release the Output Buffer, instructing it to render to the Surface.
This protocol defines the existence of the four foundational APIs: dequeueInputBuffer, queueInputBuffer, dequeueOutputBuffer, and releaseOutputBuffer.
A Surface is Not a Bitmap
Surface is the ingestion endpoint for the Android Graphics System (SurfaceFlinger). If you configure AMediaCodec with a Surface, the decoded image frames will typically never touch your C++ memory space. The hardware decoder drops the pixels directly into a GraphicBuffer owned by the OS compositing pipeline.
This architecture is deliberately designed for Zero-Copy rendering. It prevents the CPU from thrashing while needlessly moving massive 4K YUV byte arrays across memory boundaries.
Instantiating the Decoder
You must instantiate the decoder using the exact mime type extracted from the track metadata (e.g., video/avc or video/hevc).
class CodecOwner {
public:
explicit CodecOwner(const std::string& mime) {
codec_ = AMediaCodec_createDecoderByType(mime.c_str());
}
~CodecOwner() {
if (codec_ != nullptr) {
AMediaCodec_stop(codec_);
AMediaCodec_delete(codec_);
}
}
AMediaCodec* get() const { return codec_; }
private:
AMediaCodec* codec_ = nullptr;
};
Warning: AMediaCodec_createDecoderByType is not guaranteed to succeed. The OEM hardware may lack a decoder for that specific MIME, or concurrent decoder limits may be exhausted. Production code must assert against nullptr.
Bridging Kotlin Surface to ANativeWindow
At the Kotlin tier, rendering targets are usually SurfaceView or TextureView.
// Kotlin Boundary
override fun surfaceCreated(holder: SurfaceHolder) {
nativeAttachSurface(nativeHandle, holder.surface)
}
override fun surfaceDestroyed(holder: SurfaceHolder) {
nativeDetachSurface(nativeHandle)
}
At the native boundary, you must cast the JNI Surface object to an ANativeWindow.
// Native Boundary
ANativeWindow* attachWindow(JNIEnv* env, jobject surface) {
return ANativeWindow_fromSurface(env, surface);
}
void releaseWindow(ANativeWindow* window) {
if (window != nullptr) {
ANativeWindow_release(window);
}
}
This allocation is strictly bound to the UI lifecycle. An ANativeWindow generated via ANativeWindow_fromSurface must be explicitly released when detached.
Configuring the Codec
AMediaCodec_configure binds the format and the rendering Surface to the hardware block.
bool configureDecoder(
AMediaCodec* codec,
AMediaFormat* format,
ANativeWindow* window
) {
media_status_t status = AMediaCodec_configure(
codec,
format,
window,
nullptr, // Crypto (DRM)
0 // Flags: 0 for Decoder, CONFIGURE_FLAG_ENCODE for Encoder
);
if (status != AMEDIA_OK) return false;
return AMediaCodec_start(codec) == AMEDIA_OK;
}
Ensure the final parameter is explicitly 0.
Feeding the Input Queue
dequeueInputBuffer requests a writable memory slot from the hardware.
bool feedInput(AMediaCodec* codec, const Packet& packet) {
// Timeout in microseconds. 10ms is standard for non-blocking poll.
const ssize_t index = AMediaCodec_dequeueInputBuffer(codec, 10000);
if (index < 0) {
return false; // Codec queue is saturated.
}
size_t capacity = 0;
uint8_t* dst = AMediaCodec_getInputBuffer(codec, index, &capacity);
if (dst == nullptr || packet.data.size() > capacity) {
// Fatal mismatch or EOF. Signal EOS to gracefully terminate the pipeline.
AMediaCodec_queueInputBuffer(
codec,
index,
0,
0,
packet.ptsUs,
AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM
);
return false;
}
// Execute memory copy
memcpy(dst, packet.data.data(), packet.data.size());
// Submit buffer back to hardware
media_status_t status = AMediaCodec_queueInputBuffer(
codec,
index,
0,
packet.data.size(),
packet.ptsUs,
packet.flags
);
return status == AMEDIA_OK;
}
The ptsUs metadata attached here is cryptographically passed through the decoding process and attached to the Output Buffer. Downstream AVSync relies entirely on this value surviving the pipeline.
Draining the Output Queue
dequeueOutputBuffer retrieves a fully processed image frame.
enum class DrainResult {
TryAgain,
FormatChanged,
Rendered,
Ended,
Error,
};
DrainResult drainOutput(AMediaCodec* codec) {
AMediaCodecBufferInfo info{};
const ssize_t index = AMediaCodec_dequeueOutputBuffer(codec, &info, 10000);
if (index == AMEDIACODEC_INFO_TRY_AGAIN_LATER) {
return DrainResult::TryAgain;
}
if (index == AMEDIACODEC_INFO_OUTPUT_FORMAT_CHANGED) {
// The hardware has altered dimensions or color formats.
AMediaFormat* outputFormat = AMediaCodec_getOutputFormat(codec);
// CRITICAL: Prevent memory leak.
AMediaFormat_delete(outputFormat);
return DrainResult::FormatChanged;
}
if (index < 0) {
return DrainResult::Error;
}
const bool eos = (info.flags & AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM) != 0;
const bool render = info.size > 0;
// THE MOST CRITICAL API CALL IN RENDERING
AMediaCodec_releaseOutputBuffer(codec, index, render);
return eos ? DrainResult::Ended : DrainResult::Rendered;
}
AMediaCodec_releaseOutputBuffer is the linchpin of the API. Official documentation enforces this rule: When you are finished with an output buffer, you must release it. For a Surface-configured decoder, the boolean parameter determines if the frame is pushed to the screen (true) or silently discarded (false).
If you fail to release an output buffer, the hardware queue will fill up, the decoder will deadlock, and playback will permanently freeze.
Utilizing releaseOutputBufferAtTime
AMediaCodec_releaseOutputBufferAtTime allows you to delegate rendering synchronization directly to SurfaceFlinger by providing an exact nanosecond monotonic system timestamp.
int64_t renderTimeNs = systemTimeNs + delayUs * 1000;
AMediaCodec_releaseOutputBufferAtTime(codec, index, renderTimeNs);
While powerful, it obscures the actual display timing. Beginners should exclusively use releaseOutputBuffer(codec, index, true) to construct their own manual AVSync loop before graduating to timestamp-delegated rendering.
The Decode Thread Event Loop
void decodeLoop(AMediaCodec* codec, PacketQueue* queue) {
bool inputEnded = false;
bool outputEnded = false;
while (!outputEnded) {
if (!inputEnded) {
Packet packet;
if (queue->pop(&packet)) {
inputEnded = (packet.flags & AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM) != 0;
feedInput(codec, packet);
}
}
DrainResult result = drainOutput(codec);
if (result == DrainResult::Ended) {
outputEnded = true;
}
}
}
This represents the absolute minimum functional loop. A production decoder loop must asynchronously process pause, seek, surface detachment, and hardware error recovery states.
Common Catastrophes
- Failure to Release: You acquire an output buffer but forget to call
releaseOutputBuffer. The symptom is playback halting abruptly after exactly ~5-15 frames. - Zombie Rendering: Rendering
trueafter theSurfacehas been destroyed by the OS. The symptom is a black screen on resume or an instantaneousSIGSEGVcrash. - Dirty Seeks: Executing a seek on the Extractor but failing to call
AMediaCodec_flush. The symptom is the screen flashing old visual data or aggressive macroblock corruption. - API Confusion: Attempting to invoke encoder-only APIs (like
createInputSurface) on a playback decoder.
Laboratory Verification
Provision a 5-second MP4 and isolate the video track. Execute without audio.
You should witness this execution trace:
decoder created: video/avc
configure ok
first packet ptsUs=0
output format changed (Hardware signals dynamic resolution)
first frame rendered
eos received
If the first frame never appears, audit the pipeline in this strict sequence:
1. Did the format include `csd-0`/`csd-1` bytes?
2. Is the `ANativeWindow` valid and currently attached?
3. Did `queueInputBuffer` return `AMEDIA_OK`?
4. Is `dequeueOutputBuffer` permanently trapped in `TRY_AGAIN_LATER`?
5. Did you explicitly execute `releaseOutputBuffer(codec, index, true)`?
Engineering Risks and Telemetry
The decode stage relies entirely on balancing the input/output queue vectors. Telemetry must track:
input_dequeue_count
input_queue_count
output_dequeue_count
output_release_count
format_changed_count
try_again_later_count
codec_recreate_count
If output_dequeue_count surpasses output_release_count, you are leaking hardware buffers.
If TRY_AGAIN_LATER climbs infinitely, the decoder is starved, deadlocked, or suffering thread starvation.
If render=true occurs on a dead Surface, you have a critical concurrency race.
Error handling must be stratified:
Transient Dequeue Timeout: Log and retry loop.
Persistent Configure Failure: Trigger Fatal Error -> Reset Player.
Surface Detached: Suspend Render Thread; wait for UI reattachment.
Seek Command: Flush codec -> Clear queues -> Reset Master Time.
Conclusion
AMediaCodec is not a "decode function"; it is a stateful hardware pipeline. Success demands flawless discipline regarding queue management, PTS passthrough, rigorous buffer releasing, and absolute alignment with the chaotic Android Surface lifecycle.