dlib/examples/ffmpeg_video_decoding_ex.cpp

// The contents of this file are in the public domain. See LICENSE_FOR_EXAMPLE_PROGRAMS.txt
/*

    This is an example illustrating the use of the ffmpeg wrappers, 
    in this case the decoding API.

    This is a pretty simple example. It loads a raw codec file, parses chunks of 
    data to the decoder and plots images to a GUI window.

    Background about video files:

        Using FFMpeg's terminology, a video/audio file has the following structure:
            - container:
                - stream 0
                - stream 1
                - stream ...
        A `container` is a file format like MP4, MP3, WAV.
        A `stream` is encoded media like video, audio ( or subtitles) 
        using a codec like H264, H265, VP9, AAC, A3C, etc.

        MP4 isn't a codec and H264 isn't (strictly speaking) a file format. 
        The first describes a packet structure for saving encoded streams to file, 
        it contains header information, trailer information, and describes how to 
        interleave multiple streams in a file.
        The later is a protocol for compressing raw media streams into something smaller in size, 
        suitable for saving to file, transmitting over a network connection or adding
        to a `container` file.
        Note, FFMpeg treats network protocols like HTTP, RTMP, RTSP as containers.

        Dlib's dlib::ffmpeg::demuxer class reads `container` files like MP4, MP3 or RTSP streams, 
        extracts and decodes each stream.

        Dlib's dlib::ffmpeg::decoder class reads raw encoded DATA like H264 or PCM data
        and decodes it to images or audio frames.
*/

#include <cstdio>
#include <dlib/media.h>
#include <dlib/gui_widgets.h>
#include <dlib/cmd_line_parser.h>

using namespace std;
using namespace dlib;

int main(const int argc, const char** argv)
try
{
    command_line_parser parser;
    parser.add_option("i",      "input video encoded stream. e.g. dlib/test/ffmpeg_data/MOT20-08-raw.h264", 1);
    parser.add_option("codec",  "codec name. e.g. h264", 1);
    parser.set_group_name("Help Options");
    parser.add_option("h",      "alias of --help");
    parser.add_option("help",   "display this message and exit");

    parser.parse(argc, argv);
    const char* one_time_opts[] = {"i"};
    parser.check_one_time_options(one_time_opts);

    if (parser.option("h") || parser.option("help"))
    {
        parser.print_options();
        return 0;
    }

    const std::string filepath = get_option(parser, "i", "");
    const std::string codec    = get_option(parser, "codec", "h264");

    image_window win;

    ffmpeg::decoder::args args;
    args.args_codec.codec_name = codec;

    ffmpeg::decoder dec(args);
    if (!dec.is_open())
    {
        printf("Failed to create decoder.\n");
        return EXIT_FAILURE;
    }

    dlib::ffmpeg::frame     frame;
    array2d<rgb_pixel>      img;
    ffmpeg::decoder_status  status{ffmpeg::DECODER_EAGAIN};

    const auto pull = [&]
    {
        while ((status = dec.read(frame)) == ffmpeg::DECODER_FRAME_AVAILABLE)
        {
            if (frame.is_image() && frame.pixfmt() == AV_PIX_FMT_RGB24)
            {
                convert(frame, img);
                win.set_image(img);
            }
        }
    };

    ifstream fin{filepath, std::ios::binary};
    std::vector<char> buf(1024);

    while (fin && status != ffmpeg::DECODER_CLOSED)
    {
        fin.read(buf.data(), buf.size());
        size_t ret = fin.gcount();
        dec.push_encoded((const uint8_t*)buf.data(), ret);
        pull();
    }

    dec.flush();
    pull();

    return EXIT_SUCCESS;
}
catch (const std::exception& e)
{
    printf("%s\n", e.what());
    return EXIT_FAILURE;
}
FFMPEG wrappers: dlib::ffmpeg::decoder and dlib::ffmpeg::demuxer (#2707) * - added ffmpeg stuff to cmake * - added observer_ptr * ffmpeg utils * WIP * - added ffmpeg_decoder * config file for test data * another test file * install ffmpeg * added ffmpeg_demuxer * install all ffmpeg libraries * support older version of ffmpeg * simplified loop * - test converting to dlib object - added docs - support older ffmpeg * added convert() overload * added comment * only register stuff when API not deprecated * - fixed version issues - fixed decoding * added tests for ffmpeg_demuxer * removed unused code * test GIF * added docs * added audio test * test for audio * more tests * review changes * don't need observer_ptr * made deps public. I could be wrong but just in case. * - added some static asserts. Some areas of the code might do memcpy's on arrays of pixels. This requires the structures to be packed. Check this. - added convert() functions - changed default decoder options. By default, always decode to RGB and S16 audio - added convenience constructor to demuxer * - no longer need opencv * oops. I let that slip * - made a few functions public - more precise requires clauses * enhanced example * - avoid FFMPEG_INITIALIZED being optimized away at link time - added decoding example * - avoid -Wunused-parameter error * constexpr and noexcept correctness. This probably makes no difference to performance, BUT, it's what the core guidelines tell you to do. It does however demonstrate how complicated and unecessarily verbose C++ is becoming. Sigh, maybe one day i'll make the switch to something that doesn't make my eyes twitch. * - simplified metadata structure * hopefully more educational * added another example * ditto * typo * screen grab example * whoops * avoid -Wunused-parameter errors * ditto * - added methods to av_dict - print the demuxer format options that were not used - enhanced webcam_face_pose_ex.cpp so you can set webcam options * if height and width are specified, attempt to set video_size in format_options. Otherwise set the bilinear resizer. * updated docs * once again, the ffmpeg APIs do a lot for you. It's a matter of knowing which APIs to call. * made header-only * - some Werror thing * don't use type_safe_union * - templated sample type - reverted deep copy of AVFrame for frame copy constructor * - added is_pixel_type and is_pixel_check * unit tests for pixel traits * enhanced is_image_type type trait and added is_image_check * added unit tests for is_image_type * added pix_traits, improved convert() functions * bug fix * get rid of -Werror=unused-variable error * added a type alias * that's the last of the manual memcpys gone. We'using ffmpeg API everywhere now for copying frames to buffers and back * missing doc * set framerate for webcam * list input devices * oops. I was trying to make ffmpeg 5 happy but i've given up on ffmpeg v5 compatibility in this PR. Future PR. * enhanced the information provided by list_input_devices and list_output_devices * removed vscode settings.json file * - added a type trait for checking whether a type is complete. This is useful for writing type traits that check other types have type trait specializations. But also other useful things. For example, std::unique_ptr uses something similar to this. * Davis was keen to simply check pixel_traits is specialised. That's equivalent to checking pixel_traits<> is complete for some type * code review * juse use the void_t in dlib/type_traits.h * one liners * just need is_image_check * more tests for is_image_type * i think this is correct * removed printf * better docs * Keep opencv out of it * keep old face pose example, then add new one which uses dlib's ffmpeg wrappers * revert * revert * better docs * better docs --------- Co-authored-by: pf <pf@me> 2023-01-30 09:17:34 +08:00			`// The contents of this file are in the public domain. See LICENSE_FOR_EXAMPLE_PROGRAMS.txt`
			`/*`

			`This is an example illustrating the use of the ffmpeg wrappers,`
			`in this case the decoding API.`

			`This is a pretty simple example. It loads a raw codec file, parses chunks of`
			`data to the decoder and plots images to a GUI window.`

			`Background about video files:`

			`Using FFMpeg's terminology, a video/audio file has the following structure:`
			`- container:`
			`- stream 0`
			`- stream 1`
			`- stream ...`
			A `container` is a file format like MP4, MP3, WAV.
			A `stream` is encoded media like video, audio ( or subtitles)
			`using a codec like H264, H265, VP9, AAC, A3C, etc.`

			`MP4 isn't a codec and H264 isn't (strictly speaking) a file format.`
			`The first describes a packet structure for saving encoded streams to file,`
			`it contains header information, trailer information, and describes how to`
			`interleave multiple streams in a file.`
			`The later is a protocol for compressing raw media streams into something smaller in size,`
			`suitable for saving to file, transmitting over a network connection or adding`
			to a `container` file.
			`Note, FFMpeg treats network protocols like HTTP, RTMP, RTSP as containers.`

			Dlib's dlib::ffmpeg::demuxer class reads `container` files like MP4, MP3 or RTSP streams,
			`extracts and decodes each stream.`

			`Dlib's dlib::ffmpeg::decoder class reads raw encoded DATA like H264 or PCM data`
			`and decodes it to images or audio frames.`
			`*/`

			`#include <cstdio>`
			`#include <dlib/media.h>`
			`#include <dlib/gui_widgets.h>`
			`#include <dlib/cmd_line_parser.h>`

			`using namespace std;`
			`using namespace dlib;`

			`int main(const int argc, const char** argv)`
			`try`
			`{`
			`command_line_parser parser;`
			`parser.add_option("i", "input video encoded stream. e.g. dlib/test/ffmpeg_data/MOT20-08-raw.h264", 1);`
			`parser.add_option("codec", "codec name. e.g. h264", 1);`
			`parser.set_group_name("Help Options");`
			`parser.add_option("h", "alias of --help");`
			`parser.add_option("help", "display this message and exit");`

			`parser.parse(argc, argv);`
			`const char* one_time_opts[] = {"i"};`
			`parser.check_one_time_options(one_time_opts);`

			`if (parser.option("h") \|\| parser.option("help"))`
			`{`
			`parser.print_options();`
			`return 0;`
			`}`

			`const std::string filepath = get_option(parser, "i", "");`
Fix for #2729 (#2731) * fixes #2729 * don't commit vscode stuff * Update ffmpeg_utils.h typo --------- Co-authored-by: pf <pf@me> 2023-02-21 09:01:13 +08:00			`const std::string codec = get_option(parser, "codec", "h264");`
FFMPEG wrappers: dlib::ffmpeg::decoder and dlib::ffmpeg::demuxer (#2707) * - added ffmpeg stuff to cmake * - added observer_ptr * ffmpeg utils * WIP * - added ffmpeg_decoder * config file for test data * another test file * install ffmpeg * added ffmpeg_demuxer * install all ffmpeg libraries * support older version of ffmpeg * simplified loop * - test converting to dlib object - added docs - support older ffmpeg * added convert() overload * added comment * only register stuff when API not deprecated * - fixed version issues - fixed decoding * added tests for ffmpeg_demuxer * removed unused code * test GIF * added docs * added audio test * test for audio * more tests * review changes * don't need observer_ptr * made deps public. I could be wrong but just in case. * - added some static asserts. Some areas of the code might do memcpy's on arrays of pixels. This requires the structures to be packed. Check this. - added convert() functions - changed default decoder options. By default, always decode to RGB and S16 audio - added convenience constructor to demuxer * - no longer need opencv * oops. I let that slip * - made a few functions public - more precise requires clauses * enhanced example * - avoid FFMPEG_INITIALIZED being optimized away at link time - added decoding example * - avoid -Wunused-parameter error * constexpr and noexcept correctness. This probably makes no difference to performance, BUT, it's what the core guidelines tell you to do. It does however demonstrate how complicated and unecessarily verbose C++ is becoming. Sigh, maybe one day i'll make the switch to something that doesn't make my eyes twitch. * - simplified metadata structure * hopefully more educational * added another example * ditto * typo * screen grab example * whoops * avoid -Wunused-parameter errors * ditto * - added methods to av_dict - print the demuxer format options that were not used - enhanced webcam_face_pose_ex.cpp so you can set webcam options * if height and width are specified, attempt to set video_size in format_options. Otherwise set the bilinear resizer. * updated docs * once again, the ffmpeg APIs do a lot for you. It's a matter of knowing which APIs to call. * made header-only * - some Werror thing * don't use type_safe_union * - templated sample type - reverted deep copy of AVFrame for frame copy constructor * - added is_pixel_type and is_pixel_check * unit tests for pixel traits * enhanced is_image_type type trait and added is_image_check * added unit tests for is_image_type * added pix_traits, improved convert() functions * bug fix * get rid of -Werror=unused-variable error * added a type alias * that's the last of the manual memcpys gone. We'using ffmpeg API everywhere now for copying frames to buffers and back * missing doc * set framerate for webcam * list input devices * oops. I was trying to make ffmpeg 5 happy but i've given up on ffmpeg v5 compatibility in this PR. Future PR. * enhanced the information provided by list_input_devices and list_output_devices * removed vscode settings.json file * - added a type trait for checking whether a type is complete. This is useful for writing type traits that check other types have type trait specializations. But also other useful things. For example, std::unique_ptr uses something similar to this. * Davis was keen to simply check pixel_traits is specialised. That's equivalent to checking pixel_traits<> is complete for some type * code review * juse use the void_t in dlib/type_traits.h * one liners * just need is_image_check * more tests for is_image_type * i think this is correct * removed printf * better docs * Keep opencv out of it * keep old face pose example, then add new one which uses dlib's ffmpeg wrappers * revert * revert * better docs * better docs --------- Co-authored-by: pf <pf@me> 2023-01-30 09:17:34 +08:00
			`image_window win;`

			`ffmpeg::decoder::args args;`
			`args.args_codec.codec_name = codec;`

			`ffmpeg::decoder dec(args);`
			`if (!dec.is_open())`
			`{`
			`printf("Failed to create decoder.\n");`
			`return EXIT_FAILURE;`
			`}`

			`dlib::ffmpeg::frame frame;`
			`array2d<rgb_pixel> img;`
			`ffmpeg::decoder_status status{ffmpeg::DECODER_EAGAIN};`

			`const auto pull = [&]`
			`{`
			`while ((status = dec.read(frame)) == ffmpeg::DECODER_FRAME_AVAILABLE)`
			`{`
			`if (frame.is_image() && frame.pixfmt() == AV_PIX_FMT_RGB24)`
			`{`
			`convert(frame, img);`
			`win.set_image(img);`
			`}`
			`}`
			`};`

			`ifstream fin{filepath, std::ios::binary};`
			`std::vector<char> buf(1024);`

			`while (fin && status != ffmpeg::DECODER_CLOSED)`
			`{`
			`fin.read(buf.data(), buf.size());`
			`size_t ret = fin.gcount();`
			`dec.push_encoded((const uint8_t*)buf.data(), ret);`
			`pull();`
			`}`

			`dec.flush();`
			`pull();`

			`return EXIT_SUCCESS;`
			`}`
			`catch (const std::exception& e)`
			`{`
			`printf("%s\n", e.what());`
			`return EXIT_FAILURE;`
			`}`