Commit ae5853e2 authored by Stefan Westerfeld's avatar Stefan Westerfeld

Merge branch 'streaming'

parents 5c6b6c71 d16e67c1
Overview of Changes in audiowmark-0.2.0:
* support input/output streams
* support raw streams
* some performance optimizations
* unified logging and --quiet option
* improved mp3 detection to avoid false positives
* split up watermarking source (wmadd/wmget/wmcommon)
Overview of Changes in audiowmark-0.1.0:
* initial release
......@@ -150,9 +150,116 @@ watermark. Fractional strengths (like 7.5) are possible.
audiowmark add --strength 15 in.wav out.wav 0123456789abcdef0011223344556677
audiowmark get --strength 15 out.wav
== Output as Stream
Usually, an input file is read, watermarked and an output file is written.
This means that it takes some time before the watermarked file can be used.
An alternative is to output the watermarked file as stream to stdout. One use
case is sending the watermarked file to a user via network while the
watermarker is still working on the rest of the file. Here is an example how to
watermark a wav file to stdout:
audiowmark add in.wav - 0123456789abcdef0011223344556677 | play -
In this case the file in.wav is read, watermarked, and the output is sent
to stdout. The "play -" can start playing the watermarked stream while the
rest of the file is being watermarked.
If - is used as output, the output is a valid .wav file, so the programs
running after `audiowmark` will be able to determine sample rate, number of
channels, bit depth, encoding and so on from the wav header.
Note that all input formats supported by audiowmark can be used in this way,
for instance flac/mp3:
audiowmark add in.flac - 0123456789abcdef0011223344556677 | play -
audiowmark add in.mp3 - 0123456789abcdef0011223344556677 | play -
== Input from Stream
Similar to the output, the `audiowmark` input can be a stream. In this case,
the input must be a valid .wav file. The watermarker will be able to
start watermarking the input stream before all data is available. An
example would be:
cat in.wav | audiowmark add - out.wav 0123456789abcdef0011223344556677
It is possible to do both, input from stream and output as stream.
cat in.wav | audiowmark add - - 0123456789abcdef0011223344556677 | play -
Streaming input is also supported for watermark detection.
cat in.wav | audiowmark get -
== Raw Streams
So far, all streams described here are essentially wav streams, which means
that the wav header allows `audiowmark` to determine sample rate, number of
channels, bit depth, encoding and so forth from the stream itself, and the a
wav header is written for the program after `audiowmark`, so that this can
figure out the parameters of the stream.
There are two cases where this is problematic. The first case is if the full
length of the stream is not known at the time processing starts. Then a wav
header cannot be used, as the wav file contains the length of the stream. The
second case is that the program before or after `audiowmark` doesn't support wav
headers.
For these two cases, raw streams are available. The idea is to set all
information that is needed like sample rate, number of channels,... manually.
Then, headerless data can be processed from stdin and/or sent to stdout.
--input-format raw::
--output-format raw::
--format raw::
These can be used to set the input format or output format to raw. The
last version sets both, input and output format to raw.
--raw-rate <rate>::
This should be used to set the sample rate. The input sample rate and
the output sample rate will always be the same (no resampling is
done by the watermarker). There is no default for the sampling rate,
so this parameter must always be specified for raw streams.
--raw-input-bits <bits>::
--raw-output-bits <bits>::
--raw-bits <bits>::
The options can be used to set the input number of bits, the output number
of bits or both. The number of bits can either be `16` or `24`. The default
number of bits is `16`.
--raw-input-endian <endian>::
--raw-output-endian <endian>::
--raw-endian <endian>::
These options can be used to set the input/output endianness or both.
The <endian> parameter can either be `little` or `big`. The default
endianness is `little`.
--raw-input-encoding <encoding>::
--raw-output-encoding <encoding>::
--raw-encoding <encoding>::
These options can be used to set the input/output encoding or both.
The <encoding> parameter can either be `signed` or `unsigned`. The
default encoding is `signed`.
--raw-channels <channels>::
This can be used to set the number of channels. Note that the number
of input channels and the number of output channels must always be the
same. The watermarker has been designed and tested for stereo files,
so the number of channels should really be `2`. This is also the
default.
== Dependencies
If you compile from source, audiowmark needs the following libraries:
If you compile from source, `audiowmark` needs the following libraries:
* libfftw3
* libsndfile
......@@ -162,7 +269,7 @@ If you compile from source, audiowmark needs the following libraries:
== Building fftw
audiowmark needs the single prevision variant of fftw3.
`audiowmark` needs the single prevision variant of fftw3.
If you are building fftw3 from source, use the `--enable-float`
configure parameter to build it, e.g.::
......@@ -181,7 +288,7 @@ or, when building from git
== Docker Build
You should be able to execute audiowmark via Docker.
You should be able to execute `audiowmark` via Docker.
Example that outputs the usage message:
docker build -t audiowmark .
......
AC_INIT([audiowmark], [0.1.0])
AC_INIT([audiowmark], [0.2.0])
AC_CONFIG_SRCDIR([src/audiowmark.cc])
AC_CONFIG_AUX_DIR([build-aux])
AC_CONFIG_MACRO_DIR([m4])
......
......@@ -6,3 +6,4 @@ audiowmark
testconvcode
testmp3
testrandom
teststream
bin_PROGRAMS = audiowmark
COMMON_SRC = utils.hh utils.cc convcode.hh convcode.cc random.hh random.cc mp3.cc mp3.hh wavdata.cc wavdata.hh
COMMON_SRC = utils.hh utils.cc convcode.hh convcode.cc random.hh random.cc wavdata.cc wavdata.hh \
audiostream.cc audiostream.hh sfinputstream.cc sfinputstream.hh stdoutwavoutputstream.cc stdoutwavoutputstream.hh \
sfoutputstream.cc sfoutputstream.hh rawinputstream.cc rawinputstream.hh rawoutputstream.cc rawoutputstream.hh \
rawconverter.cc rawconverter.hh mp3inputstream.cc mp3inputstream.hh wmcommon.cc wmcommon.hh fft.cc fft.hh
COMMON_LIBS = $(SNDFILE_LIBS) $(FFTW_LIBS) $(LIBGCRYPT_LIBS) $(LIBMPG123_LIBS)
audiowmark_SOURCES = audiowmark.cc fft.cc fft.hh $(COMMON_SRC)
audiowmark_SOURCES = audiowmark.cc wmget.cc wmadd.cc $(COMMON_SRC)
audiowmark_LDFLAGS = $(COMMON_LIBS)
noinst_PROGRAMS = testconvcode testrandom testmp3
noinst_PROGRAMS = testconvcode testrandom testmp3 teststream
testconvcode_SOURCES = testconvcode.cc $(COMMON_SRC)
testconvcode_LDFLAGS = $(COMMON_LIBS)
......@@ -16,3 +19,6 @@ testrandom_LDFLAGS = $(COMMON_LIBS)
testmp3_SOURCES = testmp3.cc $(COMMON_SRC)
testmp3_LDFLAGS = $(COMMON_LIBS)
teststream_SOURCES = teststream.cc $(COMMON_SRC)
teststream_LDFLAGS = $(COMMON_LIBS)
possible improvements:
- dynamic bit strength
- mp3 support with libmad
streaming:
- final documentation
- merge mp3 code
- fix virtual constructor FIXMEs
#include "audiostream.hh"
#include "wmcommon.hh"
#include "sfinputstream.hh"
#include "sfoutputstream.hh"
#include "mp3inputstream.hh"
#include "rawconverter.hh"
#include "rawoutputstream.hh"
#include "stdoutwavoutputstream.hh"
using std::string;
AudioStream::~AudioStream()
{
}
std::unique_ptr<AudioInputStream>
AudioInputStream::create (const string& filename, Error& err)
{
std::unique_ptr<AudioInputStream> in_stream;
if (Params::input_format == Format::AUTO)
{
SFInputStream *sistream = new SFInputStream();
in_stream.reset (sistream);
err = sistream->open (filename);
if (err && MP3InputStream::detect (filename))
{
MP3InputStream *mistream = new MP3InputStream();
in_stream.reset (mistream);
err = mistream->open (filename);
if (err)
return nullptr;
}
else if (err)
return nullptr;
}
else
{
RawInputStream *ristream = new RawInputStream();
in_stream.reset (ristream);
err = ristream->open (filename, Params::raw_input_format);
if (err)
return nullptr;
}
return in_stream;
}
std::unique_ptr<AudioOutputStream>
AudioOutputStream::create (const string& filename, int n_channels, int sample_rate, int bit_depth, size_t n_frames, Error& err)
{
std::unique_ptr<AudioOutputStream> out_stream;
if (Params::output_format == Format::RAW)
{
RawOutputStream *rostream = new RawOutputStream();
out_stream.reset (rostream);
err = rostream->open (filename, Params::raw_output_format);
if (err)
return nullptr;
}
else if (filename == "-")
{
StdoutWavOutputStream *swstream = new StdoutWavOutputStream();
out_stream.reset (swstream);
err = swstream->open (n_channels, sample_rate, bit_depth, n_frames);
if (err)
return nullptr;
}
else
{
SFOutputStream *sfostream = new SFOutputStream();
out_stream.reset (sfostream);
err = sfostream->open (filename, n_channels, sample_rate, bit_depth, n_frames);
if (err)
return nullptr;
}
return out_stream;
}
#ifndef AUDIOWMARK_AUDIO_STREAM_HH
#define AUDIOWMARK_AUDIO_STREAM_HH
#include <vector>
#include <memory>
#include "utils.hh"
class AudioStream
{
public:
virtual int bit_depth() const = 0;
virtual int sample_rate() const = 0;
virtual int n_channels() const = 0;
virtual ~AudioStream();
};
class AudioInputStream : public AudioStream
{
public:
static std::unique_ptr<AudioInputStream> create (const std::string& filename, Error& err);
// for streams that do not know the number of frames in advance (i.e. raw input stream)
static constexpr size_t N_FRAMES_UNKNOWN = ~size_t (0);
virtual size_t n_frames() const = 0;
virtual Error read_frames (std::vector<float>& samples, size_t count) = 0;
};
class AudioOutputStream : public AudioStream
{
public:
static std::unique_ptr<AudioOutputStream> create (const std::string& filename,
int n_channels, int sample_rate, int bit_depth, size_t n_frames, Error& err);
virtual Error write_frames (const std::vector<float>& frames) = 0;
virtual Error close() = 0;
};
#endif /* AUDIOWMARK_AUDIO_STREAM_HH */
This diff is collapsed.
#ifndef AUDIOWMARK_MP3_HH
#define AUDIOWMARK_MP3_HH
#include <string>
#include "wavdata.hh"
bool mp3_detect (const std::string& filename);
std::string mp3_load (const std::string& filename, WavData& wav_data);
#endif /* AUDIOWMARK_MP3_HH */
#include "mp3.hh"
#include "mp3inputstream.hh"
#include <mpg123.h>
#include <assert.h>
#include <stdio.h>
#include <vector>
using std::vector;
using std::min;
using std::string;
struct ScopedMHandle
{
mpg123_handle *mh = nullptr;
bool need_close = false;
~ScopedMHandle()
{
if (mh && need_close)
mpg123_close (mh);
if (mh)
mpg123_delete (mh);
}
};
void
static void
mp3_init()
{
static bool mpg123_init_ok = false;
......@@ -32,152 +15,238 @@ mp3_init()
int err = mpg123_init();
if (err != MPG123_OK)
{
fprintf (stderr, "audiowmark: init mpg123 lib failed\n");
error ("audiowmark: init mpg123 lib failed\n");
exit (1);
}
mpg123_init_ok = true;
}
}
/* there is no really simple way of detecting if something is an mp3
*
* so we try to decode a few frames; if that works without error the
* file is probably a valid mp3
*/
bool
mp3_detect (const string& filename)
MP3InputStream::~MP3InputStream()
{
close();
}
Error
MP3InputStream::open (const string& filename)
{
int err = 0;
mp3_init();
mpg123_handle *mh = mpg123_new (NULL, &err);
m_handle = mpg123_new (nullptr, &err);
if (err != MPG123_OK)
return false;
return Error ("mpg123_new failed");
auto smh = ScopedMHandle { mh }; // cleanup on return
err = mpg123_param (m_handle, MPG123_ADD_FLAGS, MPG123_QUIET, 0);
if (err != MPG123_OK)
return Error ("setting quiet mode failed");
err = mpg123_param (mh, MPG123_ADD_FLAGS, MPG123_QUIET, 0);
// allow arbitary amount of data for resync */
err = mpg123_param (m_handle, MPG123_RESYNC_LIMIT, -1, 0);
if (err != MPG123_OK)
return false;
return Error ("setting resync limit parameter failed");
err = mpg123_open (mh, filename.c_str());
// force floating point output
{
const long *rates;
size_t rate_count;
mpg123_format_none (m_handle);
mpg123_rates (&rates, &rate_count);
for (size_t i = 0; i < rate_count; i++)
{
err = mpg123_format (m_handle, rates[i], MPG123_MONO|MPG123_STEREO, MPG123_ENC_FLOAT_32);
if (err != MPG123_OK)
return Error (mpg123_strerror (m_handle));
}
}
err = mpg123_open (m_handle, filename.c_str());
if (err != MPG123_OK)
return false;
return Error (mpg123_strerror (m_handle));
smh.need_close = true;
m_need_close = true;
/* scan headers to get best possible length estimate */
err = mpg123_scan (m_handle);
if (err != MPG123_OK)
return Error (mpg123_strerror (m_handle));
long rate;
int channels;
int encoding;
err = mpg123_getformat (mh, &rate, &channels, &encoding);
err = mpg123_getformat (m_handle, &rate, &channels, &encoding);
if (err != MPG123_OK)
return false;
return Error (mpg123_strerror (m_handle));
size_t buffer_bytes = mpg123_outblock (mh);
unsigned char buffer[buffer_bytes];
/* ensure that the format will not change */
mpg123_format_none (m_handle);
mpg123_format (m_handle, rate, channels, encoding);
m_n_values = mpg123_length (m_handle) * channels;
m_n_channels = channels;
m_sample_rate = rate;
m_frames_left = m_n_values / m_n_channels;
for (size_t i = 0; i < 10; i++)
return Error::Code::NONE;
}
Error
MP3InputStream::read_frames (std::vector<float>& samples, size_t count)
{
while (!m_eof && m_read_buffer.size() < count * m_n_channels)
{
size_t buffer_bytes = mpg123_outblock (m_handle);
assert (buffer_bytes % sizeof (float) == 0);
std::vector<float> buffer (buffer_bytes / sizeof (float));
size_t done;
err = mpg123_read (mh, buffer, buffer_bytes, &done);
if (err == MPG123_DONE)
int err = mpg123_read (m_handle, reinterpret_cast<unsigned char *> (&buffer[0]), buffer_bytes, &done);
if (err == MPG123_OK)
{
return true;
const size_t n_values = done / sizeof (float);
m_read_buffer.insert (m_read_buffer.end(), buffer.begin(), buffer.begin() + n_values);
}
else if (err != MPG123_OK)
else if (err == MPG123_DONE)
{
return false;
m_eof = true;
}
else if (err == MPG123_NEED_MORE)
{
// some mp3s have this error before reaching eof -> harmless
m_eof = true;
}
else
{
return Error (mpg123_strerror (m_handle));
}
}
return true;
/* pad zero samples at end if necessary to match the number of frames we promised to deliver */
if (m_eof && m_read_buffer.size() < m_frames_left * m_n_channels)
m_read_buffer.resize (m_frames_left * m_n_channels);
/* never read past the promised number of frames */
if (count > m_frames_left)
count = m_frames_left;
const auto begin = m_read_buffer.begin();
const auto end = begin + min (count * m_n_channels, m_read_buffer.size());
samples.assign (begin, end);
m_read_buffer.erase (begin, end);
m_frames_left -= count;
return Error::Code::NONE;
}
void
MP3InputStream::close()
{
if (m_state == State::OPEN)
{
if (m_handle && m_need_close)
mpg123_close (m_handle);
if (m_handle)
{
mpg123_delete (m_handle);
m_handle = nullptr;
}
m_state = State::CLOSED;
}
}
string
mp3_load (const string& filename, WavData& wav_data)
int
MP3InputStream::bit_depth() const
{
return 24; /* mp3 decoder is running on floats */
}
int
MP3InputStream::sample_rate() const
{
return m_sample_rate;
}
int
MP3InputStream::n_channels() const
{
return m_n_channels;
}
size_t
MP3InputStream::n_frames() const
{
return m_n_values / m_n_channels;
}
/* there is no really simple way of detecting if something is an mp3
*
* so we try to decode a few frames; if that works without error the
* file is probably a valid mp3
*/
bool
MP3InputStream::detect (const string& filename)
{
struct ScopedMHandle
{
mpg123_handle *mh = nullptr;
bool need_close = false;
~ScopedMHandle()
{
if (mh && need_close)
mpg123_close (mh);
if (mh)
mpg123_delete (mh);
}
};
int err = 0;
mp3_init();
mpg123_handle *mh = mpg123_new (NULL, &err);
if (err != MPG123_OK)
return "mpg123_new failed";
return false;
auto smh = ScopedMHandle { mh }; // cleanup on return
err = mpg123_param (mh, MPG123_ADD_FLAGS, MPG123_QUIET, 0);
if (err != MPG123_OK)
return "setting quiet mode failed";
// allow arbitary amount of data for resync */
err = mpg123_param (mh, MPG123_RESYNC_LIMIT, -1, 0);
if (err != MPG123_OK)
return "setting resync limit parameter failed";
// force floating point output
{
const long *rates;
size_t rate_count;
mpg123_format_none (mh);
mpg123_rates (&rates, &rate_count);
for (size_t i = 0; i < rate_count; i++)
{
err = mpg123_format (mh, rates[i], MPG123_MONO|MPG123_STEREO, MPG123_ENC_FLOAT_32);
if (err != MPG123_OK)
return mpg123_strerror (mh);
}
}
return false;
err = mpg123_open (mh, filename.c_str());
if (err != MPG123_OK)
return mpg123_strerror (mh);
return false;
smh.need_close = true;
long rate;
int channels;
int encoding;
err = mpg123_getformat (mh, &rate, &channels, &encoding);
if (err != MPG123_OK)
return mpg123_strerror (mh);
/* ensure that the format will not change */
mpg123_format_none (mh);
mpg123_format (mh, rate, channels, encoding);
return false;
size_t buffer_bytes = mpg123_outblock (mh);
assert (buffer_bytes % sizeof (float) == 0);
vector<float> buffer (buffer_bytes / sizeof (float));
vector<float> samples;
unsigned char buffer[buffer_bytes];
while (true)
for (size_t i = 0; i < 30; i++)
{
size_t done = 0;
err = mpg123_read (mh, reinterpret_cast<unsigned char *> (&buffer[0]), buffer_bytes, &done);
if (err == MPG123_OK)
{
const size_t n_values = done / sizeof (float);
samples.insert (samples.end(), buffer.begin(), buffer.begin() + n_values);
}
else if (err == MPG123_DONE)
{
wav_data = WavData (samples, channels, rate, 24);
return ""; /* success */
}
else if (err == MPG123_NEED_MORE)
size_t done;
err = mpg123_read (mh, buffer, buffer_bytes, &done);
if (err == MPG123_DONE)
{
// some mp3s have this error before reaching eof -> harmless
return true;
}
else
else if (err != MPG123_OK)
{
return mpg123_strerror (mh);
return false;
}
}
return true;
}
#ifndef AUDIOWMARK_MP3_INPUT_STREAM_HH
#define AUDIOWMARK_MP3_INPUT_STREAM_HH
#include <string>
#include <mpg123.h>
#include "audiostream.hh"
class MP3InputStream : public AudioInputStream
{
enum class State {
NEW,
OPEN,
CLOSED
};
int m_n_values = 0;
int m_n_channels = 0;
int m_sample_rate = 0;
size_t m_frames_left = 0;
bool m_need_close = false;
bool m_eof = false;
State m_state = State::NEW;
mpg123_handle *m_handle = nullptr;
std::vector<float> m_read_buffer;
public:
~MP3InputStream();
Error open (const std::string& filename);
Error read_frames (std::vector<float>& samples, size_t count);
void close();
int bit_depth() const override;
int sample_rate() const override;
int n_channels() const override;
size_t n_frames() const override;
static bool detect (const std::string& filename);
};
#endif /* AUDIOWMARK_MP3_INPUT_STREAM_HH */
......@@ -20,7 +20,7 @@ gcrypt_init()
/* version check: start libgcrypt initialization */
if (!gcry_check_version (GCRYPT_VERSION))
{
fprintf (stderr, "audiowmark: libgcrypt version mismatch\n");
error ("audiowmark: libgcrypt version mismatch\n");
exit (1);
}