Table of Contents
Cipher modes usage is usually split to three stages:
- Initialization. Implicit stage with creation of accumulator to be used.
- Accumulation. Performed one or more times. Calling update several times is equivalent to calling it once with all of the arguments concatenated.
- Finalization. Accumulated hash data is required to be finalized, padded and prepared to be retrieved by user.
Architecture Overview
Hashes library architecture consists of several parts listed below:
- Algorithms
- Stream Processors
- Constructions and Compressors
- Accumulators
- Value Processors
Algorithms
Implementation of a library is considered to be highly compliant with STL. So the crucial point is to have ciphers to be usable in the same way as STL algorithms do.
STL algorithms library mostly consists of generic iterator and since C++20 range-based algorithms over generic concept-compliant types. Great example is std::transform
algorithm:
Input values of type InputIterator
operate over any iterable range, no matter which particular type is supposed to be processed. While OutputIterator
provides a type-independent output place for the algorithm to put results no matter which particular range this OutputIterator
represents.
Since C++20 this algorithm got it analogous inside Ranges library as follows:
This particular modification takes no difference if InputRange
is a Container
or something else. The algorithm is generic just as data representation types are.
As much as such algorithms are implemented as generic ones, hash algorithms should follow that too:
Hash
is a policy type which represents the particular hash will be used. InputIterator
represents the input data coming to be hashed. OutputIterator
is exactly the same as it was in std::transform
algorithm - it handles all the output storage operations.
The most obvious difference between std::transform
is a representation of a policy defining the particular behaviour of an algorithm. std::transform
proposes to pass it as a reference to Functor
, which is also possible in case of Hash
policy used in function already pre-scheduled:
Algorithms are no more than an internal structures initializer wrapper. In this particular case algorithm would initialize stream processor fed with accumulator set with `hash` accumulator inside initialized with Hash
.
Stream Data Processing
Hashes are usually defined for processing Integral
value typed byte sequences of specific size packed in blocks (e.g. sha2 is defined for blocks of words which are actually plain n
-sized arrays of uint32_t
). Input data in the implementation proposed is supposed to be a various-length input stream, which length could be not even to block size.
This requires an introduction of stream processor specified with particular parameter set unique for each `Hash` type, which takes input data stream and gets it split to blocks filled with converted to appropriate size integers (words in the cryptography meaning, not machine words).
Example. Lets assume input data stream consists of 16 bytes as follows.
Lets assume the selected cipher to be used is Rijndael with 32 bit word size, 128 bit block size and 128 bit key size. This means input data stream needs to be converted to 32 bit words and merged to 128 bit blocks as follows:
Now with this a `Hash` instance of sha2 can be fed.
This mechanism is handled with stream_processor
template class specified for each particular cipher with parameters required. Hashes suppose only one type of stream processor exist - the one which split the data to blocks, converts them and passes to AccumulatorSet
reference as cipher input of format required. The rest of data not even to block size gets converted too and fed value by value to the same AccumulatorSet
reference.
Data Type Conversion
Since block cipher algorithms are usually defined for Integral
types or byte sequences of unique format for each cipher, encryption function being generic requirement should be handled with particular cipher-specific input data format converter.
For example Rijndael
cipher is defined over blocks of 32 bit words, which could be represented with uint32_t
. This means all the input data should be in some way converted to 4 byte sized Integral
type. In case of InputIterator
is defined over some range of Integral
value type, this is is handled with plain byte repack as shown in previous section. This is a case with both input stream and required data format are satisfy the same concept.
The more case with input data being presented by sequence of various type T
requires for the T
to has conversion operator operator Integral()
to the type required by particular BlockCipher
policy.
Example. Let us assume the following class is presented:
Now let us assume there exists an initialized and filled with random values
To feed the BlockCipher
with the data presented, it is required to convert A
to Integral
type which is only available if A
has conversion operator in some way as follows:
This part is handled internally with stream_processor
configured for each particular cipher.