Table of Contents
Hashes usage is usually split to three stages:
- Initialization. Implicit stage with creation of accumulator to be used.
- Accumulation. Performed one or more times. Calling update several times is equivalent to calling it once with all of the arguments concatenated.
- Finalization. Accumulated hash data is required to be finalized, padded and prepared to be retrieved by user.
Architecture Overview
Hashes library architecture consists of several parts listed below:
- Algorithms
- Stream Processors
- Hash Policies
- Constructions and Compressors
- Accumulators
- Value Processors
Algorithms
Implementation of a library is considered to be highly compliant with STL. So the crucial point is to have hashes to be usable in the same way as STL algorithms do.
STL algorithms library mostly consists of generic iterator and since C++20 range-based algorithms over generic concept-compliant types. Great example is std::transform
algorithm:
Input values of type InputIterator
operate over any iterable range, no matter which particular type is supposed to be processed. While OutputIterator
provides a type-independent output place for the algorithm to put results no matter which particular range this OutputIterator
represents.
Since C++20 this algorithm got it analogous inside Ranges library as follows:
This particular modification takes no difference if InputRange
is a Container
or something else. The algorithm is generic just as data representation types are.
As much as such algorithms are implemented as generic ones, hash algorithms should follow that too:
Hash
is a policy type which represents the particular hash will be used. InputIterator
represents the input data coming to be hashed. OutputIterator
is exactly the same as it was in std::transform
algorithm - it handles all the output storage operations.
The most obvious difference between std::transform
is a representation of a policy defining the particular behaviour of an algorithm. std::transform
proposes to pass it as a reference to Functor
, which is also possible in case of Hash
policy used in function already pre-scheduled:
Algorithms are no more than an internal structures initializer wrapper. In this particular case algorithm would initialize stream processor fed with accumulator set with `hash` accumulator inside initialized with Hash
.
Stream Data Processing
Hashes are usually defined for processing Integral
value typed byte sequences of specific size packed in blocks ( e.g. `SHA2` is defined for blocks of words which are actually plain n
-sized arrays of uint32_t
). Input data in the implementation proposed is supposed to be a various-length input stream, which length could be not even to block size.
This requires an introduction of stream processor specified with particular parameter set unique for each `Hash` type, which takes input data stream and gets it split to blocks filled with converted to appropriate size integers (words in the cryptography meaning, not machine words).
Example. Lets assume input data stream consists of 16 bytes as follows.
Lets assume the selected hash to be used is SHA2 with 32 bit word size and 512 bit block size. This means input data stream needs to be converted to 32 bit words and merged to 512 bit blocks as follows:
Now with this a `Hash` instance of `SHA2` can be fed.
This mechanism is handled with stream_processor
template class specified for each particular hash with parameters required. Hashes suppose only one type of stream processor exist - the one which split the data to blocks, converts them and passes to AccumulatorSet
reference as hash input of format required. The rest of data not even to block size gets converted too and fed value by value to the same AccumulatorSet
reference.
Data Type Conversion
Since hash algorithms are usually defined for Integral
types or byte sequences of unique format for each hash, its function being generic requirement should be handled with particular hash-specific input data format converter.
For example SHA2
hash is defined over blocks of 32 bit words, which could be represented with uint32_t
. This means all the input data should be in some way converted to 4 byte sized Integral
type. In case of InputIterator
is defined over some range of Integral
value type, this is is handled with plain byte repack as shown in previous section. This is a case with both input stream and required data format are satisfy the same concept.
The more case with input data being presented by sequence of various type T
requires for the T
to has conversion operator operator Integral()
to the type required by particular Hash
policy.
Example. Let us assume the following class is presented:
Now let us assume there exists an initialized and filled with random values SequenceContainer
of value type A
:
To feed the Hash
with the data presented, it is required to convert A
to Integral
type which is only available if A
has conversion operator in some way as follows:
This part is handled internally with stream_processor
configured for each particular hash.
Hash Policies
Hash policies architecturally are completely stateless. Hash policies are required to be compliant with `Hash` concept. Thus, a policy has to contain all the data corresponding to the Hash
and defined in the `Hash` concept.
Among other things a hash policy should contain information about its compressor and construction. For example, for SHA2
there are Merkle-Damgard construction and Davies-Meyer compressor.
Constructions and Compressors
Constructions and Compressors used by a Hash
should be defined in its `policy`. Construction defines how the message should be padded, if its size is not multiple of block_bits
and how the hashed messaged should be finalized. Construction also calls the compressor inside itself while processing a message block.
For example, `sha2` hashes has Merkle-Damgard construction with default Merkle-Damgard padding and Davies-Meyer compressor. It means each message block is being processed by Davies-Meyer compressor. The last message block is being padded before processing using Merkle-Damgard padding. After processing last block message is being finalizing by appending length of the input message to the end of result.
Accumulators
The Hashing contains an accumulation step, which is implemented with Boost.Accumulators library.
All the concepts are held.
Hash contain pre-defined `block::accumulator_set`, which is a boost::accumulator_set
with pre-filled `hash` accumulator.
Hash accumulator accepts only one either block_type::value_type
or block_type
at insert.
Accumulator is implemented as a caching one. This means there is an input cache sized as same as particular Hash::block_type
, which accumulates unprocessed data. After it gets filled
, data gets hashed, then it gets moved to the main accumulator storage, then cache gets emptied.
`hash` accumulator internally uses `bit_count` accumulator and designed to be combined with other accumulators available for Boost.Accumulators.
Example. Let's assume there is an accumulator set, which intention is to encrypt all the incoming data with `rijndael<128, 128>` cipher and to compute a `sha2<256>` hashes of all the incoming data as well.
This means there will be an accumulator set defined as follows:
Extraction is supposed to be defined as follows:
Value Postprocessors
Since the accumulator output type is strictly tied to `digest_type` of particular `Hash` policy, the output format in generic is closely tied to digest type too. Digest type is usually defined as fixed or variable length byte array, which is not always the format of container or range user likes to store output in. It could easily be a std::vector<uint32_t>
or a std::string
, so there is a `hash_value` state holder which is made to be implicitly convertible to various container and range types with internal data repacking implemented.
Such a state holder is split to a couple of types:
- Value holder. Intended to have an internal output data storage. Actually stores the
AccumulatorSet
with digest data. - Reference holder. Intended to store a reference to external
AccumulatorSet
, which is usable in case of data gets appended to existing accumulator.