This data structure maintains a sequence, supporting range modification (the range [l,r)
gets an elementwise update) and range queries (the monoid product of a subarray).
According to http://web.ntnu.edu.tw/~algo/Sequence2.html, the data structure originated from Baltic OI 2001: Mars Maps. It is commonly referred to as "segment tree" in competitive programming, even if segment tree really means a different storing intervals in research papers.
Each leaf stores an element in the sequence.
Representation
Explicit tree structure
Usually built in a top-down manner. Each node contains pointers to its left and right children. When the size is not a concern, the left and right boundaries can be stored into the node.
This representation is not recommended. However, since the top-down approach can place more than one elements into one node, some of its usage cannot (easily) be replaced by the iterative bottom-up approach.
1 | const int N = 1000000, P = 1000000; |
Implicit tree structure
Binary heap representation
This representation requires the length of the whole range is a power of 2, which can form a full binary tree. If the length is not a power of 2, we can add fake elements at the end.
Let the root node's index be 1. For any node i
, its left and right children are 2*i
and 2*i+1
respectively. This representation wastes the index 0. I prefer zero based numbering (Edsger W. Dijkstra: Why numbering should start at zero). However, the segment tree is an exception. The reason is that if you have the index of a leaf, you can subtract n
from it to get the original sequence index. While with one based numbering, you need to subtract n-1
, which is slightly inconvenient. In addition, getting the index of an ancestor node is simply a right shift operation while with zero based numbering you need extra tweaks. This especially matters for lazy propagation with the iterative bottom-up representation.
1 | 1:[0,8) |
1 | void build(int i, int l, int r) { |
l+r
This is an alternative representation for a full binary tree. We can compute the node index by adding its left and right boundaries.
1 | 8:[0,8) |
1 | void build(int l, int r) { |
This simple rule works because no two node share the same l+r
.
However, if the length is not a power of two, we may have a non-leaf node whose l+r
is odd, say, 0+5
. The leftmost node of its right subtree has the same index: (l+r-1)/2 + (l+r+1)/2
. Note that such a collision only happens with a leaf node and an internal node. We can therefore special case the index computation for leaf nodes.
1 | int id(int l, int r) { |
The space consumption is 3*n
. There are some unused indexes, though.
Bottom-up
To the best of my knowledge, it was independently discovered by 张昆玮 in 《统计的力量——线段树全接触》 and Dmitry Urbanovich in https://codeforces.com/blog/entry/1256. This representation is about open intervals, which are a bit inconvenient if you do need an element at index 0.
There is a great write-up https://codeforces.com/blog/entry/18051. I prefer this style because it uses half-open intervals:)
Note: if the length is a power of two, the node at index one (root) stores the monoid product of the whole sequence. Top-down traversals are straightforward.
Caveat: if the length is not a power of two, the node at index one does not aggregate the whole sequence. Instead, the monoid product of the whole sequence can be divided into the product of several subtrees with arbitrarily looking indexes.
For simplicity, pad the original sequence if you need top-down traversals.
Lazy propagation
This is needed if modification operations are not commutative. Say you have applied f
on a[0]
, g
on a[0,2)
, h
on a[0,4)
. The final value of a[0]
is h(g(f(a[0])))
. Note that the outer functions apply to larger ranges. If you want to apply an operation f'
on a smaller range (say, a[0]
), the final value should be f'(h(g(f(a[0]))))
.
If the operations commute, f'(h(g(f(a[0])))) = h(g(f'(f(a[0]))))
. We may pass through the ancestor ranges without applying the accumulated operations into subranges. The classicaly example is range add. We can store the accumulated addend in each internal node.
If the operations do not commute, we have to unfold the operations. For the outmost f
(we need to remember this operation), we apply it onto both subtrees.
Fenwick tree
If the binary operation supports subtraction, you can answer a [l,r)
range query with sum[0,r) - sum[0,l)
, and therefore switch to a Fenwick tree (binary indexed tree). Unlike a segment tree (2n-1
nodes), a Fenwick tree needs exactly n elements.