This is the last post in the series. We have a fully functional **tuple** class, with a decent implementation of **tuple_cat** — but some reservations about its performance. In this post we’re going to improve it considerably, at the expense of making the implementation more difficult. The main problem with what we did in the previous post is that we are creating many temporary tuple objects. When concatenating *n* **tuple**s, we create *n-2* intermediate **tuple**s that hold partial results. But can it be avoided?

**UPDATE (Feb 24)**: I forgot to credit Stephan T. Lavavej (a.k.a. STL, also the maintainer of Microsoft’s STL) for this idea. He obviously deserves it, and I apologize. On Reddit he also points out that the Standard requires somewhat different behavior from **tuple_cat**.

The idea is very elegant and simple. The core of the concatenation is going to be just one line of code. But it’s an ingenious line of code. Let me show it to you first, so we can bask in its glory:

return make_tuple(get<Jx>(get<Ix>(forward<Tuples>(tuples)))...);

Now, let’s try to understand what’s going on here; the implementation is going to follow automatically once we figure out the basic idea. The **tuples** variable is a **tuple** of **tuple**s. **Jx** and **Ix** are parameter packs of **size_t**‘s, so they can be expanded with **get<I>**. We set them up in a very specific fashion. For example, suppose we are given three **tuple**s: **tuple<int, char>**, **tuple<float>**, and **tuple<char, bool, long>**. The **tuples** variable is then of type **tuple<tuple<int, char>, tuple<float>, tuple<char, bool, long>>**, and the parameter packs are: **Ix** = [0, 0, 1, 2, 2, 2]; **Jx** = [0, 1, 0, 0, 1, 2]. Now think about the expansion:

// Remember the type of 'tuples': // tuple< tuple<int,char> , tuple<float> , tuple<char,bool,long> > tuples; make_tuple( get<0>(get<0>(tuples)), get<1>(get<0>(tuples)), get<0>(get<1>(tuples)), get<0>(get<2>(tuples)), get<1>(get<2>(tuples)), get<2>(get<2>(tuples)) );

Do you see it? **get<0>(tuples)** returns the first **tuple**, and then we retrieve the first and second elements from it; **get<1>(tuples)** returns the second **tuple**, and then we retrieve its only element; and so on. If we carefully generate the parameter packs **Ix** and **Jx**, we can make the nested **get** invocation retrieve individual elements from the **tuple** of **tuple**s, indexing into them like a two-dimensional lattice. Specifically, given *n* **tuple**s of sizes *S(0), …, S(n-1)*, we need to generate the following sequences:

S(0) times S(1) times S(n-1) times Ix = [0, 0, ..., 0, 1, 1, ..., 1, ..., n-1, n-1, ..., n-1 ] Jx = [0, 1, ..., S(0)-1, 0, 1, ..., S(1)-1, ..., 0 , 1 , ..., S(n-1)-1]

So, assuming that we have a way of generating both sequences, we can implement **tuple_cat** very easily now. Here it goes:

template <typename... Tuples, size_t... Ix, size_t... Jx> auto cat2d(Tuples&& tuples, index_sequence<Ix...>, index_sequence<Jx...>) { return make_tuple(get<Jx>(get<Ix>(std::forward<Tuples>(tuples)))...); } template <typename... Tuples> auto tuple_cat(Tuples&&... tuples) { auto ixs = repeating_index_sequence< std::integral_constant<size_t, 0>, tuple_size<std::decay_t<Tuples>::type>::value... >{}; auto jxs = variable_index_sequence< tuple_size<std::decay_t<Tuples>::type>::value... >{}; return cat2d(forward_as_tuple(std::forward<Tuples>(tuples)...), ixs, jxs); }

As always, we use the **index_sequence** trick to force-feed the parameter packs into the **cat2d** method. Then, all it has to do is invoke the two-dimensional **get**. All that remains is implementing the **repeating_index_sequence** and **variable_index_sequence** helpers.

**Exercise 16**: Implement the **repeating_index_sequence** metafunction. It takes an integral constant 0 as a start value, and the sizes of all **tuple**s, and generates the sequence **Ix**. For example: **repeating_index_sequence<std::integral_constant<size_t, 0>, 2, 1, 3>::type** is **index_sequence<0, 0, 1, 2, 2, 2>**.

**Exercise 17**: Implement the **variable_index_sequence** metafunction. It takes the sizes of all **tuple**s and generates the sequence **Jx**. For example: **variable_index_sequence<2, 1, 3>::type** is **index_sequence<0, 1, 0, 0, 1, 2>**.

Here are the implementations — again, emphasis was put on clarity and not brevity. First, we start with a helper that concatenates index sequences:

template <typename...> struct cat_index_sequence; template <size_t... Indices> struct cat_index_sequence<index_sequence<Indices...>> : index_sequence<Indices...> { }; template <size_t... Indices1, size_t... Indices2> struct cat_index_sequence<index_sequence<Indices1...>, index_sequence<Indices2...>> : index_sequence<Indices1..., Indices2...> { }; template <size_t... Indices1, size_t... Indices2, typename... Sequences> struct cat_index_sequence< index_sequence<Indices1...>, index_sequence<Indices2...>, Sequences... > : cat_index_sequence< index_sequence<Indices1..., Indices2...>, cat_index_sequence<Sequences...> > { };

Next, a helper that makes a sequence that repeats the element **I**, **Length** times:

template <size_t I, size_t Length> struct make_repeated_sequence : cat_index_sequence< index_sequence<I>, typename make_repeated_sequence<I, Length - 1>::type > { }; template <size_t I> struct make_repeated_sequence<I, 1> : index_sequence<I> { };

Now we’re ready for **repeating_index_sequence**. The base case is when there is just one **Length**, in which case we generate a sequence with the element **I** and length **Length**. Otherwise, we concatenate that sequence with a sequence of **I+1** repeated for as many times as specified by the next element in **…Lengths**:

template <typename, size_t...> struct repeating_index_sequence; template <size_t I, size_t Length> struct repeating_index_sequence<std::integral_constant<size_t, I>, Length> : make_repeated_sequence<I, Length> { }; template <size_t I, size_t Length, size_t... TailLengths> struct repeating_index_sequence< std::integral_constant<size_t, I>, Length, TailLengths... > : cat_index_sequence< typename make_repeated_sequence<I, Length>::type, typename repeating_index_sequence< std::integral_constant<size_t, I + 1>, TailLengths... >::type > { };

And finally **variable_index_sequence**. The base case is simply **make_index_sequence**, and the recursive case is a concatenation of that base and the recursively produced sequence:

template <size_t...> struct variable_index_sequence; template <size_t Head> struct variable_index_sequence: make_index_sequence<Head> { }; template <size_t Head, size_t... Tail> struct variable_index_sequence<Head, Tail...> : cat_index_sequence< typename variable_index_sequence<Head>::type, typename variable_index_sequence<Tail...>::type > { };

If you fell asleep while we were generating these index sequences, that’s fine. It’s time to wake up though. We now have a complete implementation of **tuple_cat**. Here are some timing results, measured with clang 3.5. The simple **tuple**s consist only of primitives, whereas the complex **tuple**s have larger objects such as strings embedded in them. (Times are relative to simple tuple/simple cat, which is an arbitrary baseline of 100.)

simple cat two-dimensional cat simple tuples 100 71 complex tuples 193 88

I think it’s reasonable to say that our efforts paid off — the more complex two-dimensional approach is faster, and doesn’t become much slower if the **tuple**s are more complex.

This concludes our series on implementing **tuple**s. I hope you found it instructive and interesting. **tuple**s offer a variety of opportunities for metaprogramming, optimization, clever and original compile-time thinking, and emphasize the importance of attention to detail. If you liked this series, you might also like the Modern C++ course which I teach at Sela. It’s a 4-day course that covers many topics from modern C++, including advanced template metaprogramming. Oh, and if you followed along and feel confident that you can implement a **tuple** or extend it? We’re hiring! 🙂

*I am posting short links and updates on Twitter as well as on this blog. You can follow me: @goldshtn*