ChatGPT2 Animator Adventure

I followed RaffK project to its completion and did much more feature development such as complete chat feature. I created a lot of Modern and screaming Future C++, it is sort of crazy. This ChatGPT2 animation experiment involved training chatgpt2 on scrolling Shakespeare 1000 times using SGD with a learn rate of 0.0002. After SGD has updated the weights/bias, I record a subsection of GPT2 Tensors to file. The file generated is about 2GB and contains 1000 frames of chatgpt2 data.

In another project, Float Space Animator, is able to view and convert Float Spaces in real-time interface. This was used to create three programs, an animation of std::mt19937, a to view of ChatGPT2 Tensor Space (weights/bias), and also, an animation ChatGPT2 by showing each saved ChatGpt2 frame in sequence at 7 frames per second.

The Animator project has a windows release on github. It can be easily changed to another OS because I use SDL and OpenGL which are portable. At some point, I plan to create an Apple release.

Future Animator work is going to be a project to view .fits files.

Now that the ChatGPT2 portion of my project seems to be complete, the next phase of NetworkLib, is to recreate my old work using my new abilities and in an open-source format. This will allow real-time animations of relatively small MLP, and I am curious of NVIDIA performance enhancements for larger MLP.

This will involve training MNIST and will animate MNIST at least.

Future ChatGPT2 work would include using the larger models and training from scratch. I am going to make an animation of the Activation Space Tensor, rather than the weights/bias, which is the current ChatGPT2 animation.

There are so many C++ concepts that are straight from the future, and they keep arriving at a rapid pace. One plan for the MLP animator is to develop mdspan in my Tensor object and perhaps change Tensor a lot.

One of the most interesting new features is std::views::zip and how it changes the expression of math.

This is a C/C++ expression of math:

for( int i=0; i< output.size(); ++i)
	output[i] += weights[i] * in;

There are two ways to write this in Modern C++, for two-parameters, you could use transform, or for n-many parameters, there is zip. Both of these ways express the math in a new way.

std::transform(output.begin(), output.end(), weights.begin(), output.begin(), [in](auto o, auto w) {
return o + w * in; 
});

With transform, the limit is two parameters, but with zip, there can be many, and the code is much cleaner, it is easy to cope with the Future in this case:

for (const auto& [o, w] : std::views::zip(output, weights))
	o += w * in;

It is hard to believe that mathematics gets more complicated than two parameters but when it does, it now looks pretty good.

This could be done with two transforms, a lot of i’s, or one zip.

for (const auto& [q, dq, k, dk] : std::views::zip(qh, dqh, kh, dkh)) {

	dq += o * k;
	dk += o * q;
}

This code is used in linear normalization:

float norm = 0;
for (const auto& [i, w, b, o] : std::views::zip(in, weight, bias, out)) {
	norm = (i - mean) * r_stdDev;
	o = norm * w + b;
}

The C/C++ loop has been left in the dust by ever advancing range and view based loops, i-loops look much worse and will soon be old code smell.

These zip loops are basically ranged based loop introduced in C++11.

for( const auto& i : in ) // a single range

for( const auto& [ a, b ] : std::views::zip(as, bs) // two ranges

The way ranges and views can be iterated over, can altered, using | operator.

for( const auto& i : in | std::views::reverse )

There are lots of cool | features and transforms for ranges and views and I expect this operator to become super common in Modern/Future C++, so it should be on your C++ radar if it isn’t.

for(const auto& text : document | std::views::slide(size) )

In this code, we iterate over document, by sliding across it with a size chunk. I considered this code when fetching the training data for ChatGPT2, but I decided the step size of 1 was unacceptable, and went with another method.

The loop-i is sometimes necessary, in this case there is iota_view

for( const auto& i : std::views::iota(0, 1000))

In this case, we see: i, begin index, and size. No equality expressions or increments here. We do future code like so for some reason:

for( const auto& i : std::views::iota(0, 1000) | std::views::reverse)

A normal C/C++ reverse loop is a good way to look like a C programmer from decades past, and that’s super old in 2025.

At this point there is a clear departure from code that is in the standard library and projects not based on the library. Projects not based on the library have lost access to ever advancing C++ programming expressions and constructs. Projects from the library are becoming so advanced and different, that soon C++ will be as different, from C/C++ is from C, and even much more different.

Share this:

Related

Leave a comment Cancel reply