Homepage of Thomas Karl

Documents

open boundaries CV

Weierstrass-Durand-Kerner Iteration

General Remarks on GZIP Compression

High-Performance Computing with Xilinx Accelerator Cards

Machine Learning Cheatsheet; online: link

More Information

LinkedIn Profile

Xing Profile

GitHub

Software Projects \(\uparrow\)

Zero Points in the Complex Plane with OpenCL \(\uparrow\)

The program computes complex zero points of arbitrary complex polynomials numerically using Weierstraß Iteration. The procedure is accelerated on GPUs with OpenCL 3.0.
Source code at github.com.
Documentation

Setup

Inputs are the number of polynomials \(n\) and its degree \(d\). Let \(p > d\) be the number of threads. The number of polynomials \(n_p\) that can be computed concurrently is \(n_p = \lfloor p/d \rfloor\). If \(dn_p \neq p\), some cores remain idle.
Each core computes an initial guess and randomly a complex coefficient for the polynomial. Real and imaginary part lie uniformly distributed between two parameters, which are also required as user input. The zero points have to be complex and lie initially on a circle in the complex plane. The radius is the half distance between the parameters.
Each core improves its \(k\)-th zero point \(x_k\) using the iteration,

\[x_k^{i+1} = x_k^i + \frac{\text{polynomial}(x_k^i)}{\prod_{j=1,...,d;j\neq k}(x_k^i-\zeta_j)} \longrightarrow x_k \quad\text{for}\quad i\rightarrow\infty\]

All remaining zero points are denoted by \(\zeta_i\). A dynamically change of the \(\zeta_j\) in each iteration further accelerates the convergence. Therefore, dataraces can be ignored.
The iteration breaks after \(10d\) runs, or if for a user given \(\varepsilon > 0\) the equation

\[\max\left\{ \frac{|\text{polynomial}(x)|}{|x|} \right\}\leq \varepsilon\]

holds. This iteration is commonly known as Weierstraß-Durand-Kerner Iteration. The zero points are converted in coordinates in a \(1080 \times 1080\) pixel image. In the end, the image is generated on the CPU in ppm format. Figure 1 shows an example.

deg8Repm30M.jpg deg8Impm30M.jpg

left: \(n=30\times 10^6\), \(d=8\), real part of coefficients \(\pm\)1; right: \(n=30\times 10^6\), \(d=8\), imaginary part of coefficients \(\pm\)1

deg4Both60M.jpg deg12Repm40M.jpg

left: \(n=60\times 10^6\), \(d=4\), real part of coefficients \(\pm\)1; right: \(n=40\times 10^6\), \(d=12\), imaginary part of coefficients \(\pm\)1

Figure 1: Graphical representation of zero points in complex plane (\(1080 \times 1080\) pixel, \(\varepsilon=0.0001\)). The zero points are approximately distributed on a circle with radius one.

Evaluation

In order to judge the result the computation times were recorded an fitted linearly. Figure 2 shows the comparison of computation times on a CPU and a GPU. The GPU was a Nvidia Quadro P4000, the CPU an Intel i7 8700K 4.8GHz. Using 1280 cores the speedup with respect to the sequential program is roughly 70. The offset in time due to the data traffic (latency) reads about \(0.5\) seconds.

computation times

Figure 2: Computation times \(T\) for certain numbers of polynomials \(n\).

In the last step the dependency on the polynomial degree is evaluated (Fig. 3). When the degree is doubled, for the same number of polynomials twice the number of zero points are required in order to keep the overall number constant. The number of floating point operations and the iterations increase also with the degree. For degree 8 and 32 the computation time is almost the same for a larger number of polynomials.

computation times

Figure 3: Computation times \(t\) for certain number of zero points \(dn\).

C++ Library for Simulations in the Heisenberg Model \(\uparrow\)

This is a parallel C++ library for Simulations in the Heisenberg, XY, Potts and Ising model. It can also be used for spin glasses. For now, the project is incomplete and only meant for educational reasons. Debugging is in process.

Source code at github.com.
Documentation

ising

Visualization of the 2d Ising Model with Qt \(\uparrow\)

This application aims to visualize the two dimensional Ising model from above for different thermal settings. download ising

Parallel Algorithm for the 2d Ising Model in MPI \(\uparrow\)

This module implements a cluster algorithm for the two-dimensional quadratic Ising model with MPI. comparison

Figure 4: Comparison of computation times depending on the grid size \(s\) for different number of MPI processes.

processor layout

Figure 5: Visualization of the exchange of the boundary values (for 4 MPI processes).

Kanji Training with Qt \(\uparrow\)

This is a training tool for Japanese kanji in C++/Qt. Linux/Windows kanji

Text Encryption with Turning-Grilles \(\uparrow\)

See https://scienceblogs.de

Encryption

Message for encryption:
Insert grid dimension:

Decryption

Message for decryption:
Decryption key:

Parallel DBSCAN with CUDA \(\uparrow\)

This project aims to implement a hardware-accelerated version of G-DBSCAN as introduced in: https://www.sciencedirect.com Each data point is considered to be a cluster if at least a given number of points lie within a circle with radius \(\varepsilon\). Consider the following data set:

cluster

Figure 6: Four different clusters sampled randomly via gaussian mixture models. Each cluster consists of 250 two-dimensional points.

After tuning the parameters outliers can be detected and filtered:

outlier

Figure 7: Outliers (red) and clusters (gray)

The main advantages of the algorithm are:

  • does not require the number of clusters a priori
  • can find arbitrarily shaped clusters
  • is robust to outliers
  • requires just two parameters
  • is designed for use with databases that can accelerate region queries
Source code at github.com.
Documentation

Machine learning \(\uparrow\)

\(\nearrow\)

Electronic Projects \(\uparrow\)

Drone \(\uparrow\)

This project is a self-made racing quadrocopter.
Fotostrecke
bottom view

Bug \(\uparrow\)

This project is a self-made hexapod.
Fotostrecke
left:control board, right: Arduino Uno R3

Chess

Chess is one of my hobbies and I am also intersted in high performance computing. In 2017 Google Deepmind's Alpha Zero defeated the famous open-source chess engine Stockfish without a single loss. How could that happen? I collected some information about that topic.
  1. The original paper on Arxiv: https://arxiv.org/pdf/1712.01815.pdf

  2. Some exercises, which will exhaust even the best engine (in german): download

  3. Currently, the best heuristic engine available: https://stockfishchess.org/download/

  4. An open-source neural network based engine: https://lczero.org/play/download/ (can be trained by user, many prebuilt networks available)

  5. A how-to in compiling the Stockfish engine on Windows Systems: https://github.com/glinscott/fishtest/wiki/Building-stockfish-on-Windows

  6. The Syzygy endgame tablebase, supports up to seven pieces: https://lichess.org/blog/W3WeMyQAACQAdfAL/7-piece-syzygy-tablebases-are-complete

  7. Eigenmann Rapid Engine Test (ERET) as a test benchmark: www.glarean-verlag.ch

  8. Some information about Monte-Carlo chess (in german): https://www.ke.tu-darmstadt.de/lehre/archiv/ss08/challenge/Ausarbeitungen/Waechter.pdf

  9. Some information and exercises for Go, the motivation for Alpha Zero (in german): download

  10. Slides from some of my presentations in german (ask for the password): Part 1, Part 2, about ELO rating