Author : Shams-Ul-Haq Syed
Advisor : Dr. Manton Matthews
Date : December 11, 2023
Time: 11 am – 1 pm
Place : Meeting room 2267 (Innovation Building)
The degree of hardware level parallelism offered by today’s GPU architecture makes it ideal for problem domains with massive inherent parallelism potential, fields such as computer vision, image processing, graph theory and graph computations. We have identified three problem areas for purpose of this research dissertation, under the umbrella of performance improvement by harnessing the power of GPUs for novel applications. The first area is concerned with k-vertex connectivity in graph theory, the second area deals performance evaluation using extended roofline models for GPU parallel applications and finally the third problem area is related to synthesis 3D printable objects from 2D images.
In this thesis we examined k-vertex connectivity in undirected graphs, its applications and measure the performance of GPU computations using the CUDA Toolkit. Matthews and Sumner in 1984 presented the conjecture that every 4-connected claw-free graph is Hamiltonian. In the initial paper  it was shown that every 3-connected claw-free graph on fewer than 20 vertices is Hamiltonian. Over the years there have been several papers establishing the result for connectivity higher than 4. So, all that remains is the case for 4 connected claw-free graphs conjectured by C. Thomassen . We present a new CUDA based parallel k-vertex connectivity test algorithm to determine the connectivity of any vi given claw-free graph. The parallel algorithm is several orders of magnitude faster when compared to the serial counterpart. It is a major step towards efficiently finding whether the conjecture holds for graphs with connectivity exactly equal to 4. Our parallel algorithm can also be applied to find the value of k (connectedness) for a given graph. It is validated using number of different types of graphs such as complete graphs, complete bipartite graphs, and chorded cycle graphs of sizes ranging from 𝐺𝑛,𝑘 , 20 ≤ 𝑛 ≤ 300.
For GPU architecture we proposed the unified cache aware roofline model which provides better insights by capturing details such as memory transfers between host and device. Unlike traditional roofline models that strictly focused on memory bandwidth and computation performance of either only CPU or GPU. Our model provides more holistic picture of application performance in a single view by capturing computations on CPUs and GPUs along with the data transfers from host to device including theoretical bandwidths of host and device memories.
Finally, a novel approach to synthesize 3D printable objects from a single input image source is presented. The algorithm employs probabilistic machine learning based framework along with multiple shapes and depth cues such as manifolds, contours, gradients, and prior knowledge about the shapes to generate a plausible 3D model from a single 2D image. Our algorithm intelligently combines isolated shapes into a single object while retaining their relative positions. It also considers minimal 3D printable area vii and strength while generating watertight mesh object. Consequently, the resultant 3D model is 3D printer compatible; and the actual 3D printed object is sturdy and less prone to breakage. In addition, our scalable algorithm runs in a quadratic time of the size of the image. The preliminary results have demonstrated several different 2D images turned into actual 3D printed objects that are sturdy and aesthetically pleasing.