Step 1, the theory

Voronoi Diagram is a classic data structure in computational geometry with a lot of applications in many fields. Given a set of sites, it consists in partitioning the domain into cells C(s) containing all points closer to s than to any other site. If the domain and sites are subsets of Z^2, we obtain a Discrete Voronoi Diagram.

In this context, the Jump Flooding Algorithm is a fast technique to approximate the Discrete Voronoi Diagram proposed by Guodong Rong and Tiow-Seng Tan in 2006. The main idea of this technique is to consider jumps with step k: for each point p of the domain, we collect information (i.e. the closest site) from points P + ({ -k, 0, k},{ -k, 0, k}) to decide the value attached to p.

If V(p) maps the point p to its closest site, the internal update function can be sketched as follows: we read the Voronoi labeling already computed at points

 {V(p), V(p+(0,k)),V(p+(0,-k)),V(p+(k,0)),V(p+(-k,0)), 
  V(p+(-k,k)),V(p+(k,-k)),V(p+(k,k)),V(p+(-k,-k)) }

and we update the site V(p) with the closest one to p from the previous set.

To obtain a fast approximation, Rong et al. suggest to iterate on jumps with k={n/2, n/4, …, 1} if n is the image with (plus a first jump with k=1 to obtain the JFA-1).

Step 2 GPU Implementation

On GPU, the parallel algorithm is trivial an can be sketched as follows:

    k = n/2
    while (k>=1)
      for each point p in parallel
          Collect the 9 sites associated to the jump with step k
          Store the closest site at p in V(p)
      synchronization
      k = k/2

The aim of this tutorial is to implement JFA with Thrust. The image domain will be represented as a 1D thrust::device_vector<int> (coordinate mapping (x,y)→ (y*m+x) for a m*n image).

Let us start with a simple internal function to decid the closest site between p and q at (x_i,y_i).

  __host__ __device__
      int minVoro(int x_i, int y_i, int p, int q)
      {    
          if (q == m*n)
              return p;
 
          // coordinates of points p and q
          int y_q =  q / m;
          int x_q =  q % m;
          int y_p =  p / m;
          int x_p =  p % m;
 
          // squared distances
          int d_iq = (x_i-x_q) * (x_i-x_q) + (y_i-y_q) * (y_i-y_q);
          int d_ip = (x_i-x_p) * (x_i-x_p) + (y_i-y_p) * (y_i-y_p);
 
          if (d_iq < d_ip)
              return q;  // q is closer
          else
              return p;
      }

In thrust, we will a thrust::tranform function to apply to each point of the image a given Functor. The main idea is to collect the sites at distance k into a tuple. The operator() in this Functor implements the main atomic process (the 9 comparisons).

Be careful : this code snippet will produce a runtime error ! See below for a workaround.

// minFunctor
// Tuple  = <seeds,seeds + k,seeds + m*k, seeds - k, 
//           seeds - m*k, seeds+ k+m*k,seeds + k-m*k,
//           seeds- k+m*k,seeds - k+m*k, i>
struct minFunctor
{
  int k,m,n;
 
  __host__ __device__
    minFunctor(int _m,int _n,int _k) : n(_n),m(_m),k(_k) {}
 
 
  //To decide I have to change my current Voronoi site
  __host__ __device__
      int minVoro(int x_i, int y_i, int p, int q)
      {    
         /*
            code presented above
         */
      }
 
  //For each point p+{-k,0,k}, we keep the Site with minimum distance
  template <typename Tuple>
  __host__ __device__
  int operator()(const Tuple &t)
  {
      //Current point and site
      int i = thrust::get<9>(t);
      int v = thrust::get<0>(t);
 
      //Current point coordinates
      int y = i / m;    
      int x = i % m;
 
      v = minVoro(x, y, v, thrust::get<3>(t));
      v = minVoro(x, y, v, thrust::get<8>(t));
      v = minVoro(x, y, v, thrust::get<7>(t));
      v = minVoro(x, y, v, thrust::get<1>(t));
      v = minVoro(x, y, v, thrust::get<6>(t));
      v = minVoro(x, y, v, thrust::get<5>(t));
      v = minVoro(x, y, v, thrust::get<4>(t));
      v = minVoro(x, y, v, thrust::get<2>(t));
 
      //global return
      return v;
  }
};

To apply this Functor with a single kernel call thrust::transform, we will use a nice trick proposed by Nathan Bell (Thrust developer): if you have a device_vector<int> tab, iterators tab.begin(), tab.begin()+k and tab.begin()-k exist. Thus we can define the function:

// Perform a jump with step k
void jfa(thrust::device_vector<int>& in,thrust::device_vector<int>& out, uint k, int m, int n)
{
   thrust::transform(
        thrust::make_zip_iterator(
            thrust::make_tuple(in.begin(), 
                               in.begin() + k, 
                               in.begin() + m*k, 
                               in.begin() - k, 
                               in.begin() - m*k, 
                               in.begin() + k+m*k,
                               in.begin() + k-m*k,
                               in.begin() - k+m*k,
                               in.begin() - k-m*k,
                               thrust::counting_iterator<int>(0))),
        thrust::make_zip_iterator(
            thrust::make_tuple(in.begin(), 
                               in.begin() + k, 
                               in.begin() + m*k, 
                               in.begin() - k, 
                               in.begin() - m*k, 
                               in.begin() + k+m*k,
                               in.begin() + k-m*k,
                               in.begin() - k+m*k,
                               in.begin() - k-m*k,
                               thrust::counting_iterator<int>(0)))+ n*m,
        out.begin(),
        minFunctor(m,n,k));
}

But if we dereference the iterator pointer it = tab.begin()-10 we obtain a runtime error. Hence before using the iterator, we have to make sure that we are in the vector limits. This is the reason why we add to the Tuple a thrust::counting_iterator<int>(0) to keep the vector index.

The Functor operator() must be updated to protect the thrust::get<i> commands:

//For each point p+{-k,0,k}, we keep the Site with minimum distance
  template <typename Tuple>
  __host__ __device__
  int operator()(const Tuple &t)
  {
      //Current point and site
      int i = thrust::get<9>(t);
      int v = thrust::get<0>(t);
 
      //Current point coordinates
      int y = i / m;    
      int x = i % m;
 
      if (x >= k)
      {
          v = minVoro(x, y, v, thrust::get<3>(t));
 
          if (y >= k)
              v = minVoro(x, y, v, thrust::get<8>(t));
 
          if (y + k < n)
              v = minVoro(x, y, v, thrust::get<7>(t));
      }
 
      if (x + k < m)
      { 
          v = minVoro(x, y, v, thrust::get<1>(t));
 
          if (y >= k)
              v = minVoro(x, y, v, thrust::get<6>(t));
          if (y + k < n)
              v = minVoro(x, y, v, thrust::get<5>(t));
      }
 
      if (y >= k)
          v = minVoro(x, y, v, thrust::get<4>(t));
      if (y + k < n)
          v = minVoro(x, y, v, thrust::get<2>(t));
 
      //global return
      return v;
  }
 

Step 3 Additional functions and Experiments

The overall JFA source code is available http://code.google.com/p/thrust/source/browse/examples/discrete_voronoi.cu. This code also contains timer functions and export functions to PGM file format. The sites are randomly spread in the image domain.

Here you have the console output on NVidia GTX285:

[Inititialize 1280x1280 Image]
  ( 4.42614ms )
[Copy to Device]
  ( 3.03962ms )
[JFA stepping]
  ( 33.2508ms )
  (number of cycles = 11)
  ( 49.2739 MPixel/s ) 
[Device to Host Copy]
  ( 10.3948ms )
[PGM Export]
  ( 217.517ms )

 
blog/jump_flooding_algorithm_on_gpu.txt · Last modified: 2012/05/07 16:36 by dcoeurjo