Adding scenario_1c with a 2D view and default layout
I have included the scenario_1c, which uses a 2d view on the array and the default layout. In addition I included "check_bounds_2d" for the case of 2d views.
On the CPU scenario_1c is a bit faster than scenario_7b (the fastest up to now on CPU) and on the GPU it is as fast as scenario_1b (the fastest up to now on GPU). For details see attached file demo_results.txt.
For performance measurements I have commented the check_bounds, as it is in the original file. I used check_bounds during development.
Although we have a scenario that covers both CPU and GPU, the problem remains that we need an #ifdef inside the parallel_for to distinguish between code for CPU and code for GPU.
Please check thoroughly and if possible reproduce the results.