Commit 503168bc authored by Uwe Schulzweida's avatar Uwe Schulzweida

arradd: disable SSE2 optimazation (bug in residual loop)

parent 7a366f62
2012-07-26 Uwe Schulzweida <Uwe.Schulzweida@zmaw.de>
* Version 1.5.6.1 released
* arradd: disable SSE2 optimazation (bug in residual loop)
The following statistical functions are affected:
*mean, *avg, *sum, *var, *std
if all of the following conditions are complied:
- x86_64 machine (tornado, squall, thunder, lizard)
- dataset has no missing values
- the horizontal grid size is > 1 and not multiple of 8
This bug was introduced in CDO version 1.5.5.
2012-07-23 Uwe Schulzweida <Uwe.Schulzweida@zmaw.de>
* using CDI library version 1.5.6
......
CDO NEWS
--------
Version 1.5.6.1 (26 July 2012):
Fixed bugs:
Wrong results with the following statistical functions:
*mean, *avg, *sum, *var, *std
only if all of the following conditions are complied:
- x86_64 machine (tornado, squall, thunder, lizard)
- dataset has no missing values
- the horizontal grid size is > 1 and not multiple of 8
This bug was introduced in CDO version 1.5.5.
Version 1.5.6 (23 July 2012):
New features:
......
This diff is collapsed.
# Process this file with autoconf to produce a configure script.
AC_INIT([cdo], [1.5.6], [http://code.zmaw.de/projects/cdo])
AC_INIT([cdo], [1.5.6.1], [http://code.zmaw.de/projects/cdo])
CONFIG_ABORT=yes
AC_CONFIG_AUX_DIR(config)
......
......@@ -203,6 +203,11 @@
/* Version number of package */
#undef VERSION
/* Enable large inode numbers on Mac OS X 10.5. */
#ifndef _DARWIN_USE_64_BIT_INODE
# define _DARWIN_USE_64_BIT_INODE 1
#endif
/* Number of bits in a file offset, on hosts where this is settable. */
#undef _FILE_OFFSET_BITS
......
......@@ -38,13 +38,14 @@ void farfun(field_t *field1, field_t field2, int function)
else cdoAbort("function %d not implemented!", function);
}
static
void arradd(const long n, double * const restrict a, const double * const restrict b)
{
long i;
// SSE2 version is 15% faster than the original loop (tested with gcc47)
#ifdef __SSE2__
#if 0
//#ifdef __SSE2__ /*__SSE2__*/ // bug in this code!!!
long residual = n % 8;
long ofs = n - residual;
......@@ -57,7 +58,7 @@ void arradd(const long n, double * const restrict a, const double * const restri
av[i+2] = _mm_add_pd(av[i+2], bv[i+2]);
av[i+3] = _mm_add_pd(av[i+3], bv[i+3]);
}
printf("residual, ofs, n %ld %ld %ld\n", residual, ofs, n);
for ( i = 0; i < residual; i++ ) a[ofs+i] += b[ofs+i];
#else
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment