New Vectorization Diagnostics starting from Intel® Fortran Compiler 15.0

Intel® Fortran Compiler

  • Intel Fortran Compiler: provides CPU and GPU offload support

  • Intel Fortran Compiler Classic: provides continuity with existing CPU-focused workflows and is provided to support users making the transition to the Intel® Fortran Compiler.

  • Both versions integrate seamlessly with popular third-party compilers, development environments, and operating systems.

author-image

作者

Product Version: Intel® Fortran Compiler 15.0 and above

Cause:

The vectorization report generated when using Intel® Fortran Compiler's optimization options (/O2 /Qopt-report:2) states that loop was not vectorized since loop body became empty after optimizations.

Example:

An example below will generate the following remark in optimization report:

integer function foo(a, b, n) 
    implicit none
    integer, intent(in) :: n
    real, intent(inout) :: a
    real, intent (in)   :: b
    integer :: i
    
    do i=1,n
           a = b + 1
    end do
    
    foo = a
    
end function 

ifort -c /O2 /Qopt-report:2 /Qopt-report-file:stdout f15414.f90

 

Report from: Interprocedural optimizations [ipo]

 

INLINING OPTION VALUES:

  -Qinline-factor: 100

  -Qinline-min-size: 30

  -Qinline-max-size: 230

  -Qinline-max-total-size: 2000

  -Qinline-max-per-routine: 10000

  -Qinline-max-per-compile: 500000

 

Begin optimization report for: FOO

 

    Report from: Interprocedural optimizations [ipo]

 

INLINE REPORT: (FOO) [1] f15414.f90(1,18)

 

Resolution:

In the example above, there is only one expression inside the loop. When moved outside the loop as a result of the compiler's optimization process there is nothing else left inside the loop to vectorize. 

Product Version: Intel® Fortran Compiler 15.0 and a later version 

Cause:

A vectorizable loop contains loads from memory locations that are not contiguous in memory (sometimes known as a “gather”). These may be indexed loads, as in the example below, or loads with non-unit stride. The compiler has issued a hardware gather instruction for these loads.

(Note that for compiler versions 16.0.1 and earlier, the compiler may also emit this message when gather operations are emulated in software).

 

The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options:

Windows* OS:  /O2  /Qopt-report:2  /Qopt-report-phase:vec    

Linux OS or OS X:  -O2 -qopt-report2  -qopt-report-phase=vec

Example:

An example below will generate the following  remark in optimization report:

subroutine gathr(n, a, b, index)
   implicit none
   integer,                intent(in)  :: n
   integer,  dimension(n), intent(in)  :: index
   real(RT), dimension(n), intent(in)  :: a
   real(RT), dimension(n), intent(out) :: b
   integer                             :: i

   do i=1,n
       b(i) = 1.0_RT + 0.1_RT*a(index(i))
   enddo

end subroutine gathr

$ ifort -c -xcore-avx2 -qopt-report=4 -qopt-report-file=stdout gathr.F90 -DRT=4 -S | egrep 'gather|VECTORIZED'

   remark #15415: vectorization support: gather was generated for the variable a:  indirect access    [ gathr.F90(10,29) ]

   remark #15300: LOOP WAS VECTORIZED

   remark #15458: masked indexed (or gather) loads: 1

   remark #15301: REMAINDER LOOP WAS VECTORIZED

$ egrep gather gathr.s

        vgatherdps %ymm4, -4(%r8,%ymm3,4), %ymm5                #10.29

        vgatherdps %ymm7, -4(%r8,%ymm6,4), %ymm8                #10.29

        vgatherdps %ymm3, -4(%r8,%ymm2,4), %ymm4                #10.29

$

The compiler has vectorized the loop using a “gather” instruction from Intel® Advanced Vector Extensions 2 (Intel® AVX2).

Compare to the behavior when compiling with -DRT=8  as described in the article for diagnostic #15328.

Product Version: Intel® Fortran Compiler 15.0 and a later version 

Cause:

The Intel® Fortran Compiler will not vectorize a loop when it knows the loop has only one iteration. If the user requires vectorization by using a SIMD directive, the compiler emits a warning diagnostic.

 

The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options:

Windows* OS:  /O2  /Qopt-report:2  /Qopt-report-phase:vec    

Linux OS or OS X:  -O2 -qopt-report2  -qopt-report-phase=vec

Example:

An example below will generate the following  remark in optimization report:

subroutine f15423( a, b, n ) 
  implicit none
  real, dimension(*) :: a, b
  integer            :: i, n
  
  n=1
 
!$omp simd
  do i=1,n 
     b(i) = 1. - a(i)**2
  end do
   
end subroutine f15423

$ ifort -c -qopenmp-simd f15423.f90

f15423.f90(8): (col. 7) remark: simd loop has only one iteration

f15423.f90(8): (col. 7) warning #13379:  was not vectorized with "simd"

Resolution:

If the loop really has only one iteration, don’t use a SIMD directive or don’t code a loop.

If the statement  n=1  was inserted unintentionally, remove it and the loop will vectorize.

Product Version: Intel® Fortran Compiler 15.0 and a later version  

Cause:

The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options includes non-vectorized loop instance:

Windows* OS:  /O2  /Qopt-report:2  /Qopt-report-phase:vec    

Linux OS or OS X:  -O2 -qopt-report2  -qopt-report-phase=vec

Example:

An example below will generate the following  remark in optimization report:

subroutine f15516(a,b,n)
   implicit none
   complex(8), dimension(n), intent(in ) :: a
   complex(8), dimension(n), intent(out) :: b
   integer,                  intent(in ) :: n
   integer                               :: i 
   
   do i=1,n
      b(i) = 1. / sqrt(1.+a(i)**2)
   enddo
   
end subroutine f15516

ifort -c /O2 /Qopt-report:2 /Qopt-report-phase:vec /Qopt-report-file:stdout f15516.f90
 
Begin optimization report for: F15516

    Report from: Vector optimizations [vec]

LOOP BEGIN at f15516.f90(8,4)
   remark #15516: loop was not vectorized: cost model has chosen vectorlength of 1 -- maybe possible to override via pragma/directive with vectorlength clause
LOOP END

Resolution: 

Updated9/13/2018

Product Version:  Intel® Fortran Compiler 15.0 and above

Cause:

When a loop contains a conditional statement that controls the assignment of a scalar value AND the scalar value is referenced AFTER the loop exits. The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options includes non-vectorized loop instance:

Windows* OS:  /O2  /Qopt-report:2  /Qopt-report-phase:vec    

Linux OS or OS X:  -O2 -qopt-report2  -qopt-report-phase=vec

Example:

An example below will generate the following remark in the optimization report:

subroutine f13379( a, b, n )  
  implicit none 
  integer, intent(in)               :: n
  integer, intent(in),  dimension(n) :: a
  integer, intent(out),dimension(n) :: b
   
  integer                           :: i, x=10  
   
!$omp simd  
  do i=1,n  
    if( a(i) > 0 ) then 
     x = i                    !...here is the scalar assignment 
    end if 
    b(i) = x  
  end do 
!... reference the scalar outside of the loop  
  write(*,*) "last value of x: ", x  
end subroutine f13379

$ ifort -c  -O2 -qopt-report2 -qopenmp-simd -qopt-report-file=stderr -qopt-report-phase=vec f13379.f90

LOOP BEGIN at f13379.f90(12,3)

   remark #15316:simd loop was not vectorized: scalar assignment in simd loop is prohibited, consider private, lastprivate or reduction clauses 

   remark #15552: loop was not vectorized with "simd"

LOOP END

Resolution:

Using !$omp simd lastprivate(x)  instead of !$omp simd will have x initialized for each subroutine in executable code.

Example

subroutine f13379( a, b, n )  
  implicit none 
  integer, intent(in)               :: n
  integer, intent(in),  dimension(n) :: a
  integer, intent(out),dimension(n) :: b
   
  integer                           :: i, x=10  
   
!$omp simd lastprivate(x)  
  do i=1,n  
    if( a(i) > 0 ) then 
     x = i                    !...here is the scalar assignment 
    end if 
    b(i) = x  
  end do 
!... reference the scalar outside of the loop  
  write(*,*) "last value of x: ", x  
end subroutine f13379

$ ifort -c  -O2 -qopt-report2 -qopenmp-simd -qopt-report-file=stderr -qopt-report-phase=vec f13379.f90


Begin optimization report for: F13379

    Report from: Vector optimizations [vec]

LOOP BEGIN at f13379.f90(10,3)
   remark #15301: SIMD LOOP WAS VECTORIZED
LOOP END