New Vectorization Diagnostics starting from Intel® Fortran Compiler 15.0

Product Version: Intel® Fortran Compiler 15.0 and above


The vectorization report generated when using Intel® Fortran Compiler's optimization options (/O2 /Qopt-report:2) states that loop was not vectorized since loop body became empty after optimizations.


An example below will generate the following remark in optimization report:

integer function foo(a, b, n) 
    implicit none
    integer, intent(in) :: n
    real, intent(inout) :: a
    real, intent (in)   :: b
    integer :: i
    do i=1,n
           a = b + 1
    end do
    foo = a
end function 

ifort -c /O2 /Qopt-report:2 /Qopt-report-file:stdout f15414.f90


Report from: Interprocedural optimizations [ipo]



  -Qinline-factor: 100

  -Qinline-min-size: 30

  -Qinline-max-size: 230

  -Qinline-max-total-size: 2000

  -Qinline-max-per-routine: 10000

  -Qinline-max-per-compile: 500000


Begin optimization report for: FOO


    Report from: Interprocedural optimizations [ipo]


INLINE REPORT: (FOO) [1] f15414.f90(1,18)



In the example above, there is only one expression inside the loop. When moved outside the loop as a result of the compiler's optimization process there is nothing else left inside the loop to vectorize. 

A vectorizable loop contains loads from memory locations that are not contiguous in memory (sometimes known as a “gather”). These may be indexed loads, as in the example below, or loads with non-unit stride. The compiler has issued a hardware gather instruction for these loads.

(Note that for compiler versions 16.0.1 and earlier, the compiler may also emit this message when gather operations are emulated in software).


subroutine gathr(n, a, b, index)
   implicit none
   integer,                intent(in)  :: n
   integer,  dimension(n), intent(in)  :: index
   real(RT), dimension(n), intent(in)  :: a
   real(RT), dimension(n), intent(out) :: b
   integer                             :: i

   do i=1,n
       b(i) = 1.0_RT + 0.1_RT*a(index(i))

end subroutine gathr

$ ifort -c -xcore-avx2 -qopt-report=4 -qopt-report-file=stdout gathr.F90 -DRT=4 -S | egrep 'gather|VECTORIZED'

   remark #15415: vectorization support: gather was generated for the variable a:  indirect access    [ gathr.F90(10,29) ]

   remark #15300: LOOP WAS VECTORIZED

   remark #15458: masked indexed (or gather) loads: 1


$ egrep gather gathr.s

        vgatherdps %ymm4, -4(%r8,%ymm3,4), %ymm5                #10.29

        vgatherdps %ymm7, -4(%r8,%ymm6,4), %ymm8                #10.29

        vgatherdps %ymm3, -4(%r8,%ymm2,4), %ymm4                #10.29


The compiler has vectorized the loop using a “gather” instruction from Intel® Advanced Vector Extensions 2 (Intel® AVX2).

Compare to the behavior when compiling with -DRT=8  as described in the article for diagnostic #15328.

The Intel® Fortran Compiler will not vectorize a loop when it knows the loop has only one iteration. If the user requires vectorization by using a SIMD directive, the compiler emits a warning diagnostic.


subroutine f15423( a, b, n ) 
  implicit none
  real, dimension(*) :: a, b
  integer            :: i, n
!$omp simd
  do i=1,n 
     b(i) = 1. - a(i)**2
  end do
end subroutine f15423

$ ifort -c -qopenmp-simd f15423.f90

f15423.f90(8): (col. 7) remark: simd loop has only one iteration

f15423.f90(8): (col. 7) warning #13379:  was not vectorized with "simd"


If the loop really has only one iteration, don’t use a SIMD directive or don’t code a loop.

If the statement  n=1  was inserted unintentionally, remove it and the loop will vectorize.

subroutine f15516(a,b,n)
   implicit none
   complex(8), dimension(n), intent(in ) :: a
   complex(8), dimension(n), intent(out) :: b
   integer,                  intent(in ) :: n
   integer                               :: i 
   do i=1,n
      b(i) = 1. / sqrt(1.+a(i)**2)
end subroutine f15516

ifort -c /O2 /Qopt-report:2 /Qopt-report-phase:vec /Qopt-report-file:stdout f15516.f90
Begin optimization report for: F15516

    Report from: Vector optimizations [vec]

LOOP BEGIN at f15516.f90(8,4)
   remark #15516: loop was not vectorized: cost model has chosen vectorlength of 1 -- maybe possible to override via pragma/directive with vectorlength clause



subroutine f13379( a, b, n )  
  implicit none 
  integer, intent(in)               :: n
  integer, intent(in),  dimension(n) :: a
  integer, intent(out),dimension(n) :: b
  integer                           :: i, x=10  
!$omp simd  
  do i=1,n  
    if( a(i) > 0 ) then 
     x = i                    ! is the scalar assignment 
    end if 
    b(i) = x  
  end do 
!... reference the scalar outside of the loop  
  write(*,*) "last value of x: ", x  
end subroutine f13379

$ ifort -c  -O2 -qopt-report2 -qopenmp-simd -qopt-report-file=stderr -qopt-report-phase=vec f13379.f90

LOOP BEGIN at f13379.f90(12,3)

   remark #15316:simd loop was not vectorized: scalar assignment in simd loop is prohibited, consider private, lastprivate or reduction clauses 

   remark #15552: loop was not vectorized with "simd"



Using !$omp simd lastprivate(x)  instead of !$omp simd will have x initialized for each subroutine in executable code.


subroutine f13379( a, b, n )  
  implicit none 
  integer, intent(in)               :: n
  integer, intent(in),  dimension(n) :: a
  integer, intent(out),dimension(n) :: b
  integer                           :: i, x=10  
!$omp simd lastprivate(x)  
  do i=1,n  
    if( a(i) > 0 ) then 
     x = i                    ! is the scalar assignment 
    end if 
    b(i) = x  
  end do 
!... reference the scalar outside of the loop  
  write(*,*) "last value of x: ", x  
end subroutine f13379

$ ifort -c  -O2 -qopt-report2 -qopenmp-simd -qopt-report-file=stderr -qopt-report-phase=vec f13379.f90

Begin optimization report for: F13379

    Report from: Vector optimizations [vec]

LOOP BEGIN at f13379.f90(10,3)
   remark #15301: SIMD LOOP WAS VECTORIZED