Intel® Optimization for TensorFlow* and Intel® Extension for TensorFlow* Cheat Sheet

Kevin Ta Rachel Oberman Preethi Venkatesh

Get started with Intel® Optimization for TensorFlow* and Intel® Extension for TensorFlow* using the following commands.

Intel® Optimization for TensorFlow*: A Public Release from Google

Features and optimizations for TensorFlow* on Intel hardware are frequently upstreamed and included in stock TensorFlow* releases. As of TensorFlow* v2.9, Intel® oneAPI Deep Neural Network Library (oneDNN) optimization is automatically enabled.

For more information, see TensorFlow.

Basic Installation Using PyPI*pip install tensorflow
Basic Installation Using Anaconda*conda install -c conda-forge tensorflow
Import TensorFlowimport tensorflow as tf
Capture a Verbose Log (Command Prompt)export ONEDNN_VERBOSE=1
Parallelize Execution (in the Code)tf.config.threading.set_intra_op_parallelism_threads(<number of physical cores per socket>)
tf.config.threading.set_inter_op_parallelism_threads(<number of sockets>)
tf.config.set_soft_device_placement(True)
# Users could tune the INTRAOP and INTEROP setting based on the workloads
Parallelize Execution (Command Prompt)export TF_NUM_INTRAOP_THREADS=<number of physical cores per socket>
export TF_NUM_INTEROP_THREADS=<number of sockets>
# Users could tune the INTRAOP and INTEROP setting based on the workloads
Non-Uniform Memory Access (NUMA)numactl --cpunodebind N --membind N python <script>
Enable Keras Mixed Precision with BF16from tf.keras import mixed_precision
mixed_precision.set_global_policy('mixed_bfloat16')



Intel® Optimization for TensorFlow*: A Public Release from Intel

In addition to the performance tuning options listed under the Google public release, the Intel public release offers OpenMP* optimizations for further performance enhancements.

For additional installation methods, see the Intel® Optimization for TensorFlow* Installation Guide.

For more information about performance, see the Maximize TensorFlow* Performance on CPU and Getting Started with Mixed Precision Support in oneDNN Bfloat16.

Basic Installation Using PyPI*pip install intel-tensorflow
Basic Installation Using Anaconda*conda install tensorflow (Linux/MacOS)
conda install tensorflow-mkl (Windows)
Import TensorFlowimport tensorflow as tf
Capture a Verbose Log (Command Prompt)export ONEDNN_VERBOSE=1
Parallelize Execution (in the Code)tf.config.threading.set_intra_op_parallelism_threads(<number of physical cores per socket>)
tf.config.threading.set_inter_op_parallelism_threads(<number of sockets>)
tf.config.set_soft_device_placement(True)
# Users could tune the INTRAOP and INTEROP setting based on the workloads
Parallelize Execution (Command Prompt)export TF_NUM_INTRAOP_THREADS=<number of physical cores per socket>
export TF_NUM_INTEROP_THREADS=<number of sockets>
# Users could tune the INTRAOP and INTEROP setting based on the workloads
Non-Uniform Memory Access (NUMA)numactl --cpunodebind N --membind N python <script>
Enable Keras Mixed Precision with BF16from tf.keras import mixed_precision
mixed_precision.set_global_policy('mixed_bfloat16')
Set the Maximum Number of Threads (Command Prompt)export OMP_NUM_THREADS=<number of physical cores per socket>
Bind OpenMP Threads to Physical Processing Unitsexport KMP_AFFINITY=granularity=fine,compact,1,0
Set a Wait Time (ms) After Completing the Execution of a Parallel Region Before Sleepingexport KMP_BLOCKTIME=<time>
# Recommended to be to 0 for CNN or 1 for non-CNN (user should verify empirically)
Print an OpenMP Runtime Library Env Variables During Executionexport KMP_SETTINGS=TRUE



Intel® Extension for TensorFlow*

This extension provides the most up-to-date features and optimizations on Intel hardware, supporting both Intel CPU and Intel GPU devices, most of which will eventually be upstreamed to stock TensorFlow* releases. Additionally, while users can get many optimization benefits by default without needing an additional set up, Intel® Extension for TensorFlow* provides further tuning and custom operations to boost performance even more.

For additional installation methods, see the Intel® Extension for TensorFlow* Installation Guide.

For more information, see Intel® Extension for TensorFlow*.

Basic Installation Using PyPI*pip install --upgrade intel-extension-for-tensorflow[gpu] # Install for GPU

pip install --upgrade intel-extension-for-tensorflow[cpu] # Install for CPU [Experimental]
Import Intel® Extension for TensorFlow*import intel_extension_for_tensorflow as itex
Get the Current XPU Backend Typeitex.get_backend()
Set the Specific Backend Type (in the Code): Set by Defaultitex.set_backend('GPU') # 'CPU'
Set the Specific Backend Type (Command Prompt): Set by Defaultexport ITEX_XPU_BACKEND="GPU" # "CPU"
Advanced Automatic Mixed Precision (in the Code): A Basic Configuration with Improved Inference Speed with Reduced Memory Consumptionauto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
auto_mixed_precision_options.data_type = itex.BFLOAT16 #itex.FLOAT16

graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
graph_options.auto_mixed_precision = itex.ON

config = itex.ConfigProto(graph_options=graph_options)
itex.set_config(config)
Advanced Automatic Mixed Precision (Command Prompt): A Basic Configuration with Improved Inference Speed with Reduced Memory Consumptionexport ITEX_AUTO_MIXED_PRECISION=1 
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE="BFLOAT16" # or "FLOAT16"
Customized AdamW Optimizer (in the Code)itex.ops.AdamWithWeightDecayOptimizer(
  weight_decay_rate=0.001, learning_rate=0.001, beta_1=0.9, beta_2=0.999,
  epsilon=1e-07, name='Adam',
  exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"], **kwargs
)
Customized Layer Normalization (in the Code)itex.ops.LayerNormalization(
  axis=-1, epsilon=0.001, center=True, scale=True,
  beta_initializer='zeros', gamma_initializer='ones',
  beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
  gamma_constraint=None, **kwargs
)
Customized GELU (in the Code)itex.ops.gelu(
  features, approximate=False, name=None
)
Customized LSTM (in the Code)itex.ops.ItexLSTM(
  200, activation='tanh',
  recurrent_activation='sigmoid',
  use_bias=True,
  kernel_initializer='glorot_uniform',
  recurrent_initializer='orthogonal',
  bias_initializer='zeros', **kwargs
)


For more information and support, or to report any issues, see:


Intel® Extension for TensorFlow* Issues on GitHub*

TensorFlow* Issues on GitHub*

Intel® AI Analytics Toolkit Forum


Sign up and try this extension for free using Intel® Developer Cloud for oneAPI.