PERFORMANCE TUNINING IN INFORMATICA
The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks can occur in the source and target, the mapping, the session, and the system.
Data Transformation Manager (DTM):
When a PowerCenter session is triggered, integration service start Data Transformation Manager (DTM), which is responsible to start reader thread, transformation thread and writer thread.
Reader thread is responsible to read data from the sources. Transformation threads process data according to the transformation logic in the mapping and writer thread connects to the target and loads the data. Any data processing delay in these threads leads to a performance issue.
Source Bottlenecks:
Performance bottlenecks can occur when the Integration Service reads from a source database.
Slowness in reading data from the source leads to delay in filling enough data into DTM buffer. So the transformation and writer threads wait for data. This delay causes the entire session to run slower.
Inefficient query or small database network packet sizes can cause source bottlenecks.
Identify bottleneck in Source:
If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So, all records will be filtered off and none will proceed to other parts of the mapping. Isssssn original case, without the test filter, total time taken is as follows:-
▹ Total Time = time taken by (source + transformations + target load)
▹ Now because of filter, Total Time = time taken by source
So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.
Target Bottlenecks:
• When target bottleneck occurs, writer thread will not be able to free up space for reader and transformer threads, until the data is written to the target. So the reader and transformer threads to wait for free blocks. This causes the entire session to run slower.
• Small database checkpoint intervals, small database network packet sizes, or problems during heavy loading operations can cause target bottlenecks.
Identify bottleneck in Target:
If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck.
Mapping Bottlenecks:
A complex mapping logic or a not well written mapping logic can lead to mapping bottleneck. With mapping bottleneck, transformation thread runs slower causing the reader thread to wait for free blocks and writer thread to wait blocks filled up for writing to target.
Session Bottlenecks:
If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.
Session bottleneck occurs normally when you have the session memory configuration is not turned correctly.
This in turn leads to a bottleneck on the reader, transformation or writer thread. Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.
System Bottlenecks:
After you tune the source, target, mapping, and session, consider tuning the system to prevent system bottlenecks.
The Integration Service uses system resources to process transformations, run sessions, and read and write data.
The Integration Service also uses system memory to create cache files for transformations, such as Aggregator, Joiner, Lookup, Sorter, XML, and Rank.
Source, Target & Mapping Bottlenecks Using Thread Statistics:
Thread statics gives run time information from all the three threads; reader, transformation and writer thread. The session log provides enough run time thread statistics to help us understand and pinpoint the performance bottleneck.
Gathering Thread Statistics:
You can get thread statistics from the session long file. When you run a session, the session log file lists run time information and thread statistics with below details.
Gathering Thread Statistics:
You can get thread statistics from the session long file. When you run a session, the session log file lists run time information and thread statistics with below details.
▹ Run Time: Amount of time the thread runs.
▹ Idle Time: Amount of time the thread is idle. Includes the time the thread waits for other thread processing.
▹ Busy Time: Percentage of the run time. It is (run time – idle time) / run time x 100.
▹ Thread Work Time: The percentage of time taken to process each transformation in a thread.
Note: Session Log file with normal tracing level is required to get the thread statistics. Amount of time the thread runs.
Optimizing the Source: INCREASE THE PERFORMANCE OF A SOURCE
1. Optimizing the Query
▹ If a session joins multiple source tables in one Source Qualifier, you might be able to improve performance by optimizing the query with optimizing hints.
▹ The database administrator (DBA) can create optimizer hints to tell the database how to execute the query for a particular set of source tables.
2. Increasing Database Network Packet Size
If you read from Oracle, Sybase ASE, or Microsoft SQL Server sources, you can improve the performance by increasing the network packet size. Increase the network packet size to allow larger packets of data to cross the network at one time.
Optimizing the Target:
1. Using Bulk Loads:
▹ You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database.
▹ When bulk loading, the Integration Service bypasses the database log, which speeds performance.
▹ Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery.
2. Using External Loaders:
To increase session performance, configure PowerCenter to use an external loader for the following types of target databases. External loader can be used for Oracle, DB2, Sybase and Teradata.
3. Dropping Indexes and Key Constraints:
▹ When you define key constraints or indexes in target tables, you slow the loading of data to those tables.
▹ To improve performance, drop indexes and key constraints before you run the session.
▹ You can rebuild those indexes and key constraints after the session completes.
4.Increasing Database Network Packet Size:
▹ If you write to Oracle, Sybase ASE, or Microsoft SQL Server targets, you can improve the performance by increasing the network packet size.
▹ Increase the network packet size to allow larger packets of data to cross the network at one time.
Optimizing the Mappings:
Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping.
Configure the mapping with the least number of transformations and expressions to do the most amount of work possible.
Delete unnecessary links between transformations to minimize the amount of data moved.
1. Optimizing Datatype Conversions:
You can increase performance by eliminating unnecessary datatype conversions. For example, if a mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions from mappings.
2. Optimizing Expressions:
Minimizing Aggregate Function Calls: When writing expressions, factor out as many aggregate function calls as possible. Each time you use an aggregate function call, the Integration Service must search and group the data.
Example:
SUM(COL_A + COL_B) performs better than SUM(COL_A) + SUM(COL_B)
Replacing Common Expressions with Local Variables: If you use the same expression multiple times in one transformation, you can make that expression a local variable by using VARIABLE PORT
Choosing Numeric Versus String Operations: The Integration Service processes numeric operations faster than string operations. For example, if you look up large amounts of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.
Using Operators Instead of Functions:
▹ The Integration Service reads expressions written with operators faster than expressions with functions. Where possible, use operators to write expressions.
▹ If your logic contains multiple Informatica Inbuilt function (IIF) functions, try replacing them using DECODE. The decode function is faster compared to multiple IIFs.
▹ The integration service processes operators faster compared to functions. For example, consider using || (pipe) in place of the CONCAT function to concatenate the data.
▹ Passing unnecessary data through the mapping will hamper the performance. It is recommended that if you do not need certain data, disable the output port of the column so that you don’t pass the data.
Optimizing Transformations:
1.Optimize Lookup transformation:
a) Caching the lookup table: When caching is enabled the Informatica server caches the lookup table and queries the cache during the session. When this option is not enabled the server queries the lookup table on a row-by row basis.
If your mapping contains multiple lookups that look up on the same lookup table, it is suggested you share the cache in order to avoid performing caching multiple times.
b) Optimizing the lookup condition: Whenever multiple conditions are placed, the condition with equality sign should take precedence.
c) Lookup Override:
▹ You can reduce the processing time if you use lookup SQL override properly in the lookup transformation. If you are using lookup to look up on the database table, you can use the lookup SQL override to reduce the amount of data that you look up. This also helps in saving the cache space.
▹ Suppress ORDER BY statement by appending two dashes (–)
▹ Remove all ports not used downstream or in the SQL Override
d) Indexing the lookup table: The cached lookup table should be indexed on order by columns. The session log contains the ORDER BY statement. The un-cached lookup since the server issues a SELECT statement for each row passing into lookup transformation, it is better to index the lookup table on the columns in the condition.
e) Replace large lookup tables with joins in the Source Qualifier when possible (take always less number of records table as a lookup table).s
2.Optimize Filter transformation:
▹ You can improve the efficiency by filtering early in the data flow.
▹ Use source qualifier to filter the data. You can also use source qualifier SQL override to filter the records, instead of using filter transformation.
▹ Replace multiple filters with a router
3.Source Qualifier Transformation:
▹ Bring only the required columns from the source to the source qualifier. Most of the times not all the columns of the source table are required, so bring only the required fields by deleting the unnecessary columns.
▹ Avoid using order by clause inside the source qualifier SQL override. The order by clause requires additional processing and performance can be increased by avoiding it.
Optimize Aggregate transformation:
▹ Group by simpler columns. Preferably numeric columns.
▹ Use Sorted input. The sorted input decreases the use of aggregate caches. The server assumes all input data are sorted and as it reads it performs aggregate calculations.
▹ Use as early as possible.
▹ Filter data before aggregating.
▹ Limit the number of ports used in the aggregator transformation. This will reduce the volume of data that aggregator transformation stores inside the cache.
▹ Minimize aggregate function calls.
5.Optimize Seq. Generator transformation:
▹ Try creating a reusable Seq. Generator transformation and use it in multiple mappings.
▹ The number of cached value property determines the number of values the Informatica server caches at one time.
6.Optimize Expression transformation:
▹ Minimize aggregate function calls.
▹ Replace common sub-expressions with local variables (Variable ports).
▹ Use operators instead of functions.
7.Optimizing joiner transformation:
▹ It is recommended to assign the table with lesser number of records as master while using joiner transformation.
▹ It is also recommended to perform joining in the source qualifier using SQL override as performing joins on the database is sometimes faster compared to performing in Informatica.
▹ Additionally, pass the sorted data to joiner transformation to enhance the performance as this utilizes less disk space compared to unsorted data.
▹ Perform normal joins whenever possible.
Common sources of problems:
▹ too many transforms
▹ unused links between ports
▹ too many input/output or outputs ports connected out of aggregator, ranking, lookup transformations
▹ unnecessary data-type conversions
Common solutions:
▹ Eliminate transformation errors
▹ If several mappings read from the same source, try single pass reading
▹ Optimize datatypes, use integers for comparisons.
▹ Don’t convert back and forth between datatypes
▹ Optimize lookups and lookup tables, using cache and indexing tables
▹ Put your filters early in the data flow, use a simple filter condition
▹ For aggregators, use sorted input, integer columns to group by and simplify expressions
▹ If you use reusable sequence generators, increase number of cached values
▹ If you use the same logic in different data streams, apply it before the streams branch off
▹ Optimize expressions:
– isolate slow and complex expressions
– reduce or simplify aggregate functions
– use local variables to encapsulate repeated computations
– integer computations are faster than character computations
– use operators rather that the equivalent function, ‘||’ faster than CONCAT().
Thank you Raj sir nice explanation no one can have this much patience to explain like you way of teaching is very nice it will understand everyone
Leave a comment