Interview Questionary Set 1

1. What is ETL and what are the ETL tools?

ETL is process of Extract, Transform and Load the data into Datawarehousing in required format for decision making.

ETL Tools List:
·         Informatica PowerCenter
·         IBM DataStage
·         Ab Initio
·         Oracle Warehouse Builder
·         Microsoft SQL Server Integration
·         DT/Studio
·         Talend Studio..etc 

2. What is Informatica?
Informatica is an ETL tool provided by 'Informatica Corporation', it is used to Extraction, Transformation and Load process all types databases. Now a day’s Informatica is also being used as an Integration tool and it is Service Oriented Architecture.

3. What are the Informatica PowerCenter Tools.?
Informatica PowerCenter had 2 types of Tools:
Server Tools: Administrator Console and Integration Service.
Client Tools: Repository Manager, Mapping Designer, Workflow Manager and Workflow Monitor.

4. What are the PowerCenter Client Side Tools and their uses?
Informatica PowerCenter Client Side Tools are Repository Manager, Mapping Designer, Workflow Manager, Workflow Monitor, Data Transformation and Developer Client.

Repository Manager: It is use to organize and secure metadata by creating folders.
Mapping Designer: It is use to create and store mapping metadata in the repository. The Designer helps you create source definitions, target definitions and transformations to build the mappings. The Designer has tools to help to build mappings and mapplets so you can specify how to move and transform data between sources and targets.
Workflow Manager: It is use to store workflow metadata and connection object information in the repository. A workflow contains a session and any other task you may want to perform when you run a session. Tasks can include a session, email notification, or scheduling information. You connect each task with links in the workflow. In the Workflow Manager, you define a set of instructions called a workflow to execute mappings you build in the Designer.
Workflow Monitory: It is use to retrieve workflow run status information and session logs written by the Integration Service. A workflow is a set of instructions that tells an Integration Service how to run tasks. Integration Services run on nodes or grids. The nodes, grids, and services are all part of a domain. You can monitor workflows and tasks in the Workflow Monitor.

5. What are the PowerCenter Server Side Tools and their uses?
Informatica PowerCenter Server Side Tools are Administrator Console and Integration Service.

Administrator Console: Informatica Administrator is the administration tool that you use to administer the Informatica domain and Informatica security. It is used to Domain administrative tasks and Domain administrative tasks
Integration Service: The PowerCenter Integration Service is an application service that runs sessions and workflows. Use the Administrator tool to manage the PowerCenter Integration Service. It is used to create a PowerCenter Integration Service, Enable or disable the PowerCenter Integration Service, Configure normal or safe mode, Configure the PowerCenter Integration Service properties, Configure the associated repository, Configure the PowerCenter Integration Service processes, Configure permissions on the PowerCenter Integration Service and Remove a PowerCenter Integration Service.

6. What happen if the no of source records are more than the sequence generator no?
Error Message: Overflow error. The Sequence Generator transformation has reached the end of the configured end value.

7. What is Source Qualifier and tasks it performs?
The Source Qualifier transformation represents the rows that the Integration Service reads when it runs a session. The Source Qualifier Transformation convert relational or flat file datatypes into Informatica datatypes.

Tasks it performs:
·         Join data originating from the same source database
·         Filter rows when the Integration Service reads source data
·         Specify an outer join rather than the default inner join
·         Specify sorted ports
·         Select only distinct values from the source
·         Create a custom query to issue a special SELECT statement for the Integration Service to read source data

8. What is Filter, Router and what is difference between them?
Filter: The Filter transformation to filter out rows in a mapping which are passing through it. The Filter transformation allows rows that meet the specified filter condition to pass through. It drops rows that do not meet the condition. You can filter data based on one or more conditions.
Router: A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group.

9. What is Joiner and why we use sorted input option?
The Joiner transformation is used join source data from two related heterogeneous sources residing in different locations or file systems. We can also join data from the same source. The use Joiner transformation sources should have at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.

User of Sorted Input: To improve session performance by configuring the Joiner transformation to use sorted input. When you configure the Joiner transformation to use sorted data, the Integration Service improves performance by minimizing disk input and output. If the sorted input option is select then we are telling Informatica integration service that we have passing sorted data to the joiner transformation.

10. What is Union Transformation?
The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the 'UNION ALL' SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement but the Union transformation does not remove duplicate rows.

11. What is Update Strategy and Forward rejected rows?
Update Strategy is used to update/delete the data from target table other than insert the records. In General the data passes through source to target is only for insert.

Forward Rejected Rows:
The Forward Rejected Rows option is used to either pass rejected rows to next transformation or drop them. By default the Integration Service forward rejected rows to next transformation. The Integration Service flags the rows for reject and writes them to the session log file and passes to reject file. If you do not select Forward Rejected Rows option the Integration Service drops rejected rows and writes them to the session log file.

12. What is transaction control transformation?
The Transaction Control transformation is used to control the data flow through it by commit and roll back transactions. A transaction is the set of rows bound by commit or roll back rows. We can define a transaction based on a varying number of input rows. It can be controlled two level within a mapping or within a session.

13. What is Lookup and How many types of Lookup?
Lookup transformation is used to look up data in a flat file, relational table, view, or synonym based on lookup conditions before loading data into target.

Types of Lookup:

Connected:
·         Connected Lookup transformation to receive input directly from the mapping pipeline.
·         It can be a dynamic or static cache.
·         Cache includes the lookup source columns in the lookup condition and the lookup source columns that are output ports.
·         Can return multiple columns from the same row or insert into the dynamic lookup cache.
·         If there is no match for the lookup condition, the Integration Service returns the default value for all output ports. If it configure dynamic caching, the Integration Service inserts rows into the cache or leaves it unchanged.
·         If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition for all lookup/output ports. If it configure dynamic caching, the Integration Service either updates the row the in the cache or leaves the row unchanged.
·        Pass multiple output values to another transformation. Link lookup/output ports to another transformation.
·         Supports user-defined default values.

Un-Connected:
·         Unconnected Lookup transformation to receive input from the result of a :LKP expression in another transformation.
·         Use a static cache.
·        Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
·         Designate one return port (R). Returns one column from each row.
·         If there is no match for the lookup condition, the Integration Service returns NULL.
·         If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition into the return port.
·         Pass one output value to another transformation. The lookup/output/return port passes the value to the transformation calling :LKP expression.
·         Does not support user-defined default values.

14. What is cache and How many types of cache?
The cache is memory allocation for that instance in the server. The Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties.

Types of cache:

Persistent cache: Once save the lookup cache files and reuse them the next time the Integration Service processes a Lookup transformation configured to use the cache.
Re-cache from lookup source: If the persistent cache is not synchronized with the lookup table, you can configure the Lookup transformation to rebuild the lookup cache.
Static cache: The cache that cannot be changed through the session. It caches the lookup file or table and looks up values in the cache for each row that comes into the transformation. When the lookup condition is true, the Integration Service returns a value from the lookup cache. The Integration Service does not update the cache while it processes the Lookup transformation. By default the Integration Service creates a static cache and it read only.
Dynamic cache: The cache that can be changed through the session. The Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target. To cache a table, flat file, or source definition and update the cache, configure a Lookup transformation with dynamic cache.
Shared cache: The cache which can be used between multiple transformations. A named cache can be used more than one place within the same mapping. A unnamed cache can be used more than one or more places within mapping or different mappings.
Pre-build lookup cache: When you configured for sequential caches, the Integration Service creates caches as the source rows enter the Lookup transformation. When you configure the session to build concurrent caches, the Integration Service does not wait for the first row to enter the Lookup transformation before it creates caches. Instead, it builds multiple caches concurrently.

15. What is difference between static and dynamic cache?
Static cache:
·         We cannot insert or update the cache
·         We can use relational, flat-file or pipeline lookup.
·         When the condition is true the Integration Service returns a value from the lookup table or cache.
·        When the condition is false the Integration Service returns the default value for connected lookup and NULL for un-connected lookup.

Dynamic cache:
·         We can insert or update rows in the cache as the rows pass to the target.
·         We can use relational, flat-file or pipeline lookup.
·         When the condition is true the Integration Service either updates rows in the cache or leaves the cache unchanged, depending on the row type. This indicates that the row is in the cache and target table. It can pass updated rows to a target.
·        When the condition is not true, the Integration Service either inserts rows into the cache or leaves the cache unchanged, depending on the row type. This indicates that the row is not in the cache or target. It can pass inserted rows to a target table.

16. What is named cache?
When you want to share a persistent cache file across mappings then they it will be named and that can be used. The caching structures must match or be compatible with a named cache. It can share static and dynamic named caches.

17. What is difference between re-usable transformation and mapplet?
A transformation which is created in Transformation developer is called re-usable transformation and which is used more than one mapping.

A mapplet is a set of transformations with a business logic which can be used more than one mapping.

18. What is Normalizer and why we use it?
The Normalizer transformation is which convert a each single row into set of multiple rows.

Uses of it:
The Normalizer transformation is used to convert a row that contains multiple-occurring columns and returns a row for each instance of the multiple-occurring data.

The Normalizer transformation is used as Source Qualifier for Cobol/VSAM files.

19. What happen if the cache overflows?
If the cache data overflow then the Integration Service stores the overflow values in the cache files. Once the session completes, the Integration Service delete cache files unless you configure the lookup transformation to use a persistent cache.

20. What is data driven?
Integration Service follows instructions coded into Update Strategy and Custom transformations within the session mapping to determine how to flag rows for insert, delete, update, or reject.

21. What is target load plan?
Target Load Plan is the order in which the Integration Service sends rows to targets in different target load order groups in a mapping.

A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. We can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially.
 
22. What happen if the Source Qualifier one of port not connected to source definition?
The Integration Service through an internal error. The Source Qualifier contains an unbounded field [LOC].

23. What is constraint based load, can we update in constraint based load?
The Constraint based load order is process to load the data first in Parent table then Child Table. When you specify constraint-based loading for a session then the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. The Constraint based load ordering attribute applies only to insert operations.

When you enable complete constraint-based loading, change data is loaded to targets in the same Transaction Control Unit (TCU) by using the row ID assigned to the data by the CDC Reader. As a result, data is applied to the targets in the same order in which it was applied to the sources. The following message will be issued in the session log to indicate that this support is enabled:

WRT_8417 Complete Constraint-Based Load Ordering is enabled.

To enable complete constraint-based loading, specify FullCBLOSupport=Yes in the Custom Properties attribute on the Config Object tab. This property can also be set in the PowerCenter Integration Service, which makes it applicable to all workflows and sessions that use that the PowerCenter Integration Service.

If you use complete constraint-based loading, your mapping must not contain active transformations which change the row ID generated by the CDC Reader. The following transformations change the row Id value: Aggregator, Custom, configured as an active transformation, Joiner, Normalizer, Rank and  Sorter.

24. How to update target without using update strategy?
By selecting Treat source rows as “update” in session properties tab.

25. How many ways we can update target table?
We can update target table in 3 ways.
1.    By selecting Treat source rows as “update” in session properties tab.
2.    By using Update Strategy in mapping with DD_UPDATE function in it and by selecting Treat source rows as “Data driven” in session properties tab.
3.    By using Update Override in Target instance in mapping.

No comments:

Post a Comment