• +91 9723535972
  • info@interviewmaterial.com

Data Warehouse Interview Questions and Answers

Data Warehouse Interview Questions and Answers

Question - 61 : - What is the difference between agglomerative and divisive hierarchical clustering?

Answer - 61 : -

The agglomerative hierarchical constraining method allows clusters to be read from bottom to top so that the program always reads from the sub-component first and then moves to the parent in an upward direction. In contrast, divisive hierarchical clustering uses a top to bottom approach in which the parent is visited first and then the child. The agglomerative hierarchical method consists of objects in which each object creates its clusters. These clusters are grouped to form a larger cluster. It is also the process of continuous merging until all the single clusters are merged into a complete big cluster that will consist of the objects of the chart clusters; however, in divisive clustering, the parent cluster is divided into smaller clusters. It keeps on dividing until each cluster has a singular object to represent.

Question - 62 : - What is the level of granularity of a fact table?

Answer - 62 : -

A fact table is usually designed at a low level of granularity. This means that we need to find the lowest amount of information stored in a fact table. For example, employee performance is a very high level of granularity while employee performance daily and employee performance weekly can be considered low levels of granularity because they are much more frequently recorded data. The granularity is the lowest level of information stored in the fact table; the depth of the data level is known as granularity in the date dimension. The level could be year month quarter period week and the day of granularity, so the day being the lowest level the year being the highest level the process consists of the following two steps determining the dimensions that are to be included and determining the location to find the hierarchy of each dimension of that information the above factors of determination will be resent as per the requirements.

Question - 63 : - What’s the biggest difference between Inmon and Kimball philosophies of knowledge warehousing?

Answer - 63 : -

These are two philosophies that we’ve in data warehousing. Within the Kimball philosophy, data warehousing is viewed as a constituency of knowledge mods, so data mods are focused on delivering business objectives for departments in a corporation, and therefore the data warehouse may be a confirmed dimension of the info mods hence a unified view of the enterprise are often obtained from the dimension modeling on a departmental area level, and within the Inmon philosophy we will create a knowledge warehouse on a topic by discipline basis hence the event of the info warehouse can start with the info from the web store other subject areas are often added to the info warehouse as their need arises point of sale or pos data are often added later if management decides that it’s required and if we check it out on a kind of algorithmic basis within the Kimball philosophy we first accompany data marts then we combine it and that we get our data warehouse while with Inmon philosophy we first create our data warehouse then we create our data marts.

Both differ within the concept of building the info Warehouse. – Kimball views Data Warehousing as a constituency of knowledge marts. Data marts are focused on delivering business objectives for departments in a corporation, and therefore the Data Warehouse may be a conformed dimension of the info Marts. Hence, a unified view of the enterprise is often obtained from the dimension modeling on a departmental area level. – Inmon explains in creating a knowledge Warehouse on a subject-by-subject area basis. Hence, the event of the info Warehouse can start with data from the web store. Other subject areas are often added to the info Warehouse as their needs arise. Point-of-sale (POS) data is often added later if management decides that it’s necessary.

Question - 64 : - Explain the ETL cycles three-layer architecture.

Answer - 64 : -

ETL stands for extraction transformation and loading, so there are three phases involved in it – the primary is the staging layer, then the info integration layer, and the last layer is the access layer. So these are the three layers that are involved for the three specific phases within the ETL cycle, so within the staging layer, it’s used for the info extraction from various data structures of the source, within the data integration layer, data from the staging layer is transformed and transferred to the info base using the mixing layer the data is arranged in hierarchical groups often mentioned as dimensions facts or aggregates during a data warehousing system the mixture of facts and dimension tables is called a schema so basically within the data integration layer, once the info is loaded and data extracted and transformed within the staging layer and eventually the access layer where the info is accessed and may be loaded for further analytics.

Question - 65 : - What’s an OLAP Cube?

Answer - 65 : -

The idea behind OLAP was to pre-compute all calculations that are needed for reporting. Generally, calculations are done through a scheduled batch job processing at non-business hours when the database server is normally idle. The calculated fields are stored during a special database, called an OLAP Cube.

An OLAP Cube doesn’t need to loop through any transactions because all the calculations are pre-calculated, providing instant access.

An OLAP Cube may be a snapshot of knowledge at a selected point in time, perhaps at the top of a selected day, week, month, or year.

At any time, you’ll refresh the Cube, using the present values within the source tables.

With very large data sets, it could take an appreciable amount of your time for Excel to reconstruct the Cube.

But with the info sets we’ve been using (just a few thousand rows), the method appears to be instantaneous.

Question - 66 : - Explain the chameleon method utilized in data warehousing.

Answer - 66 : -

Chameleon may be a methodology that may be a hierarchical clustering algorithm that overcomes the restrictions of the prevailing models and methods in data warehousing. This method operates on the sparse graph having nodes that represent data items and edges which represent the weights of the info items. This representation allows large data sets to be created and operated successfully. The tactic finds the clusters that are utilized in the info set using the two-phase algorithm. The primary phase consists of the graph partitioning that permits the clustering of the info items into a larger number of sub-clusters; the second phase, on the opposite hand, uses an agglomerative hierarchical clustering algorithm to look for the clusters that are genuine and may be combined alongside the sub-clusters that are produced.

Question - 67 : - What’s virtual data warehousing?

Answer - 67 : -

A virtual data warehouse provides a collective view of the finished data. Therein warehouse a virtual data warehouse has no historical data. It is often considered as a logical data model of the given metadata. Virtual data warehousing is that the de facto data system strategy for supporting analytical deciding. It’s one of the simplest ways for translating data and presenting it within the form which will be employed by decision-makers. It provides a semantic map that allows the top user also for viewing because the data is virtualized.

Question - 68 : - What is active data warehousing?

Answer - 68 : -

An active data warehouse represents a single state of a business. Active data warehousing considers the analytical perspectives of customers and suppliers. It helps in showing the updated data through reports. Now, this is the most common form of data warehousing, which is used for large businesses and specifically those which deal in the industry of e-commerce or commerce. A form of repository of captured transactional data is known as active data warehousing. Using this concept, trends and patterns are found to be used for future decision making, so based on the analytical results from the data warehouse, It can perform further business decisions active data warehouse as a feature which can integrate the changes of data while scheduled cycles refresh enterprises utilize an active data warehouse and drawing the company’s image in a very statistical manner. So everything is essentially a combination of all the data that is present in various data sources. Combine it all together and then perform some analytics on it to get insights for further business decisions.

Question - 69 : - What is a snapshot with reference to a data warehouse?

Answer - 69 : -

Snapshots are pretty common in software, especially in databases, so essentially, it is what the name suggests snapshot refers to the complete visualization of data at the time of extraction. It occupies less space and can be used to backup and restore data quickly, so essentially snapshot a data warehouse when anyone wants to create a backup of it. So using the data warehouse catalog, It’s creating a report, and the report will be generated as shown as soon as the session is disconnected from the data warehouse. 

Question - 70 : - What is XMLA?

Answer - 70 : -

XMLA is XML for analysis, and it is a SOAP-based XML protocol that can be used and considered as a standard for accessing data in the OLAP method, data mining, or data sources on the internet. It is the simple object access protocol XMLA that uses to discover and execute methods that fetch information from the internet while the execute allows the application to execute against the data sources that are present in XMLA. XMLA is a standard methodology for accessing data in analytical systems such as OLAP. It is based on XML soap and HTTP XMLA specifies MDXML as a query language in XMLA 1.1 version. The only construct is the MDXML in an MDX statement enclosed in the tag.


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners