Questions about the book contents
This page offers questions for each section of the book that the reader should be able to answer after going through the respective section. You can use these questions to make sure not to miss an important point conveyed in a section, or, in case you do not yet possess a copy of the book, the questions can help you to develop a feeling for the book's contents and its technical depth.
Click on the chapter names below to view questions concerning the selected section.
Desirability, Feasibility, Viability – The Impact of In-Memory
1.1 Information in Real Time – Anything, Anytime, Anywhere
- What makes enterprise reporting different from web search and what are similarities?
- Explain the notion speed of thought.
- Name reasons, why computer systems should strive for “zero” response time?
- What restrictions are placed on reporting due to materialization of data?
- Which potentials lie in on-the-fly computation?
1.2 The Impact of Recent Hardware Trends
- What are the ACID properties and why are they important for enterprise data management?
- Why has analytical processing been separated from transactional processing in the past?
- Discuss the application of Moore’s law to multi-core technology.
- Name and explain two different main memory access architectures.
- Software has been feasting on advancement of processor technology regarding clock speed. Elaborate if this is true for multi-core technology.
1.3 Reducing Cost through In-Memory Data Management
- What is the total cost of ownership?
- Name and explain different cost types that are accumulated in total cost of ownership.
- What are cost factors driving total cost of ownership in enterprise systems?
- Discuss why in-memory data management can help to reduce total cost of ownership.
Why Are Enterprise Applications So Diverse?
- What does it mean that enterprise applications are 'integrated' and what are the advantages of an 'integrated' enterprise application?
- What are functional areas in companies?
- List challenges that an enterprise application vendor faces while taking into consideration that the most successful vendors develop standard software
- Describe the process of demand planning.
- What is an "Available-to-Promise" check?
- What kind of different enterprise application architectures exist?
- On which layers can business logic be implemented? Which vendors favorite which concept?
- What are access patterns in enterprise applications? Are they more single instance oriented or do they rather conduct set processing.
SanssouciDB – Blueprint for an In-Memory Enterprise Database System
- What is the “One Size Fits All” approach in the context of database systems? Why is it no longer applicable?
- Why is SanssouciDB deployed on a small number of powerful high-end blades?
- What are the advantages of a column-oriented data layout?
- Explain the term “selectivity” in the context of database queries!
- How does SanssouciDB handle updates?
The Technical Foundations of SanssouciDB
4.1 Understanding Memory Hierarchies
- What is the memory hierarchy?
- Explain the different members of the main memory cache hierarchy?
- Explain an experiment that allows to detect the different members of the memory hierarchy?
- Why is it important to optimize the data access along the memory hierarchy?
- Explain the correlation between price and performance of the memory hierarchy?
- Why can't SRAM caches be indefinitely large?
- What are future trends in the memory hierarchy?
- Why is alignment important for main memory access?
- Why is prefetching important?
- Describe the impact of virtualization on main memory access.
4.2 Parallel Data Processing Using Multi-Core and Across Servers
- Explain the difference between Moore’s and Amdahl’s law and set them in relation to multi-core CPUs!
- What is the difference between Share Memory, Shared Disk and Shared Nothing?
- What is the difference between Intra- and Inter-Operator Parallelism?
- How can you parallelize an aggregation or a join operation?
4.3 Compression for Speed and Memory Consumption
- Which factors influence the data compression rate and how can a system decide which compression technique fits best?
- What is the preferred compression technique for status variables? Explain the advantages and disadvantages compared to other techniques.
- Discuss the differences between light and heavy-weight compression techniques and their impact on memory bandwith and CPU performance.
- Which compression techniques are best suited for columnar in-memory databases?
- Explain which additional resources are needed when using dictionary encoding.
4.4 Column, Row, Hybrid – Optimizing the Data Layout
- What is vertical partitioning?
- Explain how different access patterns of enterprise applications can be mapped to different physical layouts.
- Can a hybrid vertical partitioned layout achieve better results compared to traditional layouts?
- What can be measures to compare different layouts?
- Why is finding the optimal layout so complex?
- Given a workload, how are the queries analyzed to determine the optimal layout?
4.5 The Impact of Virtualization
- What incentives do data center operators have to employ virtualization?
- Is there a large overhead of scan-intensive processes in a virtual machine?
- In what way does CPU become a bottleneck when running multiple virtual machines?
- In what way do main memory accesses become a bottleneck when running multiple virtual machines?
- What impacts does the underlying hardware architecture have on sharing CPU and DRAM among multiple virtual machines?
Organizing and Accessing Data in SanssouciDB
5.1 SQL for Accessing In-Memory Data
- What is SQL and what role does it play for a DBMS?
- Is SQL a procedural, functional or declarative language? Explain your answer.
- Name and briefly describe the stages of query processing.
- What are stored procedures? How do they differ from SQL queries?
- What are the benefits of stored procedures?
- What is the main reason of using index structures?
- Name and briefly describe the major data structures used for organizing collections in databases?
- Hash-organized collections provide constant search time no matter what the size of the collection is. If the collections are so good, why not to use only them?
- What are B-Trees?
- How do they organize data?
- When a B-Tree is useful?
- Define a heap-organized collection? What are the benefits of heap-organized collection?
5.2 Increasing Performance with Data Aging
- Why is about 20% of the data in enterprise systems 'passive'?
- What is the difference for 'active' and 'passive' data?
- What is the difference between partitioning in initial database design and dynamic partitioning?
- Who defines which data can be aged?
- Many applications access the same data. What happens if the aging criteria of these application is not homogeneous?
- Describe the lifecycle of a lead.
5.3 Efficient Retrieval of Business Objects
- What is a business object?
- Why accessing a single instance of a business object may result in performance issues?
- How does object data guide resolve the performance issues of accessing a complete instance of a business object?
5.5 Append, Never Delete, to Keep the History Complete
- Explain the differences between interval and point representation.
- What is the impact of insert-only on concurrency control?
- Explain the concept of time travel queries.
- What are the advantages and disadvantages for using insert-only in a columnar in-memory database?
- Name and explain the different types of updates typically found in ERP systems.
- Discuss advantages of the insert-only approach from a business perspective. Which additional questions can be answered?
- Discuss the feasibility of the insert-only approach with regard to memory consumption.
5.6 Enabling Analytics on Transactional Data
- Why join operation is particularly important for analytics on transactional data?
- What are basic join algorithms?
- Regarding join and aggregation, what are the advantages (disadvantages) of the column (row) layout with respect to to data access and performance?
- What are the basic join and aggregation algorithms?
- How are they related and why?
- Why should a database system implement more than one of these alternatives?
- What are the alternatives for distributed aggregation computation over multiple blades?
- Which implementation should be preferred?
- How would a distributed aggregation algorithm for holistic aggegation functions look like?
- How can the introduced shared-memory aggregation algorithms be extended to support holistic aggregation functions?
- Can they be parallelized as well and, if so, how?
- Can the redistribution strategy (for distributed aggregation) be applied for distributed join processing?
- How would an algorithm look like?
- In which cases would this algorithm be better than the shown distributed join algorithm?
- How can be improved the basic join algorithms for better utilization of distributed processors?
- How do column-oriented table layout and data compression influence the performance of join algorithms and why?
5.7 Extending Data Layout without Downtime
- Why are modifications to database table neccessary for enterprise applications?
- What kind of enterprise systems do require those changes to be on-line (i.e. without downtime)?
- What steps are necessary to append a column to a table in a row-oriented database?
- What application-level techniques exist to make this process easier?
- What steps are necessary to append a column to a table in a column-oriented database?
5.8 Business Resilience through Advanced Logging Techniques
- How do in-memory databases react to power outages?
- What is the difference between a logical and a physical logging scheme?
- What is the difference between logging and snapshotting?
- Discuss the neccessity of redo and undo logs for the case of an in-memory database.
- What is the key advantage of differential logging and how does it work?
5.9 The Importance of Optimal Scheduling for Mixed Workloads
- What is scheduling?
- Why is scheduling important for a system that can operate a mixed workload?
- What are the advantages (disadvantages) of DMBS-level scheduling vs. OS-level scheduling?
- Is fairness an important goal for a DMBS-level scheduler?
- What makes finding a good schedule a hard problem?
- Discuss the different possibilities to define the machine environment, task granularity, and the task characteristics for a mixed workload scheduler.
- How can optimal scheduling be achieved?
- What are the major properties / objects that need to be observed to perform scheduling?
- What differentiates scheduling for mixed workloads from traditional approaches?
- Which system properties are important to model the scheduling of tasks?
- What kind of application properties might influence the scheduling decision?
Application Development
- What are the main layers that comprise modern enterprise applications?
- What are factors that influence in which layer application logic should be encoded?
- What are the benefits of using stored procedures for encoding business logic?
- Why are diverse teams a necessity for the creation of innovative new enterprise applications?
- Explain two core principles that should be adhered to when creating enterprise applications.
Finally, a Real Business Intelligence System Is at Hand
7.1 Analytics on Operational Data
- What are the goals of analytical processing?
- Explain the differences between operational and analytical processing as seen in the past.
- What are the components that can facilitate business intelligence and what are their fields of application?
- Name and explain three drawbacks of separating analytics from daily operations.
- What is ETL and why can it become a complex process?
- Characterize the dedicated database designs used in analytical systems.
- Explain the view layer concept.
7.2 How to Evaluate Databases after the Game Has Changed
- Explain the goal of benchmarking.
- Which categories of benchmarking do you know? Name example benchmarks for each category.
- What are the typical measuring units to compare database products?
- Why can two existing benchmarks for operational and analytical systems not simply be run concurrently to simulate a mixed workload?
Scaling SanssouciDB in the Cloud
- What is cloud computing?
- What types of cloud applications exist?
- What is multi-tenancy and why is it important for vendors of enterprise software?
- Discuss the pros and cons of building a cloud from low-end hardware vs. using recent server technologies.
- What implications does in-memory computing have on the energy-efficiency of a data center?
The In-Memory Revolution Has Begun
- What are the motivating factors behind a non-disruptive transition from traditional to in-memory data management?
- Describe how companies can step-by-step transform their separated operational and analytical enterprise systems into a combined in-memory OLAP/OLTP system.
- How does in-memory data management help to increase the feasibility, desirability, and viability of future enterprise applications?
- Give examples for how in-memory technology has already started to improve IT processes of real companies. Discuss the potential business impact.
- Sketch out further industry scenarios in which in-memory technology will create new business opportunities, improve end-user experiences, or change the way how businesses are operated and decisions are being made.