Labels

Monday, August 13, 2012

HANA Features

Features of HANA (Hybrid In-Memory Database):


No Disk

Modern servers provide up to multiple Terabytes of main memory and allow keeping the complete transactional data in memory. This eliminates multiple I/O layers and simplifies database design, allowing for high throughput of transactional and analytical queries.



In-Memory Computing


High performance in-memory computing will change how enterprises work. Currently, enterprise data is split into two databases for performance reasons. Disk-based row-oriented database systems are used for operational data (e.g. "insert or update a specific sales order") and column-oriented databases are used for analytics (e.g. “sum of all sales in China grouped by product”). While analytical databases are often kept in-memory, they can also be mixed with disk-based storage media.
It is possible to store data sets of entire Fortune 500 companies in main memory. At the same time, orders of magnitude faster performance than with disk-based systems can be achieved.


Active and Passive Data Store


By default, all data is stored in-memory to achieve high-speed data access. However, not all data is accessed or updated frequently and needs to reside in-memory, as this increases the required amount of main memory unnecessarily. This so-called historic or passive data can be stored in a specific passive data storage based on less expensive storage media, such as SSDs or hard disks, still providing sufficient performance for possible accesses at lower cost. The dynamic transition from active to passive data is supported by the database, based on custom rules defined as per customer needs.

We define two categories of data stores: active and passive: We refer to active data when it is accessed frequently and updates are expected (e.g., access rules). In contrast, we refer to passive data when this data either is not used frequently and neither updated nor read. Passive data is purely used for analytical and statistical purposes or in exceptional situations where specific investigations require this data. A possible storage hierarchy is given by memory registers, cache memory, main memory, flash storages, solid state disks, SAS hard disk drives, SATA hard disk drives, tapes, etc. As a result, rules for migrating data from one store to another need to be defined, we refer to it as aging strategy or aging rules. The process of aging data, i.e. migrating it from a faster store to a slower one, is considered as background tasks, which occurs on regularly basis, e.g. weekly or daily. Since this process involves reorganization of the entire data set, it should be processed during times with low data access, e.g. during nights or weekends.
Combined Row and Column Store

To support analytical and transactional workloads, two different types of database systems evolved. On the one hand, database systems for transactional workloads store and process every day’s business data in rows, i.e. attributes are stored side-by-side. On the other hand, analytical database systems aim to analyze selected attributes of huge data sets in a very short time.
  SQL Interface on Columns and Rows

Column and row-oriented storage in HANA provides the foundation to store data according to its frequent usage patterns in column or in row-oriented manner to achieve optimal performance. Through the usage of SQL, that supports column as well as row-oriented storage, the applications on top stay oblivious to the choice of storage layout.




Minimal Projections
To increase the overall performance, it is required to select only the minimal set of attributes that should be projected for each query. This has two important advantages: First, it dramatically reduces the amount of accessed data that is transferred between client and server. Second, it reduces the number of accesses to random memory locations and thus increases the overall performance


Any Attribute as an Index
Traditional row-oriented databases store tables as collections of tuples. To improve access to specific values within columns and to avoid scanning the entire table, that is, all columns and rows, indexes are typically created for these columns. The offsets of the matching values are used as an index to retrieve the values of the remaining attributes, avoiding the need to read data that is not required for the result set. Consequently, complex objects can be filtered and retrieved via any of their attributes.




Insert-only
Insert-only or append-only describes how data is managed when inserting new data. The principle idea of insert-only is that changes to existing data are handled by appending new tuples to the data storage. Insert-only enables storing the complete history of value changes and the latest value for a certain attribute.For the history-based access control, insert-only builds the basis to store the entire history of queries for access decision. In addition, insert-only enables tracing of access decision, which can be used to perform incident analysis
   




Group-Key 
A common access pattern of enterprise applications is to select a small group of records from larger relations. To speed up such queries, group-key indexes can be defined that build on the compressed dictionary. A group key index maps a dictionary-encoded value of a column to a list of positions where this value can be found in a relation. 



No Aggregate Tables
Given the incredible aggregation speed provided by HANA, all aggregates required by any application can now be computed from the source data on-the-fly, providing the same or better performance as before and dramatically decreasing code complexity which makes system maintenance a lot easier.




On-the-fly Extensibility

 The possibility of adding new columns to existing database tables dramatically simplifies a wide range of customization projects. In a column store database, such as HANA, all columns are stored in physical separation from one another. This allows for a simple implementation of column extensibility, which does not need to update any other existing columns of the table. This reduces a schema change to a pure metadata operation, allowing for flexible and real-time schema extensions.

Reduction of Layers
To avoid redundant data, logical layers, describing the transformations, are executed during runtime, thus increasing efficient use of hardware resources by removing all materialized data maintenance task. Moving formalizable application logic to the data it operates on results in a smaller application stack and increases maintainability by code reduction. (Use Stored Procedures & other Business Fuction Libraries)




Partitioning
We distinguish between two partitioning approaches: vertical and horizontal partitioning, whereas a combination of both approaches is also possible. Vertical partitioning refers to the rearranging of individual database columns. It is achieved by splitting columns of a database table into two or more column sets. Each of the column sets can be distributed on individual databases servers. This can also be used to build up database columns with different ordering to achieve better search performance while guaranteeing high-availability of data. In contrast, horizontal partitioning addresses large database tables and how to divide them into smaller pieces of data. As a result, each piece of the database table contains a subset of the complete data within the table. Splitting data into equivalent long horizontal partitions is used to support search operations and better scalability.

Lightweight Compression


For in-memory databases, compression is applied to reduce the amount of data that is transferred between main memory and CPU, as well as to reduce overall main memory consumption. However, the more complex the compression algorithm is, the more CPU cycles it will take to decompress the data to perform query execution. As a result, in-memory databases choose a trade-off between compression ration and performance using so called light-weight compression algorithms.
An example for a light-weight compression algorithm is dictionary compression. With dictionary compression, all value occurrences are replaced by a fixed length encoded value. This algorithm has two major advantages for in memory databases: First, it reduces the amount of required storage and second, it allows to perform predicate evaluation directly on the compressed data.

Bulk Load

Besides transactional inserts, HANA also supports a bulk load mode. This mode is designed to insert large sets of data without the transactional overhead and thus enables significant speed-ups when setting up systems or restoring previously collected data.



Multi-core and Parallelization


A CPU core can be considered as single worker on a construction area. If it is possible to map each query to a single core, the system’s response time is optimal. Query processing also involves data processing, i.e. the database needs to be queried in parallel, too. If the database is able to distribute the workload across multiple cores of a single system, this is optimal. If the workload exceeds physical capacities of a single system, multiple servers or blades need to be involved for work distribution to achieve optimal processing behavior. From the database perspective, the partitioning of data sets enables parallelization since multiple cores across servers can be involved for data processing.
MapReduce
MapReduce is a programming model to parallelize the processing of large amounts of data. HANA emulates the MapReduce programming model and allows the developer to define map functions as user-defined procedures. Support for the MapReduce programming model enables developers to implement specific analysis algorithms on HANA faster, without worrying about parallelization and efficient execution by HANA’s calculation engine.

Dynamic Multithreading within Nodes 
partitioning database tasks on large data sets into as many jobs as threads are available on a given node. This way, the maximal utilization of any supported hardware can be achieved.
 
Single and Multi-Tenancy 
To achieve the highest level of operational efficiency, the data of multiple customers can be consolidated onto a single HANA server. Multi-tenancy allows making HANA accessible for smaller customers at lower cost, as a benefit from the consolidation. Already today HANA is equipped with the technology to enable such consolidation while ensuring that no critical resources are contending between the customers sharing a server and while ensuring a reliant and highly-available storage of the customers data at the hosting site.

Analytics on Historical Data
For analytics the historical data is the key. In HANA, historical data is instantly available for analytical processing from solid state disk (SSD) drives. Only active data is required to reside in-memory permanently.

Text Retrieval and Exploration



Elements of search in unstructured data, such as linguistic or fuzzy search find their way into the domain of structured data, changing system interaction. Furthermore, for business environments added value lies in combining search in unstructured data with analytics of structured data.


Object Data Guides

The in-memory database improves the retrieving performance of a business object by adding
which is called object data guide and includes two aspects: In addition to the parent instance, every node instance can contain the link to the corresponding root instance. Using this additional attribute, it is possible to retrieve all nodes in parallel instead of waiting for the information from the parent level. Additionally, each node type in a business object can be numbered. Then, for every root instance, a bit vector (Object Data Guide) is stored, whose bit at position i indicates if an instance of node number i exists for this root instance. Using this bit vector, a table only needs to be checked if the corresponding bit is set, reducing the complexity of queries to a minimum. This is mainly for Sparse tree data.
 

Thursday, August 9, 2012

Features of the Column Storage Engine (HANA)



Both the row and column storage engine each have a built-in query capability optimized for the respective storage method. Consequently each of the storage engines has its strengths when used for differing tasks. The row engine supports all features provided by SAP HANA SQL. The column engine, however, has been optimized for analytics and supports a narrower set of SQL features. These features are much faster than the corresponding functions of the row store. Since not all features provided by SAP HANA SQL are natively supported by the column engine, the query optimizer sometimes creates an execution plan that involves both storage engines even if the data reside completely in the column store. On the other hand it is sometimes desirable to transfer intermediate results from the row store to the column store because subsequent calculation steps can be processed much faster there. As a general rule of thumb we recommend to use the "Explain Plan" feature of the SAP HANA studio to explore execution of critical SQL statements and optimize the data model and SQL statements accordingly.

The following list explains the query features of the column engine. The column engine can natively execute queries that are logically equivalent to queries that can be expressed in the following format.

SELECT
FROM
[ WHERE ]
[ GROUP BY ]
[ HAVING ]
[ ORDER BY
ASC | DESC separated by comma> ]
[ LIMIT [OFFSET ] ]

A column engine FROM specification can be one of the following:

·        
·         ( SELECT * FROM WHERE )
·         INNER JOIN ON
·         LEFT OUTER JOIN ON
·         RIGHT OUTER JOIN ON
·         FULL OUTER JOIN ON

The column engine supports following join conditions.

·         =
§  Data types of two columns should be the same. Decimals with different precisions or scales are also treated as different data types.
·         AND operations between column engine join conditions
§  For outer joins, cyclical joining is not supported. For example, T.a = S.a AND T.b = R.b is not supported if there is already a join condition between S and R.

The column engine supports the following filter conditions
·         Binary comparison (=, <>, <, >, <= and >=) between column engine expressions
·         LIKE, IS NULL and IN predicate on a column engine expression
·         AND, OR, NOT operations over column engine filter conditions

The column engine supports the following scalar expressions.
·         Constant values
·         Binary arithmetic operation (+, -, * and /) between column engine expressions
·        
·         String concatenation (||) between column engine expressions
·         CASE WHEN THEN
·         expression> ... [ELSE ] END
·         CASE WHEN THEN ... [ELSE ] END
·         Functions listed below with column engine expressions as arguments
§  TO_DECIMAL, TO_NUMBER, TO_BIGINT, TO_REAL, TO_DOUBLE, TO_CHAR,
§  TO_NCHAR, TO_DATE, TO_TIMESTAMP and BINTOHEX/HEXTOBIN
§  Format strings are not supported for TO_CHAR, TO_NCHAR, TO_DATE and
§  TO_TIMESTAMP
§  LTRIM | RTRIM | TRIM, LENGTH, SUBSTR, INSTR and LOWER | UPPER
§  WEEKDAY, DAYS_BETWEEN, SECONDS_BETWEEN, ADD_DAYS, UTCTOLOCAL,
§  LOCALTOUTC, ISOWEEK and QUARTER
§  LN | LOG, EXP | POWER | SQRT, SIN | COS | TAN, ASIN | ACOS | ATAN, SINH | COSH
§  and FLOOR | CEIL
§  NULLIF, NVL, NVL2 and COALESCE
§  EXTRACT( YEAR | MONTH FROM )
·         Functions without arguments are evaluated by the SQL parser. The evaluated results are passed to the column engine as constant values.
§  CURRENT_CONNECTION, CURRENT_SCHEMA, CURRENT_USER, SYSUUID, CURRENT_DATE/ CURRENT_TIME/ CURRENT_TIMESTAMP and CURRENT_UTCDATE/ CURRENT_UTCTIME/ CURRENT_UTCTIMESTAMP
§  Expressions equivalent to one of the listed above (e.g. CAST function)
·         The column engine supports the following aggregation expressions.
§  MIN | MAX | COUNT | SUM | AVG ([DISTINCT] )
§   
Since SQL provides multiple possible ways to formulate a query, an SQL query that is not in the above format can be processed natively by the column engine if the query is equivalent to a query that can be represented in the correct format. Below are examples of such equivalence. Equivalence is denoted as 􀑞.
SELECT * FROM T, S WHERE T.a = S.a
􀑞.SELECT * FROM T INNER JOIN S ON T.a = S.a
SELECT DISTINCT a, b, c FROM T
􀑞.SELECT a, b, c FROM T GROUP BY a, b, c
SELECT * FROM T WHERE EXISTS (SELECT * FROM S WHERE T.a = S.a)
􀑞.SELECT DISTINCT T.* FROM T INNER JOIN S ON T.a = S.a
Equivalent only if table T has a primary key.
SELECT * FROM T WHERE NOT EXISTS (SELECT * FROM S WHERE T.a =
S.a)
􀑞.SELECT T.* FROM T LEFT OUTER JOIN S ON T.a = S.a WHERE S.a
IS NULL

HANA Data Types Mapping to Java





UNIX supported locales for BOE


Locales supported for UNIX BOE Install:


Language Browser Locales Vesion Solari 10 AIX Locales Linux Locales HP IA Locales
(5.3 & 6.1) RedHat 5.2 & 5.3
Suse 10 & 11
11.23 & 11.31
Tier 1 English English (*) en_US.UTF-8 EN_US.UTF-8 en_US.utf8 en_US.utf8
Tier 1 French French (*) fr_FR.UTF-8 FR_FR.UTF-8 fr_FR.utf8 fr_FR.utf8
Tier 1 Japanese Japanese ja_JP.UTF-8 JA_JP.UTF-8 ja_JP.utf8 ja_JP.utf8
Tier 1 Simlified Chinese Chinese (China) [zh-cn]
Chinese (Singapore) [zh-sg]
Chinese [zh]
zh_CN.UTF-8@pinyin
zh_CN.UTF-8@radical
zh_CN.UTF-8@stroke
ZH_CN.UTF-8 zh_CN.utf8 zh_CN.utf8
Tier 1 German German (*) de_DE.UTF-8 DE_DE.UTF-8 de_DE.utf8 de_DE.utf8
Tier 1 Italian Italian (*) it_IT.UTF-8 IT_IT.UTF-8 it_IT.utf8 it_IT.utf8
Tier 1 Spanish Spanish (*) es_ES.UTF-8 ES_ES.UTF-8 es_ES.utf8 es_ES.utf8
Tier 1 Dutch Dutch (*) nl_NL.UTF-8 NL_NL.UTF-8 nl_NL.utf8 nl_NL.utf8
Tier 1 Russian Russian ru_RU.UTF-8 RU_RU.UTF-8 ru_RU.utf8 ru_RU.utf8
Tier 2 Korean Korean ko_KR.UTF-8@dict KO_KR.UTF-8 ko_KR.utf8 ko_KR.utf8
Tier 2 Traditional Chinese Chinese (Hong Kong SAR) [zh-hk]
Chinese (Macau SAR) [zh-mo]
Chinese (Taiwan) [zh-tw]
zh_TW.UTF-8@pinyin
zh_TW.UTF-8@radical
zh_TW.UTF-8@stroke
zh_TW.UTF-8@zhuyin
zh_HK.UTF-8@radical
zh_HK.UTF-8@stroke
ZH_TW.UTF-8
ZH_HK.UTF-8
zh_TW.utf8
zh_HK.utf8
zh_TW.utf8
zh_HK.utf8
Tier 2 Portuguese Portuguese (Brazil) [pt-br] pt_BR.UTF-8 PT_BR.UTF-8 pt_BR.utf8 pt_BR.utf8
Tier 2 Swedish Swedish [sv] sv_SE.UTF-8 SV_SE.UTF-8 sv_SE.utf8 sv_SE.utf8
Tier-3 Polish Polish pl_PL.UTF-8 PL_PL.UTF-8 pl_PL.utf8 pl_PL.utf8
Tier-3 Danish Danish da_DK.UTF-8 DA_DK.UTF-8 da_DK.utf8 da_DK.utf8
Tier-3 Thai Thai th_TH.UTF-8 TH_TH.UTF-8 th_TH.utf8 No Locale
Tier-3 Norwegian Bokmal nb_NO.UTF-8 NO_NO.UTF-8 no_NO.utf8 no_NO.utf8
Tier-3 Czechoslovakia Czech cs_CZ.UTF-8 CS_CZ.UTF-8 cs_CZ.utf8 cs_CZ.utf8
Tier-3 Finnish Finnish fi_FI.UTF-8 FI_FI.UTF-8 fi_FI.utf8 fi_FI.utf8
Tier-3 Turkish Turkish tr_TR.UTF-8 TR_TR.UTF-8 tr_TR.utf8 tr_TR.utf8
Tier-3 Hungarian Hungarian hu_HU.UTF-8 HU_HU.UTF-8 hu_HU.utf8 hu_HU.utf8
Tier-3 Slovakian Slovakian sk_SK.UTF-8 SK_SK.UTF-8 sk_SK.utf8 sk_SK.utf8 

Learn German - Quick Reference