Database Interview Question: June 2008

LINQ to SQL Statements Select with a Where Clause

public void SimpleQuery3()
{
DataClasses1DataContext dc = new DataClasses1DataContext();

var q =
from a in dc.GetTable<Order>()
where a.CustomerID == "VINET"
select a;

dataGridView1.DataSource = q;
}

simple LINQ to SQL statement

public void SimpleQuery()
{
DataClasses1DataContext dc = new DataClasses1DataContext();

var q =
from a in dc.GetTable<Order>()
select a;

dataGridView1.DataSource = q;
}

SQL Server Performance Tips and Guidelines

• As a common practice, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases, such as an identity column or some other column where the value is unique. In many cases, the primary key is the ideal column for a clustered index.

• Indexes should be measured on all columns that are frequently used in WHERE, ORDER BY, GROUP BY, TOP and DISTINCT clauses.

• Do not automatically add indexes on a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table.

• For historical (static) tables, create the indexes with a FILLFACTOR and a PAD_INDEX of 100 to ensure there is no wasted space. This reduces disk I/O, helping to boost overall performance.

• Queries that return a single row are just as fast using a non-clustered index as a clustered index.

• Queries that return a range of rows are just as fast using a clustered index as a non-clustered index.

• Do not add more indexes on your OLTP tables to minimize the overhead that occurs with indexes during data modifications.

• Do not add the same index more than once on a table with different names.

• Drop all those indexes that are not used by the Query Optimizer, generally. You probably won't want to add an index to a table under the following conditions:

If the index is not used by the query optimizer. Use the Query Analyzer's "Show Execution Plan" option to see if your queries against a particular table use an index or not.
If the table is small, most likely indexes will not be used.
If the column(s) to be indexed are very wide.
If the column(s) are defined as TEXT, NTEXT or IMAGE data types.
If the table is rarely queried but insertion, updating is frequent.

• To provide up-to-date statistics, the query optimizer needs to make smart query optimization decisions. You will generally want to leave the "Auto Update Statistics" database option on. This helps to ensure that the optimizer statistics are valid, ensuring that queries are properly optimized when they are run.

• Keep the "width" of your indexes as narrow as possible. This reduces the size of the index and reduces the number of disk I/O reads required to read the index.

• If possible, try to create indexes on columns that have integer values instead of characters. Integer values use less overhead than character values.

• If you have two or more tables that are frequently joined together, then the columns used for the joins should have an appropriate index. If the columns used for the joins are not naturally compact, then consider adding surrogate keys to the tables that are compact in order to reduce the size of the keys. This will decrease I/O during the join process, which increases overall performance.

• When creating indexes, try to make them unique indexes if at all possible. SQL Server can often search through a unique index faster than a non-unique index. This is because, in a unique index, each row is unique and once the needed record is found, SQL Server doesn't have to look any further.

• If a particular query against a table is run infrequently and the addition of an index greatly speeds the performance of the query, but the performance of INSERTS, UPDATES and DELETES is negatively affected by the addition of the index, consider creating the index for the table for the duration of when the query is run and then dropping the index. An example of this is when monthly reports are run at the end of the month on an OLTP application.

• Avoid using FLOAT or REAL data types as primary keys, as they add unnecessary overhead that can hurt performance.

• If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:

Of the search criteria in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
If at least one of the search criteria in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
If none of the columns in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.

• The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or do not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has an index.

• If you have a query that uses ORs and it is not making the best use of indexes, consider rewriting it as a UNION and then testing performance. Only through testing can you be sure that one version of your query will be faster than another.

• If you use the SOUNDEX function against a table column in a WHERE clause, the Query Optimizer will ignore any available indexes and perform a table scan.

• Queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes. Any of the following indexing strategies can be used:

Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or the GROUP BY clauses.
Include a clustered index on the columns in the GROUP BY clause.
Include a clustered index on the columns found in the SELECT clause.
Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often.

• Avoid clustered indexes on columns that are already "covered" by non-clustered indexes. A clustered index on a column that is already "covered" is redundant. Use the clustered index for columns that can better make use of it.

• Ideally a clustered index should be based on a single column (not multiple columns) that are as narrow as possible. This not only reduces the clustered index's physical size, it also reduces the physical size of non-clustered indexes and boosts SQL Server's overall performance.

• When you create a clustered index, try to create it as a unique clustered index, not a non-unique clustered index.

• SET NOCOUNT ON at the beginning of each stored procedure you write. This statement should be included in every stored procedure, trigger, etc. that you write.

• Keep Transact-SQL transactions as short as possible within a stored procedure. This helps to reduce the number of locks, helping to speed up the overall performance of your SQL Server application.

• If you are creating a stored procedure to run in a database other than the Master database, don't use the prefix sp_ in its name. This special prefix is reserved for system stored procedures. Although using this prefix will not prevent a user defined stored procedure from working, what it can do is to slow down its execution ever so slightly.

• Before you are done with your stored procedure code, review it for any unused code, parameters or variables that you may have forgotten to remove while you were making changes and remove them. Unused code just adds unnecessary bloat to your stored procedures, although it will not necessarily negatively affect performance of the stored procedure.

• For best performance, all objects that are called within the same stored procedure should be owned by the same object owner or schema, preferably dbo, and should also be referred to in the format of object_owner.object_name or schema_owner.object_name.

• When you need to execute a string of Transact-SQL, you should use the sp_executesql stored procedure instead of the EXECUTE statement.

• If you use input parameters in your stored procedures, you should validate all of them at the beginning of your stored procedure. This way, if there is a validation problem and the client application needs to be notified of the problem, it happens before any stored procedure processing takes place, preventing wasted effort and boosting performance.

• When calling a stored procedure from your application, it is important that you call it using its qualified name, for example:

exec dbo.myProc

...instead of:

exec myProc

• If you think a stored procedure will return only a single value and not a record set, consider returning the single value as an output parameter.

• Use stored procedures instead of views. They offer better performance.

• Don't include code, variable or parameters that don't do anything.

• Don't be afraid to make broad-minded use of in-line and block comments in your Transact-SQL code. They will not affect the performance of your application and they will enhance your productivity when you have to come back to the code and try to modify it.

• If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications.

• If you have the choice of using a join or a sub-query to perform the same task within a query, generally the join is faster. This is not always the case, however, and you may want to test the query using both methods to determine which is faster for your particular application.

• If your application requires you to create temporary tables for use on a global or per connection use, consider the possibility of creating indexes for these temporary tables. While most temporary tables probably won't need -- or even use -- an index, some larger temporary tables can benefit from them. A properly designed index on a temporary table can be as great a benefit as a properly designed index on a standard database table.

• Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application's performance.

• For better performance, if you need a temporary table in your Transact-SQL code, consider using a table variable instead of creating a conventional temporary table.

• Don't repeatedly reuse the same function to calculate the same result over and over within your Transact-SQL code.

• If you use BULK INSERT to import data into SQL Server, then use the TABLOCK hint along with it. This will prevent SQL Server from running out of locks during very large imports and will also boost performance due to the reduction of lock contention.

• Always specify the narrowest columns you can. The narrower the column, the less amount of data SQL Server has to store and the faster SQL Server is able to read and write data. In addition, if any sorts need to be performed on the column, the narrower the column, the faster the sort will be.

• If you need to store large strings of data and they are less than 8000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag down performance.

• Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.

• If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length columns can greatly reduce the I/O reads that the cache memory uses to hold data, improving overall SQL Server performance.

• If a column's data does not vary widely in length, consider using a fixed-length CHAR field instead of a VARCHAR. While it may take up a little more space to store the data, processing fixed-length columns is faster in SQL Server than processing variable-length columns.

• If you have a column that is designed to hold only numbers, use a numeric data type such as INTEGER instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value than does a character data type. This helps to reduce the size of the columns and can boost performance when the columns are searched (WHERE clause), joined to another column or sorted.

• If you use the CONVERT function to convert a value to a variable length data type such as VARCHAR, always specify the length of the variable data type. If you do not, SQL Server assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the required task. This helps to reduce memory use and SQL Server resources.

• Avoid using the new BIGINT data type unless you really need its additional storage capacity. The BIGINT data type uses 8 bytes of memory, versus 4 bytes for the INT data type.

• Don't use the DATETIME data type as a primary key. From a performance perspective, it is more efficient to use a data type that uses less space. For example, the DATETIME data type uses 8 bytes of space, while the INT data type only takes up 4 bytes. The less space used, the smaller the table and index, and the less I/O overhead that is required to access the primary key.

• If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data much faster than character data.

• Carefully evaluate whether your SELECT query needs the DISTINCT clause or not. Some developers automatically add this clause to every one of their SELECT statements, even when it is not necessary. This is a bad habit that should be stopped.

• When you need to use SELECT INTO option, keep in mind that it can lock system tables, preventing other users from accessing the data they need while the data is being inserted. In order to prevent or minimize the problems caused by locked tables, try to schedule the use of SELECT INTO when your SQL Server is less busy. In addition, try to keep the amount of data inserted to a minimum. In some cases, it may be better to perform several smaller SELECT INTOs instead of performing one large SELECT INTO.

• If you need to verify the existence of a record in a table, don't use SELECT COUNT (*) in your Transact-SQL code to identify it. This is very inefficient and wastes server resources. Instead, use the Transact-SQL IF EXISTS to determine if the record in question exists, which is much more efficient.

• By default, some developers -- especially those who have not worked with SQL Server before -- routinely include code similar to this in their WHERE clauses when they make string comparisons:

SELECT column_name FROM table_name
WHERE LOWER (column_name) = 'name'

In other words, these developers are making the assumption that the data in SQL Server is case-sensitive, which it generally is not. If your SQL Server database is not configured to be case sensitive, you don't need to use LOWER or UPPER to force the case of text to be equal for a comparison to be performed. Just leave these functions out of your code. This will speed up the performance of your query, as any use of text functions in a WHERE clause hurts performance.

However, what if your database has been configured to be case-sensitive? Should you then use the LOWER and UPPER functions to ensure that comparisons are properly compared? No. The above example is still poor coding. If you have to deal with ensuring case is consistent for proper comparisons, use the technique described below, along with appropriate indexes on the column in question:

SELECT column_name FROM table_name
WHERE column_name = 'NAME' or column_name = 'name'

This code will run much faster than the first example.

• If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server optimizer has to use a nested table scan to perform this activity, instead try to use one of the following options, all of which offer better performance:

Use EXISTS or NOT EXISTS
Use IN
Perform a LEFT OUTER JOIN and check for a NULL condition

• When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will generally want to use the EXISTS clause, as it is usually more efficient and performs faster.

• If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use an IN/OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index.

• If you use LIKE in your WHERE clause, try to use one or more leading characters in the clause, if possible. For example, use:

LIKE 'm%' instead of LIKE '%m'

• If your application needs to retrieve summary data often, but you don't want to have the overhead of calculating it on the fly every time it is needed, consider using a trigger that updates summary values after each transaction into a summary table.

• When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will generally want to use the BETWEEN clause, as it is much more efficient. For example...

SELECT task_id, task_name
FROM tasks
WHERE task_id in (1000, 1001, 1002, 1003, 1004)

...is much less efficient than this:

SELECT task_id, task_name
FROM tasks
WHERE task_id BETWEEN 1000 and 1004

• If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to use an index (assuming there is one). If the substring you are searching for does not include the first character of the column you are searching for, then a table scan is performed.

• If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead for better performance. Instead of doing this:

WHERE SUBSTRING(task_name,1,1) = 'b'

Try using this instead:

WHERE task_name LIKE 'b%'

• Avoid using optimizer hints in your WHERE clauses. This is because it is generally very hard to out-guess the Query Optimizer. Optimizer hints are special keywords that you include with your query to force how the Query Optimizer runs. If you decide to include a hint in a query, this forces the Query Optimizer to become static, preventing the Query Optimizer from dynamically adapting to the current environment for the given query. More often than not, this hurts -- not helps -- performance.

• If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written. This assumes that no parentheses have been used to change the order of execution. Because of this, you may want to consider one of the following when using AND:

Locate the least likely true AND expression first.
If both parts of an AND expression are equally likely of being false, put the least complex AND expression first.
You may want to consider using Query Analyzer or Management Studio to look at the execution plans of your queries to see which is best for your situation

• Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server.

• Whenever SQL Server has to perform a sorting operation, additional resources have to be used to perform this task. Sorting often occurs when any of the following Transact-SQL statements are executed:

ORDER BY
GROUP BY
SELECT DISTINCT
UNION

• If you have to sort by a particular column often, consider making that column a clustered index. This is because the data is already presorted for you and SQL Server is smart enough not to resort the data.

• If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order the list of values so that the most frequently found ones are placed at the start of the list and the less frequently found ones are placed at the end of the list. This can speed up performance because the IN option returns true as soon as any of the values in the list produce a match. The sooner the match is made, the faster the query completes.

• If your application performs many wildcard (LIKE %) text searches on CHAR or VARCHAR columns, consider using SQL Server's full-text search option. The Search Service can significantly speed up wildcard searches of text stored in a database.

• The GROUP BY clause can be used with or without an aggregate function. However, if you want optimum performance, don't use the GROUP BY clause without an aggregate function. This is because you can accomplish the same end result by using the DISTINCT option instead, and it is faster. For example, you could write your query two different ways:

SELECT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
GROUP BY OrderID

...or:

SELECT DISTINCT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20

• It is important to design applications that keep transactions as short as possible. This reduces locking and increases application concurrently, which helps to boost performance.

• In order to reduce network traffic between the client or middle-tier and SQL Server -- and also to boost your SQL Server-based application's performance -- only the data needed by the client or middle-tier should be returned by SQL Server. In other words, don't return more data (both rows and columns) from SQL Server than you need to the client or middle-tier and then further reduce the data to the data you really need at the client or middle-tier. This wastes SQL Server resources and network bandwidth.

• To make complex queries easier to analyze, consider breaking them down into their smaller constituent parts. One way to do this is to simply create lists of the key components of the query, such as:

List all of the columns that are to be returned
List all of the columns that are used in the WHERE clause
List all of the columns used in the JOINs (if applicable)
List all the tables used in JOINs (if applicable)
Once you have the above information organized into this easy-to-comprehend form, it is much easier to identify those columns that could potentially make use of indexes when executed.

What is SQL Server Monitoring?

Server monitoring is a challenge for SQL Server administrators. On average, SQL Server DBAs are responsible for twice as many servers as their peers using other platforms. The DBA also has a number of tasks to accomplish. How do DBAs get everything done in a day? They use the scripting languages just mentioned to automate their processes.

In an ideal world, databases would be self-monitoring. The database itself would kill a hanging process automatically. The creation of a new database would cause the automatic creation of backup jobs. The database would do what's needed to keep query performance above a certain accepted level. Unfortunately, none of the database providers are even close to providing this functionality. That's why DBAs are still needed.

There are two kinds of monitoring scenarios:

Reactive monitoring deals more with the resolution of existing issues or those that crop up.
Proactive monitoring is the process of looking at a current server state and making estimates and plans for changes to the underlying objects. The objective is to prevent issues, increase scalability, and maintain availability.

Command for Start SQLCMD

To get started with SQLCMD, go to the command prompt and type SQLCMD. The basic switch for working with SQLCMD is -S, which identifies the server.

To provide an instance, the command would be:

sqlcmd -S ComputerName\InstanceName.

The authentication type has three switches:

-E is the default and uses the local user.

-U lets you specify a user, such as SA.

-P is the password. Passwords are case-sensitive. If the -P option is not used, and the SQLCMDPASSWORD environment variable has not been set, SQLCMD prompts the user for a password. If the -P option is used at the end of the command prompt without a password, SQLCMD uses the default password (NULL).

SQL Server 2005 Remote Management Features

One of the real challenges facing administrators is how to manage more and more servers. Operations such as manufacturing and customer support are done around the globe, with local copies of data being synchronized during smaller and smaller windows. Administrators need to be able to remotely install, monitor, troubleshoot, and maintain remote servers. The SQL Server tools team has delivered three categories of interfaces to manage remote environments:

SQLCMD is for folks who are comfortable with command prompt applications. For customers migrating or currently using the command prompt languages, OSQL and ISQL are deprecated and have been replaced with SQLCMD. SQLCMD is a command-line executable. You invoke it at the command prompt by typing SQLCMD.
SQL Server Management Objects (SMO) is for those who build user interface–oriented applications. DBAs who want to build custom management tools will find that SMO has replaced DMO. SMO is a new API architecture that overcomes the limitations of DMO. SMO is scalable, reliable, and flexible. SMO is significantly more robust than DMO, because it is used by SQL Server Management Studio to connect and work with SQL Server instances. Every function found in SQL Server Management Studio is made possible by SMO.
Windows Management Instrumentation (WMI) allows the use of Windows scripting languages, such as VBScript, and it's more complicated than SMO or SQLCMD. WMI is powerful and provides deep hooks into the operating system, which is beyond the reach of SQLCMD and SMO. In an extremely complex infrastructure, the WMI provider may prove to be the most complete solution.

Remote functionality is also enhanced with new capabilities for scripting and working with Replication and Analysis Services (via respective .NET libraries), Replication Management Objects, and Analysis Management Objects.

The Top 10 Good Leadership Qualities

1.Vision
=======
A leader with vision has a clear, vivid picture of where to go, as well as a firm grasp on what success looks like and how to achieve it. But it's not enough to have a vision; leaders must also share it and act upon it. Jack Welch, former chairman and CEO of General Electric Co., said, "Good business leaders create a vision, articulate the vision, passionately own the vision and relentlessly drive it to completion."

A leader must be able to communicate his or her vision in terms that cause followers to buy into it. He or she must communicate clearly and passionately, as passion is contagious.

A good leader must have the discipline to work toward his or her vision single-mindedly, as well as to direct his or her actions and those of the team toward the goal. Action is the mark of a leader. A leader does not suffer "analysis paralysis" but is always doing something in pursuit of the vision, inspiring others to do the same.

2.Integrity
=========
Integrity is the integration of outward actions and inner values. A person of integrity is the same on the outside and on the inside. Such an individual can be trusted because he or she never veers from inner values, even when it might be expeditious to do so. A leader must have the trust of followers and therefore must display integrity.

Honest dealings, predictable reactions, well-controlled emotions, and an absence of tantrums and harsh outbursts are all signs of integrity. A leader who is centered in integrity will be more approachable by followers.

3. Dedication
============
Dedication means spending whatever time or energy is necessary to accomplish the task at hand. A leader inspires dedication by example, doing whatever it takes to complete the next step toward the vision. By setting an excellent example, leaders can show followers that there are no nine-to-five jobs on the team, only opportunities to achieve something great.

4.Magnanimity
============
Magnanimity means giving credit where it is due. A magnanimous leader ensures that credit for successes is spread as widely as possible throughout the company. Conversely, a good leader takes personal responsibility for failures. This sort of reverse magnanimity helps other people feel good about themselves and draws the team closer together. To spread the fame and take the blame is a hallmark of effective leadership.

5.humility
==========
Leaders with humility recognize that they are no better or worse than other members of the team. A humble leader is not self-effacing but rather tries to elevate everyone. Leaders with humility also understand that their status does not make them a god. Mahatma Gandhi is a role model for Indian leaders, and he pursued a "follower-centric" leadership role.

6.Openness
==========
Openness means being able to listen to new ideas, even if they do not conform to the usual way of thinking. Good leaders are able to suspend judgment while listening to others' ideas, as well as accept new ways of doing things that someone else thought of. Openness builds mutual respect and trust between leaders and followers, and it also keeps the team well supplied with new ideas that can further its vision.

7.Creativity
===========
Creativity is the ability to think differently, to get outside of the box that constrains solutions. Creativity gives leaders the ability to see things that others have not seen and thus lead followers in new directions. The most important question that a leader can ask is, "What if … ?" Possibly the worst thing a leader can say is, "I know this is a dumb question ... "

8.Fairness
==========
Fairness means dealing with others consistently and justly. A leader must check all the facts and hear everyone out before passing judgment. He or she must avoid leaping to conclusions based on incomplete evidence. When people feel they that are being treated fairly, they reward a leader with loyalty and dedication.

9.Assertiveness
==============
Assertiveness is not the same as aggressiveness. Rather, it is the ability to clearly state what one expects so that there will be no misunderstandings. A leader must be assertive to get the desired results. Along with assertiveness comes the responsibility to clearly understand what followers expect from their leader.

Many leaders have difficulty striking the right amount of assertiveness, according to a study in the February 2007 issue of the Journal of Personality and Social Psychology, published by the APA (American Psychological Association). It seems that being underassertive or overassertive may be the most common weakness among aspiring leaders.

10.Sense of humor
===============
A sense of humor is vital to relieve tension and boredom, as well as to defuse hostility. Effective leaders know how to use humor to energize followers. Humor is a form of power that provides some control over the work environment. And simply put, humor fosters good camaraderie.

Intrinsic traits such as intelligence, good looks, height and so on are not necessary to become a leader. Anyone can cultivate the proper leadership traits.

What is SOAP?

The basic Web services platform is XML plus HTTP.

SOAP stands for Simple Object Access Protocol
SOAP is a communication protocol
SOAP is for communication between applications
SOAP is a format for sending messages
SOAP is designed to communicate via Internet
SOAP is platform independent
SOAP is language independent
SOAP is based on XML
SOAP is simple and extensible
SOAP allows you to get around firewalls
SOAP will be developed as a W3C standard

Why Web Services?

Interoperability has highest priority.

When all major platforms could access the Web using Web browsers, different platforms could interact. For these platforms to work together, Web applications were developed.

Web applications are simple applications run on the web. These are built around the Web browser standards and can mostly be used by any browser on any platform.

Web services take Web applications to the next level.

Using Web services your application can publish its function or message to the rest of the world.

Web services uses XML to code and decode your data and SOAP to transport it using open protocols.

With Web services your accounting departments Win 2k servers billing system can connect with your IT suppliers UNIX server.

Web services have two types of uses.

Reusable application components

There are things different applications need very often. So why make these over and over again?

Web services can offer application components like currency conversion, weather reports or even language translation as services.

Ideally, there will only be one type of each application component, and anyone can use it in their application.

Connect existing software

Web services help solve the interoperability problem by giving different applications a way to link their data.

Using Web services you can exchange data between different applications and different platforms.

What is a Web service?

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.

Database naming convention

This article references Microsoft SQL Server databases in some examples, but can be used generically with other RDBMSs like Oracle, Sybase etc. too. So, here's preferred naming convention for:

Tables
Views
Stored procedures
User defined functions
Triggers
Indexes
Columns
User defined data types
Primary keys
Foreign keys
Default and Check constraints
Variables

Tables:

Tables represent the instances of an entity. For example, you store all your customer information in a table. Here, 'customer' is an entity and all the rows in the customers table represent the instances of the entity 'customer'. So, why not name your table using the entity it represents, 'Customer'. Since the table is storing 'multiple instances' of customers, make your table name a plural word.

So, name your customer table as 'Customers'.
Name your order storage table as 'Orders'.
Name your error messages table as 'ErrorMessages'.

This is a more natural way of naming tables, when compared to approaches which name tables as tblCustomers, tbl_Orders. Further, when you look at your queries it's very obvious that a particular name refers to a table, as table names are always preceded by FROM clause of the SELECT statement.

If your database deals with different logical functions and you want to group your tables according to the logical group they belong to, it won't hurt prefixing your table name with a two or three character prefix that can identify the group.

For example, your database has tables which store information about Sales and Human resource departments, you could name all your tables related to Sales department as shown below:

SL_NewLeads
SL_Territories
SL_TerritoriesManagers

You could name all your tables related to Human resources department as shown below:

HR_Candidates
HR_PremierInstitutes
HR_InterviewSchedules

This kind of naming convention makes sure, all the related tables are grouped together when you list all your tables in alphabetical order. However, if your database deals with only one logical group of tables, you need not use this naming convention.

Note that, sometimes you end up vertically partitioning tables into two or more tables, though these partitions effectively represent the same entity. In this case, append a word that best identifies the partition, to the entity name.

Views:

A view is nothing but a table, for any application that is accessing it. So, the same naming convention defined above for tables, applies to views as well, but not always. Here are some exceptions:

1) Views not always represent a single entity. A view can be a combination of two tables based on a join condition, thus, effectively representing two entities. In this case, consider combining the names of both the base tables. Here's an example:

If there is a view combining two tables 'Customers' and 'Addresses', name the view as 'CustomersAddresses'. Same naming convention can be used with junction tables that are used to link two many-to-many related base tables. Most popular example is the 'TitleAuthor' table from 'Pubs' database of SQL Server.

2) Views can summarize data from existing base tables in the form of reports. You can see this type of views in the 'Northwind' database that ships with SQL Server 7.0 and above. Here's the convention that database follows. (I prefer this):

'Product Sales for 1997'
'Summary of Sales by Quarter'
'Summary of Sales by Year'

However, try to stay away from spaces within object names.

Stored procedures:

Stored procedures always do some work for you, they are action oriented. So, let their name describe the work they do. So, use a verb to describe the work.

This is how I would name a stored procedure that fetches me the customer details given the customer identification number:
'GetCustomerDetails'. Similarly, you could name a procedure that inserts a new customer information as 'InsertCustomerInfo'. Here are some more names based on the same convention: 'WriteAuditRecord', 'ArchiveTransactions', 'AuthorizeUser' etc.

As explained above in the case of tables, you could use a prefix, to group stored procedures also, depending upon the logical group they belong to. For example, all stored procedures that deal with 'Order processing' could be prefixed with ORD_ as shown below:

ORD_InsertOrder
ORD_InsertOrderDetails
ORD_ValidateOrder

If you are using Microsoft SQL Server, never prefix your stored procedures with 'sp_', unless you are storing the procedure in the master database. If you call a stored procedure prefixed with sp_, SQL Server always looks for this procedure in the master database. Only after checking in the master database (if not found) it searches the current database.

I do not agree with the approach of prefixing stored procedures with prefixes like 'sproc_' just to make it obvious that the object is a stored procedure. Any database developer/DBA can identify stored procedures as the procedures are always preceded by EXEC or EXECUTE keyword.

User defined functions:

In Microsoft SQL Server 2000, User Defined Functions (UDFs) are almost similar to stored procedures, except for the fact that UDFs can be used in SELECT statements. Otherwise, both stored procedures and UDFs are similar. So, the naming conventions discussed above for stored procedures, apply to UDFs as well. You could even use a prefix to logically group your UDFs. For example, you could name all your string manipulation UDFs as shown below:

str_MakeProperCase
str_ParseString

Triggers:

Though triggers are a special kind of stored procedures, it won't make sense to follow the same naming convention as we do for stored procedures.

While naming triggers we have to extend the stored procedure naming convention in two ways:

Triggers always depend on a base table and can't exist on their own. So, it's better to link the base table's name with the trigger name
Triggers are associated with one or more of the following operations: Insert, Update, Delete. So, the name of the trigger should reflect it's nature

So, here's how I would name the insert, update and delete trigger on titles table:

titles_instrg
titles_updtrg
titles_deltrg

Microsoft SQL Server 7.0 started allowing more than one trigger per action per table. So, you could have 2 insert triggers, 3 update triggers and 4 delete triggers, if you want to! In SQL Server 7.0 you can't control the order of firing of these triggers, however you have some control over the order of firing in SQL Server 2000. Coming back to the point, if you have 2 insert triggers on titles table, use the following naming convention to distinguish the triggers:

titles_ValidateData_instrg
titles_MakeAuditEntries_instrg

Same naming convention could be used with update and delete triggers.

If you have a single trigger for more than one action (same trigger for insert and update or update and delete or any such combination), use the words 'ins', 'upd', 'del' together in the name of the trigger. Here's an example. If you have a single trigger for both insert and update on titles table, name the trigger as titles_InsUpdtrg

Indexes:

Just like triggers, indexes also can't exist on their own and they are dependent on the underlying base tables. So, again it makes sense to include the 'name of the table' and 'column on which it's built' in the index name. Further, indexes can be of two types, clustered and nonclustered. These two types of indexes could be either unique or non-unique. So, the naming convention should take care of the index types too.

My index naming convention is:
Table name + Column name(s) + Unique/Non-uniqueness + Clustered/Non-clustered

For example, I would name the unique, clustered index on the TitleID column of Titles table as shown below:

Titles_TitleID_U_Cidx

I would name the unique, nonclustered index on the PubID column of Publishers table as shown below:

Publishers_PubID_U_Nidx

Here's how I would name a non-unique, non-clustered index on the OrdeID column of OrderDetails table:

OrderDetails_OrderID_NU_Nidx

Indexes can be composite too, meaning, an index can be built on more than one column. In this case, just concatenate the column names together, just the way we did with junction tables and views above. So, here's how I would name a composite, unique, clustered index on OrderID and OrderDetailID columns of OrderDetails table:

OrderDetails_OrderIDOrderDetailID_U_Cidx

Sure, these index names look long and ugly, but who is complaining? You'll never need to reference these index names in code, unless you are creating/dropping/rebuilding the indexes. So, it's not a pain, but it's a very useful naming convention.

Columns:

Columns are attributes of an entity, that is, columns describe the properties of an entity. So, let the column names be meaningful and natural.

Here's a simplest way of naming the columns of the Customers table:

CustomerID
CustomerFirstName
CustomerAddress

As shown above, it'll be a good idea to prefix the column names with the entity that they are representing.

Here's another idea. Decide on a standard two to four character code for each table in your database and make sure it's unique in the database. For example 'Cust' for Customers table, 'Ord' for Orders tables, 'OrdD' for OrderDetails table, 'Adt' for Audit tables etc. Use this table code to prefix all the column names in that table. Advantage of this convention is that in multi-table queries involving complex joins, you don't have to worry about ambiguous column names, and don't have to use table aliases to prefix the columns. It also makes your queries more readable.

If you have to name the columns in a junction/mapping table, concatenate the table codes of mapped tables, or come up with a new code for that combination of tables.

So, here's how the CustomerID column would appear in Customers table:

Cust_CustomerID

The same CustomerID column appears in the Orders table too, but in Orders table, here's how it's named:

Ord_CustomerID

Some naming conventions even go to the extent of prefixing the column name with it's data type. But I don't like this approach, as I feel, the DBA or the developer dealing with these columns should be familiar with the data types these columns belong to.

User defined data types:

User defined data types are just a wrapper around the base types provided by the database management system. They are used to maintain consistency of data types across different tables for the same attribute. For example, if the CustomerID column appears half a dozen tables, you must use the same data type for all the occurrences of the CustomerID column. This is where user defined data types come in handy. Just create a user defined data type for CustomerID and use it as the data type for all the occurrences of CustomerID column.

So, the simplest way of naming these user defined data types would be: Column_Name + '_type'. So, I would name the CustoerID type as:

CustomerID_type

Primary keys:

Primary key is the column(s) that can uniquely identify each row in a table. So, just use the column name prefixed with 'pk_' + 'Table name' for naming primary keys.

Here's how I would name the primary key on the CustomerID column of Customers table:

pk_Customers_CustomerID

Consider concatenating the column names in case of composite primary keys.

Foreign keys:

Foreign key are used to represent the relationships between tables which are related. So, a foreign key can be considered as a link between the 'column of a referencing table' and the 'primary key column of the referenced table'.

I prefer the following naming convention for foreign keys:

fk_referencing table + referencing column_referenced table + referenced column.

Based on the above convention, I would name the foreign key which references the CustomerID column of the Customers table from the Order's tables CustomerID column as:

fk_OrdersCustomerID_CustomersCustomerID

Foreign key can be composite too, in that case, consider concatenating the column names of referencing and referenced tables while naming the foreign key. This might make the name of the foreign key lengthy, but you shouldn't be worried about it, as you will never reference this name from your code, except while creating/dropping these constraints.

Default and Check constraints:

Use the column name to which these defaults/check constraints are bound to and prefix it with 'def' and 'chk' prefixes respectively for Default and Check constraints.

I would name the default constraint for OrderDate Column as def_OrderDate and the check constraint for OrderDate column as chk_OrderDate.

Variables:

For variables that store the contents of columns, you could use the same naming convention that we use

Implementing the Table Interface

Table "Interfaces"

Continuing to use OOP terminology, I like to think of this situation as modeling "Table Interfaces". Unlike Table Inheritance, we don't want to create a base entity to be inherited by multiple entities, we just want some entities to "implement" the same interface. So, in our example, both Employees and Offices would implement the "PhoneNumbers" interface. This concept allows us to pick and choose which interfaces an entity has, whereas using inheritance means that all sub-tables always have the same columns/relations. We can add add/remove interfaces from different entities without worrying about affecting other entities that have the same base.

This is not as straight-forward in SQL, and there really is no direct, easy, standard way to handle this situation, unlike with Table Inheritance. Here's just one possible idea for you to consider if you need to implement this concept.

First, let's start with two tables, Employees and Offices:

create table Employees
(
	EmpID int primary key,
	EmpName varchar(100)
)

create table Offices
(
	OfficeID int primary key,
	OfficeName varchar(100)
)

We wish for our data model to allow both an Employee and an Office to have multiple phone numbers. Of course, to make our data meaningful and not contain just a random list of phone numbers, we will create a table of Phone Number Types:

create table PhoneNumberTypes
(
	PhoneTypeCode varchar(10) primary key,
	Description varchar(100),
	Sort int not null default 0
)

So far, so good. Now, here's where we have some options and we really need to think a little. Let's start by examining a common attempt to create a re-usable and generic "PhoneNumbers" table that I feel doesn't work so well.

A Common Approach

Many database architects will start by creating a table of phone numbers, and then they'd create two more tables: one to relate PhoneNumbers to Offices, and another to relate PhoneNumbers to Employees:

create table PhoneNumbers
(
	PhoneID int identity primary key,
	AreaCode char(3),
	Exchange char(3),
	Number char(4),
	Extension varchar(10)
)

create table EmployeePhoneNumbers
 (
	EmpID int references Employees(EmpID) on delete cascade,
	PhoneID int refernces PhoneNumbers(PhoneID),
	PhoneType varchar(10) references PhoneNumberTypes(PhoneTypeCode),
	primary key (EmpID, PhoneType)
)
 
create table OfficePhoneNumbers
(
	OfficeID int references Offices(OfficeID) on delete cascade,
	PhoneID int references PhoneNumbers(PhoneID),
	PhoneTypeCode varchar(10) references PhoneNumberTypes(PhoneTypeCode)
 	primary key (OfficeID, PhoneType)
)

However, there are several problems with this situation.

Our EmployeePhoneNumbers and OfficePhoneNumbers tables are redundant, which is the problem we are trying to avoid. We may be able to use one set of stored procedures for update and delete operations, but we still need separate stored procedures for inserts and selects.
Creating a table just for PhoneNumbers and relating PhoneNumbers to Offices and/or Employees establishes phone numbers as an entity, not an attribute. Unless we are keeping track of specific phone numbers over time, and who is assigned which phone number, and who "shares" phone numbers, a phone number is not an entity, it is just an attribute. This is a bad design because introducing a "phoneID" is unnecessary and confusing. Consider changing a phone number for an employee: if PhoneNumbers are attributes, we just change the area code or extension or whatever we need to do by directly updating a row in a table. However, if it is an entity, then we should create a new phone number entity, and then assign the employee the new "phoneID" created instead of the old one, and then delete the old phone number (perhaps? Or do we keep it around?) That really should not be necessary -- we should just be able to update Bob's extension directly, right?
This design will not cascade deletes to the PhoneNumber table when an Office or Employee phone number is deleted; that must be done in a separate step. Thus, it is very easy to have orphan phone numbers that have no one assigned to them.
No constraints are in place to disallow an Office and an Employee from sharing the same PhoneID.

So, I would recommend avoiding this design as it complicates the situation without much benefit. If you're going to do this, you might has well just store the Phone Number data itself in the EmployeePhoneNumbers and OfficePhoneNumbers tables and get rid of the PhoneNumbers table completely.

A More Generic, Flexible Alternative

Instead, consider this approach:

create table PhoneNumbers

(
	EmpID int references Employees(EmpID) on delete cascade,
	OfficeID int references Offices(OfficeID) on delete cascade,

	PhoneTypeCode varchar(10) references PhoneNumberTypes(PhoneTypeCode) not null
 	
	AreaCode char(3),
	Exchange char(3),
	Number char(4),
	Extension varchar(10),
	
	check (case when EmpId is null then 0 else 1 end + 
	        case when OfficeID is null then 0 else 1 end = 1),
	unique constraint PhoneNumbers_UC (EmpID, OfficeID, PhoneTypeCode)
 )

Let's a take minute to dissect that. We have created a table of Phone Numbers, with columns that allow us to relate the phone number to an Employee and/or an Office, but neither is required. We also have a required PhoneTypeCode column, and we have the standard phone number attributes in there as well. We could easily add a notes column, or description, or last updated/modified columns as well.

Notice the check constraint: that expression is simply ensuring that only EmpID or OfficeID is not null, but not both. There is no "exclusive or" boolean operator in T-SQL, so this is one easy way to handle it using CASE expressions that return 0 if a column is null or 1 if it has a non-null value. If we wrote:

	check (EmpID is not null OR OfficeID is not null)

Then the table would allow an entry in both the EmpID and OfficeID columns, which we do not want. Each row in this table will relate to only an office or an employee. Plus, we can easily add more columns to this check constraint as necessary (more on that later).

Next, instead of a standard primary key, we have a unique constraint on the columns EmpID/OfficeID/PhoneTypeCode. This ensures that only one row per EmpID/OfficeID/PhoneTypeCode is allowed. A primary key constraint cannot allow NULL values in any of the primary key columns, so we cannot use that type of constraint. If you recall, either EmpID or OfficeID is always null, but never both. Since PhoneTypeCode is never null, this constraint effectly ensures that there are never two entries for a single EmpID/PhoneTypeCode or an OfficeID/PhoneTypeCode.

With that, we are done. We can now write a single stored procedure that allows us add a PhoneNumber to either an Office or an Employee like this:

create procedure PhoneNumberAdd
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10),
	@AreaCode char(3),
	@Exchange char(3),
	@Number char(4),
 	@Extension varchar(10)
as
	insert into PhoneNumbers (OfficeID, EmpID, PhoneTypeCode, AreaCode, Exchange, Number, Extension)
	values (@OfficeID, @EmpID, @PhoneTypeCode, @AreaCode, @Exchange, @Number, @Extension)

This procedure effectively requires that you pass either an EmpID or an OfficeID, since if both are null, or neither is not null, the INSERT will fail.

Writing a procedure that returns the phone number for an entity that "implements" the PhoneNumber interface is a slightly more complicated, but still quite simple. We must write our criteria expression to check for the possibility that either an @EmpID or an @OfficeID is passed in, since both conditions are possible. Thus, that results in something like this:

create procedure PhoneNumberSelect 
	@OfficeID int = null,
	@EmpID int = null
as
	select 
	    p.OfficeID, p.EmpID, p.PhoneTypeCode, 
	    pt.Description as PhoneTypeDescription, 
 	    p.AreaCode, p.Exchange, p.Number, p.Extension
	from	
	    PhoneNumbers p
	inner join	
	    PhoneTypes pt on p.PhoneTypeCode = pt.PhoneTypeCode
	where	
	    EmpID = @EmpID or OfficeID = @OfficeID
 	order by
	    pt.Sort

Again, effectively this requires that you pass a value for either the @OfficeID or the @EmpID parameter, but not both, and the matching data will be returned. We don't have to worry about issues where NULL != NULL because we never want to return a row where both a parameter value and a column value is NULL -- we only want to return rows where both are not null, and therefore the comparison will return TRUE. We could of course also put a @PhoneTypeID parameter in there as well if you only wanted to return a single, specific phone number of the specified type.

Update and Delete will work in much the same way:

create procedure PhoneNumberUpdate 
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10),
	@AreaCode char(3),
	@Exchange char(3),
	@Number char(4),
 	@Extension varchar(10),
as
	update 
	    PhoneNumbers
	set 
	    AreaCode=@areacode, 
	    Exchange=@exchange, 
	    Number=@number, 
	    Extension=@extension
	where	
	    (PhoneTypeCode = @PhoneTypeCode) and 
 	    (EmpID = @EmpID or OfficeID = @OfficeID)

create procedure PhoneNumberDelete
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10)
as
	delete from PhoneNumbers
	where	
	    (PhoneTypeCode = @PhoneTypeCode) and 
	    (EmpID = @EmpID or OfficeID = @OfficeID)

Calling our code is very easy. From within T-SQL or from a client, we just set the parameter that corresponds to the entity we are referencing:

    exec PhoneNumberDelete @OfficeID=2, @PhoneTypeCode='HOME'
   
    exec PhoneNumberDelete @EmpID=45, @PhoneTypeCode='WORK'
   
    ... etc ...

Implementing the Interface

With all this in place, we now can use one set of stored procedures to Add/Select/Update/Delete phone numbers. The only variable is the parameter that we pass in to the stored procedure, which indicates the entity that we are referencing.

That's all fine and dandy, but the question is: how would we set up a new entity to implement the "Phone Number" interface? For example, suppose that we now decide that we need to track "Contacts", and Contacts will have multiple phone numbers. (Please note that we could argue that Employees and Contacts should simply inherit from the same base class, and thus an interface is not necessary, but let's ignore that for the sake of this example.)

Per usual, we would start by creating a table of Contacts:

create table Contacts
(
	ContactID int primary key,
	ContactName varchar(100)
)

Now, it is time to "implement" the interface. Where do we begin? Well, it is actually done rather backwards. Normally, you alter the class you have created so that it implements a particular interface, but in this case, we need to alter the interface so that it works with the class. So, we actually don't do anything to the Contacts table -- we alter the PhoneNumbers table and the corresponding PhoneNumber stored procedures. This is a bit of a drawback of this approach, and a big reason why implementing an interface in a relational database is not as clean as implementing inheritance.

So, the steps we must take are:

Add a contactID column to the PhoneNumbers table, which is an foreign key reference back to the Contact table's ContactID column.
Alter the unique constraint on the PhoneNumbers table to include the ContactID column.
Alter the check constraint on the PhoneNumbers table to include the ContactID column.

Having done that, we now need to edit each of the stored procedures to include contactID as an optional parameter, and to include contactID throughout the SELECT and WHERE clauses.

It sounds like a lot of work, but it's really not too bad and can be done quite quickly. The "template" we've created for this interface is very easy to maintain. Despite the maintenance required, the overall process is pretty nice because:

This doesn't break any existing code -- everything already written to use the existing stored procedures will work just fine without any modification.
Once this is done, there is nothing else to do -- Contacts now completely implement the "Phone Numbers" interface, and you do not need to create any additional stored procedures to allow for your application to maintain multiple phone numbers for Contacts. It "just works"!

The "Phone Entity" Approach

Another option is to create one table with foreign key references back to the tables that "implement" your interface, and with an identity primary key on that table, and then put all of the actual data in another table. In other words, you might have one table that just defines "PhoneEntities", with links back to the parents, and then we put the actual address data in a table with a primary key of (PhoneEntityId/PhoneTypeCode).

For example, you might have a PhoneEntities table like this:

create table PhoneEntities
(
    PhoneEntityID int identity primary key,
    EmpID int references Employees(EmpID) on delete cascade,
    OfficeID int references Offices(OfficeID) on delete cascade,
 
    check (case when EmpId is null then 1 else 0 end + 
	    case when OfficeID is null then 0 else 1 end = 0),
	    
    unique constraint PhoneEntities_UC (EmpID, OfficeID)
)

Notice that this table just stores a reference back to either an Office or an Employee, and our unique constraint ensures that there is only one value in this table for either. It is a bit simpler because we aren't worry about PhoneTypes here, or actual phone numbers. That data is stored in a more traditional PhoneNumbers table, with a foreign key reference to the PhoneNumberEntity table:

create table PhoneNumbers
(
    PhoneEntityID int references PhoneEntities(PhoneEntityID) on delete cascade,
    PhoneTypeCode varchar(10) references PhoneTypes(PhoneTypeCode),
     AreaCode char(3),
    Exchange char(3),
    Number char(4),
    Extension varchar(10),
    primary key (PhoneEntityID, PhoneTypeCode)
)

That allows the PhoneNumbers table to be a more "normal" looking, and a bit simpler as well. This does complicate the stored procedures a little, since first you need to determine if a particular entity already has an entry in the PhoneEntities table, and if not then you need to add one in.

For example, with that schema, here's a PhoneNumberAdd stored procedure:

create procedure PhoneNumberAdd
	@OfficeID int = null,
	@EmpID int = null,
	@PhoneTypeCode varchar(10),
	@AreaCode char(3),
	@Exchange char(3),
	@Number char(4),
	@Extension varchar(10)
as
     declare @EntityID int
    set @EntityID = (select PhoneEntityID from PhoneEntities where Office=@OfficeID or EmpID = @EmpID)
	
    if (@EntityID is null)
    begin
        insert into PhoneEntities (EmpID, OfficeID)
         values (@EmpID, @OfficeID)
        set @EntityID = scope_identity()
    end

    insert into PhoneNumbers (EntityID, PhoneTypeCode, AreaCode, Exchange, Number, Extension)
    values (@EntityID, @PhoneTypeCode, @AreaCode, @Exchange, @Number, @Extension)

This approach may or may not work better, depending on your specific situation. As you can see, the logic in the stored procedures can potentially get a little complicated since now we have two tables to deal with. But, this does makes the PhoneNumbers table itself much simpler, so you may get better performance with this approach.

Important to keep in mind when designing an effective database

The following concepts and techniques are important to keep in mind when designing an effective database:

An entity is a logical collection of things that are relevant to your database. The physical counterpart of an entity is a database table. Name your entities in singular form and in ALL CAPS. For example, an entity that contains data about your company's employees would be named EMPLOYEE.
An attribute is a descriptive or quantitative characteristic of an entity. The physical counterpart of an attribute is a database column (or field). Name your attributes in singular form with either Initial Capital Letters or in all lower case. For example, some attribute names for your EMPLOYEE entity might be: EmployeeId (or employee_id) and BirthDate (or birthdate).
A primary key is an attribute (or combination of attributes) that uniquely identify each instance of an entity. A primary key cannot be null and the value assigned to a primary key should not change over time. A primary key also needs to be efficient. For example, a primary key that is associated with an INTEGER datatype will be more efficient than one that is associated with a CHAR datatype. Primary keys should also be non-intelligent; that is, their values should be assigned arbitrarily without any hidden meaning. Sometimes none of the attributes of an entity are sufficient to meet the criteria of an effective primary key. In this case the database designer is best served by creating an "artificial" primary key.
A relationship is a logical link between two entities. A relationship represents a business rule and can be expressed as a verb phrase. Most relationships between entities are of the "one-to-many" type in which one instance of the parent entity relates to many instances of the child entity. For example, the relationship between EMPLOYEE and STORE_LOCATION would be represented as: one STORE_LOCATION (parent entity) employs many EMPLOYEEs (child entity).
The second type of relationship is the "many-to-many" relationship. In a "many-to-many" relationship, many instances of one entity relate to many instances of the other entity. "Many-to-many" relationships need to be resolved in order to avoid data redundancy. "Many-to-many" relationships may be resolved by creating an intermediate entity known as a cross-reference (or XREF) entity. The XREF entity is made up of the primary keys from both of the two original entities. Both of the two original entities become parent entities of the XREF entity. Thus, the "many-to-many" relationship becomes resolved as two "one-to-many" relationships. For example, the "many-to-many" relationship of (many) EMPLOYEEs are assigned (many) TASKs can be resolved by creating a new entity named EMPLOYEE_TASK. This resolves the "many-to-many" relationship by creating two separate "one-to-many" relationships. The two "one-to-many" relationships are EMPLOYEE (parent entity) is assigned EMPLOYEE_TASK (child entity) and TASK (parent entity) is assigned to EMPLOYEE_TASK (child entity).
A "foreign key" exists when the primary key of a parent entity exists in a child entity. A foreign key requires that values must be present in the parent entity before like values may be inserted in the child entity. The concept of maintaining foreign keys is known as "referential integrity".
Relationships between two entities may be classified as being either "identifying" or "non-identifying". Identifying relationships exist when the primary key of the parent entity is included in the primary key of the child entity. On the other hand, a non-identifying relationship exists when the primary key of the parent entity is included in the child entity but not as part of the child entity's primary key. In addition, non-identifying relationships may be further classified as being either "mandatory" or "non-mandatory". A mandatory non-identifying relationship exists when the value in the child table cannot be null. On the other hand, a non-mandatory non-identifying relationship exists when the value in the child table can be null.
Cardinality helps us further understand the nature of the relationship between the child entity and the parent entity. The cardinality of a relationship may be determined by asking the following question: "How many instances of the child entity relate to each instance of the parent entity?". There are four types of cardinality: (1.) One to zero or more (common cardinality), (2.) One to one or more (P cardinality), (3.) One to zero or one (Z cardinality), and (4.) One to exactly N (N cardinality).

Object database models

The object-oriented paradigm has been applied to database technology,
creating a new programming model known as object databases. These
databases attempt to bring the database world and the application
programming world closer together, in particular by ensuring that the
database uses the same type system as the application program. This
aims to avoid the overhead (sometimes referred to as the impedance
mismatch) of converting information between its representation in the
database (for example as rows in tables) and its representation in the
application program (typically as objects). At the same time, object
databases attempt to introduce the key ideas of object programming,
such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a
database. Some products have approached the problem from the
application programming end, by making the objects manipulated by the
program persistent. This also typically requires the addition of some
kind of query language, since conventional programming languages do
not have the ability to find objects based on their information
content. Others have attacked the problem from the database end, by
defining an object-oriented data model for the database, and defining
a database programming language that allows full programming
capabilities as well as traditional query facilities.

Object databases suffered because of a lack of standardization:
although standards were defined by ODMG, they were never implemented
well enough to ensure interoperability between products. Nevertheless,
object databases have been used successfully in many applications:
usually specialized applications such as engineering databases or
molecular biology databases rather than mainstream commercial data
processing. However, object database ideas were picked up by the
relational vendors and influenced extensions made to these products
and indeed to the SQL language.

Network model

The network model (defined by the CODASYL specification) organizes
data using two fundamental constructs, called records and sets.
Records contain fields (which may be organized hierarchically, as in
the programming language COBOL). Sets (not to be confused with
mathematical sets) define one-to-many relationships between records:
one owner, many members. A record may be an owner in any number of
sets, and a member in any number of sets.

The network model is a variation on the hierarchical model, to the
extent that it is built on the concept of multiple branches
(lower-level structures) emanating from one or more nodes
(higher-level structures), while the model differs from the hierchical
model in that branches can be connected to multiple nodes. The network
model is able to represent redundancy in data more efficiently than is
the hierarchical model.

The operations of the network model are navigational in style: a
program maintains a current position, and navigates from one record to
another by following the relationships in which the record
participates. Records can also be located by supplying key values.

Although it is not an essential feature of the model, network
databases generally implement the set relationships by means of
pointers that directly address the location of a record on disk. This
gives excellent retrieval performance, at the expense of operations
such as database loading and reorganization.

Most Object databases use the navigational concept to provide fast
navigation across networks of objects, generally using Object
Identifiers as "smart" pointers to related objects. Objectivity/DB,
for instance, implements named 1:1, 1:many, Many:1 and Many:Many named
relationships that can cross databases. Many object databases also
support SQL, combining the strengths of both models.

Hierarchical model

In a hierarchical model, data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one 1:N relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information. However, the hierarchical structure is inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also included for each record.

Flat model

The flat (or table) model consists of a single, two-dimensional array
of data elements, where all members of a given column are assumed to
be similar values, and all members of a row are assumed to be related
to one another. For instance, columns for name and password that might
be used as a part of a system security database. Each row would have
the specific password associated with an individual user. Columns of
the table often have a type associated with them, defining them as
character data, date or time information, integers, or floating point
numbers.

What is Database models?

A data model is not just a way of structuring data: it also defines a
set of operations that can be performed on the data. The relational
model, for example, defines operations such as select, project, and
join. Although these operations may not be explicit in a particular
query language, they provide the foundation on which a query language
is built. Examples of databases include word processors, dictionary,
mail merges, user preferences and information management systems.

Oracle Certification Program

The Oracle Certification Program, a professional certification program, includes the administration of Oracle Databases as one of its main certification paths. It contains three levels:

Oracle Certified Associate (OCA)
Oracle Certified Professional (OCP)
Oracle Certified Master (OCM)

Different versions of the Oracle database

Enterprise Edition (EE) includes more features than the 'Standard Edition', especially in the areas of performance and security. Oracle Corporation licenses this product on the basis of users or of processors, typically for servers running 4 or more CPUs. EE has no memory limits, and can utilize clustering using Oracle RAC software.

Standard Edition (SE) contains base database functionality. Oracle Corporation licenses this product on the basis of users or of processors, typically for servers running from one to four CPUs. If the number of CPUs exceeds 4 CPUs, the user must convert to an Enterprise license. SE has no memory limits, and can utilize clustering with Oracle RAC at no additional charge.

Standard Edition One, introduced with Oracle 10g, has some additional feature-restrictions. Oracle Corporation markets it for use on systems with one or two CPUs. It has no memory limitations.

Express Edition ('Oracle Database XE'), introduced in 2005, offers Oracle 10g free to distribute on Windows and Linux platforms (with a footprint of only 150 MB and restricted to the use of a single CPU, a maximum of 4 GB of user data and 1 GB of memory). Support for this version comes exclusively through on-line forums and not through Oracle support.

Oracle Personal Edition provides the functionality of the "high end" Enterprise Edition but marketed to (and licensed for) single-user developers working on personal workstations.

Oracle Database Lite, intended to run on mobile devices. The database, partially located on the mobile device, can synchronize with a server-based installation.

Oracle Process architecture

The Oracle RDBMS typically relies on a group of processes running simultaneously in the background and interacting to further and monitor database operations. Such processes (and their standard abbreviations) can include:

Archiver processes (ARCn)
Checkpoint process (CKPT)
Database writer processes (DBWn)
Dispatcher processes (Dnnn): multiplex server-processes on behalf of users
Memory-manager process (MMAN): used for internal database tasks such as Automatic Shared Memory Management
Job-queue processes (CJQn)
Log-writer process (LGWR)
Log-write network-server (LNSn): transmits redo logs in Data Guard environments
Logical standby coordinator process (LSP0): controls Data Guard log-application
Media-recovery process (MRP): detached recovery-server process
Memory-monitor process (MMON)
Memory-monitor light process (MMNL): gathers and stores Automatic Workload Repository (AWR) data
Process-monitor process (PMON)
Process-spawner (PSP0): spawns Oracle processes
Queue-monitor processes (QMNn)
Recoverer process (RECO)
Remote file-server process (RFS)
Shared server processes (Snnn): serve client-requests
system monitor process (SMON)

Oracle Memory architecture

System Global Area

Each Oracle instance uses a System Global Area or SGA — a shared-memory area — to store its data and control-information.

Each Oracle instance allocates itself an SGA when it starts and de-allocates it at shut-down time. The information in the SGA consists of the following elements, each of which has a fixed size, established at instance startup:

* The database buffer cache: this stores the most recently-used data blocks. These blocks can contain modified data not yet written to disk (sometimes known as "dirty blocks"), unmodified blocks, or blocks written to disk since modification (sometimes known as clean blocks). Because the buffer cache keeps blocks based on a most-recently-used algorithm, the most active buffers stay in memory to reduce I/O and to improve performance.

* The redo log buffer: this stores redo entries — a log of changes made to the database. The instance writes redo log buffers to the redo log as quickly and efficiently as possible. The redo log aids in instance recovery in the event of a system failure.

* The shared pool: this area of the SGA stores shared-memory structures such as shared SQL areas in the library cache and internal information in the data dictionary. An insufficient amount of memory allocated to the shared pool can cause performance degradation

Library cache

The library cache stores shared SQL, caching the parse tree and the execution plan for every unique SQL statement.

If multiple applications issue the same SQL statement, each application can access the shared SQL area. This reduces the amount of memory needed and reduces the processing-time used for parsing and execution planning.

Data dictionary cache

The data dictionary comprises a set of tables and views that map the structure of the database.

Oracle stores information here about the logical and physical structure of the database. The data dictionary contains information such as the following:

* User information, such as user privileges
* Integrity constraints defined for tables in the database
* Names and datatypes of all columns in database tables
* Information on space allocated and used for schema objects

The Oracle instance frequently accesses the data dictionary in order to parse SQL statements. The operation of Oracle depends on ready access to the data dictionary: performance bottlenecks in the data dictionary affect all Oracle users. Because of this, database administrators should make sure that the data dictionary cache has sufficient capacity to cache this data. Without enough memory for the data-dictionary cache, users see a severe performance degradation. Allocating sufficient memory to the shared pool where the data dictionary cache resides precludes these particular performance problems.

Program Global Area

The Program Global Area or PGA memory-area contains data and control-information for Oracle's server-processes.

The size and content of the PGA depends on the Oracle-server options installed. This area consists of the following components:

* stack-space: the memory that holds the session's variables, arrays, and so on.
* session-information: unless using the multithreaded server, the instance stores its session-information in the PGA. (In a multithreaded server, the session-information goes in the SGA.)
* private SQL-area: an area in the PGA which holds information such as bind-variables and runtime-buffers.
* sorting area: an area in the PGA which holds information on sorts, hash-joins, etc.

Database Interview Question

Search This Blog