Browsing all articles in SQL
Apr
14

SQL Server 2014 Key Features and Enhancements

Author admin    Category IT News, SQL     Tags

sql-server-2014

SQL Server 2014 comes with a set of enhancements as compared to its processors that brings a lot of business benefits to the applications in OLTP and OLAP environments. This article highlights a few key enhancements that are built into the product.

Performance Enhancements

These are performance enhancements that can help your workload run faster.

  • In-Memory OLTP:  This is a new feature allows the database to handle in-memory operations in OLTP scenarios (resolve issues in high concurrency situations). This component is called as “Hekaton”.  Microsoft has released an in-memory component called ‘xVelocity’ catering to OALP requirements, along with SQL Server 2012.
  • On-line indexing at the partition level: This feature allows index rebuilding done at a partition level. Also, the index statistics can be managed at the partition level which will be a huge improvement in performance
  • Updatable Column Store Indexes (CSI):  The CSI feature has been introduced in 2012. The limitation with CSI was that the table cannot be modified once a CSI is created.  In order to update table information the index needed to be dropped or disabled and then rebuilt the index. This new feature provides the ability to load or delete data from the table with Column Store indexes.
  • Buffer Pool Extension to Solid State Drives (SSDs). Ability of each node to have its own SSD or SSD Array for buffering (just like you would with TempDB) and thus increase in the performance by means of faster paging.  In this way, one can cache frequently used data on SSDs. This feature can be best leveraged in case of read-heavy OLTP workloads.
  • Resource Management. Resource Governor can control  I/O along with CPU and Memory (provided by the previous versions).

 

Improved Scalability

SQL Server has imcreased the amount of hardware it can use.

  • Ability to scale up to 640 logical processors and 4TB of memory in a physical environment
  • Ability to scale to 64 virtual processors and 1TB of memory when running in a virtual machine (VM).

High-Availability Enhancements

  • AlwaysOn Availability Groups (AG) get more secondary nodes: Always On supports now up to 8 secondary nodes instead of 4 (is the case with SQL 2012). Of course, Enterprise Edition is needed.
  • AlwaysOn Availability Group’s Readable Secondary will be ON-LINE (more reliable). In SQL 2012, if the primary drops offline, the readable replica databases drop offline. In SQL 2014, the secondary remain online and readable when the primaries aren’t available.
  • Azure Integrated AlwaysOn Availability Groups:  Uses Azure VMs as AlwaysOn AG replicas. This replicas can be created asynchronously on cloud (Azure platform) that saves from paying for expensive offsite datacenter space with machines that sit idle all the time.

Backup Enhancements

  • Smart Backup to Azure (Windows Azure Integrated Backup): Another new backup feature is Smart Backups. With Smart Backups SQL Server determines whether a full or incremental backup is needed and backs up accordingly to Azure.
  •  Azure backup feature and the Azure AlwaysOn Availability options are completely integrated into SSMS.

Microsoft® in-memory database engine

Microsoft® implemented an in-memory transactional engine with the project code name Hekaton*”. Hekaton is expected to dramatically improve the throughput and latency of SQL Server’s on-line transaction processing (OLTP) capabilities. Hekaton is designed to meet the requirements of the most demanding OLTP applications for financial services companies, online gaming and other companies which have extremely demanding TP requirements. This product achieves breakthrough improvement in TP capabilities without requiring a separate data management product or a new programming model. It’s still SQL Server!

Note*: Hekaton is from the Greek word ?κατ?ν for “hundred”. The design goal for the Hekaton original proof of concept prototype was to achieve 100x speeds (possibly)  for certain TP operations.

Key Features:

  • Implements a row-based technology squarely focused on transaction processing (TP) workloads. However, the xVelocity* and Hekaton in-memory approaches are NOT mutually exclusive. The combination of Hekaton and SQL Server’s existing xVelocity column store index and xVelocity analytics engine, will result in a great combination.
  • Hekaton (In-memory TP engine) and xVelocity column store index will be built-in to SQL Server, rather than a separate data engine, which is a conscious design choice

Note*: xVelocity is OLAP version of in-memory database released along with SQL Server 2012.

Technology Implementation:

  • Hekaton works by providing in-application memory storage for the most often used tables in SQL Server. Identifies tables that are most accessed, and will store them in the system’s main memory for faster access time.
  • Hekaton compiles T-SQL stored procedures directly into native code for faster execution.
  • Hekaton uses a new concurrency control mechanism developed by Microsoft® team and researchers from the University of Wisconsin using lock-free data structures for better scalability across multiple cores, avoiding locks while preserving ACID transaction integrity.

These features may be available already outside SQL Server such as own Power Pivot and Power View. However, the biggest difference is that Hekaton is built directly into SQL Server, so that there are no extensions, downloads, or interfaces that can slow down the very program meant to help increase your speed.

A few challenges with the implementation of in-memory OLTP database are as follows.

  • Need to change your data model. Need to bring significant changes to the traditional OLTP model. For ex: Identity fields aren’t supported, may have to use a GUID as a primary clustered key.
  • Need to change application code to replace ad-hoc SQL queries with stored procedure calls. Hekaton works best with stored procedures, as the stored procedures can compile into native code.
  • Since the processing happens in in-memory only, if there is any sudden growth in the Hekaton tables, we can cache less of your other tables. We may run out of memory.

[sqlservercentral]

Oct
11

SQL Injection Hacker Attacks Are On The Rise. Here’s How To Defend Your Servers

Author admin    Category IT News, SQL     Tags

Last week, a hacker group claimed that it breached computer systems at 100 major universities. Team GhostShell gained access to servers at Stanford, Harvard, and the University of Michigan, among others. The technique used, SQL injection, is not new or complex, but reportedly it’s becoming increasingly common. Here’s a quick guide to defending your servers.

Basic Basics

We asked researchers at security firm Sophos to explain what an SQL injection is and how it can be stopped. Before launching into that, though, for laymen, here are a couple things you need to know about an SQL injection before learning how to stop one.

  • SQL stands for Structured Query Language. It is an international standard for interacting with databases.
  • Statements in SQL can retrieve, insert, create and otherwise change data in a database.
  • Code injection is a technique used by hackers to exploit vulnerabilities in a website.

“SQL injection is an old, well established method of attacking systems,” said Sophos threat researcher Fraser Howard. “It consists of inserting malicious SQL statements into an application to cause it to perform some undesirable function.”

Mechanics Of An Attack

Undesirable action sounds nasty. What does it mean exactly? Here are a few examples:

  • Dump table (i.e., return a dump of the entire contents of a database table). This is a great way to steal data. Could be used to gain access to a system (dump admin password, then access the system etc.)
  • Drop table (delete table contents). Destructive. Attackers do not necessarily gain access to the data, but they can break the system. Data may be irretrievably lost.
  • Modify table. Insert additional data into the database table.

Basically, once a SQL injection has its hooks in your database, it can do whatever the heck the malicious hacker behind it wants. Steal your data (most commonly), delete your data, change your data.

“Imagine a website where page contents are stored in a database,” Howard wrote. “When you browse the site, the database is queried, and the page shows you whatever information is relevant. For example, a shopping site. You search for carrots, it queries the database and gets the price. The page you view displays this price.” A malicious hacker using SQL injection could download the store’s entire stock list, wipe it out, and/or change all the prices (or any other category of information).

One further problem with SQL injection not related to theft: Hackers can change the query instructions for a Web application. So instead of the application querying its own server and obtaining information, the query can be sent to a server of the hacker’s choice. This can lead to malware infecting a user’s computer.

Scary stuff, huh?

How To Defend Your Servers

According to Howard, defense against this type of attack is all about the Web application that is the door to the server. Protect that application and you protect the server. In theory, at least. Most organizations likely will remain vulnerable to a dedicated, sophisticated hacker no matter what they do.

Not all hackers are so single-minded, so it makes sense to be prepared. Here are the steps Howard recommends to defend against SQL injection attacks:

  • Secure programming. Design applications securely from the start. SQL injection is not new, and there are many books and online resources to help developers build applications that are secure against this attack. The most common vulnerability is an application that doesn’t sanity-check user input such as data entered into Web forms. If the input is not checked, an attacker can use such forms to inject malicious instructions.
  • Firewalling. This does not replace secure programming. However, it can add a layer of defense in front of your Web server. Web application firewalls can help to block most attacks.

Many organizations are vulnerable to SQL injections because they outsource their Web application development, rush production, test poorly and take little regard for security. “Recipe for disaster,” Howard said. “Lots of easy targets out there.”

In security, the guidelines are usually pretty simple: Take your time, factor security into everything you do, and use common sense. Security might seem like the boring part of what you do, but if you do not pay attention to it, there is a hacker just waiting to break into your databases and steal, destroy, or alter your data.

[RWW]

Mar
12

Microsoft releases to manufacturing SQL Server 2012

Author admin    Category IT News, Microsoft, Software, SQL     Tags

Microsoft has released to manufacturing (RTM’d) its SQL Server 2012 product, as of March 6, the day before the company’s virtual “launch” of its new database.

So when can you get the bits? If you’re an MSDN/TechNet subscriber, you can download them as of March 7, the Softies said. If you’re a volume-license customer, you can grab them from the Volume Licensing Center on April 1. April 1 (no fooling) also is the date when SQL Server 2012 will be generally available to any/all interested parties. And if you want to take the evaluation version for a spin, you can do so immediately, starting March 6.

(Update: Though not quite yet, it seems. Microsoft is working on fixing whatever is holding up  availability of the trial version, officials said in the early afternoon ET on March 6. Update 2: And here is the Evaluation Edition download.)

With SQL Server 2012, codenamed Denali, Microsoft has added a new business-intelligence (BI) specific SKU to its line-up. The Redmondians also have added a core-based pricing option with the new release.

Microsoft rolled out the first Community Technology Preview (CTP) test build of SQL Server 2012 back in November 2010. Approximately 150,000 users have tested the product via various CTP and Release Candidate test builds, officials said.

Like the SQL Server 2008 and SQL Server 2008 R2 releases, SQL Server 2012 is focused on providing users with more high-availability, self-service and BI functionality, officials have said. New technologies and features part of this release include:

  • SQL Server AlwaysOn, a new high-availability component
  • Project codename “Apollo”, new column-store database technology aiming to provide greater query performance
  • Project codename “Juneau”, a single development environment for developing database, business intelligence (BI) and web solutions
  • Power View, a k a, project codename “Crescent”, a web-based, data visualization and presentation solution, and comlement to the PowerPivot technology that is part of SQL Server 2008 R2
  • SQL Server Data Quality Services (based on technology from Microsoft’s 2008 Zoomix acquisition)
  • Other data integration and management tools

Microsoft is playing up the ability of its database technologies — both SQL Server 2012 and the Hadoop on Azure and Hadoop on Windows Server offerings officials announced last year — as giving the company the ability to handle both structured and unstructured data.

Kick off your day with ZDNet’s daily e-mail newsletter. It’s the freshest tech news and opinion, served hot. Get it.

[zdnet]

Feb
9

Calling SQL Server With A BlackBerry Phone

Author admin    Category IT News, Programming, SQL     Tags

Overview

This article provides a simple Web page to play multimedia files on a home computer using a BlackBerry phone. It does this by using SQL Server to store selected binary images (documents, music, videos, etc.) bulk copied to it. Such images may be opened on a smart phone, copied to other SQL Servers, or downloaded to remote machines.

Examples

For this article, I used a BlackBerry Bold 9700 phone connected to SQL Server Express 2005 on a Windows XP desktop hosting IIS. To connect, I enter

http://nnn.nnn.nnn.nnn/images/images.asp?helpLevel=0

into my BlackBerry browser (where nnn.nnn.nnn.nnn is the external IP address of my home computer):

At this point I can View selected books (pdf) that I read in airports, a spreadsheet of several thousand addresses (handy in a meeting), sample movies (wmv) and various holiday pics (.jpg) to show friends, and a few Word and text files (doc/txt). But what I enjoy most are the audio books (mp3) that I play when walking the dog.

Note that you’ll need to install a PDF reader on your BlackBerry, such as the excellent package RepliGo Reader.

I can use the page to download these files to my phone, but memory is scarce so I generally leave them on the desktop. Fortunately, when I’m near my wireless LAN it takes priority for communication, so no service charges are incurred. More importantly, I can modify (or even automate) the multimedia lineup by writing a simple batch job on the desktop that hosts it.

My partner and I use different networks at home, so this page allows her to get our vacation clips after they’ve been edited on my machine (eliminating the need for emails or memory sticks). Privacy is not an issue because these files never sit on other computers. As well, only selected IP addresses are allowed to sign in.

To interactively transfer files (eg. zip files) from my desktop to other machines, I increase the help level to expose a form for entering client information:

http://nnn.nnn.nnn.nnn/images/images.asp?helpLevel=1

After entering that information (which represents a local machine, although an external IP address could be used) and pressing the Refresh button, I’m ready to transfer files:

After pressing the Download link for any file, that file disappears from the list when the Refresh button is pressed. That’s because the application only lists host files that aren’t among the client files (determined by the Id value, which is a non-identity key).

Of course, this operation assumes that the client machine has the same table that the host uses for storing images (nothing more is required).

Later, to transfer files to other machines (on my partner’s network, for example), I modify the client parameters and hit the Refresh button again.

Both the host and client machines may be pre-set in the Web link, and by setting the help level to 0 only the row of files and download links are displayed. To keep these browser links really short for the smart phone crowd, all parameters may be hard-wired into the app itself so that only the address of the page is required. However, by increasing the help level instead, more diagnostic information is displayed (useful for debugging).

The All link downloads all files at once.

How It Works

The system uses a single table called Images in a SQL Server database on the host desktop:

Column Name Data Type
ImageId int
ImageName varchar(256)
ImageBinary varbinary(max)

Its sole purpose is to store a multimedia lineup in a way that my phone can get at it.

Actually that’s not quite true. If I wanted to preserve some confidential documents from a variety of formats (text, voice, video, etc.) but still be able to catalog and query its contents, this would be one way to do that. And if my lawyer wanted some of it while he’s in a meeting, he could use my page from his phone.

Anyway, the standard IIS Web site (accompanied by its asp service) is set up on the XP desktop that’s hosting the database. Then a simple asp page displays a form to get client server particulars (unless they’re hard-wired into the page, or contained in the caller’s hyperlink). If I specify View, then the appropriate image is downloaded to a recordset, and displayed by my browser (after using the image name to determine its MIME type). If I specify Download, then the host is sent an INSERT query that uses SQL Server’s OPENDATASET function to make the client server part of that query.

So the real work occurs on the asp page.

However, we first need a convenient way to populate and maintain the multimedia lineup. Three batch files LoadImages.bat, SaveImages.bat, DeleteImages.bat are used to load, save and delete multimedia images in SQL Server. For example, LoadImages.bat loads images into SQL Server using a parameter file LoadImagesParameters.bat to specify server parameters. In turn, it calls the stored procedure spLoadImages.sql to perform the loading. The other batch files work the same way.

After using these batch files to set up the images table, the asp page Images.asp may be called by my phone browser to view or download them to another server. If I decide to view images, then Image.asp is called where the appropriate parameters are passed. Otherwise ImageDownload.asp is called to download them to another server.

The system is portable so you can use your own table for storing images by changing some parameters in the source code (some limitations apply).

The code is in the Resources section below in the ZIP file.

Summary

This web page for connecting to SQL Server from a BlackBerry has proved surprisingly useful, given its simplicity.

Results may vary depending on the speed of your wireless connection (buffering may happen during busy sessions). Also, what media you may open varies with the device and file size. Audio books (mp3) can be quite large (> 20 Meg.). For increased security, use https:// (but you’ll need a certificate which won’t work on BlackBerries).

PDF files should be < 3 Meg. otherwise the phone’s reader may balk or give false error messages even when loading successfully. ZIP files can only be opened by desktop browsers, but they can still be downloaded when using a phone.

You must enable Ad Hoc Remote Queries and OLE Automation for SQL Server (use Surface Area Configuration For Features to do this).

This page was originally written for a large merge replication project, where employee images needed to be replicated over a slow satellite connection in Canada’s Northwest Territories.

[sqlservercentral]

Oct
21

SQL Sentry Plan Explorer Beta

Author admin    Category IT News, Programming, SQL     Tags

Recently at SQLBits York I demonstrated the new SQL Sentry Plan Explorer (formerly the “Plan Viewer”) and officially announced its upcoming release.  We are now pleased to make the beta version of the tool available for download!

Background

We had originally envisioned query analysis features only as part of the fully licensed version of SQL Sentry Performance Advisor.  The goal was to build upon what we all have today with the graphical plan view in SSMS, and in a big way.  Like many of you, I’ve been using this feature for years, going back to Enterprise Manager, and so had lots of ideas going in about what we could do to make plan analysis more efficient.

After a few significant reworks over many months, things really started coming together, and it was doing most everything I’d wanted and much more.  Working with our lead developer, Brooke Philpott (blog | twitter), one idea had led to another idea, which led to another… it was a gradual, iterative process.  I’m thankful that we embrace a style of agile development here, because this is one project that I believe would have been impossible to design to completion in advance.  It was only through using it on a daily basis to solve real world query issues that the tool was able to evolve into what it is today.

Anyway, somewhere along the way a couple of things became clear to us:

  • These features are so cool and useful (IOHO) that we wanted to share the tool with anyone that wants it, so we are making it available to the SQL Server community for free!
  • Much of the code was client-side, so it would be fairly easy to break out into a standalone tool without any dependencies on our collection service or database (one of each is required for every full SQL Sentry installation).

Plan Explorer Features

I’ll go ahead and run through some of the key features of the Plan Explorer, and in the process I’ll compare and contrast with SSMS.  Please bear in mind I am not in any way trying to bash SSMS (promise! ;-), I’m simply trying to illustrate the reasoning behind the design of the tool for longtime users of SSMS/Enterprise Manager.

When you launch the tool for the first time, you’ll notice is that the UI is broken into 3 major sections:

  • Statements Tree
  • General Query & Plan Info
  • Plan Details

They’re all empty until you load a plan.  There are several ways to load a plan:

  • In SSMS, right-click a graphical plan and select “Show Execution Plan XML”, then copy and paste the plan XML into the Plan Viewer.  It doesn’t matter which pane is active, the clipboard handler is global and will auto-detect the plan XML.
  • Save an execution plan from SSMS to a .sqlplan file, then open the file using the Open toolbar button, File->Open menu, or Ctrl + O.  Raw plan XML files and .QueryAnalysis files (our own proprietary format) are also supported.
  • Right-click an existing .sqlplan file in Windows Explorer and select “Open with -> SQL Sentry Plan Viewer”.
  • Drag-and-drop a plan file onto the application.
  • Retrieve the estimated plan from TSQL. (See General Query & Plan Info section below).

Statements Tree Section

Let’s start with how SSMS works.  A batch or stored procedure can of course consist of one or more statements.  With multi-statement execution plans, what SSMS does is combine all plans for all statements together into one “master plan”.  When it’s an estimated plan, all individual statement plans are joined together at the top level:

big_plan_estimated

With actual plans, each statement is broken out into a separate section, with embedded cost and other information in the section header:

big_plan_actual

When there is only one or a few statements, SSMS can do Ok with this approach.  However, if it’s a large plan with tens or hundreds of statements and conditional logic, looping, etc., it just doesn’t scale.  Some plans are so big and complex that it is literally impossible to find the heaviest statements and operations, and some can’t even be loaded by SSMS!

We wanted to change all that, and so the Statements Tree represents one of the biggest differences between the Plan Explorer and SSMS.  Its function is critical when navigating multi-statement plans.  What it shows is a logical breakdown of the entire plan tree, including all control structures, nested procedure calls, and estimated and/or actual metrics for each statement, including operation counts:

statements_tree

If you click on any statement in the grid, you’ll be shown the associated plan for that statement only:

statements_tree_stmt_sel

Likewise, if you click on any control structure (IF or WHILE) or EXEC <procname> higher up the tree, you’ll see a filtered view of the plan starting from that point:

statements_tree_while_sel

As you’ve probably guessed by now, you can of course sort the statements list, and voila!, the highest cost trees and statements will immediately bubble up to the top:

statements_tree_sorted

So whether there are 10 or 1,000 statements in the master plan doesn’t really matter, it’s trivial to find the highest cost individual plans so you can focus your attention on those.  Big plans that simply weren’t usable before can now be managed with ease.

General Query & Plan Info Section

The primary tab you will make use of in this section is “Text Data”, which holds the color-coded TSQL batch or procedure definition:

query_info

If you loaded an existing execution plan, the TSQL will be auto-generated.  You can also type TSQL, copy/paste it in, or open a .SQL file, then click the “Retrieve Estimated Plan” button on the toolbar to get the estimated plan.

NOTE: Executing TSQL to retrieve the actual plan is not yet supported, although opening an existing actual plan is fully supported.

You’ll notice that, like plans, the TSQL statements are synchronized with the Statements Tree, meaning that if you select a row on the tree view it will auto-select the associated TSQL statement and its execution plan, and vice versa.  See the shots above for examples.

Plan Details Section

This section contains 4 tabs, each of which serves a different purpose:

  • Plan Diagram
  • Plan Tree
  • Top Operations
  • Query Columns

If you select an operator node or row on one of the tabs, it will be auto-selected on all others, so you can switch back and forth between the different views of the plan without losing your place.

Plan Diagram

This is the view that will certainly be most familiar to everyone… although you should immediately notice some differences.  First, there is color!  That’s right, the cost information is color-scaled so you can quickly see which operations are heaviest, and Lookups and Scans are also highlighted (if Rows > 100):

plan_diagram_color_scaling

Pretty cool, eh?  But wait, there’s more! ;-)

If you right-click the diagram, context menus provide access to a variety of other functions. (bolded because this can be easy to miss)

Using the context menus, in addition to scaling by total cost (CPU + I/O), you can also use CPU or I/O separately:

plan_diagram_costs_by_cpu

This can be very helpful if your hardware is more I/O constrained than CPU constrained, or vice versa.

To see which subtrees are most expensive, select the “Cumulative Costs” context item:

plan_diagram_cumulative_costs

I won’t go into detail on all of the other plan diagram enhancements in this already lengthy post, but here is a list of some of them:

  • Optimized plan node labels prevent truncation of object names in most cases, with menu option to disable truncation completely so full object names are always visible.
  • Optimized tooltips prevent wrapping of object names, column names and predicates.
  • Costs are always displayed above nodes for maximum readability.
  • Costs are shown to the first decimal place.
  • Connector line width can be scaled by either Rows or Data Size.
  • Rows and Data Size labels are displayed above connector lines.
  • Connector lines for bookmark (key|rid) lookups, table spools, and function calls show the actual estimated rows or data size, and line width is scaled accordingly. (SSMS always shows 1 row for these operations in estimated plans)
  • The mouse scroll wheel can be used for both scrolling up down, and zooming in/out (by holding Ctrl) !

Last but certainly not least, you may notice that some plans seem, well, “different”.  This is likely due to the optimized layout algorithm which is able to render many plans using much less real estate than SSMS, meaning you are seeing more of the plan without having to scroll and zoom.  It’s easiest to demonstrate this with a couple of pictures.  The shots below are of the same plan:

SSMS version:

plan_tall_ssms

Plan Explorer version:

plan_tall_pe

Plan Tree

This view is just what it sounds like, a tree representation of the plan which shows all operations and associated metrics.  On the surface it looks similar to a showplan_all, however it’s different in several respects:

  • You can expand and collapse sections, and sort within levels .
  • Additional metrics are shown, such as “Estimated Data Size”, as well as “Actual Rows” and “Actual Data Size” for actual plans, side-by-side with the estimates.  There are many other plan metrics available by right-clicking any column header and selecting “Column Chooser”.
  • Significant differences between estimates and actuals are highlighted.
  • Possibly problematic operations like scans and bookmark lookups are highlighted.
Top Operations

Another one that is as it sounds, a simple list of all plan operations, sorted DESC by total estimated cost by default.  You can of course sort by any of the other columns, as well as group by operation type, object, etc.  The same columns and highlighting as on the Plan Tree tab apply here.

Query Columns

This is one of the most useful views for many plans.  What is shows is a list of all columns accessed by the query, for whatever reason (sarg, join, output list, sort), along with the table, operation, and index used.  How is it useful?

Take the case of a bookmark lookup, where you have some columns that aren’t covered by the index chosen by the optimizer.  The traditional ways of figuring out which columns aren’t covered is by hovering over multiple nodes on the graphical plan to determine the index used and columns accessed, or looking through a showplan_all, both of which are tedious and error prone.

Instead, when you look at the list, you’ll see the columns for related operations grouped together (indicated by a thick separator bar), with the index used and the columns in the lookup highlighted in orange:

columns_lookup

From here it’s often a simple matter to make the appropriate index modifications to eliminate the lookup.  In the case above, I can see that index [IDX_ClientID] isn’t covering columns [LEASE_ID] and [DOCUMENT_SIZE], so I may want to add them as included columns to the existing index, or create a new covering index.

NOTE: In the full version of SQL Sentry v6, you can directly open the index properties from this and all other plan tabs using context menus, or by double-clicking nodes on the graphical plan.  This is something you used to be able to do in Enterprise Manager but was dropped in SSMS, much to my dismay.  We wanted to make this available in the Plan Explorer as well since it can be a huge timesaver, but currently the tool has no integration with SSMS in order to keep the install simple.  Rest assured, if/when this changes we will add this capability to the tool.

Beta Downloads

You MUST HAVE .NET 4.0 PRE-INSTALLED to install and use the Plan Explorer. Download it here.

SQL Sentry Plan Explorer x64

SQL Sentry Plan Explorer x86

SQL Sentry Plan Explorer Documentation

General discussion and support for the free tool will be provided through the SQL Sentry Plan Explorer forum.

This beta build has proven very solid, and handles all plans we’ve seen thus far, regardless of size or complexity.  However, as it is a beta, you may find a quirk or two.  If you run into any problems, or have any ideas, questions, etc., email me at greg at sqlsentry.net.  For questions, please be sure to check the forum and the documentation first.

Special thanks the first beta testers who really hammered the tool and helped us get it where it is today!  In particular, Jonathan Kehayias (blog | twitter), Mladen Prajdic (blog | twitter), Christian Bolton (blog | twitter), Brent Ozar (blog | twitter), Brian Kelley (blog | twitter), and Aaron Bertrand (blog | twitter) provided some great real-world plans and feedback.

There are many more features and use cases I’ll be covering in the coming days and weeks, so stay tuned…

I hope you enjoy using it as much as we’ve enjoyed building it!

[sqlsentry]

Jun
21

Virtual Lab: 2008 R2 Domain Controller – Basic Tasks

Author admin    Category IT News, Microsoft, SQL     Tags

Creating Accounts

After installing the Active Directory Directory Services role, there are several new tools available from the Administrative Tools section of the Start menu. Throughout the remainder of the article we will be working through the “Active Directory Users and Computers” tool.

Domain Administrator Account

The first account we are going to create on the new domain is a Domain Administrator account. Even though this is a virtual lab environment, we don’t want to get in the habit of using the built-in administrator account.

Open the “Active Directory User and Computers” screen and explore the default groups that have been created. As this is our first visit, there are no custom folders in our tree.

AD Users and  Computers
AD – Users and Computers

To add a new user to the default “Users” folder we will right-click on the folder, select “New”, and select “User”. This provides us with a dialog to enter the basic user information.

AD Adding a  User
AD – Adding a User

After entering the general user information and pressing next, we are presented with a request for the user’s password and password options. As we have not yet configured the password policy, the default policy is still in place.

Completing the wizard gives us a brand new member with basic Domain User permissions.

AD Adding a  User
AD – Adding a User

To elevate the permissions of the user across the domain, we will assign them membership to the Domain Admins group. Open the user’s properties panel (right-clicking the user and select “Properties”) and select the “Member Of” tab.

AD  Configuring a User
AD – Configuring a User
There are several default groups created when you install AD DS with default levels of permissions across the domain. There is a pretty good reference on the default groups available here.

Pressing the “Add” button brings up the search dialog. Enter “Domain Admins” and press “Check Names” to quickly find and select the Domain Admins group.

AD  Configuring a User
AD – Configuring a User

Press Ok twice and the user is now part of the Domain Admins group.

Service Accounts

Service accounts are an important part of a network. Many companies will rely on common accounts, like the built-in Administrator account, when they are setting up services on the Domain that require domain permissions. There are numerous reasons this is a bad idea:

  • Greater Impact – a change to a single user account suddenly has a much wider impact then is intended, potentially causing service outages in disparate places acros the network
  • Greater Impact, Frequently – how about password changes? Using one accuont for a wide range of services requires a wider set of work when a single password has to change
  • Documentation – using a single account for multiple services requires either good documentation, good guessing, or a lot of overtime
  • Security – providing an account with too much security is a potential security hole, a virus that uses a single application as an entry point suddenly has rights to everything on the network

In order to make management of service accounts (and services) clearer, I suggest each service account only have the level of permission on the domain it absolutely needs and that it only be used for a single service or application. This requires more work up front, as you have to not only create individual accounts for each application and service, but more importantly determine what “minimum requirements” means for each one. On the other hand, minimizing planned and unplanned system outages in the future is worth a little extra time.

Sometimes multiple services need access to a common resource, or a common service needs a few different permission profiles based on the use of the system (for instance, qa versus production SQL Server instances). Rather than add these permissions to each individual service account, consider creating a group that outlines the permissions and then give the service accounts membership in the group. Our goal is to minimize the number of configuration changes we will have to make when that configuration needs to change in the future.

Creating the Service User Account

Our guinea pig in this section will be the basic SQL install from a previous article. We are going to create a service account for the database server that has limited domain permissions but local administrative permissions. We will be using a middle-of-the-road configuration to create a single user account for all SQL Server services on our domain to run unser, what I would consider to be the minimum level of implementation.

Larger networks may want to create groups to handle permissions and then create accounts for individual servers and assign them the correct group membership. This second route reduces risk but heightens the level of work necessary to manage the accounts. In this situation, I would suggest using Managed Service accounts rather than standard user accounts, as they can be assigned the same types of group permissions but reduce a portion of the overhead in administering the accounts.

On the AD DS server open the “Active Directory User and Computers” screen again. While there is a “Users” container already available in the interface, we’re going to create a new Organizational Unit (OU) named “Service Accounts” to store these accounts.

To create the new OU, we right-click the Domain (in my case avl.local) and select New -> Organizational Unit. In the dialog we enter the name, in my case “Service Accounts”, and press Ok.

Creating a  new OU
AD – Creating a new OU

Right clicking the new “Service Accounts” container and selecting New > User opens the new user dialog.

Creating a  new Service User
AD – Creating a new Service User

As this is the domain account for our SQL Server services, I have chosen the name “SQL Server Service Account” and a username of “sqladmin”.

On the next step we will also ensure that our password won’t expire in the middle of the night by verifying the checked defaults and modifying a few.

Creating a  new Service User
AD – Creating a new Service User

After viewing the summary in the final step, we press Ok and create the new user. As the user is not going to have any special, domain-wide permissions we do not need to modify their group memberships or other settings. In fact we are done with the account creation until we bring our database server onto the domain and set up the local permissions for the user on that server.

Joining the Domain

Joining a server to the domain is a fairly straightforward process. In this case we will join our database server from prior articles and then we will configure our domain account from the previous section to serve as the administrative account for the database services.

The first step will be to change the network properties of the virtual server. When we first created it we used DHCP and that means that it received a dynamic address and DNS entries other than our new controller.

In my personal lab I am still serving DHCP requests from my core switch and have not configured it to direct DNS requests at my virtual DC because there is no guarantee that the DC will be on and later on it will be moved behind a virtual firewall and will be inaccessible to the other physical systems on my network.

On the SQL Server machine we open the network settings for our adapter, select the “Internet Protocol Version 4 (TCP/IPv4)” value, and press “Properties”.

SQL Server VM  - Network Settings
SQL Server VM – Network Settings

Assign the server a static IP address on the network and point the DNS settings to the DC. At this point it would also be a good idea to check the “Validate” button to let Windows test the settings when you press Ok.

SQL Server VM  - Network Settings
SQL Server VM – Network Settings

In my server the validation resulted in a message indicating that Windows couldn’t identify the problem, this translates to “Yes It Works”.

Next we will open the System Properties.

One of the 90 ways you can get to the System Properties panel is to press the “Server Manager” icon that is pinned to the toolbar, then in the console highlight “Server manager” in the left side to display the high level dashboard on the right, followed by selecting “Change System Properties” on the right side of that dash.

Now we want to press the “Change” button on the “Computer Name” tab. Select the “Domain” radio button and type in the name of the domain from your domain controller.

SQL Server  VM - Join Domain
SQL Server VM – Join Domain

On pressing “Ok” we are asked to enter domain credentials with permission to join the network.

SQL Server  VM - Join Domain
SQL Server VM – Join Domain

Once we enter our credentials and press “Ok” the system is accepted to the Domain and provides us with the customary reboot request.

back on the Domain Controller in the “Users and Computers” console, the server has been automatically added under the “Computers” container. This being a simple setup, we have not created our own containers to store our servers in, so we will leave it in the default one (for now).

SQL Server  VM - Join Domain
SQL Server VM – Join Domain

When the SQL Server has finished rebooting we will log in for the first time with our Domain credentials. In the “Server Manager” we now see that the full computer name and Domain entries reflect the name of the domain.

Applying the SQL Service Account

The last step after adding a database server to the domain is to reconfigure it’s services to use the domain service account we created above. This will allow us to later create domain resourcs that all of our SQL Server systems need to access without having to manage each one individually.

Currently our service account has no permissions on the server, other than as a basic Domain User (and for some of you, it may not even have that level). So the first step is to provide that domain account with local administrator permissions on our database server.

Before continuing further make sure you log into the server once with the service account above. Several folders and files are created when the account logs in and not doing so (or not manually doing some trickery with the Users sub-directory and an NTLog file) appears to upset WMI when your assigning users to the services. Don’t stay logged in as this user, as you will then end up battling UAC the rest of the day.

From the open “Server Manager” console, expand the “Configuration” and then “Local Users and Groups” containers, selecting the “Groups” folder. Right click the “Administartors” group on the right side and select “Properties”.

SQL Server  VM - Service Account
SQL Server VM – Service Account

We enter the name of the domain account (in this case AVL\sqladmin) and press the “Check Names” button to ensure the account could be found.

SQL Server  VM - Service Account
SQL Server VM – Service Account

Press “Ok” and “Ok” and we are back in the Server Manager again.

The last step is to configure our services to use the Domain account, however this is done a little differently for SQL Server than you might expect.

While it is technically possible to change the service accounts directly from the Windows “Services” console, there are a number of extra steps that have to be completed in order for it to work properly, as there are also various registry and application settings that have to be set outside of the Services console. Microsoft has published a knowledge base article on this for 2005, I have not been able to locate a similar document for 2008.

Open the “SQL Server Configuration Manager” from the Start Menu (this would also be a good time to pin it to the taskbar).

SQL Server  VM - Service Account
SQL Server VM – Service Account
Ah, the wonders of UAC. Don’t get me wrong, I actually like the concept, unfortunately the implementation leaves a little to be desired in Enterprise or Domain setups. Accounts in the Domain Admins (and a few others?) groups don’t receive the level of rights you would expect and, in fact, it doesn’t seem possible to grant them the abilities of the built-in administrator, which has strange and mystical powers that are turned on by default and can only be turned off in the policy settings, not applied to other accounts. You will receive a number of errors when you try to escalate with a variety of different settings, and all of the recommendations [I have found] end with either turning off UAC, using the Right-Click “Run As Administrator”, or appear to have been written by people logged in as a Local Administrator.
To make matters worse, file access escalation (if it works) creates a mess of local folder permissions, so it’s highly suggested that you read this workaround.

In the Basic SQL Server Virtual Lab post we only installed a few services and configured SQL Server and SQL Agent to use a local account.

SQL Server  VM - Service Account
SQL Server VM – Server Configuration Manager

Right-click on each service and select Properties to open the properties panel. Enter the domain account we created above in the Service Accounts section and it’s password, then press Apply. The system will then warn us that a restart to the services is required to continue.

SQL Server  VM - Service Account
SQL Server VM – Changing the Account

Follow the same instructions for each of the services and we son have all of our services running under the DOmain account.

SQL Server  VM - Service Account
SQL Server VM – Finished

Wrapping Up

Now that we have the Domain Controller running, we’ve created some accounts, and we have joined our first database server to the domain, it’s time to switch focus back to our database server for a while. Upcoming weeks will focus on some administrative tasks, such as configuring database mail and alerts and building out some scripts to monitor current state and state-over-time values for a few critical performance indicators.

Virtual Lab entry on the LTD Wiki

May
25

Working Together: SQL Server 2008 R2 Reporting Services Integration in SharePoint 2010

Author admin    Category IT News, SQL     Tags
Alan Le Marquand
SQL Server and SharePoint have always worked together well. When SharePoint Server 2010 and SQL Server 2008 R2 were released, there were some significant improvements to the integration between SharePoint and SQL Server 2008 R2 Reporting Services (SSRS). Here’s a look at how to configure and use the latest enhancements.
Server Integration Architecture
The Reporting Services Add-in for SharePoint is what truly drives the integration between the two servers. You install the add-in, which is available as a free download from the Microsoft Download Center, on all SharePoint 2010 Web Front End (WFE) servers that require integration with a Report Server. Figure 1 shows the architecture of the integration components.
On the SharePoint 2010 WFE, the add-in installs three components: the SSRS Proxy, a Report Viewer Web Part, and the application pages that allow you to view, store, and manage report server content on a SharePoint site or farm. The SSRS Proxy facilitates communication between the WFE and the Report Server. On the Central Administration Reporting Services pages within SharePoint, you configure the proxy with the Report Server you want to access, as well as the authentication method and credentials to access the server. For the integration to work, you must configure the Report Server to run in SharePoint Integrated mode.
Figure 1 Server Integration Architecture
One item to note in Figure1 is the SharePoint Object Model component on the Report Server. For the Report Server to understand the reporting information stored in SharePoint and to be able to secure it, the Report Server has to interact with the configuration and content databases on your SharePoint site or farm. You can achieve this by installing a minimum copy of SharePoint on the Report Server and joining it to the farm.
The version of SharePoint you install on the Report Server must be the same as the version used throughout the farm.  You need to do this only if you’re running your Report Server on a separate machine. If you are running both SharePoint and Reporting Services on the same machine, you only need to install the add-in.

Alan Le Marquand
SQL Server and SharePoint have always worked together well. When SharePoint Server 2010 and SQL Server 2008 R2 were released, there were some significant improvements to the integration between SharePoint and SQL Server 2008 R2 Reporting Services (SSRS). Here’s a look at how to configure and use the latest enhancements.Server Integration Architecture
The Reporting Services Add-in for SharePoint is what truly drives the integration between the two servers. You install the add-in, which is available as a free download from the Microsoft Download Center, on all SharePoint 2010 Web Front End (WFE) servers that require integration with a Report Server. Figure 1 shows the architecture of the integration components.
On the SharePoint 2010 WFE, the add-in installs three components: the SSRS Proxy, a Report Viewer Web Part, and the application pages that allow you to view, store, and manage report server content on a SharePoint site or farm. The SSRS Proxy facilitates communication between the WFE and the Report Server. On the Central Administration Reporting Services pages within SharePoint, you configure the proxy with the Report Server you want to access, as well as the authentication method and credentials to access the server. For the integration to work, you must configure the Report Server to run in SharePoint Integrated mode.

Figure 1: Server Integration Architecture

Figure 1 Server Integration Architecture
One item to note in Figure1 is the SharePoint Object Model component on the Report Server. For the Report Server to understand the reporting information stored in SharePoint and to be able to secure it, the Report Server has to interact with the configuration and content databases on your SharePoint site or farm. You can achieve this by installing a minimum copy of SharePoint on the Report Server and joining it to the farm.
The version of SharePoint you install on the Report Server must be the same as the version used throughout the farm.  You need to do this only if you’re running your Report Server on a separate machine. If you are running both SharePoint and Reporting Services on the same machine, you only need to install the add-in.

Configuring Integration

Overall, configuring integration has been simplified with SharePoint 2010 and SQL Server 2008 R2. The order in which you perform the configuration depends on what you’ve already installed. Even if you’re starting from scratch or from an existing installation, the key is to have all the main components installed before you configure the SSRS Proxy in SharePoint. For best results when integrating SQL Server Reporting Service 2008 R2 with SharePoint 2010, the recommended order if starting from scratch is:

  1. Run the SharePoint 2010 prerequisite installer—this will install the SSRS 2008 R2 Add-in for SharePoint.
  2. Install and configure SharePoint 2010 in a farm configuration.
  3. Repeat steps 1 and 2 on the Report Server machine if it is separate from the SharePoint WFE machine, and configure it to join the SharePoint farm created in step 2.
  4. Install SQL Server Reporting Services in SharePoint Integrated mode.
  5. Configure the SSRS Proxy via the Reporting Services Integration page and activate the Reporting Services feature.

If you don’t see the Reporting Services content types in your site under the Document| New menu, you’ll need to manually add them. I describe how to add the Report Server Content types later in this article under Integration with Report Builder 3.0.

In this scenario, I’d use the SQL Server for the SharePoint database, rather than the embedded edition that SharePoint defaults to. If you plan to install all the components on one machine, step 5 is redundant. Steps 1 and 2 can be combined within the SQL Server installation process.

If you have an existing SharePoint installation, you can download and install the add-in at any time. The add-in installation process adds the necessary pages to the SharePoint Central Administration as well as the new Report Server content types for existing SharePoint Libraries in sites using the Business Intelligence (BI) Center site template.

On the SharePoint side, you can configure integration on either SharePoint Server 2010 or SharePoint Foundation 2010. Both support the installation of the Reporting Services Add-in. If you install SharePoint and Reporting Services on different machines, you must install the same version of SharePoint on the Report Server. For example, you would not be able to install SharePoint Foundation 2010 on the Report Server if you were using SharePoint Server 2010 as your Web Front End.

The add-in installation is very simple; besides entering you name and company, no other configuration is required. If you’re installing SharePoint for the first time, you install the add-in before you install SharePoint; this is done automatically when you run the SharePoint 2010 prerequisite.

Configuring the Report Server is straightforward. The key considerations are:

  • The edition of SQL Server has to be Standard, Enterprise or higher.
  • The Report Server database must be created for SharePoint Integrated mode.
  • If you are using separate machines for SharePoint and Report Server, you’ll need a minimal installation of SharePoint and this must be joined to the farm on the Report Server.

A Report Server is implemented as a single Windows service that runs under a built-in account or a local or domain Windows user account. In SharePoint Integrated mode, the Report Server service account is provisioned appropriately to access to the SharePoint configuration and content database as well as SharePoint object model resources. This happens when configuring the Reporting Services integration with SharePoint via the Reporting Services Integration page.

When the authentication mode is “Windows Integrated,” the Windows user logged into SharePoint will be impersonated when connecting from the WFE to the Report Server. When the authentication mode is a trusted account, the SharePoint user context of the user logged into SharePoint is passed on to Report Server in the form of the SharePoint user token. The SharePoint WFE’s application pool account is used to make the connection from the WFE to Report Server. You’ll find a summary of the Service Account Configuration in the TechNet article “Configuring Reporting Services for SharePoint 2010 Integration.”

If you’ve already installed Reporting Services using the default settings, the Reporting Services database will be in Native mode. To operate in SharePoint Integrated mode, you’ll need to return to the Reporting Services Configuration tool and, from the Database Settings page, change the mode from Native to SharePoint Integrated.

You can change the Report Server mode from Native to SharePoint Integrated at any time; however, this does not convert the existing database. Each time you switch you must either create a new database or connect to an existing one.

Before configuring the Reporting Services Proxy options within SharePoint, there’s one other configuration you need to make. You should ensure anonymous access has not been enabled on the Web application. While this will not stop you from configuring the Reporting Services Proxy settings, your users will get an error when they run reports. You can choose to use Windows or any claims-based authentication fromthe other authentication providers, and if you are configuring integration between a report server and a SharePoint farm, each SharePoint Web application in the farm can be configured to use different authentication providers.

The Add-in creates a new Reporting Services section within the SharePoint Central Administration General Application Settings page. On the Reporting Services Integration page you enter the Report Server URL and the authentication details and activation of the Reporting Services feature on all or selective site collections in the farm.

Figure 2 Configuring the Reporting Services Proxy

Once you complete the page shown in Figure 2, the integration configuration process is complete.

Integration with Report Builder 3.0

The main benefit of the integration between SharePoint and Reporting Services is that it allows users to create, modify and publish reports from within SharePoint. Reporting Services provides some predefined content types that are used to manage various files, including the shared Report data source (.rsds) files, the Report Builder model (.smdl), and the Report Builder report definition (.rdl) files. After you have configured integration to allow users to create and manage these new content types from the ribbon and the context menus, you need to enable the new content types on those libraries.

If you are using the BI Center site template, you do not have to do anything; the content types are automatically enabled with the template and for all sites created using this template. For all other sites and document libraries, you’ll need to carry out a two-step configuration process. First, you need to enable Content Type Management within the libraries; by default it is off. Then you need to enable the content types for the library. To enable content type management for a document library, follow the procedure in the TechNet article “How to: Add Report Server Content Types to a Library (Reporting Services in SharePoint Integrated Mode).”

Once these new content types have been added to a library, three new options will appear from the New Document drop-down on the Documents tab. If you now select the Report Builder Report option, Report Builder 3.0 will be downloaded to the client and run. You can alter this behavior from the SharePoint Central Administration. The Reporting Services Server Defaults allows you to turn off this option as well as configure an alternate URL for Report Builder.

Using the Report Viewer Web Part on a SharePoint Site

The Report Viewer Web Part is a custom Web Part that is installed by the Reporting Services Add-in. You can use the Web Part to view, navigate, print and export reports on a report server. To add this Web Part to a page, you can use the steps in the TechNet article “How to: Add the Report Viewer Web Part to a Web Page (Reporting Services in SharePoint Integrated Mode).”

Each Report Viewer Web Part renders one report at a time based on the absolute URL to the report file (.rdl) specified in the Report property. The URL must be the fully qualified path to a report on the current SharePoint site or on a site within the same Web application or farm. The URL must resolve to a document library or to a folder within a document library that contains the report. The report URL must include the .rdl file extension. If the report depends on a model or shared data source files, you don’t need to specify those files in the URL. The report contains references to the files it needs.

Claims Authentication and Reporting Services

One of the new features introduced with SharePoint Server 2010 is support for claims-based authentication. In claims-aware applications, clients present “claims” to the application. These claims are pieces of information about the user, such as user name, e-mail address, or manager name. This provides the application with more information than it would receive using Kerberos. Take, for example, a purchasing application: Two of the claims passed to the application could be the user’s manager’s e-mail address and the user’s purchasing limit. In a non-claims-aware application, this information would have to be managed by the application.

In the SharePoint world, claims authentication solves the problem of sharing SharePoint sites across organizations. Using a product like Active Directory Federation Services (AD FS), two organizations with different authentication methods can set up claims that allow SharePoint to identify a user and assign the correct permissions.

Because this functionality is built into SharePoint 2010 products, Reporting Services can work with this authentication model. Reporting Services is not claims-aware; instead it communicates with SharePoint through a trusted account. The proxy service within the SQL Server 2008 R2 add-in uses the SharePoint object model to convert the claims token into a corresponding SharePoint user context in the form of a SharePoint user token that the Report Server can understand and use to validate against the SharePoint database. In a nutshell, the process works like this:

  1. SharePoint performs the appropriate claims authentication and, using the SharePoint Secure Token Service, communicates the claims token to the Reporting Services proxy.
  2. The Reporting Services proxy then uses the claims token to communicate with the SharePoint object model and generate a corresponding SharePoint user token that it forwards to the Report Server.
  3. The Report Server uses the SharePoint user token against the local SharePoint object model to generate the correct SharePoint user context.
  4. If the user has the required permission, Report Server sends the requested information back to SharePoint using the appropriate SharePoint user context as it would normally.

Native List Reporting

SQL Server 2008 R2 Reporting Services now supports SharePoint lists as a data source. This support allows you to retrieve list data from SharePoint Foundation 2010, SharePoint Server 2010, Windows SharePoint Services 3.0, and Office SharePoint Server 2007. The ability to access list data is not reliant on the add-in or running Report Server in Native or SharePoint Integrated mode. The functionality is built into Report Server. What changes in the different configurations is the method of access.

There are two methods by which SharePoint list data is accessed. One is via the lists.asmx web service and the other is via the SharePoint object model APIs. On any SharePoint installation, if you enter the URL http://<sharepoint_server_name>\lists.asmx, you’ll get an XML list of all the lists on the SharePoint site that you’re able to access. By using this method, Report Builder 3.0 is able to retrieve the lists. A Report Server configured in Native mode also uses this method.

The SharePoint object model API method can be used in two scenarios. One is where a Report Server is configured in SharePoint Integration mode and the list exists in the same SharePoint farm Reporting Services is integrated with, and this is all on the same machine; remember that in this scenario there is a copy of SharePoint running on the Report Server that gives it access to the API set. The other scenario is where you have SharePoint 2010 installed along with the add-in, but you have no Report Server. This is called local mode and is covered later in the section “Reporting Without Reporting Services.”

To use data obtained from a SharePoint list within a report first requires you create a data source, then a dataset that uses that data source. In Report Builder 3.0, there is a new connection type on the Data Source properties page called Microsoft SharePoint List, as shown in Figure 3. Along with this option, you enter the URL of your SharePoint site—no need to add lists.asmx to the URL. The data source can also be configured with different credentials to use when accessing the SharePoint server.

Figure 3: SharePoint List Connection Type

Figure3 SharePoint List Connection Type

When you create a new dataset based on this data source, you will be given a list of all the SharePoint lists on the site you have access to. You can then drill down into a list and access the individual list items, create filters, create parameters and create reports just as if this were a SQL database table.

Alternate Access Mapping Support

Another integration enhancement is support for Alternate Access Mapping (AAM). AAM has been in SharePoint since the 2007 version, but Reporting Services did not support it. Now if you configure an alternate access mapping within SharePoint Central Administration, the Reporting Service Add-in will maintain the URL structure, as shown in the very simple report in Figure 4. Both http://sql-01 and http://www.contoso.com render the same report.

Figure 4: Alternate Access Mapping

Figure 4 Alternate Access Mapping

Reporting Without Reporting Services

So far, all the information in this article has pertained to what is called connected mode. In the previous versions of Reporting Services, this was the only mode available and meant that SharePoint had to be connected to a Reporting Services report server configured in SharePoint Integrated mode in order to render reports using the Report Viewer.

With the release of SQL Server 2008 R2, you can render reports without integrating your SharePoint site or farm with a Reporting Services report server. Instead, you can use the Report Viewer to directly render reports from SharePoint when the data extension supports local mode reporting. Out of the box, only the SharePoint List and the Microsoft Access 2010 reporting extension support this.

When you’re in local mode, you can also render a report that has an embedded data source or a shared data source from an .rsds file. However, you can’t manage the report or its associated data source as this is not supported in local mode.

Supported Combinations of the SharePoint Add-in and Report Server

With the release of SQL Server 2008 R2 and SharePoint Server 2010, there are now three versions of SQL, three versions of the add-in, and two versions of SharePoint. The integration components can work on any of these releases, but you have to mix and match the right versions. The table in Figure 5provides the supported combinations of products.

Figure 5: Supported Combinations of the SharePoint Add-In Report Server

Figure 5 Supported Combinations of the SharePoint Add-in and Report Server

<technet.microsoft.>

May
21

SQL Server 2008 : Intersect, Except, Union, All and Any

Author admin    Category IT News, Programming, SQL     Tags

I was reviewing the Microsoft training courses for SQL2008 for a set of application developers who wanted to learn more about SQL Server. In doing so I came across some commands that I either had not heard of or had not used before.

  • INTERSECT
  • EXCEPT
  • ALL
  • ANY

ALL and ANY have been around for donkeys years but INTERSECT and EXCEPT were new to me.

Whenever I find a new set of commands in T-SQL I fire up my developer edition and go straight to Adventureworks to start playing around.

INTERSECT, EXCEPT and UNION

To experiment with these commands I decided to look at two sets of CustomerID values

  • Customers in sales territory 10 (United Kingdom)
  • Sales orders in July 2004, the last month of orders in Adventureworks

Perhaps the best way to show what these commands do when comparing two sets of data is to look at the diagrams below

Predicate Illustration Description
EXCEPT image001 Customers from the UK who have not bought in the last month (July 2004)
INTERSECT INTERSECT Customers from the UK
AND
who have made a purchase in the last month (July 2004)
UNION UNION Customers from the UK
OR
who have made a purchase in the last month (July 2004)

Different ways of writing an EXCEPT query

Although I had not used the EXCEPT before I had achieved the same ends by more traditional methods. As any DBA will know there are many ways of achieving exactly the same thing so I will illustrate 3 solutions.

LEFT JOIN

The following query yielded the results I needed

        SELECT C.CustomerID
        FROM Sales.Customer AS C
        	LEFT JOIN Sales.SalesOrderHeader AS OH
        	ON C.CustomerID = OH.CustomerID
        	AND OrderDate>='2004-07-01'
        WHERE OH.CustomerID IS NULL
        AND C.TerritoryID=10

Except - LEFT JOIN

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 196 ms.

The overall query cost was 0.0517279

WHERE CustomerID NOT IN(…)

Being a diligent DBA I wanted to try the query a different way

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        AND CustomerID NOT IN(
        	SELECT customerid
        	FROM  Sales.SalesOrderHeader
        	WHERE OrderDate>='2004-07-01'
        )

EXCEPT NOT IN

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 160 ms.

This time the overall query cost was 0.0500904 offering a slight improvement.

EXCEPT

Finally I tried the EXCEPT predicate

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        	EXCEPT
        SELECT customerid
        FROM Sales.SalesOrderHeader
        WHERE OrderDate>='2004-07-01'

image007

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 202 ms.

The time for the EXCEPT predicate was 0.0500904. Exactly the same as for the WHERE NOT IN(…) query. You can also see that the execution plans are the same.

Different ways of writing an INTERSECT query

I decided to take the same approach with the INTERSECT query and write the equivalentquery in 3 separate ways.

INNER JOIN

As any customer can have more than one order I am going to have to make a distinct list of CustomerID values. I decided to try a couple of approaches.

        SELECT DISTINCT C.CustomerID
        FROM Sales.Customer AS C
        	INNER JOIN Sales.SalesOrderHeader AS OH
        	ON C.CustomerID = OH.CustomerID
        WHERE
        	C.TerritoryID=10
        	AND OH.OrderDate>='2004-07-01' 

        SELECT C.CustomerID
        FROM Sales.Customer AS C
        	INNER JOIN (SELECT DISTINCT CustomerID FROM Sales.SalesOrderHeader WHERE OrderDate>='2004-07-01'
        )AS OH
        	ON C.CustomerID = OH.CustomerID
        WHERE C.TerritoryID=10

This produced two differing execution plans with an overall cost of 0.0500859 and 0.0513978 respectively.

INNER JOIN

WHERE CustomerID IN(…)

This time the query below yielded an execution cost of 0.0499945

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        AND CustomerID IN(
        	SELECT customerid
        	FROM  Sales.SalesOrderHeader
        	WHERE OrderDate>='2004-07-01'
        )

INTERSECT IN

INTERSECT

Finally, as before the INTERSECT query presented an identical execution plan to the WHERE CustomerID IN ( query

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        	INTERSECT
        SELECT customerid
        FROM Sales.SalesOrderHeader
        WHERE OrderDate>='2004-07-01'

Conclusion for INTERSECT and EXCEPT

Granted Adventureworks is a pretty contrived example and under such conditions the example I have given here INTERSECT and EXCEPT are as good as the best of the rest.

INTERSECT
Predicate Result
INNER JOIN 0.0500859
INNER JOIN (with DISTINCT derived query) 0.0513978
IN 0.0499945
INTERSECT 0.0499945
EXCEPT
Predicate Result
LEFT JOIN 0.0517279
NOT IN 0.0500904
EXCEPT 0.0500904

At this stage I cannot think of a compelling reason to use these new predicates over and above the more usual WHERE field NOT IN (…) particularly as the query on both sides of an EXCEPT or INTERSECT has to return the same recordset structure.

I did consider using it as a means of deduplicating two address record sets but that would be an even more contrived example.

The test does emphasize that it is worthwhile trying to write the query more than one way in order to determine the best option for the particular scenario.

The ANY and ALL Predicate

ANY and ALL are predicates I have never needed to use.

ANY

The two queries below offer up the same results and execution plan

        SELECT *
        FROM Sales.SalesPerson
        WHERE TerritoryID = ANY(
        	SELECT TerritoryID FROM Sales.SalesTerritory WHERE CountryRegionCode='US'
        )

        SELECT *
        FROM Sales.SalesPerson
        WHERE TerritoryID IN(
        	SELECT TerritoryID FROM Sales.SalesTerritory WHERE CountryRegionCode='US'
        )

To look at how ANY may be used in a more practical sense I decided to create a Sales.TopSales table in Adventureworks using the query below

        SELECT TOP 5 SalesOrderID,TotalDue
        INTO Sales.TopSales
        FROM Sales.SalesOrderHeader
        ORDER BY Sales.TotalDue DESC

        ALTER TABLE Sales.TopSales
        	ADD CONSTRAINT PK_TopSales PRIMARY KEY CLUSTERED( SalesOrderID)

        CREATE INDEX idx_TopSales_TotalDue ON Sales.TopSales(TotalDue DESC)

For my example I decided to retrieve sales that exceeded the minimum in my Sales.TopSales table.

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > ANY(SELECT TotalDue FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC

An alternative way of writing this query could be as follows

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > (SELECT MIN(TotalDue) FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC

The execution plans show considerable differences

  • ANY query cost = 3.47092
  • Alternative query cost = 1.24152

image012

ALL

The ALL predicate allows a comparison to be made against all the values in a SELECT list. The following two queries are identical

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > ALL(SELECT TotalDue FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC 

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > (SELECT MAX(TotalDue) FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC 

Once again the execution plans tell the full story

  • ALL query cost 3.47068
  • Alternative query cost 1.24152

image014

Conclusion for ALL and ANY

ALL and ANY certainly date back to SQL Server 7.0 and possibly before that. I have never found a reason to use them and given my findings I see no benefit to them.

Both predicates insist that their contents are produced from SELECT statements so where an IN predicate can contain a static list or a SELECT statement, ANY can only contain a SELECT statement.

The best I could results I could achieve for the ANY predicate simply equaled those for the more flexible IN predicate.

Rounding it all up

ALL and ANY do not seem to offer any benefits to offset their disadvantages

INTERSECT and EXCEPT do at least offer seem to offer performance on par with alternatives but so far I cannot see a benefit to using them. They are however part of the ANSI standard SQL.

May
10

The end of SQL and relational databases ?

Author admin    Category IT News, SQL     Tags

The road to SQL started with Dr. E.F. Codd’s paper, “A Relational Model of Data for Large Shared Data Banks“, published in Communications of the ACM in June 1970.   His colleagues at IBM, Donald Chamberlin and Raymond Boyce were working on a query language (originally named SQUARE, Specifying Queries As Relational Expressions) that culminated in the 1974 paper, “SEQUEL: A Structured English Query Language“.  Since that time, SQL has become the dominant language for relational database systems.  In recent years, frameworks and architectures have arrived on the programming scene that attempt to hide (or completely remove) the use of SQL and relational databases allowing developers to focus even more on user interfaces, business logic and platform support in our application development.  We have also seen the rise of alternatives to relational databases, the so-called “NoSQL” data stores. Are we witnessing the end of SQL and relational databases?

In a December DDJ podcast interview by Mike Riley, I was asked the question: “With the growing trend of ORMs (Object Relational Mapping), some software developers are saying that SQL is becoming less relevant. What are your opinions about that statement?”  Thinking over the holidays about that question and the implications including and beyond ORMs, I spent some time thinking more about frameworks like Ruby on Rails Active Record and Hibernate.  These frameworks still rely on team members who work on the design, development and maintenance of relational databases.  The work that Microsoft has done with LINQ also reduces the impedence mismatch between programming languages and database languages.

The “NoSQL movement” and Cloud based data stores are striving to completely remove developers from a reliance on the SQL language and relational databases.  Some developers think that the NoSQL movement is something completely new.  Object databases appeared in the 1980s and Ray Ozzie started the commercial document-centric datastore business with Lotus Notes in the 1990′s.  Charlie Caro, Senior Software Engineer who works on Embarcadero’s InterBase SQL database engine, told me “At the time, it seemed almost heretical that a data model that ignored concurrency control could flourish. But Ozzie recognized the benefits of distribution, replication and ease-of-setup would outweigh the very, infrequent occurrence of update conflicts in managing documents and messages. Even so, there were opt-in capabilities if documents needed to be locked for safe modification without data loss. But the default was to just let it fly with no update coordination.”

NoSQL, according to WikiPedia, is “an umbrella term for a loosely defined class of non-relational data stores”.  The term was first used by Rackspace employee Eric Evans.  In his blog post last October, Eric talks about the name NoSQL (now generally known to mean Not Only SQL).  The real nugget in the post is “the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for.”  Adam Keys in his The Real Adam blog post offered up an alternative term, “Post-Relational”.  One goal of some NoSQL databases is to remove the overhead and memory footprint of relational databases.  Other goals of NoSQL databases include: closer affinity with programming languages, use of web technologies and RPC calls for access and optional forms of data query.

In a recent blog post, “The “NoSQL” Discussion has Nothing to Do With SQL” Professor Michael Stonebraker discusses comparisons of SQL and NoSQL databases.  SQL and NoSQL databases can be compared using any or all of the following features and characteristics.  (Note: there are many more that can be added to this list for either or both types of databases. Post a comment with additional items that might characterize the differences between the two types):

  • Vertically and Horizontally Scalable – relational databases (traditionally) reside on one server which can be scaled by adding more processors, memory and storage to provide scalability.  Relational databases residing on multiple servers usually use replication to keep the databases synchronized.  NoSQL databases can reside on a single server but more often are designed to work across a cloud of servers.  (NoSQL: Distributed and Scalable Non-Relational Database Systems)
  • Columns, Key/Value Store, Tuple Store – relational databases are usually comprised of a collection of columns in a table and/or view (fixed schema, join operations).  NoSQL databases often store a combination of key and value pairs or Tuples (schema free, an ordered list of elements).
  • In-Memory Dataset, On-Disk Storage – relational databases almost always reside on a disk drive or a storage area network.  Sets of database rows are brought into memory as part of SQL select or stored procedure operations.  Some (but not all) of the NoSQL databases are designed to exist in memory for speed and can be persisted to disk.
  • Document-Oriented, Collection-Oriented, Column-Oriented, Object-Oriented, Set-Oriented, Row-Oriented – Document-oriented databases contain records of documents, fields and XML.  Collection-oriented datasets provide closer affinity to object-oriented programming languages.  Relational databases are characterized by data organized in tables, rows and columns (Column-Oriented).  SQL select operations normally return cursors to a single row or a set of rows containing the columns specified.  Object-Oriented databases have been around since the dawn of object-oriented programming but relational databases are by far the dominant database architecture in use today (and for years to come).  Are object databases NoSQL databases?  The rise of object-relational mapping (ORM) frameworks marries object-oriented programming to the larger number of relational databases. NoSQL databases data are often organized into objects, key/value pairs or tuples.  NoSQL database election operations are often done in code or an interface.

In an email conversion, Charlie Caro told me the following: “If Facebook has to manage 100,000,000′s of user profiles, a distributed, context-free key-value store is probably the way to go. Simple lookup by large numbers of users but only a single updater of the profile by the owning user means a transactional database is probably overkill. Multiple readers, with one updater, needs no concurrency control.  In many cases, the simpler setup and ease-of-use of the NoSQL solutions is what attracts its user community. SQL databases require more setup (schema et al) but those schemas are exactly what give relational databases higher performance opportunity in parallel DBMS implementations. This ease of use premium finds its way in programming languages as well. Many developers today prefer scripting languages over their compiled counterparts, which have static type checking for safer programs. Scripting languages are just so forgiving and easy to get started with and projects to compile scripts into .NET/Java intermediate byte codes are increasing their performance.”  We both agreed that it’s really all about having the right tool for the job, and it always has been!  No sense hammering a screw into the wall when you can use a screwdriver.

For years to come, most of us will still rely on and use relational databases and SQL.  I certainly will.  I will also continue to look for better ways to isolate and encapsulate data access in my applications.  As always, any project decisions have to match with the user and business requirements.  For new projects, I believe, we have genuine non-relational alternatives on the table (pun intended).

I covered some background on the SQL language and relational databases, the current and future for relational databases, the rise of frameworks that hide some of the complexities of database programming and the rise of the NoSQL databases.  In this second installment I will take a walk through (some of) the currently available open and closed source NoSQL databases.  Then, in part 3, I will point you to NoSQL Internet resources, past/upcoming events and offer some guidance for developers.

It’s amazing to see how many open and closed source alternative data stores have appeared.   At the same time more are appearing every day. If I have left off one or more of your favorite NoSQL databases post a comment.  Below you will find many different types of NoSQL databases:  Document-Oriented, Collection-Oriented, Column-Oriented, Object-Oriented, Graph-Oriented, Set-Oriented, Row-Oriented and more.

AllegroGraph

Company/Org: Franz Inc.
Type: Graph
Description: Modern, high performance, persistent graph database.
Storage: Disk based, meta-data and data triples.
API(s): SPARQL, Prolog

BerkleyDB

Company/Org: Oracle
Type: Key/Value
Description: C language embeddable library for enterprise-grade, concurrent, transactional storage services. Thread safe to avoid data corruption or loss
Storage: B-tree, hash table, persistent queue
API(s): C, C++ and Java
Notes: Use BerkleyDB XML layer on top of BerkleyDB for XML based applications. Comparison of BerkleyDB and relational databases

BigTable

Company/Org: Google
Type: Sparse, distributed, persistent multidimensional sorted map.
Description: Distributed storage system for structured data. Data model provides dynamic control over data layout and format. Data can live in memory or on disk.
Storage: Data is stored as an uninterpreted array of bytes. Client applications can create structured and semi-structured data inside the byte arrays.
API(s): Python, GQL, Sawzall API, REST, various.
Notes: Overview: Bigtable: A Distributed Storage System for Structured Data (PDF format)

Cassandra

Company/Org: Apache
Type: Dimensional hash table
Description: Highly scalable distributed database. Combines Dynamo’s distributed design and Bigtable’s column family data model.
Storage: Clusters of multiple keyspaces. The keyspace is a name space for column families. Columns are comprised of a name, value and timestamp.
API(s): Java, Ruby, perl, Python, C#, Thrift framework.
Notes: Open sourced by Facebook in 2008. Wiki, FAQ, Examples

CouchDB

Company/Org: Apache
Type: Document
Description: Distributed database with incremental replication, bi-directional conflict detection and management.
Storage: Ad-hoc and schema-free with a flat address space.
API(s): RESTful JSON API. JavaScript query language.
Notes: CouchDB Introduction, Technical Overview

db4o

Company/Org: Versant
Type: Object
Description: Java and .NET dual license (commercial and open source) object database.
Storage: Data objects are stored in the way they are defined in the application.
API(s): Java, .NET languages.
Notes: db4o db4o database runtime engine, about db4o

Dovetaildb

Company/Org: Millstone Creative Works
Type: JSON-based
Description: Schemaless database similar to Amazon’s SimpleDB. Open source, standalone Java application server.
Storage: JSON data format, “bags” (similar to tables).
API(s): HTTP and Javascript APIs
Notes: Dovetaildb JavaScript API reference manual

Dynomite

Company/Org: Cliff Moon
Type: Key/Value
Description: Open source Amazon Dynamo clone written in Erlang.
Storage: Distributed key/valve store, Pluggable storage engines.
API(s): Thrift API
Notes: Dynomite Wiki

eXtreme Scale

Company/Org: IBM
Type: In-memory grid/cache
Description: Distributed cache processes, partitions, replicates and manages data across servers.
Storage: Data and database cache, “near cache” for local subset of data. Java persistent cache. Map reduce support.
API(s): Java APIs, REST data service
Notes: eXtreme Scale Document library web site

GT.M

Company/Org: FIS
Type: Hierarchical, multi-dimensional sparse arrays, content associative memory
Description: Small footprint, multi-dimensional array with fill support for ACID transactions, optimistic concurrency and software transactional memory.
Storage: Unstructured array of bytes. Can be Key/Value, document oriented, schema-less, dictionary or any other data model.
API(s): Mumps, C/C++, SQL
Notes: GT.M FAQ

hamsterDB

Company/Org: Christoph Rupp
Type: Embedded storage library
Description: Lightweight embedded database engine. Supports on disk and in memory databases.
Storage: B+tree with variable length keys.
API(s): C++, Python, .NET and Java
Notes: hamsterdb FAQ, examples, tutorial

HBase

Company/Org: Apache
Type: Sparse, distributed, persistent multidimensional sorted map.
Description: Open source, distributed, column-oriented, “Bigtable like” store
Storage: Data row has a sortable row key and an arbitrary number of columns, each containing arrays of bytes.
API(s): Java API, Thrift API, RESTful API
Notes: Part of Apache Hadoop project. HBase Wiki, FAQ

Hypertable

Company/Org: Zvents Inc.
Type: Sparse, distributed, persistent multidimensional sorted map.
Description: High performance distributed data storage system designed to run on distributed filesystems (but can run on local filesystems). Modeled after Google Bigtable.
Storage: Row key (primary key), column family, column qualifier, time stamp.
API(s): C++, Thrift API, HQL
Notes: Hypertable Architectural overview, FAQ

Infinispan

Company/Org: JBoss Community
Type: Grid/Cache
Description: Scalable, highly available, peer to peer, data grid platform.
Storage: Key/Value pair with optional expiration lifespan.
API(s): Java, PHP, Python, Ruby, C
Notes: Infinispan FAQ, Wiki

InfoGrid

Company/Org:
Type: Graph
Description: Internet graph database made up on nodes and edges. Supports in-memory and persistent storage alternatives including RDBMS, file system, file grid, and custom storage.
Storage: Nodes (meshobjects) and edges (relationships). Meshobjects can have entity types, properties and participage in relationships. MeshObjects raise events.
API(s): RESTful web services.
Notes: InfoGrid Overview, FAQ

Keyspace

Company/Org: Scalien
Type: Key/Value
Description: Distributed (master/slave) key-value data store delivering strong consistency, fault-tolerance and high availability.
Storage: Uses BErkeleyDB library for For local storage. Key/Value pairs and their state are replicated to multiple servers.
API(s): C/C++, Python, PHP, HTTP
Notes: Keyspace Overview, FAQ

MemcachedDB

Company/Org:
Type: Key/Value
Description: High performance, high realiability persistent storage engine for key/value object storage.
Storage: Uses BerkeleyDB as storage library/backend.
API(s): Memcache protocol, C, Python, Java, perl
Notes: MemcacheDB complete guide (PDF format)

Mnesia

Company/Org: Ericsson
Type: Key/Value
Description: Multiuser distributed database including support for replication and dynamic reconfiguration.
Storage: Organized as a set of tables made up of Erlang records. Tables also have properties including type location, persistence, etc.
API(s): Erlang
Notes: Mnesia Reference manual

MongoDB

Company/Org: 10gen
Type: Document
Description: Scalable, high-performance, open source, schema-free, document-oriented database
Storage: JSON-like data schemas, Dynamic queries, Indexing, replication, MapReduc
API(s): C,C++, Java, JavaScript, perl, PHP, Python, Ruby, C#, Erlang, Go, Groovy, Haskell, Scala, F#
Notes: MongoDB Documentation Index

Neo4J

Company/Org: Neo Technology
Type: Graph
Description: Embedded, small footprint, disk based, transactional graph database written in Java. Dual license – free and commercial.
Storage: Graph-oriented data model with nodes, relationships and properties.
API(s): Java, Python, Ruby, Scala, Groovy, PHP, RESTful API.
Notes: Neo4J Wiki, API, FAQ

Redis

Company/Org:
Type: Key/Value
Description: Key/Value store with the dataset kept in memory and saved to disk asynchronously. “not just another key-value DB”
Storage: Values can be strings, lists sets and sorted sets.
API(s): Python, Ruby, PHP, Erlang, Lua, C, C#, Java, Scala, perl
Notes: Redis Wiki

SimpleDB

Company/Org: Amazon
Type: Item/Attribute/Value
Description: Scalable Web Service providing data storage, query and indexing in Amazon’s cloud.
Storage: Items (like rows of data), Attributes (like column headers), and Values (can be multiple values)
API(s): SOAP, REST
Notes: SimpleDB FAQ, Getting Started Guide, Developer Guide, API

Tokyo Cabinet

Company/Org: Mikio Hirabayashi
Type: Key/Value
Description: Library (written in C) of functions for managing files of key/value pairs. Multi-thread support.
Storage: Keys and Values can have variable byte length. Binary data and strings can be used as a key and a value.
API(s): C, perl, Ruby, Java, Lua.
Notes: Tokyo Cabinet Specifications, presentation (PDF format). Also available: Tokyo Tyrant (remote service), Tokyo Distopia (full text search), Tokyo Promenade (content management).

Voldemort

Company/Org: LinkedIn
Type: Hash Table
Description: “It is basically just a big, distributed, persistent, fault-tolerant hash table.” High performance and availability.
Storage: Each key is unique to a store. Each key can have at most one value. Supported types: JSON, string, identity, protobuf, java-serialization.
API(s): Java, C++, custom clients
Notes: Project Voldemort Wiki, Client how-to

It’s one thing to have lots of choices for non-relational databases.  Building up a NoSQL knowledge and experience base will definitely help managers, architects and developers compare and contrast what they already know about relational databases. Relational databases and the SQL language are still the architecture and lingua franca for the design, development and management of database applications.  While we are still at the beginning of the use of databases in cloud infrastructures, we can move forward faster because of all of the work and collaboration that is taking place.  Depending on the user and business requirements, we can choose between existing relational database technologies or the NoSQL alternatives.

Stay tuned to the conclusion (part three) of this blog series where I’ll give you links to additional NoSQL Internet resources, past/upcoming events, offer some guidance for developers and highlight some of the comments and answer some of the questions from the first two parts.

If you are looking for more information about NoSQL and non-relational databases, check out the following sites, blog posts, and articles:

There are several upcoming and recent NoSQL conferences and meet ups with great information and solutions for architects and developers.  Here are just a few:

It has been great to see the comments and suggestions that have been posted on my Computerworld blog and also on Digg.  Thank you to everyone for joining the discussion about relational and non-relational database development.  Here are excerpts from the many comments.

  • Emil Eifrem (Neo4j) commented: “You talk about scaling to size and handling Facebook’s 100M user profiles. That’s an important use case and one that for example a key-value store handles brilliantly. But it turns out most companies aren’t Facebook. You can categorize the four emerging categories of NOSQL databases (key-value stores, column family stores, document dbs and graph databases) along the axes of scaling to size and scaling to complexity. For more information about that, see this blog post. Graph databases (like e.g. Neo4j, which I’m involved with, or Sones) excels at representing complex and rapidly evolving domain models and then traversing them with high performance.”
  • Mongo-DB Developer commented: “We have seen the most common use case to date being use of nosql solutions as operational data store of web infrastructure projects. By operational, I mean, problems with real time writes and reads (contrast with data warehousing with bulk occasional loading). For these sort of problems these solutions work well and also fit well with agile development methods where the somewhat ‘schemaless’ (or more accurately, columnless) nature of some of the solutions, and the dynamically typed nature of the storage, really helps.”
  • Peter R commented: “I have already seen, in the domain I work in, the movement away from straight up SQL databases. XML databases are one technology that will be stealing a lot of SQL’s thunder (if they haven’t already). Do I think SQL will ever die? No. But the key is that there will be/are more options that need to be thought about when designing a system now.”
  • Anonymous commented: “I agree object databases have a purpose. They are great for large datasets that need to be replicated and called by a key. However SQL provides a very important capability and that it is to be able to query data across a number of datasets very efficiently, this will be very hard to duplicate in a simple key value database.”
  • Johannes Ernst commented: “One of the difficulties for “normal” developers with many of the NoSQL technologies that you’ve described so far has been the learning curve and the additional work required: e.g. it’s easy and everybody knows how to put “every customer can place one or more orders” into a relational database, but what if the only thing you have is keys and opaque values? Compared to many other NoSQL alternatives, graph databases provide a high level of abstraction, freeing developers to concentrate on their application, while still bringing many of the same NoSQL benefits. For example, in InfoGrid (http://infogrid.org/), a project I’m involved in, you can define “Customer” and “Order” and their relationship, and the InfoGrid graph database takes care of storing and retrieving data and enforcing the relationship. In our experience, that makes graph databases much more approachable to developers than many other NoSQL technologies.”
  • Database-ed commented: “The problem is that when folks think about storing information that they need to retrieve, they are so ingrained to SQL that they fail to think of other means. The Facebook example is a case in point. Who is ever going to ask for an accurate report of every user in Facebook? If you miss something the first time you go looking, you can always present it later. The end user doesn’t know you lost it, they assume it didn’t exist at the time and now it does. Yet you still need to store the data for easy retrieval. One problem with SQL is that it ties you into the relationships. Facebook is about letting people build the relationships based on the fields they want to build them on, not the ones you might think of. I know, it can be done within the confines of SQL, but it is a lot harder to do when the size gets large.”
  • Raptor007 commented: “Some tasks that are poorly serviced by SQL may get switched over to a new method, but other implementations that are perfectly suited to SQL will continue using it. As they quoted Eric Evans in the article, “the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for.”
  • Miracle Blue commented: “While I highly doubt there’s going to be any significant migration away from SQL and the like any time soon, I think more web developers will start experimenting with data stores and other data solutions as we move further into the cloud.”
  • TheUnGod commented: “And as companies turn to ask their SQL DBAs what they think of this, they’ll say “lets stick with SQL.” Honestly, there are so many people that support SQL right now that will not switch any time soon this article is just bogus. You can’t make a switch like that until people can support it properly.”
  • SteelChicken commented: “Document centric is pretty dumb if you plan on doing any sort of analytics and data mining. Great for workflow and such.”
  • Angusm commented: “The significance of the NoSQL movement is that it adds new tools that offer better solutions to specific problems. The future probably belongs to NoSQL in the sense of ‘not-only SQL’, rather than ‘no SQL’. Don’t imagine that NoSQL solutions offer a free lunch though. I had an educational experience when I changed a view definition in a CouchDB data store and my first trivial query took an hour to come back. CouchDB can be pleasingly fast when all its indexes are built, but if you have to rebuild those indexes from scratch … well, let’s just say that’s not something you want to do on a live client-facing site.”
  • Afex2win commented: “digg is one of the bigger proponents of Cassandra, a distributed data store in the vein of which the article is talking about. http://about.digg.com/blog/looking-future-cassandra
  • Drmangrum commented: “SQL will be around for awhile. It’s good at doing what it was designed to do. However, there are many times when people use SQL simply because there is nothing better out there. As data complexity rises, a new method for accessing and persisting that data will have to be investigated. Part of the problem with many of the alternate solutions is that few people know how to use them.”

For years to come, most of us will still use relational databases and SQL.  At the same time, I will continue to look for better ways to design and implement databases in my applications.  As always, any project decision has to match with the user and business requirements.  For new projects, I believe, we now have genuine non-relational alternatives on the table (pun intended).

Programming is Life!

<blogs.computerworld.com>

May
7

SQL Server 2005 Paging – The Holy Grail

Author admin    Category SQL, Tips & Trik     Tags

Introduction

The paging and ranking functions introduced in 2005 are old news by now, but the typical ROW_NUMBER OVER() implementation only solves part of the problem.

Nearly every application that uses paging gives some indication of how many pages (or total records) are in the total result set. The challenge is to query the total number of rows, and return only the desired records with a minimum of overhead? The holy grail solution would allow you to return one page of the results and the total number of rows with no additional I/O overhead.

In this article, we’re going to explore four approaches to this problem and discuss their relative strengths and weaknesses. For the purposes of comparison, we’ll be using I/O as a relative benchmark.

The ‘two-bites’ approach

The most basic approach is the ‘two-bites’ approach. In this approach you, effectively, run your query twice; querying the total rows in one pass, and querying your result set in the second. The code is pretty straightforward:

DECLARE @startRow INT ; SET @startrow = 50
SELECTCOUNT(*) AS TotRows
FROM [INFORMATION_SCHEMA].columns

;WITH cols
AS
(
SELECT table_name, column_name,
ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq
FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

It gives the desired results, but this approach doubles the cost of the query because you query your underlying tables twice:

(1 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘syscolpars’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘sysschobjs’. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(50 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘syscolpars’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘sysschobjs’. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The temp table approach

The ‘two-bites’ approach is especially undesirable if your paged query is very expensive and complex. A common workaround is to write the superset into a temporary table, then query out the subset. This is also the most common way to implement paging pre-2005 (in this case, ROW_NUMBER is superfluous).

DECLARE @startRow INT ; SET @startrow = 50
CREATETABLE #pgeResults(
id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
table_name VARCHAR(255),
column_name VARCHAR(255)
)
INSERTINTO #pgeResults(Table_name, column_name)
SELECT table_name, column_name
FROM [INFORMATION_SCHEMA].columns
ORDERBY [table_name], [column_name]

SELECT@@ROWCOUNT AS TotRows
SELECT Table_Name, Column_Name
FROM #pgeResults
WHERE id between @startrow and @startrow + 49
ORDERBY id

DROPTABLE #pgeResults

Looking at the query plan, you can see that your underlying tables are queried only once but the I/O stats show us that you take an even bigger hit populating the temporary table.

Table ‘#pgeResults_________________________________________________________________________________________________________000000001A9F’. Scan count 0, logical reads 5599, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘syscolpars’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘sysschobjs’. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 39, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(2762 row(s) affected)

(1 row(s) affected)

(50 row(s) affected)
Table ‘#pgeResults_________________________________________________________________________________________________________000000001A9F’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In this case, it would be better to query the tables twice. Maybe some new 2005 functionality can yield a better solution.

The COUNT(*) OVER() Approach

OVER() can also be used with Aggregate Window Functions. For our purposes this means we can do a COUNT(*) without the need for a GROUP BY clause, returning the total count in our result set. The code definitely looks much cleaner and, if your application permits it, you can simply return one dataset (eliminating the overhead of writing to a temp table).

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols
AS
(
SELECT table_name, column_name,
ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq,
COUNT(*) OVER() AS totrows
FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name, totrows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

Unfortunately this approach has it’s own hidden overhead:

Table ‘Worktable’. Scan count 3, logical reads 5724, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘syscolpars’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘sysschobjs’. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Where did that come from? In this case, SQL Server implements the COUNT(*) OVER() by dumping all the data into a hidden spool table, which it then aggregates and joins back to your main output. It does this to avoid re scanning the underlying tables. Although this approach looks the cleanest, it introduces the most overhead.

I’ve spent most of today cleaning up and old data-paging proc that is both very inefficient and frequently called enough for me to notice it. I’ve explored probably a dozen other approaches to solving this problem before I came up with the solution below. For the sake of brevity—and because they rest are pretty obscure and equally inefficient­—we’ll now skip to the best solution.

The Holy Grail

In theory, ROW_NUMBER() gives you all the information you need because it assigns a sequential number to every single row in your result set. It all falls down, of course, when you only return a subset of your results that don’t include the highest sequential number. The solution is to return a 2nd column of sequential numbers, in the reverse order. The total number of the records will always be the sum of the two fields on any given row minus 1 (unless one of your sequences is zero-bound).

DECLARE @startRow INT ; SET @startrow = 50
;WITH cols
AS
(
SELECT table_name, column_name,
ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq,
ROW_NUMBER() OVER(ORDER BY table_name DESC, column_name desc) AS totrows
FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name, totrows + seq -1 as TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

This approach gives us our page of data and the total number of rows with zero additional overhead! (well, maybe one or two ms of CPU time, but that’s it) The I/O statistics are identical to querying just the subset of records.

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘syscolpars’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘sysschobjs’. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Compare the stats above with the stats and query below (just returning one page of data).

;WITH cols
AS
(
SELECT table_name, column_name,
ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq
FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘syscolpars’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘sysschobjs’. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Conclusion

I have found this approach to be best suited for smaller resultsets from complex queries where I/O is the primary bottleneck. Jeff Moden, Peso and others here have pointed out that with larger resultsets, the I/O cost you save is more than outweighed by the CPU cost. You definitly want to compare different approches to find the best solution for your problem.

My real goal here was to try and figure out a way to avoid unnecessary I/O overhead. I am sure that this solution is not the last word on the subject and I greatly look forward to hearing your thoughts, experiences and ideas on this topic. Thank you all for reading and for your feedback.

Follow us on Twitter! Follow us on Twitter!
[Powered by Android]

Blogroll

Google Search :)

Calendar

April 2014
M T W T F S S
« Mar    
 123456
78910111213
14151617181920
21222324252627
282930  

Archives

Recent Posts