Monday, October 12, 2009

Push Stack

Push stack inserts an element into the stack. The first thing we need to do when we push data into a stack is find memory for the node. We must therefore allocate memory from dynamic memory. Once the memory is allocated, we simply assign the data to the Stack node and then set the next pointer to point to the node currently indicated as the stack top. We also need to update the stack top pointer and add 1 to the stack count field. Figure 9 traces a push stack operation in which a new pointer (newptr)is used to identify the data to be inserted into the stack.

To develop the insertion algorithm, we need to analyze three different stack conditions: (1) insertion into an empty stack, (2) insertion into a stack with data, and (3) insertion into a stack when the available memory is exhausted. The first two of these operations are shown in Figure 8. When we inset into a stack that contains data, the new node’s next pointer is set to point to the node currently at the top, and the stack’s top pointer is set to point to the new node. When we insert into an empty stack, the new node’s next pointer is set to null and the stack’s top pointer is set to point to the new node. However, because the stack’s top pointer is null, we can use it to set the new node’s next pointer to null. Thus the logic for inserting into a Stack with data and the logic for inserting into an empty stack are identical. The logic for the third case, stack overflow, depends on the application.

Implementation of Stacks in Memory

We present a linked list method using dynamic memory allocation and pointers. The two structures needed to implement stacks are head structure and data node structure. We name them STACK and NODE, respectively. In C, they are defined as follows.

struct node

{

int data;

struct node *next;

};

struct stack

{

int count;

struct node *top;

}stack1;

These declarations will reserve required memory for the two structures, but no values are assigned to the members of the both structures. The following algorithm will initialize, that is, will assign values, the stack to an empty state, which can further be expanded or shrunk as the need arises. Situation before execution of algorithm, and after the algorithm is shown in first part of figure 8.

Introduction Stacks

A stack is an ordered dynamic linear list in which all additions of new elements and deletions of existing elements are restricted to one end, called the Top. If a data series is inserted into a stack, and then removed it, the order of the data would be reversed. This reversing attribute due to addition and removal at the top of stack give a special behaviour, called as “Last-in, first-out” (LIFO) structure.

A stack is dynamic because it is constantly changing

objects, it expands and shrinks with passage of time.

The basic Stack operations are Create Stack, Push,

Pop, Stacktop, Emptystack, Full stack, Stack Counts,

And Destroy stacks.

Separate Chaining Resolution Using Linked List

A major disadvantage to open addressing is that each collision resolution increases the probability of future collisions. This disadvantage is eliminated in the second approach to collision resolution known as separate chaining using short linked lists. Separate chaining is an ordered collection of data in which each element contains the location of the next synonymous element. For example, in Figure 12, array element 001, Sarah Trapp, contains a pointer to the next element, Harry Eagle, which in turn contains a pointer to the third element, Chirs Walljasper.

Quadratic Probe

Primary clustering can be eliminated by adding a value other than 1 to the current address. One easily implemented method is to use the quadratic probe. In the quadratic probe, the increment is the collision probe number squared. Thus for the first probe we add 12, for the second collision probe we add 22, for the third collision probe we add 32, and so forth until we either find an empty element or we exhaust the possible elements. To ensure that we don’t run off the end of the address list, we use the modulo of the quadratic sum for the new address. This sequence is shown in Table 1, which for simplicity assumes a collision at location 1 and a list size of 100. From table, we can see quadratic probing causes secondary clustering on a collision resolution path.

Address = H(K)+C i2

Where we take C=1 and i is collision probe number.

Pseudorandom Method

In the pseudorandom method, the key is used as the seed in a pseudorandom number generator and the resulting random number is then scaled into the possible address range using modulo division. A common random number generator is shown below.

y = ax + c

This method is also known as MAD which stands for multiply, add and divide. To use the pseudorandom number generator as a hashing method, we set x to the key, multiply it by the coefficient a, and then add the constant c. The result is then divided by the list size with the remainder (see “Modulo-Division Method,”) being the hashed address. Let’s demonstrate the concept with an example from Figure 6. To keep the calculation reasonable, we use 17 and 7 for factors a and c, respectively. Also, the list size in the example is the prime number 307.

y = ((17 * 121267) + 7)modulo 307

y = (2061539 + 7) modulo 307

y = 2061546 modulo 307

y = 4

We will see this pseudorandom number generator again when we discuss collision resolution.

Midsquare Method

In midsquare hashing, the key is squared and the address selected from the middle of the squared number. The most obvious limitation of this method is the size of the key. Given a key of 6 digits, the product will be 12 digits, which is beyond the maximum integer size of many computers. Because most personal computers can handle a 9-digits integer, let’s demonstrate the concept with keys of 4 digits. Given a key of 9452, the midsquare address calculation is shown below using a 4-digit address (0000 to 9999).

9452 * 9452 = 89340304 : address is 3403

As a variation on the midsquare method, we can select a portion of the key, such as the middle three digits, and then use them rather than the whole key. Doing so allows the method to be used when the key is too large to square. For example, for the keys in Figure 6, we can select the first three digits and then use the midsquare method as shown below. (We select the third, fourth, and fifth digits as the address.)

379452 : 379 * 379 = 143641 ë 364

121267 : 121 * 121 = 014641 ë 464

378845 : 378 * 378 = 142884 ë 288

160252 : 160 * 160 = 025600 ë 560

045128 : 045 * 045 = 002025 ë 202

Note that in the midsquare method, the same digits must be selected from the product. For that reason, we consider the product to have sufficient leading zeros to make it the full six digits.

Digit – Extraction Method

Using digit extraction, selected digits are extracted from the key and used as the address. For example, using our six-digit employee number to hash to a three-digit address (000 to 999) we could select the first, third, and fourth digits (from the left) and use them as the address. Using the keys from Figure 6, we would hash them to the addresses shown below.


379452 ë 394

121267 ë 112

378845 ë 388

160252 ë 102

045128 ë 051

Modulo-Division Method

Also known as division remainder, the modulo-division method divides the key by array size (table size) and uses the remainder for the address. This method gives us the simple hashing algorithm shown below where tableSize is the number of elements in the array.

address = key MODULUS tableSize

This algorithm works with any table size, but a table size that is a prime number produces fewer collisions than other table sizes. We should therefore try, whenever possible, to make the array size a prime number.

As the little company begins to grow, we realize that soon we will have more than 100 employees. Planning for the future, we create a new employee numbering system that will handle 1,000,000 employees. We also decide that we want to provide data space for up to 300 employees. The first prime number greater than 300 is 307. We therefore choose 307 as out list (array) size, which gives us a table with addresses that range from 0 through 306. Our new employee table and some of its hashed addresses are shown in Figure 6.

To demonstrate, let’s hash Bryan Devaux’s employee number, 121267.


121267/307 = 395 with remainder of 2

Therefore: hash(121267) = 2

Direct Method

In direct hashing, the key is the address without any hashing manipulation. The structure must therefore contain an element for every possible key. The situations in which you can use direct hashing are limited, but it can be very powerful because it guarantees that there are no synonyms. Let’s look at two applications.

Now let’s take an example. Imagine that a small organization has fewer than 100 employees. Each employee is assigned an employee number between 1 and 100. In this case, if we create an array of 101 employee records, the employee number can be directly used as the address of any individual record. This concept is shown in Figure 5.

Note that not every element in the array contains an employee’s record. In fact, all hashing techniques other than direct hashing require that some of the elements be empty to reduce the number of collisions.

Although this is the ideal method, its application is very limited. For example, we cannot have the National Identity Card Number as the key using this method because National Identity Card Number is 13 digits. In other words, if we use the National Identity Card Number as the key, we need an array as large as 1,000,000,000,0000 records but we would use fewer than 100 of them. Let’s turn our attention, then to hashing techniques that map a large population of possible keys into a small address space.

Probing and Probing Sequence

It should be obvious that when we need to locate an element in a hashed list, we must use the same algorithm that we used to insert it into the list. Consequently, we first hash the key and check the home address to determine whether it contains the desired element. If it does, the search is complete. If not, we must use the collision resolution algorithm to determine the next location and continue until we find the element or determine that it is not in the list. Each calculation of an address and test for success is known as a probe. In case of collisions, method of open addressing produces alternate lists of addresses. The process of probing produces a probing sequence which is a sequence of locations that we examine when we attempt to insert a new key into the hashed table, T. The first location in the probe sequence is the hash address, H (K). The second and successful locations in the probe sequence are determined by Collision Resolution Policy. To guarantee availability of always an empty location in every probe sequence, we define a “Full” Table T, to be a table having exactly one empty table entry.

Collisions in Hashing

Generally, the population of keys for a hashed list is greater than the storage area for the data. For example, if we have an array of 50 students for a class in which the students are identified by the last four digits of their National Identity Card Numbers, then there are 200 possible keys for each element in the array (10,000/50).

Because there are many keys for each index location in the array, more than one student may hash to the same location in the array. We call the set of keys that hash to the same location in our list synonyms.

Hash function H (K) are used to map keys into table addresses to store data as table entries (K, I) in hashed table. Almost always we use hashing techniques in which there are many more distinct keys K than there are table addresses, so we encounter a situation in which two distinct keys, K1 ¹ K2, map to the same table address, i.e.;

H (K1) = H (K2)

Hash Function

The idea of using the key from a large set K of keys to determine the address/location of a record into a smaller set L of table addresses leads to the form of a hash function H. A function that performs this job, such as

H (K) = L

is called a hash function or hashing function. Another way to describe hashing is as a key-to-address transformation in which the keys map to addresses in a list. This mapping transformation is shown in Figure 2. At the top of Figure 2 is a general representation of the hashing concept. The rest of the figure shows how three keys might hash to three different addresses in the list. There are various methods or functions used to map key to addresses, namely Modulo-Division Method, Midsquare Method, and Folding Method etc.

There are few criteria and purposes for hashing of keys. One is that hashing function should be very easy, simple and quick to compute. Second is to map a large possible set of keys to a smaller set of addresses in table. Usually the population of possible keys is much larger than the allocated table size. Thirdly we would like hashed addresses be distributed evenly on the smaller set so that there are a minimum number of collisions. Collision elimination is not guaranteed. Uneven distribution will cause clustering which is not desirable. Fourthly hashing allow direct access to a specific location in the table where the key of the record has been hashed. Its running time efficiency is O(1).

A hash table is a list of records in which keys are mapped by a hash function. It is similar to table, T, described in section 1.0, except here table entries T (K, I) are generated by a appropriate hash functions. A hash table generated by a perfect hash has no collisions. In this case usually all possible keys must be known before hand. This is also known as optimal hashing or perfect hashing.

Hashing Table

A table, T, is an abstract data storage structure that contains table entries that are either empty or are pairs of the form (K, I), where K is a key and I is some data or information associated with the key K.

The identity number is “key”, and name, phone etc are “information” associated with the key.

A common table operation is table searching, which is an activity in which, given a search key, K, we attempt to find the table entry (K, I) in T containing the key K. Then we may wish to retrieve or update its information, I, or we may wish to delete the entire table entry (K, I). We may also wish to insert a new table entry (K, I). We can enumerate entries in table T, e.g. to print contents of the table.

Table can be represented by various data structures like C struct, arrays and linked lists.

LIMITATION OF RECURSION

Recursion works best when the algorithm uses a data structure that naturally supports recursion. For example, Trees are a naturally recursive structure and recursion works well with them. In other cases, the algorithm is naturally suited to recursion. For example, the binary search algorithm lends itself to a natural recursive algorithm. On the other hand, not all looping algorithms can or should be implemented with a recursion.

Recursive solutions may involve extensive overhead because they use calls. When a call is made, it takes time to build a stackframe and push it into the stack. Conversely, when a return is executed, the stackframe must be popped from the stack and the local variables reset to their previous values. Again, this takes time. A recursive algorithm therefore generally runs slower than its nonrecursive implementation.

Each time we make a call we use up some of our memory allocation. If the recursion is deep – that is, if there are many recursive calls – then we may run out of memory. Therefore, the algorithms we have discussed so far (factorial and powers) are better developed iteratively if large numbers are involved. As a general rule, recursive algorithms should be used only when their efficiency is logarithmic.

HOW A RECUSION WORKS – STACK FRAMES

When a program calls a subroutine – the current module suspends processing and the called subroutine takes over control of the program. When the subroutine completes its processing and returns to the module that called it, the module wakes up and continues its processing. One important point in this interaction is that, unless changed through call by reference, all local data in the calling module are unchanged. Every local variable must be in the same state when processing resumes as it was in when processing suspended. Similarly, the parameter list must not be changed. The value of the parameters must be the same before and after a call, unless they are reference parameters.

Let’s look at an example of a simple call and see how it works. Figure 3 contains an algorithm called testPower that prints the value of a number x raised to a power y. To illustrate the function call, we use a separate algorithm, power, to determine the actual value of xy.

Our little program contains three local variables: the base number, base; the exponent, exp; and the answer, result. The base and exponent values must not be changed by the call to power. When power terminates, it returns the value of xy , which is stored in the local variable, result.

Factorial Function

A repetitive algorithm uses recursion whenever the algorithm appears in the definition itself. Simplest example of a recursive definition is that for the factorial function; defined as

n! = {

1 if n = 0 base case

n* (n – 1) ! if n > 0 general case

The decomposition of Factorial(3) is shown in figure 1. It can be noted that the recursive solution for a problem involves a two-way journey: First we decompose the problem from top to the bottom, and then we solve it from bottom to the top. The recursive solution offers a conceptual simplicity to the creator and the reader.

Introduction and definition of Recursion

Recursion is a repetitive process in which an algorithm or function calls itself. C & C++ both support recursive programming.

For the definition of function or algorithm not to be circular, it must have the following two properties.

1. There must be certain values, called based values, used for termination of program or definition of function. At these base values the function / algorithm does not call itself.

2. Each time function / program call itself, it must be closer to the base condition.

Some mathematical functions and problems (a few to mention) that can be solved recursively are

· Factorial

· Fibonacci Number

· Euclid GCD

· Towers of Hanoi

· Binary Search

Actually, in programs call stack or stackframes does this housekeeping job. Besides functions, data structures also may be recursively defined. One of the most important class is trees.

Sunday, October 11, 2009

Difference of Record and Linear Arrays

a) A record may be a collection of non-homogenous data i.e; the data items in a record may have different data types. In case of arrays, data elements are homogenous.

b) The data items in a record are indexed by attribute names, so there may not be a natural ordering of its elements. In arrays, data items are in natural order as they are sequenced by indices.

Uses of Record Structure :

1. Records are useful for modeling objects that have a number of characteristics. The record data type allows to collect various types of data about an object and to refer to the whole object by a single name. Different attributes or fields are also referred by name.

2. Records are also useful for defining other data structures, allowing programmers to combine information describing characteristics of the structure with the storage of the elements.

Note: Data files comprising of records and fields as hierarchical components are usually organized on randomized or hashed tables. These tables are implemented in arrays. In this sense, every data element of an array is a record defined as C/C++ structure.

Record; Record Structures, Representation in Memory and Record with Variable Lengths

Record: A record is typically is used when a data file is organized into a table structure. In such organization of data, a record belongs to a hierarchy of fields, records and files. We can define record as a structured data type made up of a finite collection of not necessarily homogenous (of same type) but related data elements. The related data elements bound together in a record structure are called fields or attributes. A file is a collection of similar records. Each data item itself may be a group item composed of sub items; those items which are indecomposable are called elementary items or atoms. Fields in a record are named and used as identifiers.

Record and array structures have some similar and some different properties as described below.

Similarities are:

a) Like arrays, records are accessed directly through the use of field selectors, also called identifiers. In case of arrays, an element is accessed by an index number.

b) A record, like an array, occupies a consecutive block of cells in memory. The record’s accessing function calculates the location or address from a named field selector, called identifier.

ADDRESS CALCULATION OF ARRAY ELEMENT A [J][K]

The computer keeps track of Base (A), the address of the first element

A[0][0] of A[M][N], and computes address Loc (A[J][K]) of A[M][N] using the formula

Column–Major Order:

Loc (A[J][K] ) = Base (A) + W [M x K + J ]

OR

ROW MAJOR ORDER:

Loc (A [J][K]) = Base (A) + W [N x J + K ]

W denotes the size, i.e; number of bytes per data element of the array A, M is total numbers of rows, and N is total number of columns in the array.

Two – Dimensional Arrays

A two - dimensional M x N array A[M][N] is a collection or list of M x N data elements such that each element is specified by a pair of integers (such as J, K), called subscripts, with the property that

LB of Row Index £ J £ UB of Row Index

and LB of Column Index £ K £ UB of Column Index

In C/C++, lower bounds, LB, are zero. The element of A with first subscript J and second subscript K will be denoted by A [J][K].

Two–dimensional arrays are analogous to matrices in mathematics and tables in business. In it, data can be logically viewed as a table with columns & rows.

2–dimensional array can be declared as A[ 4] [5] in C/C++.

Like 1–dimensional array, 2–dimensional array is a linear and structured data type made up of a finite, fixed–size collection of ordered homogenous elements. Its accessing mechanism is direct access.

All characteristics of 1–dimensional array, like upper bound (UB), lower bound (LB), base address, and size (W) applies as well to 2–dimensional (or multidimensional) array.

Number of elements of 2–dim array =

(UB of row index – LB of row index + 1) x (UB of column index LB of column index + 1)

In case of an array A[4] 5]

no. of elements = 4 x 5 = 20 (rows) x (column)

no. of bytes = no. of elements x bytes per element

= 20 x 2 bytes per data element = 40, if each element uses two memory bytes.

Representation of Linear Arrays in Memory (Dope Vector Method)

To implement array data structure, memory bytes must be reserved and the accessing functions must be coded. In case of linear arrays, the declaration statements tell how many cells are needed to store the array. The following characteristics of the array are used to calculate the number of cells needed and to find the location or address of any element of the array.

1. The upper bound (UB) of the index range.

2. The lower bound (LB) of the index range. In C/C++, LB is zero.

3. The location in memory of the first byte in the array, called base address of the array (Base)

4. The number of memory bytes needed for each cell containing one data element in the array (size, denoted by W)

By cell we mean a unit of memory bytes that will be assigned to hold a value of respective data element.

During the compilation of the program, the information about characteristics of the array is stored in a table called DOPE VECTOR. When compiler comes across references to an array element, it uses this information that will calculate the element’s location in memory at runtime.

Linear or One-dimensional Arrays

An array (of any dimension) is a linear and structured data type with the following properties and characteristics:

1. It is made up of finite n homogenous (of same type) data elements such that

a. Every data element is of fixed size. Size depends upon the specific nature of structure of the data elements.

b. The elements of the array are referenced respectively by an index set consisting of n consecutive numbers.

c. The elements of the array are stored respectively in successive contiguous memory locations. Array is a data–structure that uses memory statically unlike linked list, which is constructed on dynamic memory allocation.

2. The accessing mechanism of an array is direct access, which means we can access any element directly, without first accessing the preceding elements. The desired element is specified using an index, which gives its relative position in the collection. e.g.

A [0], A[1],…….. A[n-1]

3. Arrays are built–in data types in most of the programming languages including C and C++.

4. The length or size or number of data elements of the array, A[LB:UB], can be obtained from the index set by the formula.

Length of Array A = UB – LB + 1

where UB is the largest index, called the upper bound, and LB is the smallest index, called the lower bound, of the array. In C and C++ languages, LB is always zero.

Data Structure

A Structure, usually in computer memory, used to organize data and information for better algorithm efficiency is called Data structure. Examples are array, linked-lists, queue, stack and tree.

In the design of computer programs, the choice of data structure is a primary design consideration. The quality of final results depends heavily on choosing the best data structure. The choice of appropriate data structure is crucial and has given rise to many design methods and programming languages. Object oriented languages such as C++ and Java are one group of languages that exhibit this philosophy, and have several in-built structures.

A data structure can also be defined as an aggregation of atomic and structured data types into a set with defined relationships. Structure means a set of rules that hold the data together. In other words, if we take a combination of data types and fit them into a structure such that we can define its relating rules, we have made a data structure. Data structures can be nested. We can have a data structure that consists of other data structures.

Data Structure

A Structure, usually in computer memory, used to organize data and information for better algorithm efficiency is called Data structure. Examples are array, linked-lists, queue, stack and tree.

In the design of computer programs, the choice of data structure is a primary design consideration. The quality of final results depends heavily on choosing the best data structure. The choice of appropriate data structure is crucial and has given rise to many design methods and programming languages. Object oriented languages such as C++ and Java are one group of languages that exhibit this philosophy, and have several in-built structures.

A data structure can also be defined as an aggregation of atomic and structured data types into a set with defined relationships. Structure means a set of rules that hold the data together. In other words, if we take a combination of data types and fit them into a structure such that we can define its relating rules, we have made a data structure. Data structures can be nested. We can have a data structure that consists of other data structures.

Model For An Abstract Data Type

The ADT model is shown in Figure 2. The dark shaded area with an irregular outline represents the model. Inside the abstract area are two different aspects of the model: the data structure and the operational functions. Both are entirely contained in the model and are not within the user’s scope. However, the data structure is available to all of the ADT’s operations as needed, and an operation may call on other functions to accomplish its task. In other words, the data structure and the functions are within scope of each other.

Abstract Data Types (ADT)

A data type consists of two parts, a set of data with identical properties and the operations that can be performed on the data. Thus we see that integer type consists of values (whole numbers) and operations (add, subtract, multiply, divide etc).

Abstract Data Types (ADT) uses the idea of abstraction and hiding information on implementation details. Applications that use the data type are oblivious to the implementation: they only make use of operations defined or specified abstractly. In this way, the application, which might be millions of lines of code, is completely isolated from the implementation. With an ADT, applications are not concerned with how the task is done but rather with what it can do. With the use of ADT, our programs divide into two pieces or layers.

Data Types

There are two types of data, namely, atomic and structured (also called composite) data.

Atomic data are data that we choose to consider as a single, non-decomposable entity. Boolean or logical data are examples. The integer 4562 may be considered as a single integer value. We can decompose it into single digits (4,5,6,2) but decomposed digits will not have the same characteristics of the original integer.

An Atomic Data Type is a set of atomic data with identical properties. These properties distinguish one atomic data type from another. Atomic data types are defined by a domain or set of values and a set of operations that act on the values. These are data types that are defined without imposing any structure on their values.

Data and Information

Data is collected about an entity which can be an object or event (real or abstract) of interest to the user. An entity may be a person, place, location or thing- e.g., a salesperson, a city or a product. An entity can also be an event or unit of time such as machine break-down, a sale, or a month.

Data remains raw facts in isolation unless they are manipulated to be useful. These isolated facts do convey meaning but generally are not useful by themselves.

For example, consider a data base which stores data about a BEIT class. Data records contains students names, date of birth, address, courses, and grades on each course. If the Principal calls to find out the total number of students enrolled in BEIT class, the clerk can answer his question by looking at the data base. To the clerk, data base is information. But director of institute wants to know the total number of students in data structures class completing with A grade. The director will have to identify the students in class data base and then identify students with A grades. The information given to the principal is data to the director. It produced useful information for the director when it was processed further to output the list of only those students who have completed data structure course with A grades. This example tells that one person’s information may be another person’s data. In this sense, information eliminates uncertainty about a state, while data mean accumulated but unorganized facts.