Syllabus Content Introduction to Data Structure and Algorithm Analysis: • Data Structure Definition and classification • Algorithm Analysis • Storage Representation of Strings • Text Handling and KWIC Indexing What is Data Structure? • Data Structure: Data structure is a way to storing and organizing data in a computer so that it can be used efficiently. • Data structure is used in almost all program and software system. • Data structure usually consist of two things: 1. The type of structure that we are going to use stores the data. 2. How efficiently we perform operations onto that data stored in structure so we can reduce execution time and amount of memory space. Data and Information • Data: - Data is a collection of raw facts(or information) and it may or may not be meaningful. • Information: - meaningful data is known as Information. • Example: – Program- set of instruction. – 67.8- Weight of person. – 13/3/1999- Date of birth of a person. Define following terms: (a) Cell: • The smallest fundamental structural unit which represents a data entity. • Cell is a memory location which stores elements of data items. (b) Field: Field is used to store particular kind of data. (c) Record: a record is a collection of related data items. For example: Employee details. Types of Data Structure Primitive Data Structure The Data structures that are directly processed by machine using its instructions are known as primitive data structure. Following are the primitive data structure: • Integer: Integer represents numerical values which are whole quantities. The number of objects are countable can be represented by an integer. The set of integer is:{…..-(n+1), -n , ….. , -2, -1,0,1,2,………n,n+1} The sign and magnitude method is used to represent integer numbers. In this method place a sign symbol in front of the number. • Real: • The number having fractional part i.e decimal point is called real number. • Common way of representing real number is normalized floating point representation. • In this method the real number is expressed as a combination of mantissa and exponent. • Ex: 123.45 • Character: • Character data structure is used to store nonnumeric information. • It can be letters [A-Z], [a-z], operators and special symbols. • A character is represented in memory as a sequence bits. • Two most commonly known character set supported by computer are ASCII and EBCDIC. • Logical: • A logical data item is a primitive data structure that can assume the values of either “true” or “false”. • Most commonly used logical operators Are AND, OR and NOT. • Pointer: • Pointer is a variable which points to the memory address. This memory address is the location of other variable in memory. • It provides homogeneous method of referencing any data structure, regardless of the structure’s type or complexity. • Another characteristic is that it provides faster insertion and deletion of elements. Non Primitive Data Structure: The data structures that are not directly processed by machine using its instructions are known as non primitive data structure. Following are the non primitive data structure: • Array: – Array is an ordered set which consist of a fixed number of object. – Insertion and deletion operation can be performed on array. – We can only change the value of the element in array. List: • List is an ordered set which consist of variable number of elements or object. • Insertion and deletion can be performed on list. File: • A file is a large list that is stored in the external memory of computer. • A file may be used as a repository for list items commonly called records. Non Primitive Data Structure is classified into two categories: • Linear Data structure • Non linear Data structure Linear Data Structure In the linear data structure processing of data items is possible in linear fashion. • Data are processed one by one sequentially. • Examples of the linear data structure are: • (A) Array (B) Stack (C) Queue (D) Linked List (A) Array: Array is an ordered set which consist of a fixed number of object. (B) Stack: A stack is a linear list in which insertion and deletion operations are performed at only one end of the list. (C) Queue: A queue is a linear list in which insertion is performed at one end called rear and deletion is performed at another end of the list called front. (D) Linked list: A linked list is a collection of nodes. • Each node has two fields: 1. Information 2. Address, which contains the address of the next node Non Linear data structure • In the Non linear data structure processing of data items is not possible in linier fashion. • Examples of the non linear data structure are: (A)Tree (B) Graph (A) Tree: In tree data contains hierarchical relationship. (B) Graph: this data structure contains relationship between pairs of elements. Operations on Data Structure: • Traversing: Access each element of the data structure and doing some processing over the data. • Create: It results in reserving memory for program elements. • Selection: It deals with accessing a particular data within a data structure. • Inserting: Insertion of new elements into the data structure. • Deleting: Deletion of specific elements. • Searching: Searching for a specific element. • Sorting: Sorting the data in ascending or descending ordered. • Merging: Combination of two data structure. • Splitting: It is a process of partitioning from single list to multiple list. Algorithms • Algorithm is a stepwise solution to a problem. • Algorithm is a finite set of instructions that is followed to accomplish a particular task. • In mathematics and computing, an algorithm is a procedure for accomplishing some task which will terminate in a defined end-state. • The computational complexity and efficient implementation of the algorithm are important in computing, and this depends on suitable data structure. Properties required for algorithm • Finiteness: “An algorithm must always terminate after a finite number of steps”. • Definiteness: “Each steps of algorithm must be precisely defined”. • Input: “quantities which are given to it initially before the algorithm begins”. • Output: “quantities which have a specified relation to the input”. • Effectiveness: “all the operations to be performed in the algorithm must be sufficient”. Complexity of Algorithms • Time complexity: Retuning time of the program. • Space complexity: Amount of computer memory required during the program execution. Time and Space complexity depends on lots of things like Hardware, Operating System, Processor etc. Calculate Time Complexity of Algorithm
• Time Complexity is most commonly estimated
by counting the elementary functions performed by the algorithm. • Since the algorithm performance may vary with different types of input data. – Hence for an algorithm we usually use the worst- case time complexity of an algorithm because that is the maximum time taken for any input size. Calculating Time Complexity Usually, the time required by an algorithm comes under three types: • Worst case: It defines the input for which the algorithm takes a huge time. • Average case: It takes average time for the program execution. • Best case: It defines the input for which the algorithm takes the lowest time Storage Representation of String • String -> “njsmti” • Representation -> ‘n’ ‘j’ ‘s’ ‘m’ ‘t’ ‘I’ ‘\0’ • In Memory -> ASCII -> Binary of Individual Character Text Handling Align • Center • Justify • Left • Right Tab Setting • TAB PRINT (/) Text Form KWIC Indexing The KWIC [KeyWord InContext] system accepts an ordered set of lines Each line is an ordered set of words,and each word is an ordered set of characters. Any line may "circularly shifted“ by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a list of all circular shifts of all lines in alphabetical order. Input: strings, each of which consists of several words. – Clouds are white. – Ottawa is beautiful. Output: a sorted list of all orderings of each input string. – are white Clouds – beautiful Ottawa is – Clouds are white – is beautiful Ottawa – Ottawa is beautiful – white Clouds are • Advantages: – Efficient representation of data (since they are shared) • Disadvantages: – Changes to data format affects several modules. – Difficulty to implement changes/enhancements in the overall processing algorithm. – Don't support reuse. Thank You