Exported Text
Exported Text
id=12100522
P1. How many records does the file contain? How many fields are there per record?
The file contains 7 records and each record consists of 5 fields.
Solution: The file contains seven records (21-5Z through 31-7P) and each of the records is composed of five
fields (PROJECT_CODE through PROJECT_BID_PRICE.)
P2. What problem would you encounter if you wanted to produce a listing by city? How would you solve this
problem by altering the file structure?
The problem presented here is that we won't be able to extract the records specifically by city as there is no field
for city individually. To solve this issue, we need to create a separate column for city.
Solution: The city names are contained within the MANAGER_ADDRESS attribute and decomposing this
character (string) field at the application level is cumbersome at best. (Queries become much more difficult to
write and take longer to execute when internal string searches must be conducted.) In addition, searching for
"Franklin" may return a city (Franklin) or street (eg. Franklin Rd) match, which cannot easily be differentiated.
If the ability to produce city listings is important, it is best to store the city name as a separate attribute.
P4. What data redundancies do you detect? How could those redundancies lead to anomalies?
The data redundancies can be seen in the following 3 fields - PROJECT_MANAGER, MANAGER_PHONE,
MANAGER_ADDRESS. If such redundancies are there, then data errors, data inconsistency and data integrity
issues can occur.
Solution: Note that the manager named Holly B. Parker occurs three times, indicating that she manages three
projects coded 21-5Z, 25-9T, and 29-2D, respectively. (The occurrences indicate that there is a 1:M relationship
between PROJECT and MANAGER: each project is managed by only one manager but, apparently, a manager
may manage more than one project.) Ms. Parker's phone number and address also occur three times. If Ms.
Parker moves and/or changes her phone number, these changes must be made more than once and they must all
be made correctly... without missing a single occurrence. If any occurrence is missed during the change, the data
are "different" for the same person. After some time, it may become difficult to determine what the correct data
are. In addition, multiple occurrences invite misspellings and digit transpositions, thus producing the same
anomalies. The same problems exist for the multiple occurrences of George F. Dorts. Solution: Q5. Evolution
of Data Models First generation: File System Second generation: Hierarchical and Network
Third generation: Relational Fourth generation: Object Oriented Fifth generation: XML Hybrid
Emerging: Key-value store (NoSQL)