Archive | June, 2003

Tags: , ,

Identifying data objects and relationships

Posted on 13 June 2003 by Demian Turner

Introduction

In order to begin constructing the basic model, the modeler must analyze the information gathered during the requirements analysis for the purpose of:

  • classifying data objects as either entities or attributes,
  • identifying and defining relationships between entities,
  • naming and defining identified entities, attributes, and relationships,
  • documenting this information in the data document.

To accomplish these goals the modeler must analyze narratives from users, notes from meeting, policy and procedure documents, and, if lucky, design documents from the current information system.

Although it is easy to define the basic constructs of the ER model, it is not an easy task to distinguish their roles in building the data model. What makes an object an entity or attribute? For example, given the statement “employees work on projects”. Should employees be classified as an entity or attribute? Very often, the correct answer depends upon the requirements of the database. In some cases, employee would be an entity, in some it would be an attribute.

While the definitions of the constructs in the ER Model are simple, the model does not address the fundamental issue of how to identify them. Some commonly given guidelines are:

  • entities contain descriptive information
  • attributes either identify or describe entities
  • relationships are associations between entities

Identifying entities
There are various definitions of an entity:

  • “Any distinguishable person, place, thing, event, or concept, about which information is kept” [BRUC92]
  • “A thing which can be distinctly identified” [CHEN76]
  • “Any distinguishable object that is to be represented in a database” [DATE86]
  • “…anything about which we store information (e.g. supplier, machine tool, employee, utility pole, airline seat, etc.). For each entity type, certain attributes are stored”. [MART89]

These definitions contain common themes about entities:

  • an entity is a “thing”, “concept” or, object”. However, entities can sometimes represent the relationships between two or more objects. This type of entity is known as an associative entity.
  • entities are objects which contain descriptive information. If an data object you have identified is described by other objects, then it is an entity. If there is no descriptive information associated with the item, it is not an entity. Whether or not a data object is an entity may depend upon the organization or activity being modeled.
  • an entity represents many things which share properties. They are not single things. For example, King Lear and Hamlet are both plays which share common attributes such as name, author, and cast of characters. The entity describing these things would be PLAY, with King Lear and Hamlet being instances of the entity.
  • entities which share common properties are candidates for being converted to generalization hierarchies(See below)
  • entities should not be used to distinguish between time periods. For example, the entities 1st Quarter Profits, 2nd Quarter Profits, etc. should be collapsed into a single entity called Profits. An attribute specifying the time period would be used to categorize by time.
  • not every thing the users want to collect information about will be an entity. A complex concept may require more than one entity to represent it. Others “things” users think important may not be entities.

Identifying attributes
Attributes are data objects that either identify or describe entities. Attributes that identify entities are called key attributes. Attributes that describe an entity are called non-key attributes.

The process for identifying attributes is similar except now you want to look for and extract those names that appear to be descriptive noun phrases.

Validating attributes
Attribute values should be atomic, that is, present a single fact. Having disaggregated data allows simpler programming, greater reusability of data, and easier implementation of changes. Normalization also depends upon the “single fact” rule being followed. Common types of violations include:

  • simple aggregation – a common example is Person Name which concatenates first name, middle initial, and last name. Another is Address which concatenates, street address, city, and zip code. When dealing with such attributes, you need to find out if there are good reasons for decomposing them. For example, do the end-users want to use the person’s first name in a form letter? Do they want to sort by zip code?
  • complex codes – these are attributes whose values are codes composed of concatenated pieces of information. An example is the code attached to automobiles and trucks. The code represents over 10 different pieces of information about the vehicle. Unless part of an industry standard, these codes have no meaning to the end user. They are very difficult to process and update.
  • text blocks – these are free-form text fields. While they have a legitimate use, an over reliance on them may indicate that some data requirements are not met by the model.
  • mixed domains – this is where a value of an attribute can have different meaning under different conditions.

Derived attributes and code values
Two areas where data modeling experts disagree is whether derived attributes and attributes whose values are codes should be permitted in the data model.

Derived attributes are those created by a formula or by a summary operation on other attributes. Arguments against including derived data are based on the premise that derived data should not be stored in a database and therefore should not be included in the data model. The arguments in favor are:

  • derived data is often important to both managers and users and therefore should be included in the data model.
  • it is just as important, perhaps more so, to document derived attributes just as you would other attributes
  • including derived attributes in the data model does not imply how they will be implemented.

A coded value uses one or more letters or numbers to represent a fact. For example, the value Gender might use the letters “M” and “F” as values rather than “Male” and “Female”. Those who are against this practice cite that codes have no intuitive meaning to the end-users and add complexity to processing data. Those in favor argue that many organizations have a long history of using coded attributes, that codes save space, and improve flexibility in that values can be easily added or modified by means of look-up tables.

Identifying relationships
Relationships are associations between entities. Typically, a relationship is indicated by a verb connecting two or more entities. For example:
employees are assigned to projects

As relationships are identified they should be classified in terms of cardinality, optionality, direction, and dependence. As a result of defining the relationships, some relationships may be dropped and new relationships added.

Cardinality quantifies the relationships between entities by measuring how many instances of one entity are related to a single instance of another. To determine the cardinality, assume the existence of an instance of one of the entities. Then determine how many specific instances of the second entity could be related to the first. Repeat this analysis reversing the entities. For example,
employees may be assigned to no more than three projects at a time; every project has at least two employees assigned to it.
Here the cardinality of the relationship from employees to projects is three; from projects to employees, the cardinality is two. Therefore, this relationship can be classified as a many-to-many relationship.

If a relationship can have a cardinality of zero, it is an optional relationship.If it must have a cardinality of at least one, the relationship is mandatory. Optional relationships are typically indicated by the conditional tense. For example,
an employee may be assigned to a project.

Mandatory relationships, on the other hand, are indicated by words such as must have. For example,
a student must register for at least three course each semester.

In the case of the specific relationship form (1:1 and 1:M), there is always a parent entity and a child entity. In one-to-many relationships, the parent is always the entity with the cardinality of one. In one-to-many relationships, the choice of the parent entity must be made in the context of the business being modeled. If a decision cannot be made, the choice is arbitrary. 

 

Comments (0)

Free and open source software – a feasibility study

Posted on 12 June 2003 by Demian Turner

I had this PDF sitting on my ‘desktop’ for months, I finally got around to reading it, very good essay 🙂  I think I got it from the MySQL site, you’ll notice the Swedish slant.

An interesting question is whether or not it is possible to make a profit on free and open source software. Since access to the workings of the software itself, the source code, is free of charge, any business model must be geared towards value-added services and products.

Successful business models based on free and open source software emanate from one or more of the following areas:

  • Software distributions: the sale of a packaged product based on free and open source software.
  • Development and sales of in-house developed product.
  • Added-value sale: free and open source software is used in order to support the sale of one’s own supplementary products, such as other applications and hardware.
  • Services: support, training, consulting, etc.
  • Accessories: literature, etc.

Both HP and IBM reported revenues of billions of dollars for Linux-related solutions in 2002.

Saving the best for last 😉  Thanks to Adobe’s great PDF-to-Html converter for when it comes to quoting PDFs.

Comments (0)

Tips & Tricks: The RPM Package Manager

Posted on 12 June 2003 by Demian Turner

What is RPM? And how can it be used? The RedHat Package Manager can be used to install, uninstall, query and maintain the packages on your RedHat system. There are a few basic commands everyone should know:

Comments (2)

Zend Teams Up with Sun in Java Initiative

Posted on 11 June 2003 by Demian Turner

The initiative is an important step in further strengthening PHP adoption by enterprises. Furthermore, it paves the way for standards-based middleware products that will provide scalable integration between PHP front-end web applications and Java business logic.

The new Java Specification Request – Scripting Pages in Java Web Applications, JSR-223 – will describe how to write portable Java classes that can be invoked from a page written in any scripting language, with PHP serving as the reference scripting language implementation. It will include details on security, resources and class loader contexts.

The new specification will lead to products that enable building n-tier applications that have a web scripting front-end which utilizes Java objects. Examples of such applications include: a web-based scheduling system that accesses an enterprise-wide Java-based personal contacts system for names and e-mail addresses; or a web-based CRM system that connects to Java-based transactional systems for purchasing and transaction processing activities.

Comments (0)

CAPTCHAs: Distinguishing Humans from Computers

Posted on 11 June 2003 by Demian Turner

This interesting story came out this week in the form of a new class at PHPclasses.

A CAPTCHA (Completely Automated Public Turing to tell Computers from Humans Apart) is a program that can generate and grade tests that:

  • Most humans can pass.
  • Current computer programs cannot pass.

For example, humans can read distorted text but current computer programs cannot.

Comments (0)

Take Credit Card Payments with PHP and Paypal

Posted on 11 June 2003 by Demian Turner

If you haven’t seen this tutorial yet over at Zend it’s a good read, check it out.  What would be really nice would be taking the concept, reworking the procedural code and creating a Payment_Paypal class at PEAR.

Now that would come in handy in PHPseagull 😉

Comments (0)

Lufthansa.com serves 6 Million pages daily with PHP

Posted on 11 June 2003 by Demian Turner

6 million pages is a lot 😉

Every passenger of Lufthansa, whether a private customer or an organization, benefits from the superior response time and friendly experience of the e-ticketing system on Lufthansa.com. The site’s e-ticketing functionality, including on-line booking, payment, frequent traveler services, scheduling, check-in etc., rely on a PHP-based e-booking engine developed and maintained by Lufthansa.

Customer service ranks very high on Lufthansa’s list of priorities, hence precise target numbers are defined, assuring that the user experience of 6 million daily visitors to the site will be most satisfactory. As a result, Lufthansa has to guarantee e-booking availability for 750 bookings per hour and prepare the infrastructure for an increase in forecasted demand and maintain availability levels.

Comments (1)

Schema fix for PHPseagull 0.1.4

Posted on 10 June 2003 by Demian Turner

A small error in the DB schema crept in with the last update, the result is you cannot preview articles in the admin view of Publisher.

To fix this you can download the updated schema here:

http://www.phpkitchen.com/phpseagull/phpseagull.sql.0.1.4-beta.fix.zip

To install just do the usual as root:

$ mysql phpseagull < phpseagull.sql

‘Drop tables’ has been specified so this will overwrite your existing data if you have any.

If you want preserve your data do a more granular update by just replacing the tables:

item_type
item_type_mapping

with the ones supplied above.  PHPseagull 0.1.5 will be along in the next week or so with more updates.

Comments (0)

New php|architect issue out

Posted on 09 June 2003 by Demian Turner

php|architect has announced the release of their June 2003 issue, which is available for download from their website as of today.

Here are some highlights from the current issue:

  • Agile Software Development with PHPUnit>
    Michael Huttermann explains how to help your next development project
    succeed. Techniques from the popular Extreme Programming method are
    covered, with an emphasis on test-driven development.

  • Getting a grip on LDAP
    This month, the Editor-in-Chief gets
    technical about what LDAP is, what LDAP isn’t, and what makes it tick.
    This is a gentler introduction targeting those who know how to deal
    with data, but never had to get it from a directory service
    implementation. Differences between a traditional DBMS and LDAP are
    covered, combined with very bite-sized code snippets to get you started
    on the path to LDAP enlightenment.

  • Industrial Strength MVC
    This month, Jason Sweat returns to
    show you how to optimize your PHP code… by not writing so much of it!
    By utilizing open source tools like Smarty and Phrame, and offloading
    some work to PostgreSQL, you can develop enterprise applications in a
    hurry, and gain the benefits of an MVC framework in your app with the
    greatest of ease!

  • Lucene
    Dave’s back. Join our resident Java guy, Dave Palmer,
    as he takes you down the halls of Lucene, a powerful Java search engine
    API. See how you can painlessly add reliable search functionality to
    your next PHP project.

  • Tailoring WAP sites with WURFL
    Andrea Trasatti cuts the
    wires, and shows you how to use the WURFL project to create
    cross-platform WAP applications, tailored specifically to the device
    accessing it.

  • Object-oriented Form Management With PHP (and an Eye on PHP5)

    Marco Tabini explores the realm of HTML forms from the perspective of
    object-oriented programming, and shows us that there’s more to form
    management than meets the eye.

For more information, or to check the new issue, visit php|architect at www.phparch.com!

Comments (0)

Tags: , ,

Creating Navigation Widgets with PHP Seagull

Posted on 05 June 2003 by Demian Turner

 

This tutorial presumes you’ve had a look at the PHP Seagull app framework.

At the moment the PHP Seagull offers a fairly simple API for creating a range of navigation widgets.  The three main approaches are as follows:

  1. Using a wrapper for PEAR’s HTML_Treemenu:To create a menu like the one used in the admin section under Publisher you can use the wrapper method getGuruTree() in the CategoryManager class.  You can customise the defaults set in the method, basically the data is grabbed from the category table which follows an id-parent id structure and is simple to build.  Of course you can use Publisher, hit the categories button and use the web API to build up the tree, so far it does everything you’ll need except node ordering, this is coming soon.

    The main reason for using getGuruTree() should be when you need to constantly update the category information, like in an admin role updating the nodes, so there’ll be quite a big DB call for each request (depending on how many nodes are in  your table) and obviously no caching.

 

Comments (0)

Categories

Books

Demian Turner's currently-reading book recommendations, reviews, favorite quotes, book clubs, book trivia, book lists

Facebook