Using RDF to create a local business review and search network

by Christopher Schmidt

FOAF, and more generally, RDF, are a way to describe the world we live in and the relationships that world creates. One of the largest ways that people in the world interact is through their purchases - where they shop, and why they shop there. To be able to describe this in a way that is useful to consumers would allow the creation of a more rich description of the world around us as well as a useful resource to the consumers participating.

Businesses have many aspects that can be described in a way that is the same across the board: what items they carry, what types of currency they accept, the hours they are open, their location, and more. In addition, these businesses may offer services -- printing, web design, law assistance -- which others may want to take advantage of. In addition to this "hard" metadata, there also exist a number of measures which are more "soft" -- opinions, reviews, and other things of a similar nature. All of these aspects together create a database of information which is useful for customers of these businesses. FOAF[1] and RDF[2] provide a way to describe this information in a way that allows the data to be searched, as well as allowing for the data to be archived and expanded upon in a fashion that's decentralized.

There are several aspects of generating business information that go into creating a decentralized information and review process. Some of these have been examined in the past, some of them have not. First, there are several very important pieces of information about businesses that need to be conveyed to any potential customer. For most local businesses, a customer will want to have an address and name available. There are currently mechanisms in place which allow you to describe these aspects of objects in RDF, specifically the Dublin Core[3] and vCard[4] schemas, although the vCard schema is notably under-maintained. In addition to this information, in many cases it may be useful to have a GPS location, useful for mapping and distance information. This information can be presented using the Geo[5] RDF schema. Additionally, it may be useful to have a machine readable description of the category or categories that a business might fit into, whether it be a short textual description or something longer or more complex, such as a URL based schema.

Perhaps the items could even be described using WordNet[6]: A shop sells wn:screwdrivers, wn:hammers, and wn:liqour (a dangerous combination!) However, this doesn't really allow for the definition of specialties of the shop -- no schema yet seems to support that kind of description. Since people most often want to look for companies that are the best or centered around what they're looking for -- Home Depot[7] might sell gazebos, but certainly The Gazebo Store would be a better place to look -- we also need levels of description. This can be designated, to a certain extent, by a foaf:interest. By using a standard set of website URLs for certain types of products -- Amazon for books, Similar websites for other products -- not only can people search by a product name, but they can also in many cases search on a unique resource identifier. Simply click the "Search my local area for this product" on Amazon, and you've got a search for the product.

Relationships described: people talking to people and its effects on consumerism The most important part of any business system is often word of mouth referrals. Susie told Bob about the Hardware Store down on Elm, and then Bob told Anne, and Anne told Dave. Suddenly, Dave's wife is buying all her light bulbs and socket wrenches at the Hardware Store down on Elm, because she heard it was the best place to be. This type of word of mouth referral is going to afford a different level of information depending on who tells you. Your landlord Stan who doesn't ever fix the pipes is obviously not the best source of information on where the buy the best tools. If he could buy good tools, he would be able to fix the drain already. However, given a trust level, you may be able to determine what level of trust is appropriate in the source. You can then create a distributed, trust based system for reviews, which can be aggregated using a combination of RSS and the concept of Trackback[8]: providing a URL to which documents can send a "ping" to, notification that they have written an entry or review of the business in question.

An example of a way to describe the above information in RDF might be:

[a foaf:Person; foaf:name "Susie"; business:visits 
  [ a business:Store; business:sells [a wn:screwdriver] ]; 
  foaf:knows [ a foaf:Person foaf:name "Bob" ] ] .

This example uses an example namespace, business, which does not exist, but would be useful for describing such information. See related work for more information on what projects have already taken steps in this direction.

Using an RDF query language, tools can use these aggregated URLs to select data from - data which provides information about reviews and other information about a restaurant. Using this distributed data, websites or tools can then provide all the available information about a business. By its very nature, this type of system is open: the type of documents used to describe the data are delivered in RDF for queries, and no claims can be made that this type of data should be "hidden", or anything similar. As such, although aggregation of this information might be limited, it is possible to imagine that such tools could be easily distributed: all the different review sites could provide their own interface to the information in question, providing a delivery mechanism that is both distributed and unbiased. When all the different tools can provide their own interface, there is no motivation to attempt to "hide" bad reviews: sites which do this will simply be ignored for sites which are known not to. Again, these types of differences can be described using trust by different people using the sites.

All of this describes how such a system might work in a perfect world. In reality, there are often people who will remove bad reviews, or will not submit reviews via the approved process, or something similar. However, by providing a system which allows for decentralization: multiple ping locations for example, and multiple sites for data storage - the effects of a specific individual or business wanting to prevent bad reviews can be alleviated by the ability to distribute the reviews to a wide variety of uninvolved parties who can not be affected by the wishes of one business.

Each aspect of semantic data, combined with a network of people, can build accurate, useful information about businesses far and wide. Using a variety of different techniques to describe who people are, what they know about, and the information they know about different businesses, combined with a distributed mechanism for distributing such information to tools which can recreate, display, and offer the data to the public can provide a mechanism through which business reviews can be created, distributed and aggregated with practically no work on the part of anyone other than the actual effort of writing such a review. This could create a great way for local businesses to participate in the semantic web revolution: by providing them with a worthwhile reason to provide information about themselves and their products, you might see an increase in the number of businesses willing to put effort into participating in the semantic web. By helping the bottom line, you help the semantic web grow.

These reviews are useful for more than just the businesses. By using FOAF to describe the reviewers, you begin to build a social network of people who have reviewed a specific business. Friends reviewing businesses refer friends in real life, and the same thing can happen with an adequately built web. This can be the beginning of an implicitly defined semantic web. Creating these webs - based around something other than relationships defined as the person creating them sees fit - is the beginning of the creation of a web that represents real relationships between people. This type of web is required for the continued and improved success of social networking sites. Taking cues from real life, rather than online only information, is the way to begin to build a web of people that you might have or want to build a relationship that you don't know yet.

The Social Networking world must continue to change and grow and become more involved with things that are not related directly to Social Networking. One aspect that can be improved by the inclusion of social networking is the reviews offered by people, as has been discussed in this paper. By providing businesses with a useful tool, you encourage participation, and participation will lead to a more rich and dense social web using the semantic web.


Related Works

The Chef Moz project uses an RDF schema to describe a variety of data about restaurants. (Data is available at ChefMoz RDF Files.) However, it is limited in a number of ways. First, the data uses a "proprietary" schema, which does not use existing schemas to represent data. In addition, it is limited to restaurants, and makes no efforts to link the data to other tools.


References

  1. FOAF Vocabulary Specification, a schema for describing information about people and the relationships between them.
  2. Resource Descritpion Framework
  3. Dublin Core Metadata Initiative, a common way to describe things like titles, descriptions, and other common terms in RDF.
  4. Representing vCard Objects in RDF/XML, a way to translate many forms of traditional contact data into RDF.
  5. RDFIG Geo vocab workspace, the workspace for the Geo Vocabulary, which allows description of points using geo coordinates.
  6. Wordnet is an online Lexical reference system. The RDF version is describe at xmlns.com.
  7. Home Depot, a hardware store.
  8. Trackback, a way to communicate between sites.