Pay-as-you-go data integration for linked data: opportunities, challenges and architectures
Linked Data (LD) provides principles for publishing data that underpin the development of an emerging web of data. LD follows the web in providing low barriers to entry: publishers can make their data available using a small set of standard technologies, and consumers can search for and browse published data using generic tools. Like the web, consumers frequently consume data in broadly the form in which it was published; this will be satisfactory in some cases, but the diversity of publishers means that the data required to support a task may be stored in many different sources, and described in many different ways. As such, although RDF provides a syntactically homogeneous language for describing data, sources typically manifest a wide range of heterogeneities, in terms of how data on a concept is represented. This paper makes the case that many aspects of both publication and consumption of LD stand to benefit from a pay-as-you-go approach to data integration. Specifically, the paper: (i) identifies a collection of opportunities for applying pay-as-you-go techniques to LD; (ii) describes some preliminary experiences applying a pay-as-you-go data integration system to LD; and (iii) presents some open issues that need to be addressed to enable the full benefits of pay-as-you go integration to be realised.