SPARQL query on multiple graphs
652134Jan 8 2009 — edited Oct 9 2009Hello,
I am using Oracle 11.1.0.6.0 with the Oracle Jena Release 2 drivers. I have imported Pubmed articles into one model and the Uniprot RDF data into another model. I am trying to run a SPARQL query to join Pubmed articles for a certain topic to their proteins. Here is the SPARQL query:
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX df: <http://www.ncbi.nlm.nih.gov/pubmed/>
SELECT ?uniprot
WHERE {
GRAPH <http://sw.brainstage.com/pubmed>
{ ?article df:hasMajorMesh <http://www.ncbi.nlm.nih.gov/pubmed/D017354>
}
GRAPH <http://purl.uniprot.org>
{
?uniprot uniprot:citation ?citation .
?citation <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> uniprot:Journal_Citation .
?citation <http://www.w3.org/2002/07/owl#sameAs> ?article .
}
}
When I execute the query, the Jena output appears to perform the following steps:
1. execute the select on the first graph (PUBMED)
for each record returned in step 1:
2. sdo_rdf_match('(?citation <http://www.w3.org/2002/07/owl#sameAs> <URI from step 1>
Both of these graphs are very large. Pubmed has over 17 million articles. Uniprot contains 500K proteins, but the size of the triples file is enormous: 153 GB. I have performed the sem_apis.analyze_model() command on the models. I know the query is correct because it does return data if I add a LIMIT statement to the end of the SPARQL query.
Is there something I can do to improve the query performance using SPARQL? I am a little surprised that the Oracle Jena driver loops over the data when connecting to more than one graph. It seems inefficient.
Thanks,
Chuck