Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

SPARQL query on multiple graphs

652134Jan 8 2009 — edited Oct 9 2009
Hello,
I am using Oracle 11.1.0.6.0 with the Oracle Jena Release 2 drivers. I have imported Pubmed articles into one model and the Uniprot RDF data into another model. I am trying to run a SPARQL query to join Pubmed articles for a certain topic to their proteins. Here is the SPARQL query:

PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX df: <http://www.ncbi.nlm.nih.gov/pubmed/>
SELECT ?uniprot
WHERE {
GRAPH <http://sw.brainstage.com/pubmed>
{ ?article df:hasMajorMesh <http://www.ncbi.nlm.nih.gov/pubmed/D017354>
}
GRAPH <http://purl.uniprot.org>
{
?uniprot uniprot:citation ?citation .
?citation <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> uniprot:Journal_Citation .
?citation <http://www.w3.org/2002/07/owl#sameAs> ?article .
}
}

When I execute the query, the Jena output appears to perform the following steps:

1. execute the select on the first graph (PUBMED)
for each record returned in step 1:
2. sdo_rdf_match('(?citation <http://www.w3.org/2002/07/owl#sameAs> <URI from step 1>

Both of these graphs are very large. Pubmed has over 17 million articles. Uniprot contains 500K proteins, but the size of the triples file is enormous: 153 GB. I have performed the sem_apis.analyze_model() command on the models. I know the query is correct because it does return data if I add a LIMIT statement to the end of the SPARQL query.

Is there something I can do to improve the query performance using SPARQL? I am a little surprised that the Oracle Jena driver loops over the data when connecting to more than one graph. It seems inefficient.

Thanks,
Chuck
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Nov 6 2009
Added on Jan 8 2009
16 comments
9,321 views