Start a new topic

Object Storage and Search Engine Indexing

The BCCVL is looking for a solution to aid in exposing some of our application's metadata and outputs via public search engines.  

Specifically, we have a few thousand images and metadata contained within Swift that we'd like to see indexed and returning in Google searches.  We'd also like to have an end-point on our public Wordpress site where users can search and filter through those images/metadata. It's not fixed content, and is updated and added to occasionally but not regularly.

Swift has a REST API that could be used to create the search/filter end-point, but that wont be appropriate for search engine indexing. It also won't help in capturing and directing users from a search engine result to the BCCVL.

Ideally, we'd like to find or build an automated service that publishes Swift media/content items in semantic HTML pages which can be hosted publicly (and in turn will be indexed by search engine robots). Search/filter functionality that could be used within Wordpress would be a great bonus there as well.

Some related resources: (requires translation) or - This is wordpress plugin that moves media uploads into an attached Object Storage service, whilst maintaining their WP-attachment HTML page and metadata.  That's a somewhat reversed version of what the BCCVL is looking for, but may provide some useful direction.

This requirement seemed like it may be relevant or useful to many other applications or projects using an Object Storage platform, so we've posted the question in a few locations like these discussion boards.  

Has anyone run into this issue before, or a similar issue? If so, how did you meet those requirements?

Any ideas, direction or feedback are more than welcome.  Please post any questions you may have as well.

Thanks heaps,
Sam Wolski

1 person likes this idea

One way to deal with this is to use the Swift REST API to get a list of Object URLs that you want Google to index, and then write that list out as a webpage consisting of "<a>" elements.  Attach that generated page to some page on your site that Google can find and index.

That will only work if the Objects are indexable; i.e. they are (human readable) text.  Also, if the Objects are too large, you may find that Google doesn't index the entire object.  In either case, you could consider generating a stub webpage for each Object that contains a summary and a link to the Object, and the keywords / phrases that you want to be indexed by Google.

Hi Stephen,

Thanks for the reply!

The REST API can be used to generate an object list, but as you mention, to be indexed that list must be statically stored and published elsewhere.  That page would have to be destroyed and rebuilt whenever the swift container's data was modified (and a developer may not be available to do that).  It also doesn't provide the best experience in capturing users from the search results that the indexed items appear in (a page filled with thousands of image embeds or links isn't realistically legible).

Generating a webpage for each object containing metadata/links/embeds for indexing is the desired outcome, but we weren't able to find any platforms or services that were capable of that. If you're aware of any, please post them to this discussion thread.

The solution we're currently working on is a Wordpress plugin that uses API requests to manage (upload/delete/modify/sync) 'wp_attachment' type posts in a Swift container.  This allows us to rely on Wordpress native bulk actions, item publishing and indexing for local search/filter functions.  I have a working prototype on github if anyone is interested.

There are some drawbacks to that model though.  The metadata is stored in MySQL separately to the items and publishing thousands of items through wordpress can be problematic.  So we're still interested in other options.

Login to post a comment