 |
opencaching.com Geocaching by the community, for the community
|
| View previous topic :: View next topic |
| Author |
Message |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Tue Sep 23, 2003 4:08 am Post subject: Distributed Data Network |
|
|
Hi there,
to introduce myself: I am the Co-webmaster of www.geocaching.de and geocaching.com is annoying me more and more. I am very happy that finally someone started a truly OPEN geocaching project.
It may be a very early stage for such detailed technical discussions, but I would like to make a proposal on a distributed data network:
The network is based on NNTP (like Usenet). Every participating site has its own NNTP server, and thus we have for free a VERY VERY reliable mechanism for worldwide replication.
There could be a group for every country, a thread for every cache and an article for every log. Updating of data is distributed by cancellation of an article and re-posting. (Of course, the normal user has NO access to these NNTP servers, and NNTP servers have to be configured for an infinite expiry time)
It is every site's choice if it builds his services directly on the NNTP server (may be bad for searching for example..), or if it has its backend database to which the data is mirrored. I think it wouldn't be too hard to build gateways for the most popular relational database systems like mysql, postgres or maybe even Oracle.
Tell me what you think about it.
Greetings from Cologne,
Michael (Vinnie) |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Tue Sep 23, 2003 4:13 am Post subject: what I forgot.... |
|
|
Oh, what I forgot:
1. The bodies of the cache and log articles should be XML for best interoperability.
2. An alternative technique: in alt.rec.geocaching once was proposed to use LDAP servers and their replication mechanisms (for example openldap and slurpd). Personally I think this approach will not be as reliable and harder to administer and debug. |
|
| Back to top |
|
 |
hmarq Site Admin

Joined: 15 Sep 2003 Posts: 351
|
Posted: Tue Sep 23, 2003 8:30 am Post subject: |
|
|
Welcome Vinnie --
I think we're all agreed that XML is the data encapsulation strategy ... as for nntp for the syndication/replication strategy; I'm not opposed, but not informed either.
My initial reaction is that yes nntp is great at the replication stuff, but it doesn't seem to be designed for dynamic content. A news article once posted isn't editable, whereas a cache listing would need to be. Some thought would need to be given to the thread structure ... maybe a cache would become a thread with edits being children of the original post? ... or would log entries be children of the original post?
Certainly nntp has been around a while and is vetted and solves at least one of the problems we're faced with; uniqueness in a multi-site, multi-server environment. |
|
| Back to top |
|
 |
CoyoteRed

Joined: 18 Sep 2003 Posts: 220
|
Posted: Tue Sep 23, 2003 9:06 am Post subject: |
|
|
I'm not so sure NNTP would be proper for this. Like what's been mentioned, editing cache pages is problematic.
I'm still leaning toward the GEOTAG model. Data is up to the minute. Listing sites could cache (in the computer-world sense) basic information to serve in case the cache page is temporarily unavailable.
For that matter, the listing service doesn't need to actually cache the page as it already is on Google. Very basic information, the cache page URL and the cached cache page on Google. Of course, the cached page is only as current as the last crawl, but it's there and people will have to understand that the cached page is not the most current--you're taking a risk by trusting it is still, in fact, current.
I'm thinking a lot of the tools we need are in place. We just have to gather them seemlessly.
CR _________________ "...been know to miss the finer points." |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Tue Sep 23, 2003 9:15 am Post subject: |
|
|
Yes, the handling of dynamic content is a drawback of the NNTP concept. It can be done by sending "update messages" and subsequent cancellation of the original message. Uh, that would destroy the threading by references ... Wait .... this would work: We have some STATIC (=uncancellable) "top nodes" (which for example only hold a cache name and/or a guid) in our threads, then on the second level, we have the "cache description node" (which can be cancelled and replaced) and the "log nodes". The "log nodes" only reference the "top node" and not each other.
This definitely needs to be thought through thoroughly, but I am convinced that it can be done.
I have thought about a distributed geocaching network from time to time, and this NNTP concept is the best that came to my mind. "Closed" techniques like the replication mechanisms of commercial software are no options, I think.
But I am very open to other open concepts!  |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Wed Sep 24, 2003 4:26 am Post subject: |
|
|
Hi,
I just found out, that the handling of dynamic content is easier than I thought. See http://www.dsv.su.se/~jpalme/e-mail-book/usenet-news.html or google for "NNTP supersede".
You can not only CANCEL articles, but you can also SUPERSEDE articles, i.e. replace them. That's exactly what we need: A user edits his cache description or a log entry, and this action (the new article) is propagated over NNTP with a SUPERSEDE.
But another question came to my mind: can an nntp server like innd handle 10^5 caches and 10^6 logs? If not, the nntp server has to be a "temporary pipelining cache" only, which could mean, that log entries have to go into other groups than the caches and have an expiry time of say 1 year. Every site would need a persistant database backend then.
Coyote_Red: Can you tell us more about "GEOTAGS"? Links? How do they work and how can they help distributing, ensuring uniqueness and avoiding inconsistencies of cache description and log data?
Is this technique as rock-solid as NNTP? |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Thu Sep 25, 2003 3:27 pm Post subject: |
|
|
No more suggestions or comments?
Ok, then some remarks from my side concerning the NNTP stuff
1. For the cache description articles we could possibly use the "extended GPX" format (with some more extensions?) which is used by geocaching.com
2. There has to be ONE group that contains an article for every user. These articles hold all kinds of personal data: Coordinates, user name, password, email address, country, .... Format is again XML, the password is of course crypted. |
|
| Back to top |
|
 |
CoyoteRed

Joined: 18 Sep 2003 Posts: 220
|
Posted: Thu Sep 25, 2003 4:30 pm Post subject: |
|
|
| Vinnie wrote: | Hi,
Coyote_Red: Can you tell us more about "GEOTAGS"? Links? How do they work and how can they help distributing, ensuring uniqueness and avoiding inconsistencies of cache description and log data?
Is this technique as rock-solid as NNTP? |
Doh! I thought I posted the information. Oh, well. GeoTags can be found HERE.
But for some reason, I haven't been able to the get my site listed. I don't know if it has stopped taking more or what. My email has been unanswered. ~shrugs~
CR _________________ "...been know to miss the finer points." |
|
| Back to top |
|
 |
nici-

Joined: 25 Sep 2003 Posts: 199 Location: Hürth near Cologne, Germany
|
Posted: Sat Sep 27, 2003 2:57 am Post subject: |
|
|
| CoyoteRed wrote: |
Doh! I thought I posted the information. Oh, well. GeoTags can be found HERE.
|
The Geotags project defines extensions for HTML and HTTP such that http servers (or single HTML pages) can indicate their geographical position. No more, no less (?).
So your proposal is that every cacher builds and hosts its own HTML page (including "Geotags") and announces them at geotags.com-like search engines? I don't think that's acceptable for people with less technical skills. And the problem of replication still remains. And where do the logs go (they have to be replicated, too)?
I must admit, I do not see, that Geotags will help us very much in building a distributed, free, uncontrollable, uncensorable database. _________________
Geocaching.de |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Sat Sep 27, 2003 3:01 am Post subject: |
|
|
ooops, wrong login. The last posting on geotags was actually mine
Vinnie- |
|
| Back to top |
|
 |
kdh
Joined: 27 Sep 2003 Posts: 14 Location: Hamburg, Germany
|
Posted: Sat Sep 27, 2003 3:57 am Post subject: Which database? |
|
|
I just talked a little bit longer with a friend about the problem to store data in multiple databases with replication. He is pretty good in IT.
His solution is the following:
1) The database is MySQL (it is cheap, solid and we both think it will live very long). It is better to store the data in a rational database as in a native XML database like Xinice from the apache software foundation.
2) create a MySQL layout and after that create a XML pattern
3a) If you like a masterdatabase (this is good for the data integrity) replication is a build-in feature of MySQL. As I heard, it should be very simple to change the masterdatabase into a normal database (and the other way too)
3b) If you donīt like a master database you can setup your own newsgroup where all the changes are written down (NNTP) or sent the changes to the other databases by PGP signed mail.
What I learned is: Keep it small and simple. So work with a master database.
4) The conversion between mysql and XML does a parser. My friend said: Use expat. This is the part I donīt understand, because I do not know nothing about XML.
5) My friendīs advice is the following: The first thing you must do, is to install a process a) where to discuss the database/XML format changes b)who discusses c) who implements it.
OK. So, I have no problem to pay Jermy Irish the 30$ protection money because then I have more time to hide ands search caches.
On the other side, I am very interested in databases, XML and in the project opencaching.com. I think I have a lot of good basic know-how, and I am able to learn fast.
with kind regards from Hamburg, Germany - Kai |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Sat Sep 27, 2003 4:30 am Post subject: Re: Which database? |
|
|
| kdh wrote: |
1) The database is MySQL (it is cheap, solid and we both think it will live very long). It is better to store the data in a rational database as in a native XML database like Xinice from the apache software foundation.
|
In an OPEN architecture I would prefer to be independent from a certain database system.
| kdh wrote: |
2) create a MySQL layout and after that create a XML pattern
|
Ok, we could work out a database design. But if we a agree on a replication and transport layer (like XML over NNTP) even this database design may only be a recommendation.
| kdh wrote: |
3a) If you like a masterdatabase (this is good for the data integrity) replication is a build-in feature of MySQL. As I heard, it should be very simple to change the masterdatabase into a normal database (and the other way too)
|
In my opinion it is absolutely crucial that we have NO master database. We should really avoid a single point of failure and control! And: A master database is not necessary (see my "XML over NNTP proposal")
| kdh wrote: |
3b) If you donīt like a master database you can setup your own newsgroup where all the changes are written down (NNTP) or sent the changes to the other databases by PGP signed mail.
|
Oh ... YES ... 100% agree
| kdh wrote: |
4) The conversion between mysql and XML does a parser. My friend said: Use expat. This is the part I donīt understand, because I do not know nothing about XML.
|
The XML parsing stuff is actually easy, if done in a high level language like perl, python or java...
| kdh wrote: |
On the other side, I am very interested in databases, XML and in the project opencaching.com. I think I have a lot of good basic know-how, and I am able to learn fast.
|
Let's set up a demonstrator
The "Two Node Geocaching Network Cologne - Hamburg" |
|
| Back to top |
|
 |
hmarq Site Admin

Joined: 15 Sep 2003 Posts: 351
|
Posted: Sat Sep 27, 2003 5:17 am Post subject: |
|
|
I take no issue with any of this ... as I think about it, nntp just becomes another back end target for the syndication. At the same time, I find it to be one more step that isn't really necessary ... as anyone that is generating the nntp feed is already creating the xml intermediary which is easier to read (IMO) ... only thing it really does is give nntp access to the caches.
So, somewhere, kicking around here are discussions on attributes and data elements ... perhaps you guys want to kick in there. |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Sat Sep 27, 2003 5:32 am Post subject: |
|
|
| hmarq wrote: | as I think about it, nntp just becomes another back end target for the syndication. At the same time, I find it to be one more step that isn't really necessary ... as anyone that is generating the nntp feed is already creating the xml intermediary which is easier to read (IMO) ... only thing it really does is give nntp access to the caches.
|
Creating the XML is only a minor step. The distribution mechanism has to be specified, a transport protocol has to be chosen (SMTP, plain TCP, HTTP ... ?), we have to take care about temporary outages -> queueing and resending of messages, a procedure how to insert new nodes into the network has to be defined and many things more...
All this has to be considered and code has to be written, which will have bugs and needs to be debugged.
As this replication mechanism is IMHO the most important part in this project, I tend to just use an established and solid thing like NNTP.
If we frequently have to take care about inconsistent data, missing caches and logs, etc., people will lose trust in our network... |
|
| Back to top |
|
 |
hmarq Site Admin

Joined: 15 Sep 2003 Posts: 351
|
Posted: Sat Sep 27, 2003 6:08 am Post subject: |
|
|
| Vinnie wrote: | | hmarq wrote: | as I think about it, nntp just becomes another back end target for the syndication. At the same time, I find it to be one more step that isn't really necessary ... as anyone that is generating the nntp feed is already creating the xml intermediary which is easier to read (IMO) ... only thing it really does is give nntp access to the caches.
|
Creating the XML is only a minor step. The distribution mechanism has to be specified, a transport protocol has to be chosen (SMTP, plain TCP, HTTP ... ?), we have to take care about temporary outages -> queueing and resending of messages, a procedure how to insert new nodes into the network has to be defined and many things more...
All this has to be considered and code has to be written, which will have bugs and needs to be debugged.
As this replication mechanism is IMHO the most important part in this project, I tend to just use an established and solid thing like NNTP.
If we frequently have to take care about inconsistent data, missing caches and logs, etc., people will lose trust in our network... |
And that's fine, I'm ok with nntp as a target, but if I were to bet, a plain old http feed spitting xml will be the most popular |
|
| Back to top |
|
 |
|
Powered by phpBB 2.0.6 © 2001, 2002 phpBB Group
|