opencaching.com Forum Index opencaching.com
Geocaching by the community, for the community
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Distributed Data Network
Goto page 1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic    opencaching.com Forum Index -> Tech Stuff
View previous topic :: View next topic  
Author Message
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Tue Sep 23, 2003 4:08 am    Post subject: Distributed Data Network Reply with quote

Hi there,
to introduce myself: I am the Co-webmaster of www.geocaching.de and geocaching.com is annoying me more and more. I am very happy that finally someone started a truly OPEN geocaching project.

It may be a very early stage for such detailed technical discussions, but I would like to make a proposal on a distributed data network:

The network is based on NNTP (like Usenet). Every participating site has its own NNTP server, and thus we have for free a VERY VERY reliable mechanism for worldwide replication.
There could be a group for every country, a thread for every cache and an article for every log. Updating of data is distributed by cancellation of an article and re-posting. (Of course, the normal user has NO access to these NNTP servers, and NNTP servers have to be configured for an infinite expiry time)

It is every site's choice if it builds his services directly on the NNTP server (may be bad for searching for example..), or if it has its backend database to which the data is mirrored. I think it wouldn't be too hard to build gateways for the most popular relational database systems like mysql, postgres or maybe even Oracle.


Tell me what you think about it.

Greetings from Cologne,
Michael (Vinnie)
Back to top
View user's profile Send private message Send e-mail
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Tue Sep 23, 2003 4:13 am    Post subject: what I forgot.... Reply with quote

Oh, what I forgot:

1. The bodies of the cache and log articles should be XML for best interoperability.

2. An alternative technique: in alt.rec.geocaching once was proposed to use LDAP servers and their replication mechanisms (for example openldap and slurpd). Personally I think this approach will not be as reliable and harder to administer and debug.
Back to top
View user's profile Send private message Send e-mail
hmarq
Site Admin


Joined: 15 Sep 2003
Posts: 351

PostPosted: Tue Sep 23, 2003 8:30 am    Post subject: Reply with quote

Welcome Vinnie --

I think we're all agreed that XML is the data encapsulation strategy ... as for nntp for the syndication/replication strategy; I'm not opposed, but not informed either.

My initial reaction is that yes nntp is great at the replication stuff, but it doesn't seem to be designed for dynamic content. A news article once posted isn't editable, whereas a cache listing would need to be. Some thought would need to be given to the thread structure ... maybe a cache would become a thread with edits being children of the original post? ... or would log entries be children of the original post?

Certainly nntp has been around a while and is vetted and solves at least one of the problems we're faced with; uniqueness in a multi-site, multi-server environment.
Back to top
View user's profile Send private message Send e-mail
CoyoteRed



Joined: 18 Sep 2003
Posts: 220

PostPosted: Tue Sep 23, 2003 9:06 am    Post subject: Reply with quote

I'm not so sure NNTP would be proper for this. Like what's been mentioned, editing cache pages is problematic.

I'm still leaning toward the GEOTAG model. Data is up to the minute. Listing sites could cache (in the computer-world sense) basic information to serve in case the cache page is temporarily unavailable.

For that matter, the listing service doesn't need to actually cache the page as it already is on Google. Very basic information, the cache page URL and the cached cache page on Google. Of course, the cached page is only as current as the last crawl, but it's there and people will have to understand that the cached page is not the most current--you're taking a risk by trusting it is still, in fact, current.

I'm thinking a lot of the tools we need are in place. We just have to gather them seemlessly.

CR
_________________
"...been know to miss the finer points."
Back to top
View user's profile Send private message AIM Address MSN Messenger
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Tue Sep 23, 2003 9:15 am    Post subject: Reply with quote

Yes, the handling of dynamic content is a drawback of the NNTP concept. It can be done by sending "update messages" and subsequent cancellation of the original message. Uh, that would destroy the threading by references ... Sad Wait .... this would work: We have some STATIC (=uncancellable) "top nodes" (which for example only hold a cache name and/or a guid) in our threads, then on the second level, we have the "cache description node" (which can be cancelled and replaced) and the "log nodes". The "log nodes" only reference the "top node" and not each other.

This definitely needs to be thought through thoroughly, but I am convinced that it can be done.

I have thought about a distributed geocaching network from time to time, and this NNTP concept is the best that came to my mind. "Closed" techniques like the replication mechanisms of commercial software are no options, I think.

But I am very open to other open concepts! Smile
Back to top
View user's profile Send private message Send e-mail
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Wed Sep 24, 2003 4:26 am    Post subject: Reply with quote

Hi,
I just found out, that the handling of dynamic content is easier than I thought. See http://www.dsv.su.se/~jpalme/e-mail-book/usenet-news.html or google for "NNTP supersede".
You can not only CANCEL articles, but you can also SUPERSEDE articles, i.e. replace them. That's exactly what we need: A user edits his cache description or a log entry, and this action (the new article) is propagated over NNTP with a SUPERSEDE.

But another question came to my mind: can an nntp server like innd handle 10^5 caches and 10^6 logs? If not, the nntp server has to be a "temporary pipelining cache" only, which could mean, that log entries have to go into other groups than the caches and have an expiry time of say 1 year. Every site would need a persistant database backend then.


Coyote_Red: Can you tell us more about "GEOTAGS"? Links? How do they work and how can they help distributing, ensuring uniqueness and avoiding inconsistencies of cache description and log data?
Is this technique as rock-solid as NNTP?
Back to top
View user's profile Send private message Send e-mail
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Thu Sep 25, 2003 3:27 pm    Post subject: Reply with quote

No more suggestions or comments?

Ok, then some remarks from my side concerning the NNTP stuff Smile

1. For the cache description articles we could possibly use the "extended GPX" format (with some more extensions?) which is used by geocaching.com

2. There has to be ONE group that contains an article for every user. These articles hold all kinds of personal data: Coordinates, user name, password, email address, country, .... Format is again XML, the password is of course crypted.
Back to top
View user's profile Send private message Send e-mail
CoyoteRed



Joined: 18 Sep 2003
Posts: 220

PostPosted: Thu Sep 25, 2003 4:30 pm    Post subject: Reply with quote

Vinnie wrote:
Hi,
Coyote_Red: Can you tell us more about "GEOTAGS"? Links? How do they work and how can they help distributing, ensuring uniqueness and avoiding inconsistencies of cache description and log data?
Is this technique as rock-solid as NNTP?


Doh! I thought I posted the information. Oh, well. GeoTags can be found HERE.

But for some reason, I haven't been able to the get my site listed. I don't know if it has stopped taking more or what. My email has been unanswered. ~shrugs~

CR
_________________
"...been know to miss the finer points."
Back to top
View user's profile Send private message AIM Address MSN Messenger
nici-



Joined: 25 Sep 2003
Posts: 199
Location: Hürth near Cologne, Germany

PostPosted: Sat Sep 27, 2003 2:57 am    Post subject: Reply with quote

CoyoteRed wrote:

Doh! I thought I posted the information. Oh, well. GeoTags can be found HERE.


The Geotags project defines extensions for HTML and HTTP such that http servers (or single HTML pages) can indicate their geographical position. No more, no less (?).

So your proposal is that every cacher builds and hosts its own HTML page (including "Geotags") and announces them at geotags.com-like search engines? I don't think that's acceptable for people with less technical skills. And the problem of replication still remains. And where do the logs go (they have to be replicated, too)?

I must admit, I do not see, that Geotags will help us very much in building a distributed, free, uncontrollable, uncensorable database.
_________________

Geocaching.de
Back to top
View user's profile Send private message Visit poster's website
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Sat Sep 27, 2003 3:01 am    Post subject: Reply with quote

ooops, wrong login. The last posting on geotags was actually mine Smile
Vinnie-
Back to top
View user's profile Send private message Send e-mail
kdh



Joined: 27 Sep 2003
Posts: 14
Location: Hamburg, Germany

PostPosted: Sat Sep 27, 2003 3:57 am    Post subject: Which database? Reply with quote

I just talked a little bit longer with a friend about the problem to store data in multiple databases with replication. He is pretty good in IT.

His solution is the following:

1) The database is MySQL (it is cheap, solid and we both think it will live very long). It is better to store the data in a rational database as in a native XML database like Xinice from the apache software foundation.

2) create a MySQL layout and after that create a XML pattern

3a) If you like a masterdatabase (this is good for the data integrity) replication is a build-in feature of MySQL. As I heard, it should be very simple to change the masterdatabase into a normal database (and the other way too)

3b) If you donīt like a master database you can setup your own newsgroup where all the changes are written down (NNTP) or sent the changes to the other databases by PGP signed mail.

What I learned is: Keep it small and simple. So work with a master database.

4) The conversion between mysql and XML does a parser. My friend said: Use expat. This is the part I donīt understand, because I do not know nothing about XML.

5) My friendīs advice is the following: The first thing you must do, is to install a process a) where to discuss the database/XML format changes b)who discusses c) who implements it.

OK. So, I have no problem to pay Jermy Irish the 30$ protection money because then I have more time to hide ands search caches.

On the other side, I am very interested in databases, XML and in the project opencaching.com. I think I have a lot of good basic know-how, and I am able to learn fast.

with kind regards from Hamburg, Germany - Kai
Back to top
View user's profile Send private message
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Sat Sep 27, 2003 4:30 am    Post subject: Re: Which database? Reply with quote

kdh wrote:

1) The database is MySQL (it is cheap, solid and we both think it will live very long). It is better to store the data in a rational database as in a native XML database like Xinice from the apache software foundation.


In an OPEN architecture I would prefer to be independent from a certain database system.

kdh wrote:

2) create a MySQL layout and after that create a XML pattern


Ok, we could work out a database design. But if we a agree on a replication and transport layer (like XML over NNTP) even this database design may only be a recommendation.

kdh wrote:

3a) If you like a masterdatabase (this is good for the data integrity) replication is a build-in feature of MySQL. As I heard, it should be very simple to change the masterdatabase into a normal database (and the other way too)


In my opinion it is absolutely crucial that we have NO master database. We should really avoid a single point of failure and control! And: A master database is not necessary (see my "XML over NNTP proposal")

kdh wrote:

3b) If you donīt like a master database you can setup your own newsgroup where all the changes are written down (NNTP) or sent the changes to the other databases by PGP signed mail.


Oh ... YES ... 100% agree Smile

kdh wrote:

4) The conversion between mysql and XML does a parser. My friend said: Use expat. This is the part I donīt understand, because I do not know nothing about XML.


The XML parsing stuff is actually easy, if done in a high level language like perl, python or java...

kdh wrote:

On the other side, I am very interested in databases, XML and in the project opencaching.com. I think I have a lot of good basic know-how, and I am able to learn fast.


Let's set up a demonstrator Smile
The "Two Node Geocaching Network Cologne - Hamburg"
Back to top
View user's profile Send private message Send e-mail
hmarq
Site Admin


Joined: 15 Sep 2003
Posts: 351

PostPosted: Sat Sep 27, 2003 5:17 am    Post subject: Reply with quote

I take no issue with any of this ... as I think about it, nntp just becomes another back end target for the syndication. At the same time, I find it to be one more step that isn't really necessary ... as anyone that is generating the nntp feed is already creating the xml intermediary which is easier to read (IMO) ... only thing it really does is give nntp access to the caches.

So, somewhere, kicking around here are discussions on attributes and data elements ... perhaps you guys want to kick in there.
Back to top
View user's profile Send private message Send e-mail
Vinnie



Joined: 23 Sep 2003
Posts: 71
Location: Cologne, Germany

PostPosted: Sat Sep 27, 2003 5:32 am    Post subject: Reply with quote

hmarq wrote:
as I think about it, nntp just becomes another back end target for the syndication. At the same time, I find it to be one more step that isn't really necessary ... as anyone that is generating the nntp feed is already creating the xml intermediary which is easier to read (IMO) ... only thing it really does is give nntp access to the caches.


Creating the XML is only a minor step. The distribution mechanism has to be specified, a transport protocol has to be chosen (SMTP, plain TCP, HTTP ... ?), we have to take care about temporary outages -> queueing and resending of messages, a procedure how to insert new nodes into the network has to be defined and many things more...
All this has to be considered and code has to be written, which will have bugs and needs to be debugged.

As this replication mechanism is IMHO the most important part in this project, I tend to just use an established and solid thing like NNTP.

If we frequently have to take care about inconsistent data, missing caches and logs, etc., people will lose trust in our network...
Back to top
View user's profile Send private message Send e-mail
hmarq
Site Admin


Joined: 15 Sep 2003
Posts: 351

PostPosted: Sat Sep 27, 2003 6:08 am    Post subject: Reply with quote

Vinnie wrote:
hmarq wrote:
as I think about it, nntp just becomes another back end target for the syndication. At the same time, I find it to be one more step that isn't really necessary ... as anyone that is generating the nntp feed is already creating the xml intermediary which is easier to read (IMO) ... only thing it really does is give nntp access to the caches.


Creating the XML is only a minor step. The distribution mechanism has to be specified, a transport protocol has to be chosen (SMTP, plain TCP, HTTP ... ?), we have to take care about temporary outages -> queueing and resending of messages, a procedure how to insert new nodes into the network has to be defined and many things more...
All this has to be considered and code has to be written, which will have bugs and needs to be debugged.

As this replication mechanism is IMHO the most important part in this project, I tend to just use an established and solid thing like NNTP.

If we frequently have to take care about inconsistent data, missing caches and logs, etc., people will lose trust in our network...


And that's fine, I'm ok with nntp as a target, but if I were to bet, a plain old http feed spitting xml will be the most popular
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    opencaching.com Forum Index -> Tech Stuff All times are GMT - 5 Hours
Goto page 1, 2, 3, 4, 5, 6  Next
Page 1 of 6

 
Jump to:  



Powered by phpBB 2.0.6 © 2001, 2002 phpBB Group