 |
opencaching.com Geocaching by the community, for the community
|
| View previous topic :: View next topic |
| Author |
Message |
nilsk
Joined: 03 Oct 2003 Posts: 9
|
Posted: Sun Oct 05, 2003 3:21 am Post subject: |
|
|
| raven wrote: | ok, i have thought of a replication system, that uses our standard-xml-over-whatever(http probably) way, and it isn't as difficult as i thought:
each record (cache, user...) needs at least the following attributes (for the replication to work, that is, of course it needs more attributes than that):
- last-modified (a timestamp, when this entry was last modified)
- type (cache, user, log...)
|
Additionally it will need a unique ID, that is very important for replication. This means each node needs its own namespace. Otherwise you cannot be sure that IDs will not be given out twice.
So maybe the structure of the ID should be like that:
<prefix>@<FQDN of node generating the entry>
The node has to enforce uniqueness of the prefix, so uniqueness of the complete ID is there.
| Quote: | all nodes have a list named "peers", which contains all the other core-nodes in the network (without themselves, of course).
if a user modifies something on one core node, this node sends a "modification-notification" to all of his peers, containing only the following information:
- modification-time (when the modification took place)
- modification-type (what type(s) have been modified)
|
And the ID. Then the receiving node can lookup, wether it already got the entry from another node. Additionally you must make sure, that the notification is sent to all nodes except the one it is received from, and it is only distributed when it is not already in the local DB.
Otherwise notifications will never stop flying around the core nodes.
| Quote: | | these peers then query the node for all datasets of modification-type with a last-modified timestamp equal or newer than modification-time. the node then, of course, sends these changed datasets to the peers that queried it for them. |
In this case they could only query for the ID, to get exactly the dataset they need.
Thats basically what NNTP does with IHAVE and SENDME messages.
The ability to request messages by date is the addon, which allows real synchronization and desaster recovery.
| Quote: | | this way we could either do instant replication (a node sends out the modification-notification directly after an edit happens), or we could queue up the modifications and then send an modification-notification with a modification-type of "all" or something like that every N hours. |
You could send out on demand (which is best) or batched jobs containing more then one ID.
| Quote: | the first "replication" of a new node could, of course, be handled differently. propably just transfer a dump of all the data using scp, ftp, hard-copies or whatever...  |
Searching by date is fine for that, I think.
Nils |
|
| Back to top |
|
 |
marc
Joined: 01 Oct 2003 Posts: 61
|
Posted: Tue Oct 07, 2003 1:19 am Post subject: |
|
|
| Vinnie wrote: | | amnesius wrote: |
Uh and i forgot something really essencial, pics! What about pics? Do you think about spreading them over the NNTP feed too? If im not wrong binary grow when you encode whem bin2ascii ( not 100% sure ). If you do this you will have the picture twice on your node too..
|
No, pictures should not be distributed. Pictures should stay on their "home core node" and a link should be distributed. |
Hmm, and what about it the core nodes hdd crashes and all the pictures are lost? I dont think the users will like that idea.
Marc. |
|
| Back to top |
|
 |
marc
Joined: 01 Oct 2003 Posts: 61
|
Posted: Tue Oct 07, 2003 1:59 am Post subject: |
|
|
Hi,
as I'm new to this board, please let me introduce myself. Im a 29 year old Unix Sysadmin and Software Developer working a long time at a ISP as Unix/Network Admin. After this I had my own Software Development Company for abount 5 years where I did a lot of complex frontend, backend and interface development. So I'm very experienced in writing complex, scaleable tcp/ip daemon applications using XML/SQL/all kind of internet protocols and stuff.
I'm not sure if the discussion about how the replication mechanism should work is over or not. If it's not, i want you to ask to please take a look at the "Distributed Data Network" thread and my thoughts about just using a simple rsync to do XML file distribution to all core nodes.
I've been working with rsync for a long time now and realized really big things with it and I think it would closest fit our needs.
We dont have a problem to put a new core node into the network since its getting all the files automatically, because it dont have any files and rsync will then send all of them. Rsync sends just files which are new or have changed and it also has the possibilty to delete files on the remote side if a cache is getting archived/deleted whatever.
Let say every core node has its own "name" or "ID" and we define a human readable directory structure where all the XML files should live:
| Code: |
/opencaching/nodes.xml (nodes config file)
/opencaching/node-01/xml (where the XML files live)
/opencaching/node-01/images (where the images live)
/opencachine/node-02/xml
/opencachine/node-02/images
|
If a user creates a new cache at node-01, XML data will be stored in the /opencaching/node-01/xml/534756834756834765834756834.xml file and distributed to all the other nodes on a daily/hourly/at once basis. Same thing if the user do some changes. Its just what innd do - distributing single files - but much less complex to setup (we can set up a test backbone in 1 hour) and administrate. And its easy to add things like audio files or whatever. We just add a new directory called 'audio' into the structure and thats it.
Rsync also do work well with a huge amount of big files. So it's no problem to keep pictures in sync between all core nodes without encoding them into 7bit ASCII chars as you have to if you use news messages to transfer these files.
We also can use rsync over SSH to use public/private key authentication to make core nodes trust each other and distribute user/cache data over an encrypted data stream.
Its somewhat easy! Everyone can understand this scheme without learning how to setup a innd and working with its weird file structure. Its much less complex to implement and therefore much less vulnerable to bugs on the development side - you dont have to implement a newsclient into your backend applications. Keep it simple!
For all who are interested - please take a look at this technical document describing how a sync between core nodes using rsync could possibly work:
http://devel.opencaching.de/bin/cvsweb/index.cgi/opencaching/doc/
Marc :)
Last edited by marc on Fri Oct 10, 2003 5:53 am; edited 2 times in total |
|
| Back to top |
|
 |
Vinnie
Joined: 23 Sep 2003 Posts: 71 Location: Cologne, Germany
|
Posted: Tue Oct 07, 2003 3:50 am Post subject: |
|
|
| marc wrote: |
I'm not sure if the discussion about how the replication mechanism should work is over or not. If it's not, i want you to ask to please take a look at the "Distributed Data Network" thread and my thoughts about just using a simple rsync to do XML file distribution to all core nodes.
|
Hey,
I guess I like that ...!
We are back to rsync After each rsync run we find the new/changed XML files and put them into the database. Simple and good. |
|
| Back to top |
|
 |
raven

Joined: 29 Sep 2003 Posts: 84 Location: Bielefeld, Germany
|
Posted: Tue Oct 07, 2003 3:55 pm Post subject: |
|
|
marc: that sounds like a good idea... when i tested rsync, i always thought of one big XML file with all the caches, and that was of course a big problem if that file got changed on two different nodes... i think i was a bit stupid not to thing of per-node files... _________________ Quoth the raven, "Nevermore" |
|
| Back to top |
|
 |
|
Powered by phpBB 2.0.6 © 2001, 2002 phpBB Group
|