Changes between Version 3 and Version 4 of Profiles


Ignore:
Timestamp:
04/28/11 17:44:08 (3 years ago)
Author:
jcnelson
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Profiles

    v3 v4  
    55(TODO: diagram) 
    66 
    7 In this profile, there are many content servers, one metadata server, and many clients. Clients are distinct from content servers, and they all use the same metadata server. This configuration provides filesystem semantics similar to that of the client/server architecture of web browsers and web servers--web browsers download data from web servers and may allow the user to locally change the data, but they cannot republish the same data without explicit support from the web server. Similarly, clients in this configuration can only read globally and write locally, but can only write globally out-of-band. 
     7 * The data is read-only--it is supplied by content servers, and only the content servers are allowed to add/remove (but not modify) content 
     8 * One metadata server generates metadata for one content server 
     9 * Syndicate clients communicate with exactly one metadata server at a time, which may be (and usually is) remote. 
     10 * Metadata servers may index multiple content servers 
     11 * Every content server is indexed by at least one metadata server 
     12 
     13[[BR]]'''Rationale:''' Consider how a web browser works:  when a user navigates to a webpage with a web browser, the web browser downloads an HTML page, which it then parses and uses to download additional content (e.g. images, sounds, Flash, etc.) to form an aggregate view composed of content from one or more sources (the final output is what the user sees on the screen).  This is a lot like what would happen in this Syndicate deployment--the user mounts a content server with Syndicate (analogous to navigating to a page), causing Syndicate to download the metadata, and generate a filesystem for the user (analogous to the browser parsing the page and locating additional media to build the view), which the user can then walk, open, and read (analogous to the browser downloading additional content and putting it into the page on the screen for the user to browse).  Interactivity from web page like Flash and AJAX, which normally runs in the web browser, can be given to the Syndicate client from the content server in the form of binaries and scripts that show up as files in the mounted hierarchy; the only difference is that a user must explicitly run the binary to gain interactivity.  An added benefit Syndicate provides the user that web browsers cannot is that the user can use his/her own tools to manipulate the data, whereas a web browser limits a user to using the content server's scripts.  In both cases, the data is read-only (unless it is first copied locally), the data must be downloaded to be viewed, and the data may contain interactive (executable) components that carry out specialized tasks on behalf of the user and content server from within the client machine.[[BR]][[BR]]'''Role of !CoBlitz:'''  !CoBlitz behaves as it was originally designed to behave--it caches content between content servers and clients. 
    814 
    915== "Distributed Dropbox" Profile == 
    1016(TODO: diagram) 
    1117 
    12 (TODO) 
     18 * There is exactly one metadata server for the entire Syndicate system 
     19 * A host running Syndicate client must also run a content server (on a separate directory).  There is a one-to-one correspondence of content servers and clients 
     20 * The metadata server is globally accessible by all clients 
     21 * The metadata server indexes all content servers, but arranges content in a flat namespace ([http://www.host1.com/foo/bar www.host1.com/foo/bar] and[http://www.host2.com/foo/baz www.host2.com/foo/baz] will appear as /foo/bar and /foo/baz in the metadata) 
     22 * The metadata server determines the last-modified time for each piece of content 
     23 * In the event that two content servers provide the same path (excluding the hostname), the host with the path corresponding to the most recently modified content is chosen for creating the metadata 
     24 * Published content on a content server comes from its local Syndicate client, subject to the following constraints: 
     25   * Content in the Syndicate client hierarchy that is NOT present in the published content hierarchy will be downloaded and added to the published content hierarchy 
     26   * Content in the published content directory that has an earlier modification time (as identified by the metadata server) than the corresponding file in the Syndicate hierarchy will be un-published 
     27 
     28'''Rationale:''' Consider how !DropBox currently works--a user installs the !DropBox client on each of his/her machines, and sets up an account with the !DropBox service.  The !DropBox service stores a copy of all of the files given to any !DropBox client, and the !DropBox clients periodically download files from the central service that were previously uploaded by other clients, thus making all of the user's files eventually consistent across all of his/her devices.  Syndicate can provide the same functionality with the above setup.  Here, the metadata server crawls the user's devices, and gets a listing of all of the content they each store.  When it goes to generate the paths for the filesystem in the metadata, it examines the last-modified time of each piece of content, and if there is a path collision between two hosts, the metadata server indexes the piece of content with the latest last-modified time.  So, if host_A.com/path/to/content was modified 4 days ago and host_B.com/path/to/content was modified 3 days ago, then /path/to/content in the metadata will refer to host_B.com/path/to/content.  Each host then periodically walks its mounted Syndicate client hierarchy and copies every piece of data that does not exist in its content directory or is located on a different host than itself, and republishes its content (i.e. rebuilds its sitemap for the metadata server to crawl, etc).  Then, after a few rounds of metadata publishing and content publishing, every host will have the same content in their content folders.[[BR]][[BR]]'''Role of !CoBlitz''':  !CoBlitz caches data as it is pulled from host to host.  Either content servers will need the ability to invalidate cached copies in !CoBlitz, since the same URL may be used to identify different bits of content during the system's execution, or the user will need to use our content publishing software to ensure that each piece of content gets a unique URL (the naming scheme will need to be globally agreed upon and understood by the metadata server for handling collisions). 
    1329 
    1430== "Publish-Subscribe" Profile == 
     31(TODO: diagram) 
     32 
     33 * There is exactly one (logical) content server for the entire Syndicate system 
     34 * There is at least one metadata server and at least one client.  All metadata servers index the same content. 
     35 * Every file ever created on the content server must be accessible by a unique URL (unique to all prior file URLs). 
     36 
     37[[BR]]'''Rationale:''' Consider how a pub/sub system might work.  The publisher creates a message to disseminate to its subscribers.  To do so, the publisher may mirror the data to multiple rendezvous points on the Internet to prevent it from being overwhelmed with periodic polls from its subscribers.  A Syndicate system can achieve this functionality with the semantics above.  The message gets published to the metadata servers as a different URL each time it changes, but the metadata servers always assign the URLs for a particular message channel to the same path on disk.  The client downloads the metadata and creates the filesystem, and users then periodically read the message data through !CoBlitz from the content server.  Then, the content server is only polled periodically by !CoBlitz and the metadata servers for the message, and clients see new messages in a timely manner (as soon as the URL for the particular message file in the metadata changes).  One example use-case with these semantics is scalable package deployment (e.g. APT and Stork could use this).[[BR]][[BR]]'''Role of !CoBlitz:'''  !CoBlitz caches the messages sent from the publisher and thus serves as a scalable rendezvous point for subscribers.