Changes between Version 5 and Version 6 of Profiles


Ignore:
Timestamp:
04/28/11 17:49:19 (3 years ago)
Author:
jcnelson
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Profiles

    v5 v6  
    1111 * Every content server is indexed by at least one metadata server 
    1212 
    13 [[BR]]'''Rationale:''' Consider how a web browser works:  when a user navigates to a webpage with a web browser, the web browser downloads an HTML page, which it then parses and uses to download additional content (e.g. images, sounds, Flash, etc.) to form an aggregate view composed of content from one or more sources (the final output is what the user sees on the screen).  This is a lot like what would happen in this Syndicate deployment--the user mounts a content server with Syndicate (analogous to navigating to a page), causing Syndicate to download the metadata, and generate a filesystem for the user (analogous to the browser parsing the page and locating additional media to build the view), which the user can then walk, open, and read (analogous to the browser downloading additional content and putting it into the page on the screen for the user to browse).  Interactivity from web page like Flash and AJAX, which normally runs in the web browser, can be given to the Syndicate client from the content server in the form of binaries and scripts that show up as files in the mounted hierarchy; the only difference is that a user must explicitly run the binary to gain interactivity.  An added benefit Syndicate provides the user that web browsers cannot is that the user can use his/her own tools to manipulate the data, whereas a web browser limits a user to using the content server's scripts.  In both cases, the data is read-only (unless it is first copied locally), the data must be downloaded to be viewed, and the data may contain interactive (executable) components that carry out specialized tasks on behalf of the user and content server from within the client machine.[[BR]][[BR]]'''Role of !CoBlitz:'''  !CoBlitz behaves as it was originally designed to behave--it caches content between content servers and clients. 
     13[[BR]]'''Rationale:''' Consider how a web browser works:  when a user navigates to a webpage with a web browser, the web browser downloads an HTML page, which it then parses and uses to download additional content (e.g. images, sounds, Flash, etc.) to form an aggregate view composed of content from one or more sources (the final output is what the user sees on the screen).  This is a lot like what would happen in this Syndicate deployment--the user mounts a content server with Syndicate (analogous to navigating to a page), causing Syndicate to download the metadata, and generate a filesystem for the user (analogous to the browser parsing the page and locating additional media to build the view), which the user can then walk, open, and read (analogous to the browser downloading additional content and putting it into the page on the screen for the user to browse).  Interactivity from web page like Flash and AJAX, which normally runs in the web browser, can be given to the Syndicate client from the content server in the form of binaries and scripts that show up as files in the mounted hierarchy; the only difference is that a user must explicitly run the binary to gain interactivity.  An added benefit Syndicate provides the user that web browsers cannot is that the user can use his/her own tools to manipulate the data, whereas a web browser limits a user to using the content server's scripts.  In both cases, the data is read-only (unless it is first copied locally), the data must be downloaded to be viewed, and the data may contain interactive (executable) components that carry out specialized tasks on behalf of the user and content server from within the client machine.[[BR]][[BR]] 
     14 
     15'''Role of CoBlitz:'''  !CoBlitz behaves as it was originally designed to behave--it caches content between content servers and clients. 
     16[[BR]][[BR]] 
    1417 
    1518== "Distributed Dropbox" Profile == 
     
    2629   * Content in the published content directory that has an earlier modification time (as identified by the metadata server) than the corresponding file in the Syndicate hierarchy will be un-published 
    2730 
    28 '''Rationale:''' Consider how !DropBox currently works--a user installs the !DropBox client on each of his/her machines, and sets up an account with the !DropBox service.  The !DropBox service stores a copy of all of the files given to any !DropBox client, and the !DropBox clients periodically download files from the central service that were previously uploaded by other clients, thus making all of the user's files eventually consistent across all of his/her devices.  Syndicate can provide the same functionality with the above setup.  Here, the metadata server crawls the user's devices, and gets a listing of all of the content they each store.  When it goes to generate the paths for the filesystem in the metadata, it examines the last-modified time of each piece of content, and if there is a path collision between two hosts, the metadata server indexes the piece of content with the latest last-modified time.  So, if host_A.com/path/to/content was modified 4 days ago and host_B.com/path/to/content was modified 3 days ago, then /path/to/content in the metadata will refer to host_B.com/path/to/content.  Each host then periodically walks its mounted Syndicate client hierarchy and copies every piece of data that does not exist in its content directory or is located on a different host than itself, and republishes its content (i.e. rebuilds its sitemap for the metadata server to crawl, etc).  Then, after a few rounds of metadata publishing and content publishing, every host will have the same content in their content folders.[[BR]][[BR]]'''Role of !CoBlitz''':  !CoBlitz caches data as it is pulled from host to host.  Either content servers will need the ability to invalidate cached copies in !CoBlitz, since the same URL may be used to identify different bits of content during the system's execution, or the user will need to use our content publishing software to ensure that each piece of content gets a unique URL (the naming scheme will need to be globally agreed upon and understood by the metadata server for handling collisions). 
     31'''Rationale:''' Consider how !DropBox currently works--a user installs the !DropBox client on each of his/her machines, and sets up an account with the !DropBox service.  The !DropBox service stores a copy of all of the files given to any !DropBox client, and the !DropBox clients periodically download files from the central service that were previously uploaded by other clients, thus making all of the user's files eventually consistent across all of his/her devices.  Syndicate can provide the same functionality with the above setup.  Here, the metadata server crawls the user's devices, and gets a listing of all of the content they each store.  When it goes to generate the paths for the filesystem in the metadata, it examines the last-modified time of each piece of content, and if there is a path collision between two hosts, the metadata server indexes the piece of content with the latest last-modified time.  So, if host_A.com/path/to/content was modified 4 days ago and host_B.com/path/to/content was modified 3 days ago, then /path/to/content in the metadata will refer to host_B.com/path/to/content.  Each host then periodically walks its mounted Syndicate client hierarchy and copies every piece of data that does not exist in its content directory or is located on a different host than itself, and republishes its content (i.e. rebuilds its sitemap for the metadata server to crawl, etc).  Then, after a few rounds of metadata publishing and content publishing, every host will have the same content in their content folders. 
     32 
     33[[BR]][[BR]]'''Role of !CoBlitz''':  !CoBlitz caches data as it is pulled from host to host.  Either content servers will need the ability to invalidate cached copies in !CoBlitz, since the same URL may be used to identify different bits of content during the system's execution, or the user will need to use our content publishing software to ensure that each piece of content gets a unique URL (the naming scheme will need to be globally agreed upon and understood by the metadata server for handling collisions). 
     34[[BR]][[BR]] 
    2935 
    3036== "Publish-Subscribe" Profile == 
     
    3541 * Every file ever created on the content server must be accessible by a unique URL (unique to all prior file URLs). 
    3642 
    37 [[BR]]'''Rationale:''' Consider how a pub/sub system might work.  The publisher creates a message to disseminate to its subscribers.  To do so, the publisher may mirror the data to multiple rendezvous points on the Internet to prevent it from being overwhelmed with periodic polls from its subscribers.  A Syndicate system can achieve this functionality with the semantics above.  The message gets published to the metadata servers as a different URL each time it changes, but the metadata servers always assign the URLs for a particular message channel to the same path on disk.  The client downloads the metadata and creates the filesystem, and users then periodically read the message data through !CoBlitz from the content server.  Then, the content server is only polled periodically by !CoBlitz and the metadata servers for the message, and clients see new messages in a timely manner (as soon as the URL for the particular message file in the metadata changes).  One example use-case with these semantics is scalable package deployment (e.g. APT and Stork could use this).[[BR]][[BR]]'''Role of !CoBlitz:'''  !CoBlitz caches the messages sent from the publisher and thus serves as a scalable rendezvous point for subscribers. 
     43[[BR]]'''Rationale:''' Consider how a pub/sub system might work.  The publisher creates a message to disseminate to its subscribers.  To do so, the publisher may mirror the data to multiple rendezvous points on the Internet to prevent it from being overwhelmed with periodic polls from its subscribers.  A Syndicate system can achieve this functionality with the semantics above.  The message gets published to the metadata servers as a different URL each time it changes, but the metadata servers always assign the URLs for a particular message channel to the same path on disk.  The client downloads the metadata and creates the filesystem, and users then periodically read the message data through !CoBlitz from the content server.  Then, the content server is only polled periodically by !CoBlitz and the metadata servers for the message, and clients see new messages in a timely manner (as soon as the URL for the particular message file in the metadata changes).  One example use-case with these semantics is scalable package deployment (e.g. APT and Stork could use this).[[BR]][[BR]] 
     44 
     45'''Role of !CoBlitz:  '''!CoBlitz caches the messages sent from the publisher and thus serves as a scalable rendezvous point for subscribers.