Developer documentation v1 on scmproj plugin

Date: 2009-11-28
Version: 0.5

Contents

1   Motivation

Big and complex projects usually consists of several smaller components. Quite often similar complex projects can share some components, e.g. libraries of common functions etc.

But the basic unit in most modern DVCS is a branch representing united tree of files and directories. There is no easy way to only checkout a part of the tree, or combine sub-directories from unrelated branches into a new project.

Bazaar has very old plans to provide support for nested trees:

Unfortunately this work is not finished and nobody knows its future.

Git provides solution for this problem with submodules support:

Mercurial 1.3 and later also has built-in experimental support for this problem called subrepos:

and additionally more or less working extension (plugin) called forest:

The aim of scmproj plugin is to provide a solution similar to git submodules and hg forests for Bazaar VCS. But scmproj is designed to be more or less VCS-agnostic and to provide more flexible project configurations, including support for mixing branches from several different VCS into one complex project. This is the not primary goal though, hence some parts of this document is biased towards bzr-specifics.

2   Terminology used in the document

SCM -- software configuration management.

Project -- in the software industry it is usually a big tree of source code files, documentation texts, related resources (images, sounds, etc).

Component -- part of a complex project; usually a separate VCS branch.

Component branch -- some VCS branch for given component, represents a line of development.

Tip -- the most recent revision in a branch.

Checkout -- a working copy of the project files.

Subset -- a set of project components for which a checkout is created. Many project-wide commands uses a subset configuration to run specific VCS commands on the component branches.

Alt -- variation of the project to specify which component branches should be used to compose some version of the project. Alt concept is similar to branch in VCS sense, but it covers a set of several component branches.

Snapshot -- represents a state of the project at some point; usually it records the current revision for each component branch. A series of snapshots forms the history of the entire project.

Subproject -- a regular project that is used in another project. A subproject cannot use another subproject as a part.

Megaproject (or superproject) -- project that consists of usual components and/or sub-projects.

3   Model and concepts

3.1   Background

The main idea of this plugin is designed after CVS modules, svn:externals, hg forest and git submodules support. We are trying to take many good ideas from these solutions while keeping the model as clean as possible.

This plugin tries to help managing complex project consisting of several different and often independent VCS branches.

The design of the project configuration is leaning towards the centralized development model. Although there is no explicit restrictions for using scmproj to handle projects in pure distributed way, there is even support for keeping component repositories inside scmproj control directory. But in general any complex project requires some sort of centralized development, either social or technical. So we are focusing more on centralized model.

3.2   Goals

Scmproj design and implementation can be complex because it needs to handle complex projects, but it should not be over complicated.

Scmproj should:

  1. Provide a way to describe the complex project structure via some sort of config file;
  2. Provide a way to replicate the project structure on a local machine;
  3. Provide a way to create a checkout of the project on a local machine for all components or only for a part of them;
  4. Provide a way to work with a local project checkout as a unit. There should be an easy way to execute commands for each component in the local checkout.
  5. Provide a way to save the exact project state at some time and then restore it easily;
  6. Provide a way to work with short-lived feature branches for some components of the project;
  7. Provide a way to include one project into another as subproject;
  8. Don't restrict users on creating any sensible project configuration.

Scmproj could:

  1. Provide support for heterogeneous projects when for different components used different VCS (e.g. external libraries);
  2. Treat some components as "readonly" and prevent user to commit/push modifications for them.

3.3   Concepts

Project config -- special configuration file that describe project structure, and components branches origins.

Alt -- variation of the project. It's similar to the branching in usual VCS practice. When you want to create new branch for some components of the project you can either create clone of the project and modify project config accordingly, or you can create new named variation of the same project and switch between existing variations easily. The Alt concept works better when your team is using a centralized workflow for projects. For pure distributed development this concept may sometimes be inconvenient.

Subset -- when you need to create a checkout for only some part of the entire project you can specify these components manually or use named subsets. Subsets should be predefined in the project config. The idea of a subset is the same as the idea of a partial checkout of files in one VCS branch.

Snapshot -- a special VCS branch to commit modifications of the project configuration and save the state of component branches (record their revision ids).

3.3.1   Example of subset usage

Let's imagine you have complex project and your team consist of programmers, artists, musicians and technical writers who work on documentation for the final product. Your project tree have 3 major components: docs, images/sounds used by the program and the source code of the program. Let's say on disk you have the following project layout:

docs/
images/
sounds/
sources/

Your programmers most likely will want to work with sources most of the time, but sometimes they need images and sounds as well. Your artists most likely will not use anything but images subdirectory. Your writer need only the docs and images to create beautiful documentation. So you can create several subsets of the project to allow different team members more effectively recreate part of the project for their specific work. You can create the following subsets:

  • "docs" for the writer, it will cover only "docs/" and "images/" subdirs
  • "images" for the artist, it will cover only "images/" subdir
  • "sounds" for the musician, it will cover only "sounds/" subdir
  • "src" for the programmers, it will cover "src/", "images/" and "sounds/" subdirs

And of course for building the final product you need to checkout all components, this is your default subset.

3.4   Model

3.4.1   Hosting a project on a server

When user publish config of complex project on some server you actually publish the project control branch with committed project configuration file. The control branch is also supposed to keep snapshots of the project. The bzr-scmproj plugin checks presence of project confog file in the revision tree to determine is the remote branch actually scmproj control branch or not.

3.4.2   Local project checkout

To keep scmproj control branch outside the regular files in components trees there is special ".scmproj" directory is used (similar to control .bzr directory etc). This way scmproj avoids using a regular VCS branch in the root of the project to hold its project config and settings. Thus it's possible to create a checkout of one component directly in the project root, and it will not clash with scmproj metadata at all.

Local project copies are intended for creating a clone of the component branches for local work.

4   Typical usage scenarios

The project admin or team lead creates a new project and publishes it on the central server. Team members checks out the project on their local machines. For convenience they could work on the entire project or only on a part, using different subsets to select components to check out. This plugin provides helper commands that allows you to run regular VCS commands like status, diff, commit for every active component of the project.

When there is need to create parallel line of development for the project (to create new branches for all or only some components) project admin will create another alts, so different people can work on different variants of the same project. This could be useful for keeping slightly different versions of the project in one place (e.g. for building releases, or doing long term new/experimental features development).

Regularly developers will update project snapshot info and commit it and optionally create some descriptive tags for snapshot revisions.

Once new stable version of project is available the build team or build system will checkout fresh copy of the project (using snapshots/tags) and build complete version for users/customers.

5   Composition of the scmproj control directory

Project meta-data stored in the control branch. This branch stored as is on the remote server, but placed iside .scmproj directory in the root of the project for local usage. Using a separate directory allows to put even one component in the root of the project.

Structure of .scmproj directory (it has regular bzr branch inside):

.scmproj/
    .bzr/
    alts/
    project.cfg
    readme.txt

Subdirectory alts/ supposed to keep snapshot data.

File project.cfg is the main project configuration.

File readme.txt provides some useful info about content of the control branch.

6   Project configuration project.cfg file format

Date:2009/11/28
Version:"Subprojects for real"

File project.cfg has an ini-style format with several sections/subsections with key = value content. ConfigObj library is used to parse project.cfg file. The file content is utf-8 encoded.

All reserved words, section and key names are uppercase to avoid name clashes with user variables. So user should use only lowercase names for your own variables/keys.

There is no mandatory sections, everything is optional.

Nevertheless one could use several optional sections:

Every component and subproject should be described in additional named sections.

Skeleton of the file:

[PROJECT]
## Optional project description
NAME =
DESCRIPTION =
## VCS used for project components.
## Every component can override this parameter.
VCS = bzr
## Optional space-separated list of known components
# COMPONENTS =

[ALTS]
## describe project variations here in the form
## variation_name = component1:branchA component2:branchB

[SUBSETS]
## describe predefined partial subsets of project here
## as space separated lists of visible components:
## subset_name = component1 component2 ...

[COMPONENTS]
## describe some component templates, which will be used unless it is
## overridden in the component section
# RELPATH =  ## available template variable {COMPONENT}
# BRANCH_URL =  ## available template variables {COMPONENT}, {BRANCH}, {RELPATH}
## List of the default public branches for components, e.g. BRANCHES = trunk
## This list can be overriden in component section.
# BRANCHES =
## Optional default format for components
# FORMAT =
## Optional default VCS for components
# VCS =

## describe each component in subsequent sections
# [component "name"]
# ## Optional component description
# DESCRIPTION =
# ## Mapping of component directory to project tree
# RELPATH =
# ## Optionally specify VCS used for this component (default is bzr)
# VCS =
# ## Optionally specify repo/branch format used for this component
# FORMAT =
# ## Optionally specify read-only mode for component (True/False)
# READONLY =
# ## Space separated list of known public branches for this component;
# ## by default branches names used as relative paths for component
# ## repo url.
# BRANCHES =
# ## Optional URL to branch (template, see notes from COMPONENTS section)
# BRANCH_URL =
# ## To provide detailed info about each public branch use subsequent subsections
#     [[branch "name"]]
#     ## Optionally specify VCS used for this branch (default is bzr)
#     VCS =
#     ## Optionally specify branch format used for this branch
#     FORMAT =
#     ## Optional URL to this branch
#     BRANCH_URL =
#     ## Optional revision info to checkout this branch
#     REVISION =
#     ## Optionally specify read-only mode for branch (True/False)
#     READONLY =

## describe each subproject in subsequent sections
# [subproject "name"]
# ## Mapping of subproject root directory to project tree
# RELPATH =
# ## URL of subproject location
# SUBPROJECT_URL =

See detailed explanation below.

6.1   PROJECT Section

To provide a optional project description you can use these keys:

  • NAME -- short name of the project
  • DESCRIPTION -- more or less verbose description

You should specify a default vcs used for component branches with the following key:

  • VCS -- the default value is 'bzr'

Optionally you can provide the space-separated list of known components with COMPONENTS key (see below Components Sections).

6.2   ALTS Section

In this section you can declare a set of your project variants. Every alt is the space-separated list of items:

<component name> <:> <component branch name>

E.g., foo:bar, spam:trunk.

There is one predefined alt named DEFAULT, by default it is equal to the first branch of every component (in the known branches list for component). But you can override this default behavior either by specifying your own list, or to make a link to another alt; a link is marked with a @ prefix, e.g.:

[ALTS]
DEFAULT = @eggs
eggs = foo:bar

It is possible to define a default branch name per alt, and also to override this default for specific cases, eg:

[ALTS]
dev = {COMPONENT}:trunk
qa = {COMPONENT}:qa non-standard-component:bar

In this instance for the dev alt it would use the trunk branch for all components. For the qa alt it would use the qa branch for all components except for non-standard-component where it will use bar.

6.3   SUBSETS Section

This section is similar to the ALTS section, but you only specify a list of visible components for every subset. There is three predefined subsets: DEFAULT, FULL and NULL.

Subset FULL enables all components.

Subset NULL disables all components.

DEFAULT subset by default enables all components. But you can override this default behavior either by specifying your own list, or by making a link to another subset; a link is marked with a @ prefix, e.g.:

[SUBSETS]
DEFAULT = @fish
fish = foo spam

You cannot override FULL or NULL subsets.

6.4   COMPONENTS Section

This section is for specifying some component templates, which will be used unless it is overridden in the component section. e.g.:

[PROJECT]
COMPONENTS = foo

[COMPONENTS]
RELPATH = {COMPONENT}

[foo]
# RELPATH = foo

In this section you can declare the following two templates:

  • RELPATH -- for component working copy relpath
  • BRANCH_URL -- for component branch origin

In addition you can provide default set of public branches for components using option BRANCHES, e.g.:

BRANCHES = trunk dev

For the RELPATH template, you may only use the {COMPONENT} template parameter, for which the component name will be substituted. In this case you don't need to specify the the RELPATH for the foo component as it is already defined by the template.

For the BRANCH_URL template you may use template variables: {COMPONENT} (component name), {BRANCH} (branch name), and {RELPATH}.

In this section user also can provide default values for FORMAT and VCS used for components branches.

6.5   Components Sections

6.5.1   Specify components list for the project

User can specify the list of project components either via COMPONENTS option in the PROJECT section, or simply by declaring components sections.

COMPONENTS option in the PROJECT section should be space-separated list of component names. This list can be used alone without dedicated sections for the components. In this case user should provide mandatory component options (RELPATH and BRANCH_URL) via templates in COMPONENTS section.

In the case COMPONENTS list is used with dedicated component sections, this list should be superset of components sections set.

6.5.2   Dedicated component section

To describe a component user need to use dedicated section in the config. Section header should be as following:

[component "name"]

Where:

  • component -- reserved keyword to specify component section
  • name -- the name of component. Double quotes around the name are optional.

6.5.3   Main info for component

For every component you should specify mandatory info and could specify some optional info.

Mandatory component info:

  • RELPATH -- mapping of component directory to project tree
  • BRANCH_URL -- URL to the branch (can be a template)

Optional component info:

  • DESCRIPTION -- more or less verbose description of component
  • VCS -- main vcs used for component branches
  • FORMAT -- vcs repository format of component branches (this option could be very important to bzr and its zoo of formats)
  • READONLY -- you can use this option to prohibit local changes to component files (useful when using vendor branches)

6.5.4   Specifying component branches

When project used only one alt then there is only one branch for every component.

But when the project need several alts then some components will have several branches. User could specify set of component branches explicitly or implicitly.

To specify explicitly list of component branches user should use BRANCHES option in the component section (or in the COMPONENTS template section). The value of BRANCHES should be space-separated list of known branches names. Branch names can be used in the templates then (as {BRANCH}).

Another way to specify set of branches for component implicitly is to using special subsections for every branch.

6.5.5   Component branches info subsections

Branch subsections header should be as following:

[[branch "name"]]

Where:

  • branch -- reserved keyword to specify component branch subsection
  • name -- the name of branch. Double quotes around the name are optional.

Branches subsections for component are optional, and they could provide following optional info:

  • BRANCH_URL -- full URL to the branch
  • REVISION -- explicit revision info used to checkout branch
  • VCS -- vcs used for this branch
  • FORMAT -- vcs repository format for this branch
  • READONLY -- you can use this option to prohibit local changes to branch files (useful when using vendor branch)

6.5.6   Examples

  • The project contains 2 components named foo and bar. User should specify them in the config as 2 sections:

    [PROJECT]
    VCS = bzr
    
    [component "foo"]
    RELPATH = docs
    BRANCH_URL = http://example.com/foo
    
    [component "bar"]
    RELPATH = src
    BRANCH_URL = http://example.com/bar
    
  • User can specify branch URL via templates in COMPONENTS section (see above) or in the dedicated branch section:

    [COMPONENTS]
    BRANCH_URL = http://example.com/{COMPONENT}
    
    [component "foo"]
    RELPATH = some/path
    
    [component "bar"]
    RELPATH = some/another/path
    
        [[branch "baz"]]
        BRANCH_URL = http://example.com/boo/{BRANCH}
    

    Then for "foo" component will be used http://example.com/foo URL.

6.6   Subprojects sections

To describe a subproject you should use dedicated section in the config. Section header should be as following:

[subproject "name"]

Where:

  • subproject -- reserved keyword to specify subproject section
  • name -- the name of subproject. Double quotes around the name are optional.

For every subproject you should specify mandatory info and could specify some optional info.

Mandatory subproject info:

  • RELPATH -- mapping of subproject directory to project tree
  • SUBPROJECT_URL -- URL to subproject origin

Example:

[subproject "cats"]
RELPATH = meow
SUBPROJECT_URL = http://www.cats.net/fluffy

7   Local project settings

Because scmproj keeps metadata in the regular [bzr] branch then it uses existing way to read/write user settings in the branch config (branch.conf).

See table below with list of used options.

Option Name Comment
parent_location standard bzr var for parent location of branch after cloning (used to keep parent location of project at server)
scmproj:alt current alt of the project checkout
scmproj:subset current subset of the project checkout
scmproj:workspace workspace configuration (plain or boxed)

8   Limitations

Component names, subproject names, alt names used to create corresponding objects in the filesystem (either for local copies of branches or for snapshots). To avoid problems with local filesystem (especially for cross-platform development) you'd better to use only safe characters, i.e. ascii letters, digits, dash sign and underscore sign in your names. Also do not use names that will differ only in the letters case (e.g., foo vs Foo).

9   Snapshots

Snapshots should allow the users to recreate some past state of the entire project. To achieve this goal system should have info about past project.cfg, past alt/subset configuration and revision ids for all component branches.

So, basic implementation of snapshots storage is usual branch where current state of project.cfg and some info about current alt/subset and revid are recorded.

File project.cfg is committed "as is", without any transformations.

To commit alt/subset/revid info there is used special ini-style files (index cards). Every index card file name is equal to the alt name. Custom alts are not committed, because they supposed to be local thing.

Every index card should provide enough information to recreate state of local project working copy and should record following data:

Per example:

ALT = trunk
SUBSET = foo bar
foo = a@b.c-12345
bar = a@b.c-67890

To operating with snapshots directly there should be special [hidden] commands like update-snapshot, commit-snapshot, but in general most of the project-wide commands should work with snapshots automatically where appropriate.

Snapshots should and will provide enough information to recreate any project from the scratch (if component branches are still available by recorded URLs). So for operations like get/clone project or create project checkout one can use either project origin URL or snapshot branch URL. This will allow hosting projects in the form of snapshot branches on the branch-based hostings, e.g. Launchpad.net.

Snapshot branch is usual vcs branch, so you can tag revisions there and therefore will have stable project-wide tags.

NOTE: Snapshot branch is essential part of the project but to working with bleeding edge of the project branches (tips) snapshot is not mandatory thing, because project.cfg is primary info source.

10   Using subprojects

If you have a small or medium project and you want to add its components to another bigger project you need either copy-paste components descriptions in new project config or reuse exisiting project configuration by referencing it as subproject in bigger project.

Subprojects are referenced as a single unit, you can't pick specific component of subproject from megaproject config.

11   Operations and UI

11.1   Main commands

Already implemented:

  • project-init (pinit) -- initialize new project
  • project-info (pinfo) -- shows info about project
  • project-get (pget) -- get copy of the project for local work
  • project-update (pup) -- update components branches
  • project-publish (pback) -- push changes in components branches back to server
  • project-commit (pci) -- commit changes in all components and update snapshot
  • project-command (pcmd, prun) -- run arbitrary [bzr] command(s) for components

Not yet implemented:

  • project-status (pst) -- run vcs status command for every component
  • project-diff (pdi) -- run vcs diff command for every component
  • project-log (plog) -- show history of the project based on known snapshots

12   Bazaar bugs affecting scmproj