| Date: | 2009-11-28 |
|---|---|
| Version: | 0.5 |
Big and complex projects usually consists of several smaller components. Quite often similar complex projects can share some components, e.g. libraries of common functions etc.
But the basic unit in most modern DVCS is a branch representing united tree of files and directories. There is no easy way to only checkout a part of the tree, or combine sub-directories from unrelated branches into a new project.
Bazaar has very old plans to provide support for nested trees:
Unfortunately this work is not finished and nobody knows its future.
Git provides solution for this problem with submodules support:
Mercurial 1.3 and later also has built-in experimental support for this problem called subrepos:
and additionally more or less working extension (plugin) called forest:
The aim of scmproj plugin is to provide a solution similar to git submodules and hg forests for Bazaar VCS. But scmproj is designed to be more or less VCS-agnostic and to provide more flexible project configurations, including support for mixing branches from several different VCS into one complex project. This is the not primary goal though, hence some parts of this document is biased towards bzr-specifics.
SCM -- software configuration management.
Project -- in the software industry it is usually a big tree of source code files, documentation texts, related resources (images, sounds, etc).
Component -- part of a complex project; usually a separate VCS branch.
Component branch -- some VCS branch for given component, represents a line of development.
Tip -- the most recent revision in a branch.
Checkout -- a working copy of the project files.
Subset -- a set of project components for which a checkout is created. Many project-wide commands uses a subset configuration to run specific VCS commands on the component branches.
Alt -- variation of the project to specify which component branches should be used to compose some version of the project. Alt concept is similar to branch in VCS sense, but it covers a set of several component branches.
Snapshot -- represents a state of the project at some point; usually it records the current revision for each component branch. A series of snapshots forms the history of the entire project.
Subproject -- a regular project that is used in another project. A subproject cannot use another subproject as a part.
Megaproject (or superproject) -- project that consists of usual components and/or sub-projects.
The main idea of this plugin is designed after CVS modules, svn:externals, hg forest and git submodules support. We are trying to take many good ideas from these solutions while keeping the model as clean as possible.
This plugin tries to help managing complex project consisting of several different and often independent VCS branches.
The design of the project configuration is leaning towards the centralized development model. Although there is no explicit restrictions for using scmproj to handle projects in pure distributed way, there is even support for keeping component repositories inside scmproj control directory. But in general any complex project requires some sort of centralized development, either social or technical. So we are focusing more on centralized model.
Scmproj design and implementation can be complex because it needs to handle complex projects, but it should not be over complicated.
Scmproj should:
Scmproj could:
Project config -- special configuration file that describe project structure, and components branches origins.
Alt -- variation of the project. It's similar to the branching in usual VCS practice. When you want to create new branch for some components of the project you can either create clone of the project and modify project config accordingly, or you can create new named variation of the same project and switch between existing variations easily. The Alt concept works better when your team is using a centralized workflow for projects. For pure distributed development this concept may sometimes be inconvenient.
Subset -- when you need to create a checkout for only some part of the entire project you can specify these components manually or use named subsets. Subsets should be predefined in the project config. The idea of a subset is the same as the idea of a partial checkout of files in one VCS branch.
Snapshot -- a special VCS branch to commit modifications of the project configuration and save the state of component branches (record their revision ids).
Let's imagine you have complex project and your team consist of programmers, artists, musicians and technical writers who work on documentation for the final product. Your project tree have 3 major components: docs, images/sounds used by the program and the source code of the program. Let's say on disk you have the following project layout:
docs/ images/ sounds/ sources/
Your programmers most likely will want to work with sources most of the time, but sometimes they need images and sounds as well. Your artists most likely will not use anything but images subdirectory. Your writer need only the docs and images to create beautiful documentation. So you can create several subsets of the project to allow different team members more effectively recreate part of the project for their specific work. You can create the following subsets:
And of course for building the final product you need to checkout all components, this is your default subset.
When user publish config of complex project on some server you actually publish the project control branch with committed project configuration file. The control branch is also supposed to keep snapshots of the project. The bzr-scmproj plugin checks presence of project confog file in the revision tree to determine is the remote branch actually scmproj control branch or not.
To keep scmproj control branch outside the regular files in components trees there is special ".scmproj" directory is used (similar to control .bzr directory etc). This way scmproj avoids using a regular VCS branch in the root of the project to hold its project config and settings. Thus it's possible to create a checkout of one component directly in the project root, and it will not clash with scmproj metadata at all.
Local project copies are intended for creating a clone of the component branches for local work.
The project admin or team lead creates a new project and publishes it on the central server. Team members checks out the project on their local machines. For convenience they could work on the entire project or only on a part, using different subsets to select components to check out. This plugin provides helper commands that allows you to run regular VCS commands like status, diff, commit for every active component of the project.
When there is need to create parallel line of development for the project (to create new branches for all or only some components) project admin will create another alts, so different people can work on different variants of the same project. This could be useful for keeping slightly different versions of the project in one place (e.g. for building releases, or doing long term new/experimental features development).
Regularly developers will update project snapshot info and commit it and optionally create some descriptive tags for snapshot revisions.
Once new stable version of project is available the build team or build system will checkout fresh copy of the project (using snapshots/tags) and build complete version for users/customers.
Project meta-data stored in the control branch. This branch stored as is on the remote server, but placed iside .scmproj directory in the root of the project for local usage. Using a separate directory allows to put even one component in the root of the project.
Structure of .scmproj directory (it has regular bzr branch inside):
.scmproj/
.bzr/
alts/
project.cfg
readme.txt
Subdirectory alts/ supposed to keep snapshot data.
File project.cfg is the main project configuration.
File readme.txt provides some useful info about content of the control branch.
| Date: | 2009/11/28 |
|---|---|
| Version: | "Subprojects for real" |
File project.cfg has an ini-style format with several sections/subsections with key = value content. ConfigObj library is used to parse project.cfg file. The file content is utf-8 encoded.
All reserved words, section and key names are uppercase to avoid name clashes with user variables. So user should use only lowercase names for your own variables/keys.
There is no mandatory sections, everything is optional.
Nevertheless one could use several optional sections:
Every component and subproject should be described in additional named sections.
Skeleton of the file:
[PROJECT]
## Optional project description
NAME =
DESCRIPTION =
## VCS used for project components.
## Every component can override this parameter.
VCS = bzr
## Optional space-separated list of known components
# COMPONENTS =
[ALTS]
## describe project variations here in the form
## variation_name = component1:branchA component2:branchB
[SUBSETS]
## describe predefined partial subsets of project here
## as space separated lists of visible components:
## subset_name = component1 component2 ...
[COMPONENTS]
## describe some component templates, which will be used unless it is
## overridden in the component section
# RELPATH = ## available template variable {COMPONENT}
# BRANCH_URL = ## available template variables {COMPONENT}, {BRANCH}, {RELPATH}
## List of the default public branches for components, e.g. BRANCHES = trunk
## This list can be overriden in component section.
# BRANCHES =
## Optional default format for components
# FORMAT =
## Optional default VCS for components
# VCS =
## describe each component in subsequent sections
# [component "name"]
# ## Optional component description
# DESCRIPTION =
# ## Mapping of component directory to project tree
# RELPATH =
# ## Optionally specify VCS used for this component (default is bzr)
# VCS =
# ## Optionally specify repo/branch format used for this component
# FORMAT =
# ## Optionally specify read-only mode for component (True/False)
# READONLY =
# ## Space separated list of known public branches for this component;
# ## by default branches names used as relative paths for component
# ## repo url.
# BRANCHES =
# ## Optional URL to branch (template, see notes from COMPONENTS section)
# BRANCH_URL =
# ## To provide detailed info about each public branch use subsequent subsections
# [[branch "name"]]
# ## Optionally specify VCS used for this branch (default is bzr)
# VCS =
# ## Optionally specify branch format used for this branch
# FORMAT =
# ## Optional URL to this branch
# BRANCH_URL =
# ## Optional revision info to checkout this branch
# REVISION =
# ## Optionally specify read-only mode for branch (True/False)
# READONLY =
## describe each subproject in subsequent sections
# [subproject "name"]
# ## Mapping of subproject root directory to project tree
# RELPATH =
# ## URL of subproject location
# SUBPROJECT_URL =
See detailed explanation below.
To provide a optional project description you can use these keys:
You should specify a default vcs used for component branches with the following key:
Optionally you can provide the space-separated list of known components with COMPONENTS key (see below Components Sections).
In this section you can declare a set of your project variants. Every alt is the space-separated list of items:
<component name> <:> <component branch name>
E.g., foo:bar, spam:trunk.
There is one predefined alt named DEFAULT, by default it is equal to the first branch of every component (in the known branches list for component). But you can override this default behavior either by specifying your own list, or to make a link to another alt; a link is marked with a @ prefix, e.g.:
[ALTS] DEFAULT = @eggs eggs = foo:bar
It is possible to define a default branch name per alt, and also to override this default for specific cases, eg:
[ALTS]
dev = {COMPONENT}:trunk
qa = {COMPONENT}:qa non-standard-component:bar
In this instance for the dev alt it would use the trunk branch for all components. For the qa alt it would use the qa branch for all components except for non-standard-component where it will use bar.
This section is similar to the ALTS section, but you only specify a list of visible components for every subset. There is three predefined subsets: DEFAULT, FULL and NULL.
Subset FULL enables all components.
Subset NULL disables all components.
DEFAULT subset by default enables all components. But you can override this default behavior either by specifying your own list, or by making a link to another subset; a link is marked with a @ prefix, e.g.:
[SUBSETS] DEFAULT = @fish fish = foo spam
You cannot override FULL or NULL subsets.
This section is for specifying some component templates, which will be used unless it is overridden in the component section. e.g.:
[PROJECT]
COMPONENTS = foo
[COMPONENTS]
RELPATH = {COMPONENT}
[foo]
# RELPATH = foo
In this section you can declare the following two templates:
In addition you can provide default set of public branches for components using option BRANCHES, e.g.:
BRANCHES = trunk dev
For the RELPATH template, you may only use the {COMPONENT} template parameter, for which the component name will be substituted. In this case you don't need to specify the the RELPATH for the foo component as it is already defined by the template.
For the BRANCH_URL template you may use template variables: {COMPONENT} (component name), {BRANCH} (branch name), and {RELPATH}.
In this section user also can provide default values for FORMAT and VCS used for components branches.
User can specify the list of project components either via COMPONENTS option in the PROJECT section, or simply by declaring components sections.
COMPONENTS option in the PROJECT section should be space-separated list of component names. This list can be used alone without dedicated sections for the components. In this case user should provide mandatory component options (RELPATH and BRANCH_URL) via templates in COMPONENTS section.
In the case COMPONENTS list is used with dedicated component sections, this list should be superset of components sections set.
To describe a component user need to use dedicated section in the config. Section header should be as following:
[component "name"]
Where:
For every component you should specify mandatory info and could specify some optional info.
Mandatory component info:
Optional component info:
When project used only one alt then there is only one branch for every component.
But when the project need several alts then some components will have several branches. User could specify set of component branches explicitly or implicitly.
To specify explicitly list of component branches user should use BRANCHES option in the component section (or in the COMPONENTS template section). The value of BRANCHES should be space-separated list of known branches names. Branch names can be used in the templates then (as {BRANCH}).
Another way to specify set of branches for component implicitly is to using special subsections for every branch.
Branch subsections header should be as following:
[[branch "name"]]
Where:
Branches subsections for component are optional, and they could provide following optional info:
The project contains 2 components named foo and bar. User should specify them in the config as 2 sections:
[PROJECT] VCS = bzr [component "foo"] RELPATH = docs BRANCH_URL = http://example.com/foo [component "bar"] RELPATH = src BRANCH_URL = http://example.com/bar
User can specify branch URL via templates in COMPONENTS section (see above) or in the dedicated branch section:
[COMPONENTS]
BRANCH_URL = http://example.com/{COMPONENT}
[component "foo"]
RELPATH = some/path
[component "bar"]
RELPATH = some/another/path
[[branch "baz"]]
BRANCH_URL = http://example.com/boo/{BRANCH}
Then for "foo" component will be used http://example.com/foo URL.
To describe a subproject you should use dedicated section in the config. Section header should be as following:
[subproject "name"]
Where:
For every subproject you should specify mandatory info and could specify some optional info.
Mandatory subproject info:
Example:
[subproject "cats"] RELPATH = meow SUBPROJECT_URL = http://www.cats.net/fluffy
Because scmproj keeps metadata in the regular [bzr] branch then it uses existing way to read/write user settings in the branch config (branch.conf).
See table below with list of used options.
| Option Name | Comment |
|---|---|
| parent_location | standard bzr var for parent location of branch after cloning (used to keep parent location of project at server) |
| scmproj:alt | current alt of the project checkout |
| scmproj:subset | current subset of the project checkout |
| scmproj:workspace | workspace configuration (plain or boxed) |
Component names, subproject names, alt names used to create corresponding objects in the filesystem (either for local copies of branches or for snapshots). To avoid problems with local filesystem (especially for cross-platform development) you'd better to use only safe characters, i.e. ascii letters, digits, dash sign and underscore sign in your names. Also do not use names that will differ only in the letters case (e.g., foo vs Foo).
Snapshots should allow the users to recreate some past state of the entire project. To achieve this goal system should have info about past project.cfg, past alt/subset configuration and revision ids for all component branches.
So, basic implementation of snapshots storage is usual branch where current state of project.cfg and some info about current alt/subset and revid are recorded.
File project.cfg is committed "as is", without any transformations.
To commit alt/subset/revid info there is used special ini-style files (index cards). Every index card file name is equal to the alt name. Custom alts are not committed, because they supposed to be local thing.
Every index card should provide enough information to recreate state of local project working copy and should record following data:
Per example:
ALT = trunk SUBSET = foo bar foo = a@b.c-12345 bar = a@b.c-67890
To operating with snapshots directly there should be special [hidden] commands like update-snapshot, commit-snapshot, but in general most of the project-wide commands should work with snapshots automatically where appropriate.
Snapshots should and will provide enough information to recreate any project from the scratch (if component branches are still available by recorded URLs). So for operations like get/clone project or create project checkout one can use either project origin URL or snapshot branch URL. This will allow hosting projects in the form of snapshot branches on the branch-based hostings, e.g. Launchpad.net.
Snapshot branch is usual vcs branch, so you can tag revisions there and therefore will have stable project-wide tags.
NOTE: Snapshot branch is essential part of the project but to working with bleeding edge of the project branches (tips) snapshot is not mandatory thing, because project.cfg is primary info source.
If you have a small or medium project and you want to add its components to another bigger project you need either copy-paste components descriptions in new project config or reuse exisiting project configuration by referencing it as subproject in bigger project.
Subprojects are referenced as a single unit, you can't pick specific component of subproject from megaproject config.
Already implemented:
Not yet implemented:
https://bugs.launchpad.net/bugs/305802
Branch in the root of treeless shared repository ignores --no-trees option: if you have in component branches list branch name as '.' you'll end up with checkout of this branch in the root of shared repository.
XXX Perhaps scmproj should prohibit of using such branch names?
https://bugs.launchpad.net/bzr/+bug/274021
bzr init ignores format argument when creating branch inside of shared repository: this bug affects situation when some component uses mixes of branch formats for different alts. It's unlikely to be good idea anyway, but this bug makes such mix completely impossible.
FIXED: https://bugs.launchpad.net/bzr/+bug/307554
bzr branch command can't create new branch in the root of existing empty directory. This prevents to direct braching of component branch into the root of project checkout or other exisiting subdirectories. As a workaround one can use bzr checkout && bzr unbind sequence, although it's not the same in the terms of side effects.
https://bugs.launchpad.net/bzr/+bug/207762/
Many bzr commands can be invoked from outside target branch and point to the branch either with -d option, or by path. Unfortunately bzr missing does not support either. This makes it very difficult to use missing with project-command.
As workaround one need to use "chdir" command provided by the plugin.
FIXED: https://bugs.launchpad.net/bzr/+bug/285055/
bzr switch --force does not work for lightweight checkouts if original master branch is gone. Because working copies for project components are used lightweight checkouts this bug can make handling components checkouts simply horrible.
https://bugs.launchpad.net/bzr/+bug/330086
Relative paths in branch references is not supported at all. This makes boxed environment immovable in the filesystem. Too bad.
https://bugs.launchpad.net/bzr/+bug/333790
For plain workspace it will be nice to have bzr init --standalone to create working trees.