Debian in Biology and Medicine: 2011

Friday 28 October 2011

New set of metapackages released (Posted by Andreas Tille)

The metapackages which are used to simplify medical applications on Debian GNU/Linux systems were updated in unstable. The most important changes are:

New metapackage med-oncology recommending package dicompyler
New metapackage med-rehabilitation recommending package sitplus
Metapackage med-imaging now recommends package ginkgocadx
Metapackage med-bio recommends several new packages (as usual)

Feel free to install these metapackages and enjoy the new applications on your Debian system.

Wednesday 3 August 2011

Debian Med BOF @ DebConf 11 in Banja Luka (Posted by Andreas Tille)

Andreas Tille had a BOF about Debian Med at DebConf 11 in Banja Luka at 30.7.2011. The slides as well as a video record can be downloaded.
Andreas had given an overview about the current status of Debian Med. It can be stated that the project which is one out of several Debian Pure Blends had made good progress over its nine years of existence. While several applications which might be used in medical care in the wider sense (which also stretches to microbiological research) were added to Debian so that we have a count of close to 200 packages it was also stated that the existence of Debian Med itself has attracted new Developers to the Debian project. Saying this Free Software for medicine profits from the Debian distribution and Debian was able to gather new manpower because this software was included.
The following TODO items were raised in the BOF
Visibility:

Logo
more effective use of social media to attract more users
Work together with Debian Publicity
Release in stable PPA and announce this

Plans for the next Debian Release:

Hospital information system VistA or at least GT.M (precondition for VistA)
Additional practice management systems FreeMed & FreeMedForms
Further Java packages like Jalview, Beast, Artemis (biology), mayam (medical imaging), dcm4chee (PACS server)
radiation therapy planning Prism

Friday 22 July 2011

2011 Codefest, BOSC and ISMB

Vienna just hosted the Codefest of the Open Bioinformatics Foundation, followed by the Bioinformatics Open Source Conference (BOSC) and the main conference Intelligent Systems in Molecular Biology (ISMB). Debian Med had a talk+poster at BOSC and somehow nobody took that down for the main conference which (for some this may be the proof for the existence of some higher might) let it hang next to the only poster by Microsoft among some 1000 or so. Hillarious. It should be noted that the Microsoft representative, Dr. Simon Mercer, seems to be a nice guy and he did not take it down either.

As a community it was nice to meet many contributors to Debian again, bringing up sweet memories also from January's sprint and Tim to volunteer organising the next one. The Codefest brought preliminary packages for GMOD's chado and jbrowse in collaboration with Scott Cain, who later at BOSC finally met with Olivier, our maintainer of gbrowse. One was also reminded of a series of nice efforts to bring microscopy and its analysis of cellular images closer to Java. And besides ImageJ this also means the need for Fiji. To bring our medical and biological users closer together, those packages have some extra importance to us. After some exchance with respective upstreams, things are getting increasingly ready for our (re)distribution. Michael from the Ensembl helpdesk had a review of the Ensembl packaging and he liked it a lot, which is good. Well, this is not too surprising, since it is maintained by Eagle Genomics - professionals in Ensembl and Cloud Computing ... and sponsors of BOSC. He also encouraged us to submit patches to remove/reduce dependencies on the long outdated version 1.2.3 of BioPerl, so it could leave the experimental section. Jim Procter sees some light towards a Debian package of Jalview. BOSC then brought a series of additional promising contacts and so did the ISMB itself. There is a strong interest from many sides to get getData into a pristine shape. With Peter Rice we have now agreed to get instructions for his EMBOSS tools, maintained by Charles, into it, starting with the classical protein sequence, motif and interaction databases. One is tempted to bring also a text based human genome in from Ensembl, but we'll see how far we get.

A major dominance throughout the Codefest and BOSC in a series of presentations had the Cloud Bio-Linux initiative. Everybody seems to like it, also in industry. It will be described in more detail in a separate post. It deserves it. When Debian approaches the best possible bioinformatics environment (what is that? and for whom?) in a bottom-up approach, Cloud Bio-Linux is top-down: bring something up that works and use Debian technologies whenever this his helpful. They will ask for money, from the government and from industry, earmarked for their bioinformatics ambitions, and from those positive vibes on the conference one tends to think the'll get some. When remembering e.g. the Dunk Tank disaster, anything like that seems rather unlikely to happen from the Debian side. But that is fine. Cloud Bio-Linux is contributing to Debian, and their contributors are feeling to be a part of Debian Med, too.

The community is possibly the most important aspect of it all. The expertise on biomedical computing collected in a single room at the Codefest and BOSC is enormous. And while there are many many many many Macs on many laps, about everyone one talks to uses Ubuntu on their
desktop/servers back home. The Debian Med's repository gives them all an opportunity to bring their expertise up into a collaborative environment and form something larger from there. Last and this year this is the cloud and what you could do with it. Next year brings complete tangible workflows, we are sure.

Sunday 19 June 2011

Gingko-CADx - or - Debian Med is not only about bioinformatics

There is a new package in town, namely Gingko-CADx. This is all about viewing and (for those who want to) also about programming with image data from medical devices. And there are only a handful of packages that render it so obvious why Open Source is about freedom. A couple of years ago I was not so very concerned when I could not immediately read the image data from a befriended clinical institute. My personal eye opener was when I learned that the local Gynecologist could not read the 3D image data but relies on his ultra sound experience and the summary of the Radiologist. I am not so worried about him deciding not to look at it, it is more that this doctor cannot freely make this decision. He does not have the software. Well, he does, now, or to be more exact .. I should go and bring it.

Also in my field, the investigation of complex diseases, image data and their automated analysis is gaining ground. There are movies on the activity of animals, videos counting migrating cells, substances traced in the body ... the more molecular the better. This way, one aims at characterising intermediate phenotypes that are more strongly coupled to genotypic variations, and thus one hopes to better explain the complete disease phenotypes with those additional insights. This is not an academic exercise, but the very start of an understanding that different patients need different treatments. And we will understand better, in what ways the animal models of diseases differ from what one can observe in humans.

So, Open Source once again brings us better communication between doctors, patients and doctors, and between either and statistical geneticists. Yippee.

Friday 3 June 2011

Community efforts towards research on EHEC

Northern Germany these days experiences many hundreds of infections with severest clinical symptoms by a new strain of E. coli bacteria EHEC subtype. The genome has recently been sequenced only to find out that it is something novel, featuring the EHEC surface markers with something from African strains. No, antibiotics don't work, yes, the number of cases are still increasing, no, the source has not been found.

What would be great to have readily available for everyone are now tools to present some comparative genomics of that strain against the other strains of E. coli sequenced so far. I was very happy to read here http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/ about collaboratively analysing the raw sequencing data. The next step would then be to look at strain differences and interpret them - on the molecular level and in the clinical context. So, whatever tools we could help promoting to the more clinical researchers, they might be used for a while.

I am not completely sure about what Debian Med could help best with. So, much in line of some earlier blog entry on "doing things with Debian Med" I ask just anyone in contact with researchers working with E. coli (or other pathogenic bacteria) to report on what tools they would like to have available through us or what tools they are already using that Debian Med does not yet provide (so other can have it more easily).

Update [10th of June]: Through the Bioinformatics Computation group at LinkedIn I was pointed to http://enews.patricbrc.org/1172/e-coli-outbreak-new-comprehensive-comparisons/. It looks very nice though I have not any idea yet about how to work with it. And I learned about yet another tool, RAST, standing for 'Rapid Annotation using Subsystem Technology'. More comments, please.

Update [14th of June]: A group in Saarbuecken focuses on regulatory transcriptional motifs of EColi.

Sunday 22 May 2011

Hobbyists

To package for Debian is not difficult. It is just ... different. And once learned, that skill is ubiquitously applicable. Quite some bits that contribute to the motivation to support Debian Med in the first place is to get this kind of software out to the younger ones or just anyone seeking some opportunity to contribute to computational biology, medical informatics or clinical research in some way. This may have some very tangible outcomes, e.g. when you hear that the local doctor cannot read the DVD with images from a PET-CT then this can now be fixed for the next visit. Open Source software definitely can change the world a bit. And the packaging in Debian Med already today helps bringing the software into University Clinics world-wide and to those local doctors that are directly supported by Sebastian and Karsten.

So, if you are out there with an interest in packaging the one or other bit, just say hello on the Debian Med mailing list, please. If English does not come sufficiently easy to you then do not be afraid. At least European languages are well covered and we might find a Debian developer outside the Debian Med community for you to help in your mother tongue.

To not only read the PET-CT data but also deeply impress your doctor, currently there is help needed for a proper packaging of

Ginkgo-CADx

and for the bioinformatics side of Debian Med, a tricky beast to package (one would start with something that works before rendering is perfect) but nonetheless important is

The task pages (offline for maintenance while I type) of Debian Med show some entries in yellow (need help) and red (missing). Enough to do for everyone.

Sunday 15 May 2011

Who's supporting what ... Debian, Ubuntu, and mutual contributions to Debian Med

A basic idea behind the concept of Blends with Debian is to bring people together that happen to be interested in the same software - users and developers alike ... and to help users developing into developers when interested. Packages are commonly community maintained. When there is some issue spotted with a particular package, it is natural to everyone to fix it when the fix is free of side-effects and easier than writing an email to the regular maintainer. Or when one feels nice. Or when one is the maintainer.

To me, this very much summarises what Debian Med's support is about: once informed about an issue you fix it and/or inform the upstream developers about it. Different people are good at different kind of packages and different kind of problems and then: not everyone is interested in every bug, not everyone is having sufficient time available. So, we all complement each other rather nicely. No guarantees for anything, but trust in the individuals and the community to care. In contrast to the understanding of the term support in a commercial environment, we do not need to improve the packages beyond what upstream has developed. But when we do, then those changes are sent to upstream for an inclusion with the next version to profit everyone, i.e. also Windows or MacOS X users. For our scientific packages such a zero delta to upstream principle is particularly important to remain compatible e.g. with the bioinformatics community at large that may use the same version of a particular tool without our contributions.

Can we also support Ubuntu? Quite a few Debian Developers, Debian Maintainers, package curators and upstream developers are particularly happy to contribute to Debian Med because of Ubuntu's large user base. For them, getting the package into Debian is the way to get packages into Ubuntu. For software that is compatible to the Debian Free Software Guidelines software, the transition from Debian to Ubuntu is just smooth. And concerning support, any problems noticed for any of the packages in Debian Med, which are all on the periphery of the distribution, shall be fixed equally (i.e. no more than once) between the distros. The users and developers of those downstream distributions to Debian then help in spotting things earlier. These thoughts and more led to the initiation of Utnubu (ubuntU spelled backwards) and more recently the Debian Exchange projects. In my personal universe, I always felt Debian Med to be a couple of years ahead of that development. After all, we have subversion and git repositories to maintain our packages. And everyone can contribute to those packaging efforts. And we have a series of developers on Ubuntu who are actively contributing to it.

For users of Debian Med who are not working with the very latest version of their distro, like with oldstable (lenny) or stable (squeeze), 10.04 (lucid) or 10.10 (maverick), our packaging has some difficulties to reach them. They just won't see the recent submissions to the archive. What is not much of an issue for the core functionalities of a distribution, for the scientific edge this may be a problem. There are multiple answers to this:

as a user: force installation (with --force) and hope for compatibility with the libraries or compile packages yourself, which is easy:
- add deb-src of unstable to the sources.list
- say apt-get source --build packagename
- dpkg -i *.deb
as packagers: organise repositories also for older versions of the distributions

For Debian and Ubuntu this are the backports. But only few individuals have upload permissions to those separate repositories. For Ubuntu, and currently discussed also for Debian, there are also Personal Package Archives, in short PPA. Everybody, and any group of everybodies, can have such a repository under their own control. The upload to an older release commonly just means to specify that name in debian/changelog and then create a source-only package by adding the flag "-S" to dpkg-buildpackage. This saves the maintainer to invest all the build time and brings package maintenance down to netbooks and mobile phones so one can do it while waiting for/in the bus :)

Still, to render packages available to older distributions remains manual labour. There is no official support for those elderly distros, be they from Debian or from Ubuntu. To help the situation just a bit, and to grant access to Ubuntu users for those packages that were sent to the experimental section of Debian and/or help overcome the limitation during the freeze of release, a few weeks ago a first Debian Med PPA was created. Let's see what this brings over time.

Some more technical description of the upload from Debian to the PPA: Descriptions on how to upload are linked to the launchpad site. Just, the friendly abbreviations don't work for Debian. So one goes the manual way. The launchpad ftp server (if you are using FTP) does not report the current working directory but "OK"s every cd you make. This is somewhat irritating when using a client that changes to the pub directory upon login. The upload will then fail.
One should rather adopt the typical tools to upload like dput or dupload. The destination (at least for Debian users) needs to be specified manually e.g. for dput as follows:



cat >> ~/.dput.cf <<EOPUT

[debianmedppa]

incoming =  ~debian-med/ppa/ubuntu/

fqdn = ppa.launchpad.net

login = anonymous

method = ftp

allow_unsigned_uploads = 0

EOPUT

After locally building the source (or complete) package and signing it, this can then be uploaded with dput debianmedppa packagename*.changes. I cannot say that I have already completely understood every little aspect of launchpad, e.g. I get very much confused about how to distinguish Eucalyptus from their euca2ools. And I am very bad at Bazaar (their version control system). But I like what I have understood and hope they soon start to also support versions of Debian for their PPA and an auto-porting across releases. Debian is now planning for a Debian variant of PPAs. It is really high time for this and should possibly even substitute the experimental section IMHO. If they can afford it and are well advised, then they will also support some downstream distros with it and attempt auto-backports. We'll see.

Tuesday 3 May 2011

Gcc 4.6 transition (Posted by Andreas Tille)

Hello,
from gcc 4.4.x with a very short introduction to 4.5.x Debian is now leaping ahead to 4.6.x with several new features. Already coming with 4.5 was the link time optimisation and there are now 128 bit floats. This shall mean something to the molecular dynamics community and maybe others on this list. We also see many more and better optimisations, so everyone will profit this already very present switch to 4.6.x in sid. Phoronix kindly did a benchmark on HMMER with Pfam and MAFFT.
The downside is ... some package builds will break. Thanks to Lucas Nussbaum's tireless QA work the whole Debian archive is rebuilded regularly - so we just know which packages are affected. However, those FTBFS (fails to build from source) cause some work on our side because we need to find out the reason why some package might fail. If we do not fix it the package in question will not reach the next stable release because those issues are regarded "serious" in Debian.
Most of the time, the fixes are rather straight forward. Like, e.g. by

#including <cstddef>

or similar. Please drop comments to this post with whatever strange issue you may have run into. Particularly funny e.g. is the building of Embassy packages for which configure reports a broken gcc. After all, the change to 4.6 is nice to

send another email to upstream with a patch for them to fix the failure
and while at it, maybe fix something that you wanted to have fixed/modded/... for long, just never got around to it
report to upstream (and this blog maybe) about performance improvements experienced with the new gcc
just enjoy it silently

Gcc 4.6 is said not to ship with Ubuntu 11.04, which is unfortunate for Ubuntu but should not hamper the transition of packages. In general those gcc transitions are just enforcing stricter standard compliance of the code and the changes which need to be done in packaging will most probably not break a build with gcc 4.5.
When handling such build failures for your specific package in Debian Med please keep two things in mind:

Helping upstream in enhancing their code makes them happy to cooperate with Debian Med and they will probably suggest their users Debian as default distribution if they regard us as competent and helpful partners

The role of Debian Med inside Debian will be strengthened if we are quick in fixing our issues. Please keep in mind that the constant growth of Debian always triggers suggestions to drop packages which are not used by many users to keep the maintenance effort lower. By default Debian Med has a small user base (compared to web browsers or office suites etc.) and thus we should really make sure that everybody in Debian knows for sure that the Debian Med packaging team is usually quick in fixing their issues and do not create extra work for other people.

Friday 22 April 2011

Open Community Research: cross-institutional integrative Bioinformatics - something for Debian Med to aim for in 2012+ ?

A few days ago this blog opened with a series of observations on the multi-directional education and collaboration that comes with an active or passive participation in Debian Med. My personal ambition is to find ways to further institutionalise this constructive exchange beyond packaging. What came to my mind is that this may mean to talk more about actually doing things with our packages.

This will lead us to discussing/optimising/specifying workflows, i.e. the graph connecting data sources with tools and their outputs with other tools plus the optimisation of command line arguments and the evaluation of the findings. This sounds all very natural to me since the desire to complete a particular workflow locally is the motivation to get most packages to the distribution today. Until recently, we just did not have a way to formally talk about those workflows, except for exchanging shell scripts. This has changed with Alan's and Hajo's continued collaboration to get command line tools integrated with the workflow suite Taverna. It allows describing our executables for inputs and outputs and presents them as regular workflow elements, right next to the (today :o) ) dominant remote web services. The myExperiment.org site is a repository of (frequently nested) workflows, with all the typical user comments and ranking. To have that extended for all those bits one can achieve with various tools in Debian will be highly interesting. Admittedly, knowing about the rather limited success in uploading bits as trivial as screenshots, we need much of a positive feedback loop and should not just expect this to be accepted by the community because it could.

So, this leads us to my initial impetus: the community needs something to work on to develop itself and the technologies (like this blog) it has adopted. And this is where public data sets in. We had previously discussed the integration of data with the distribution in the context of BioMaj/getData for curated protein, structure or interaction data. But when we extend that also for some "weird stuff", maybe something novel from the more clinical branch of Debian Med or for the joint (re-?)analysis of a genome (a virus, maybe?) then I have some good confidence that the enormous heterogeneity of us as a community allows us to yield something that a regular institution's Bioinformatics service unit would find difficult to match.

So, we would apply Open Source principles to biomedical (re-)research. Beyond the further development of ourselves, this certainly has many direct benefits through our findings and indirectly because of the education it brings to of all those who are following the development online. Such shared research efforts could start any time, in principle. The anticipated deeper integration of Taverna with our distribution will allow specifying many smallish workflows as legitimate subgoals. Let's hope for some soonish additional posting with a tutorial for Taverna's external tools. With the advent of Ensembl or gbrowse in our distribution we have the sensation of some sort of "completeness" for the end users: once my genome has arrived in either, the work is perceived as done. This may be wrong or right, just filling those web interfaces with data is a challenging workflow. There is quite something to do for it all, still, and we should talk about it.

Tuesday 19 April 2011

Debian Med: individuals' expertise and their sharing of package build instructions

This is the very first post to a blog about Debian Med, a community of enthusiasts and professionals in computational biology and medical informatics. They all use the Linux distribution Debian or one of those befriended distros like Ubuntu with which Debian exchanges its packages.

The title for this post is that of an abstract just submitted to the Bioinformatics Open Source Conference (BOSC 2011) . The readers of this blog will be among the first to learn about its acceptance :) The abstract stresses that Debian Med is more than the packages contributed to Debian. It is also those packages and the individuals behind them that were only created for local use. Debian Med offers subversion and git repositories to then share that local effort, granting the technically advanced users to finish the effort or to just benefit from patches, compiler flags and the specification of build/run-time dependencies directly. This sharing is especially beneficial for a series of software packages that are available as source code but are not allowed to be redistributed as binaries - VMD comes to mind, its build instructions are here.

This blog may help with some biocomputational infotainment and insights beyond what Debian Med exchanges on its mailing list already: shall the more conventional (for us) sharing of code here be augmented with a sharing of thoughts.