Advantages of Open Source Cloud Computing Software
Open source cloud computing software can offer distinct advantages to organizations, often leveraging strong user and developer communities and aggressive release cycles. Here's a look at the current open source cloud computing landscape.
While there are a good number of commercial offerings on the market for building cloud infrastructure, before you start spending hard cash you may want to take a look at the open source options that are available. While often referred to as "alternatives" to commercial counterparts, open source cloud software is anything but. And in many cases, the open applications were the first cloud technology of their kind on the scene.
There are many reasons to turn to open source software for your cloud computing needs. Depending on the size of your business, you could see considerable savings when turning to one or more of these open applications. Additionally, you could find yourself competing with large-scale businesses that are actually running the same open source software as you, in effect evening out the playing field a bit.
But there are other reasons that might compel you to try out these offerings. Many cloud computing open source projects have larger user bases. Because of the low barrier to implement the software, there is a wider number and variety of people using it, and often a vibrant community behind the software that acts as a support system. Typically, open source projects are innovative, with aggressive release cycles that push the technology forward. In fact, users often determine the next feature release cycle based on real-world business needs.
And, open source means open access to application programming interfaces (APIs) and the open standards they are written against. More transparency in the application code base often helps move the innovation forward and increase knowledgeable community support.
Across the many cloud computing service models, such as user cloud (a.k.a. software as a service), development cloud (a.k.a. platform as a service) and systems cloud (a.k.a infrastructure as a service), there are a large and diverse number of applications to choose from and both commercial and free open source offerings. As you'll notice, many of the open projects excel in their purpose because of the large, open communities of developers committed to creating innovative software and hoping to further cloud technology. Fortunately, there are open standards and many of the open source applications interface with one another, allowing you to pick and choose your apps and build a solid, interfaced cloud computing solution for your enterprise.
Examples of these applications and solutions include Salesforce.com, Google Docs, Red Hat Network, VMware Cloud Foundry, Google AppEngine, Windows Azure, Rackspace Sites, Red Hat OpenShift, Active State Stackato, AppFog, EC2, Rackspace Cloud Files, OpenStack, CloudStack, Eucalyptus, OpenNebula and many more.
Let's take a closer look at the open source cloud computing software available today.
Open Source Hypervisors
Xen Cloud Platform (XCP)
XCP contains a subset of features of the commercial distribution XenServer from Citrix. It includes a Xen API toolstack which offers host system pool management, multi-tenancy, storage repositories, SLA support, and pre-integration of network and disk functionality (e.g. Open vSwitch). XCP can be installed via ISO similarly to XenServer with all the same drivers. While not identical to XenServer, they share much of the same code base. XCP can also be obtained and configured via XCP-XAPI packages installed under the package manager in Debian GNU/Linux and Ubuntu Linux. The ISO is CentOS 5.x based, managed locally using XAPI and supports most XenServer features. However, it is a black box style appliance that works only on CentOS, whereas the XCP-XAPI packages are easy to customize, easy to build from source, come as packages so you can assemble your own system around them. The downside is that XCP-XAPI has some feature restrictions in comparison to the ISOs similar to XenServer, and supports only a limited number of storage repository types where the ISO supports most.
The Kernel Based Virtual Machine (KVM) is an open source project written for GNU/Linux that runs on x86 hardware. With a loadable Linux kernel module (kvm.ko), it integrates with the GNU/Linux system it is installed on, including processor specific Linux kernel modules (kvm-intel.ko or kvm-amd.ko). A complete virtualization system, there are many external programs written to manage KVM images. KVM can run multiple images of GNU/Linux or Microsoft Windows systems, including access to private network cards, disks, graphics, USB and so on. While KVM is integrated with the Linux kernel (as of Linux 2.6.20 KVM is part of the mainline code) hypervisors like Xen are external, having to take control of the machine it is installed on and utilizing resource management; KVM acts as part of the system, using the Linux scheduler and memory management. Currently KVM requires QEMU (an open source emulator) to work, but as of this writing, the project is working to replace that dependency.
VirtualBox
For non-commercial ease-of-use, VirtualBox works on multiple platforms with easy installation and setup. Offering x86 and AMD64/Intel64 virtualization, it can easily be leveraged in both the home and enterprise arenas. It can run on Windows, Linux, Macintosh, and Solaris hosts, and supports host installation of Windows (NT 4.0, 2000, XP, Server 2003, Vista, Windows 7, Windows 8), DOS/Windows 3.x, Linux (2.4, 2.6 and 3.x), Solaris and OpenSolaris, OS/2, and OpenBSD. Among the compelling features, VirtualBox makes running multiple OS guests easy; your limits are primarly dictated by your system memory and CPU capability. Virtual networks can be set up using VirtualBox for lab settings like penetration testing, build farm prototyping and Beowulf cluster templates. A feature called "snapshots" allows the user to save virtual machine states and revert back to those if needed. You can also build systems, configure them and deliver the VMs for use in customer VirtualBox instances. In comparison to some other well known commercial virtualization solutions, VirtualBox has been noted one of the least memory and CPU intensive.
OpenVZ
Based upon "container" technology, OpenVZ is not a true virtualization application. It has a modified Linux kernel (meaning host systems can only be some flavor of GNU/Linux) that is tailored to support OpenVZ containers. The containers are separate entities, relying on resource management and checkpointing through the one modified Linux kernel, behaving for the most part as a normal server would, with filesystems, applications, users, groups, etc. Host memory is more flexible for OpenVZ containers in that memory not being used by one container can be shifted for use by another. Resource management in OpenVZ consists of user beancounters, disk I/O schedulers, CPU schedulers, and two-level disk quota. No reboot is needed to change these resources during runtime, which is an attractive feature for developers and testers, or folks running live systems with users engaging them. The containers are secure, isolated, and promote less conflict on the server between applications who may otherwise share libraries or directory space. OpenVZ is released under the GNU Public License (GPL) and is free software.
LXC
Like OpenVZ, LXC is a container technology, existing as a userspace interface for the containment features in the Linux kernel. These features include Kernel namespaces (ipc, uts, mount, pid, network, user), profiles from Apparmor and SELinux, Seccomp policies, chroots via pivot_root, and cgroups (control groups). Not quite a virtual machine, LXC still can provide a similar environment to a Linux install as found on virtual machines like in VirtualBox. However, there is no need for a separate kernel under LXC since it takes root in the host kernel. Like OpenVZ LXC uses the resource management and checkpointing of the host kernel. LXC consists of a variety of container templates, standard tools for managing containers, bindings for multiple languages (Ruby, Python, Go, Lua, etc), and the liblxc library (libvirt is considered an alternative library). LXC is free software, most of the code is released under the terms of the GNU LGPL license (see the LXC website for details on non-LGPL licensed components).
Open Source Cloud Appliances
Bitnami
Obtaining pre-built appliances has never been easier. Like SUSE Studio, users can download virtual machines or installers with a pre-built Bitnami stack. Applications available from Bitnami cover the spectrum from infrastructure and cloud tools, to CRM, CMS and ECM. Familiar applications include WordPress, Joomla, Drupal, Moodle, JBoss, LimeSurvey, DokuWiki, SugarCRM and ownCloud, to name only a few. Additionally, Bitnami offer a variety of stacks, including Ruby, Django, LAMP, WAMP, WAPP, MAMP, LAPP and MAPP. Bitnami appliances boast features such as being self-contained, secure, up-to-date, and build to consistent standards. Each appliance is bundled with all the libraries, databases and runtimes it requires. Each appliance is optimized for most common usage and life on the Internet. Bitnami also offers cloud services via Amazon Web Services for those who want what the appliances have to offer but don't have the resources to download and install them.
BoxGrinder
At the moment primarily a command-line built tool for the creation of appliances for both virtualization purposes and cloud deployment, BoxGrinder is a project in progress. Currently supports the creation of appliances based upon Fedora, CentOS, Red Hat Enterprise Linux (RHEL), and Scientific Linux. Appliances can be deployed currently to VMware, VirtualBox, VirtualPC, and EC2. Their delivery consists of a collection of plugins for local delivery, SFTP, S3, EBS, ElasticHosts, and OpenStack. At the moment, however, appliances can only be built on Fedora. Written in Ruby, BoxGrinder requires rubygems for package installation and management. With the project underway and additions to it in the planning, BoxGrinder is one to look at down the road, but build system limitations and appliance OS base limitations may keep you looking to other cloud appliance apps for now.
Oz
Created to allow automatic installation of operating systems, Oz takes minimal input from the user for the initial install and pushes out a completed system quickly. Oz can install the operating system, customize the operating system, and generate package manifests. There is some element of manual upkeep with Oz, however. For new operating systems and revisions, they need to be manually added. Oz must be updated regularly to support new operating system releases. As part of its functionality, Oz uses the native installation tools provided by the operating system to perform the installation. While Oz supports installation of a wide variety of operating systems, including RHEL, CentOS, Scientific Linux, Fedora, OpenSUSE, Debian, Ubuntu, Mandrake, Mandriva, FreeBSD and Windows, for each of these and their different version, Oz may or may not support one of its operations (install, customize, manifest). However, unlike many other stack installs or OS installers, Oz leave the OS installation identical to as if it had been installed on a bare-metal machine.
SUSE Studio
Widely popular since its intial release in 2009, SUSE Studio quickly made the review rounds, from LifeHacker to TuxRadar. An online Linux image creation tool initially released by Novell SUSE (now just SUSE), the website makes it easy to configure your fantasy OS, from core applications to system-level customizations, and then build the system with automated tools. RPMs can be uploaded to the build environment, or repositories added. The resulting image can be downloaded as an ISO (Live CD/DVD, Preload), virtual machine (VMware/VirtualBox/OVF/Xen), SUSE Cloud or USB image. There are a wide variety of pre-loaded images to choose from as well, both developed by SUSE and contributed by SUSE Studio users. The default images are all based upon SUSE, both openSUSE and SUSE Linux Enterprise Server. In addition to these function, SUSE Studio can upload AMI images and instantiate EC2 images via pre-existing AWS accounts, and also upload VHD images and instantiate Azure appliances via pre-existing Windows Azure accounts.
Open Source Compute Clouds (IaaS)
Apache CloudStack
Despite rumors to the contrary, Java continues to prove central to many major cloud applications. At the heart of Apache CloudStack is a host of functions written in Java including user management, multi-tenancy and account separation, network, compute and storage resource accounting, web-based management console, native API and Amazon S3/EC2 compatible API, and primary/secondary storage support. Apache CloudStack works with hosts on XenServer/XCP, KVM, Hyper-V and VMware. Used to deploy and manage large networks of virtual systems, Apache CloudStack has been chosen by many providers deploying private, public, and hybrid cloud solutions to customers. Additional features include high availability, a scalable infrastructure as a service cloud computing platform, and a significant community of users and developers who keep the technology and feature improvements moving forward.
Eucalyptus
Though currently only available on CentOS and Red Hat Enterprise Linux, Eucalyptus is already getting notice as a complete IaaS solution. Comprised of a Cloud Controller (CLC), Walrus (persistent data storage), Cluster Controller (CC), Storage Controller (SC), Node Controller (NC), and an optional VMware Broker (VB), Eucalyptus is a full-featured product. Each component is a stand-alone web service (excluding VB), with the aim of allowing Eucalyptus to provide an API for each service (language-agnostic). This Linux-based system allows users to implement private and hybrid clouds within existing infrastructure with an industry-standard, modular framework. In particular, Eucalyptus provides a virtual network overlay isolating various traffic, allowing multiple clusters to be transparent on the same Local Area Network (LAN) while maintaining data integrity. Additionally, Eucalyptus is API compatible with Amazon’s EC2, S3, IAM, ELB, Auto Scaling, and CloudWatch services, ideal for hybrid cloud implementation options.
OpenNebula
A combination of functional project and research, OpenNebula purports to be the next step in the evolution of data center virtualization. From a research perspective, the project seeks to develop advanced and adaptable virtualization data centers and enterprise clouds. Through collaboration with other open source projects and researchers in cloud computing, OpenNebula hopes to achieve stability and quality of cloud computing software, as well. The project's core values include process and technology openness, excellence across all project lifecycles, and innovation in cloud development. Regarding their actual functional product, key features of this are currently reported to be an intuitive self-service portal, automated service management catalog, administration and super user interfaces, appliance marketplace, performance and capacity management, high availability, business continuity, virtual infrastructure management, enterprise-level security, third-party tool integration and excellent product support and SLA-based commercial support directly from the developers.
OpenStack
Of all the IaaS offerings, OpenStack is one of only a couple that appear in multiple product areas of cloud computing architecture. A global project, OpenStack was founded by Rackspace and NASA, who produced a massively scalable cloud operating system, freely available under the Apache 2.0 license. OpenStack has no proprietary hardware or software requirements, and is designed to operate within both fully virtual and bare metal systems. Multiple hypervisors are supported, including KVM and XenServer, as well as container technology, including LXC. OpenStack is used anywhere from service providers deploying IaaS to its customers, to enterprise IT departments providing private cloud services to project teams and departments. OpenStack works with Hadoop for big data needs, scales vertically and horizontally to meet diverse computing needs, and offers high-performance computing (HPC) for intensive workloads. Key features include VM image caching, role based access control, VM image management, LAN management, VNC proxy via web browser, floating IP addresses, and much more.
Open Source Cloud Storage Software
GlusterFS
Using FUSE (Filesystem in Userspace) to hook itself with the VFS (Virtual File System), GlusterFS creates a clutered network filesystem written in userspace, or, outside the kernel and its privileged extensions. GlusterFS uses existing filesystems like ext3, ext4, xfs, etc. to store data. The popularity of GlusterFS comes from the accessibility of a framework that can scale, providing petabytes of data under a single mount point. GlusterFS distributes files across a collection of subvolumes and makes one large storage unit from a host of smaller ones. This can be done across volumes on a single (or several) server. Volume can be increased by adding new servers, essentially on the fly. With replicate functionality, GlusterFS provides redundancy of storage and availability.
Ceph
Ceph's technical foundation is the Reliable Autonomic Distributed Object Store (RADOS), which provides applications with object, block, and file system storage in a single unified storage cluster. With libraries giving client applications direct access to the RADOS object-based storage system, users can leverage RADOS Block Device (RBD), RADOS Gateway, as well as the Ceph filesystem. The RADOS Gateway provides Amazon S3 and OpenStack compatible interfaces to the RADOS object store. Additionally, POSIX is a key feature in Ceph. POSIX semantics drive the interface with Ceph's traditional filesystem, so applications that use POSIX-compliant filesystems can easily use Ceph's object storage system. Additional libraries allow apps written in C, C++, Java, Python and PHP to also access the Ceph object storage FS. Advanced features include partial or complete read/writes, snapshots, object level key-value mappings, and atomic transactions with features like append, truncate and clone range. Ceph is also compatible with several VM clients.
OpenStack
Among the many architectural features of OpenStack, storage is one of the foundational cloud architecture necessities. Providing scalable, redundant object storage, OpenStack uses clusters of servers and can store petabytes of data. Through this distributed storage system, OpenStack adds to its feature list another area of scalability, redundancy and durability. Written to multiple disks across the data center, data replication is managed and replication ensured. For those that are mindful of budgets, the OpenStakc storage solution can write across older, smaller drives as well as newer, faster ones. Not satisfied with OpenStack storage? OpenStack is compatible with other storage solutions like Ceph, NetApp, Nexenta, SolidFire and Zadara. Additional features include snapshots (can be restored or used to create a new storage block), scaling (add new servers to scale and replicate data across), support for block storage, self-healing, a variety of powerful management tools for usage, performance, and general reporting, including auditing.
Sheepdog
Another distributed object storage solution, Sheepdog stands by its small codebase, simplicity and ease of use. Primarily for volume and container services, Sheepdog intelligently manages disks and nodes to which it can scale out to by the thousands. Sheepdog can attach to QEMU VMs and Linux SCSI targets, also supporting snapshot, cloning and thin provisioning. It can also attach to other VMs and OS that run on baremetal hardware (iSCSI must be supported, however). Sheepdog has support for libvirt and OpenStack, can interface with HTTP Simple Storage, and has backend storage features like discard support, journaling, multi-disk on single node support, and erasure code support. With OpenStack Swift and Amazon S3 compatibility via web interface, Sheepdog can store and retrieve vast amounts of data.
Open Source Platform as a Service (PaaS)
Cloud Foundry
Developed by VMware, Cloud Foundry provides deep and varied products and services as a platform as a service (as part of Pivotal software, funded by both VMware and EMC). A large open community of Ruby developers and users support Cloud Foundry, focusing not only on the codebase but also the hosted services the PaaS provides. Among the services Cloud Foundry offers for its hosted solution are MySQL DB, VFabric Postgres, MongoDB, Redis, and RabbitMQ. A fairly straightforward model, Cloud Foundry provides mechanisms for deploying applications, designing apps for the cloud, pushing apps, using services, migrating databases, using environment variables and mapping custom domains. Additionally, Cloud Foundry offers client tools like the cf command line tool, an Eclipse plugin, and a build integration tool. Cloud Foundry also has application logging, third-party log management services integration, Splunk integration and app manifests. The project considers their key competitors right now AppScale, Heroku, OpenShift, and Google App Engine.
Cloudify
Primarily focused on automation, Cloudify automates installation, deployment, monitoring, remediation and auto-scaling of application stacks based on usage. Cloudify uses technology created by the Oasis Foundation called TOSCA (Topology and Orchestration Specification for Cloud Applications) who state their technology "works to enhance the portability of cloud applications and services. TOSCA will enable the interoperable description of application and infrastructure cloud services, the relationships between parts of the service, and the operational behavior of these services (e.g. deploy, patch, shutdown), independent of the supplier creating the service, and any particular cloud provider or hosting technology. TOSCA will also make it possible for higher-level operational behavior to be associated with cloud infrastructure management." Using TOSCA blueprints, you can specify a "recipe" that becomes your application stack template. Cloudify supports integration with OpenStack, AWS, CloudStack, Microsoft Azure and VMware.
OpenShift
OpenShift PaaS is a premier service from Red Hat; the company currently supports a private cloud version of the software, OpenShift Enterprise. OpenShift allows deployment of binaries that run on Red Hat Enterprise Linux. Supported languages include JavaScript, Ruby, Python, PHP, Perl, Java, Haskell, and .NET. Among the databases OpenShift supports are MySQL, PostgreSQL, MongoDB and Microsoft SQL Server. Some of the most popular and widely used web-application frameworks can be found under OpenShift like Rack for Ruby, WSGI for Python, PSGI for Perl and Node.js for JavaScript. Additional frameworks include Laravel, CodeIgniter, CakePHP, Ruby on Rails, Django, Perl Dancer, Flask, Sinatra, Tornado, and Web2py. To stay competitive (namely against AppScale, Heroku, Cloud Foundry, Google App Engine, Jelastic and ElasticBox), OpenShift offers features for the enterprise like accelerated application service delivery, minimized vendor lock-in, self-service and on-demand application stacks, and standardized developer workflows. The PaaS is also a polyglot, supporting a number of programming languages and frameworks, enterprise apps with Java EE6, built-in database services as well as multiple environment support (development, testing and production). Other OpenShift features include dependency and build management, continuous integration and release management, source code version management, remote SSH login to application container, IDE integration, remote debugging of applications, rich command-line tool set, a responsive web console, and much more.
Stackato
ActiveState's Stackato offers the usual faire, including a customizable app store, a web management console, activity stream and self-service. Other features you'll find in Stackato are end-to-end development, auto-configuration, centralized cluster administration, dynamic load balancing & elastic scalability, placement and availability zones, application auto-scaling and persistent file-system sharing. With an essentially self-service model, ActiveState allows for deployment in minutes rather than weeks. Being developer-driven, the Stackato team put together a great development, build and release tool. With a focus on the agile enterprise audience, time-to-market is a key driver in ActiveState's design model for Stackato. Reporting features include time to market, downtime, usage, and performance.
WSO2 Stratus
Rated by Gartner as a "visionary," WS02 Stratos touts itself as "the most complete, enterprise-grade cloud solution." It supports more core services than other available PaaS options today and is a good option for enterprises that seek to "extend the flexibility and innovation achieved from implementing heterogeneous environments on-premises, into the cloud." Among its features, WSO2 Stratos offers extensible cartridge architecture (plugin third party run-times such as PHP, MySQL, and Tomcat, support for Puppet based cartridge creation for WSO2 Carbon products, provisioning to add multi-tenant or single tenant cartridges), enhanced cloud deployment Support for multiple IaaS (fully tested on Amazon Web Services EC2, experimental support for OpenStack and vCloud, capability to support any IaaS via jclouds API, cloud-bursting to scale to multiple IaaS at the same time; available for private, public and hybrid cloud solutions), as well as easy SaaS app development (built-in support for multi-tenant and single-tenant models, support for user identity management, metering and billing). The PaaS also provides data storage with easy access, caching, and queuing, along with a SaaS app as a multi-tenant application, allowing each tenant to deploy their own customized logic alongside it. Artifact Distribution Coordinator (ADC) with support for external Git and GitHub repositories and the ability to publish application logs into a centralized location for easy monitoring are also included.
Open Source Software Defined Networking Tools
Floodlight
An enterprise-class OpenFlow Controller (OpenFlow is a open standard managed by the Open Networking Foundation), Floodlight is Apache-licensed and Java-based. Floodlight is an open SDN controller that works with virtual and physical switches that interface via the OpenFlow protocol. Additionally, it can specify the protocol for use in remote control of networking devices, such as switches, routers, virtual switches, and other access points. With OpenFlow, Floodlight can control remotely a switch's packet forwarding tables, flow table rules, forwarding or blocking of traffic, and leverage custom interfaces and scripting languages. Highlighted features include a module loading system, minimal dependencies, support for OpenFlow and non-OpenFlow networks and high performance. Floodlight also has a large community behind it and supports OpenStack.
Indigo
The open source project Indigo enables support for OpenFlow on both physical and hypervisor switches. It is also the basis of Switch Light by Big Switch Networks. Indigo provides firmware for a number of popular switches, providing access to OpenFlow technology to those switches. The technology consists of the Indigo agent which has a core set of libraries with a HAL abstraction layer for easy integration with switches for forwarding and port management capability through Indigo. There is also an abstraction layer for "hybrid" mode OpenFlow on the switch. Indigo comes with a compiler named LoxiGen, as well, that generates marshalling/unmarshalling libraries. Indigo firmware comes in both pre-built binary format or source distribution via VM. Also available is the Indigo Virtual Switch, an open source VS compatible with KVM, containing the Indigo framework with OpenFlow integration.
OpenStack Networking "Neutron"
Part of the OpenStack project, Neutron provides a "networking as a service" between interface devices like NICs managed by OpenStack services like Nova. Though part of the core of OpenStack, Neutron deserves special notice for its size and functionality as a "NaaS" product. Users can create multi-tier web application topologies, utilize advanced network capabilities like end-to-end QoS or NetFlow monitoring. Advanced network services can be plugged into OpenStack tenant networks such as LB-aaS, VPN-aaS, firewall-aaS, IDS-aaS, and data-center-interconnect-aaS. Neutron provides Horizon GUI support for Neutron L2/L3 network and subnet creation/deletion and booting VMson Neutron networks. An API is also available with which extensions can be written.
Open vSwitch
A multilayer software switch, Open vSwitch support a wide range of features including 802.1Q VLAN with trunk and access ports, NIC bonding (with and without LACP upstream), NetFlow/sFlow, QoS, GRE, GRE over IPSEC, VXLAN, and LISP tunneling, 802.1ag connectivity fault management, OpenFlow, high-performance forwarding via the Linux kernel, and a transactional configuration database. Open vSwitch can operate entirely in userspace with a kernel module, or as a kernel-based switch supporting multiple virtualization technologies including Xen/XenServer, KVM and VirtualBox. Special support exists for Citrix XenServer and Red Hat Enterprise Linux hosts. Components of Open vSwitch include a daemon 'ovs-vswitched', a database server 'ovsdb-server', a tool 'ovs-dpctl' for configuration querying and updating, and many other tools for management and monitoring.
Open Source NoSQL Databases
Apache Cassandra
Apache Cassandra is a database providing scalability, high availability and fault-tolerance on hardware, virtual systems or cloud infrastructure. With column indexing, log-structured updates, denormalized and materialized views and built-in caching, many large-scale organizations have chosen to use Cassandra (including Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and many others). Features include automatic replication to multiple nodes for fault-tolerance, avoiding single points of failure by keeping cluster nodes identical, synchronous or asynchronous replication during updates, and read/write throughput supported without downtime or interruption. Third party contract support services for Apache Cassandra are also available.
CouchDB
Specifically written for web application database needs, CouchDB lacks a pre-defined data structure, or schema. CouchDB data is stored in JSON documents that consist of name fields that can be strings, numbers, dates or ordered lists and associative maps. CouchDB supports web and mobile apps and can serve web apps directly out of CouchDB. Using JavaScript for description, CouchDB can aggregrate, join and report on database documents without affecting the underlying structure of the documents. CouchDB is fully distributed and peer-based, with servers and offline-clients that can have independent replica copies of the same database. Replication activities include conflict management, incremental and bi-directional replication, filtered replication and Master/Slave and Master/Master replication. CouchDB is written in the Erlang programming language which has built-in support for concurrency, distribution, fault tolerance, and the language and runtime are able to take advantage of newer hardware with multiple CPU cores.
HBase
Notable for running on top Hadoop Distributed File System (HDFS, Apache's Hadoop file system), Apache Hbase is distributed, scalable, secure and provides high availability. Modeled after Google's BigTable, HBase can handle massive data tables containing billions of rows, millions of columns, and utilizes storage, memory and CPU resources across multiple servers within a cluster so that the database scales horizontally. Other features include Kerberos security across tables and columns, automatic sharding, full consistency, and a scale-out architecture allowing for the addition of servers for increased capacity. HBase also features compression, in-memory operation and Bloom filters on a per-column basis. MapReduce jobs run in Hadoop and can use HBAse tables for input and output.
Hypertable
Modeled after Bigtable (the massively scalable database from Google), Hypertable has a flattened out table structure and employs key-prefix and block data compression. In comparison to a relational database, it has little resemblance, save that it represents data as tables of information in rows and columns. Row keys in Hypertable are UTF-8 strings and there is no support for data types, joins or transactions. Stored as massive tables of data, information in Hypertable is sorted by the row key, which is the sole and primary key. Other features of Hypertable include cell versioning (timestamps), column qualifiers, namespaces (like a directory hierarchy in a filesystem), and "realtime" scaling when additional servers are added to the RangeServer processes.
MongoDB
The open source document database MongoDB is written in C++ and is a NoSQL DB. Features include document-oriented storage (JSON-style documents, dynamic schemas), full index support (on any attribute), replication and high availability (across LANs and WANs for scale), auto-sharding (scale horizontally), querying, rapid in-place updates and map/reduce. MongoDB also has flexible aggregation and data processing, GridFS, (store files of any size), MongoDB management service and professional support. One advantages of MongoDB is embedded documents and arrays, which reduce the need for expensive joins. Additionally, dynamic schema supports fluent polymorphism and documents correspond to native data types in many programming languages.
Redis
Written in ANSI C, Redis is a networked, in-memory, key-value data store. A popular key-value data store, languages that already have bindings for it include ActionScript, C, C++, C#, Clojure, Common Lisp, Dart, Erlang, Go, Haskell, Haxe, Io, Java, JavaScript (Node.js), Lua, Objective-C, Perl, PHP, Pure Data, Python, R, Ruby, Scala, Smalltalk and Tcl. Key features include a dictionary data model key-mapped to values, persistence through storage of the entire dataset in memory, master-slave replication and better performance via in-memory storage. Redis also offers alpha stage clustering, ease-of-use in IaaS and PaaS platforms, and the ability to use Radis as a managed service without having to launch the VM instance of the database.
Riak
A combination cloud storage and distributed database solution (Riak CS/Riak), this database is geared toward providing cloud storage to any scale in both private and public clouds. Riak has Amazon S3-API compatibility, per-tenant visibility (accessible over network I/O), metadata and large object support, multi-datacenter replication, and more. Data in Riak is private by default and Access Control Lists are available to further refine data visibility. However, encryption of "data at rest" is not currently supported and there is no compression at ingest (can be done by external applications, however).
Open Source Provisioning Tools
Axemblr Provisionr
This Apache incubator project is primarily a virtual machine pool manager across multiple clouds. As a simple service, Axemblr Provisionr can manage pools of 10s or 100s of virtual machines. The project is focused on semi-automated workflows, cloud portability, and configuration management. In order to achieve cloud portability, the APIs are hidden and assumptions made, such as the platform running a specific OS with assumed pre-installed packages and libraries, DNS settings and network configuration. The project's external dependencies currently all have Apache compatible licenses such as Activiti (Apache 2.0), AWS SDK (Apache 2.0), jclouds (Apache 2.0), and Google Guava (Apache 2.0). Axemblr is currently used to deploy Hadoop clusters on-demand in-house for testing and QA.
Cobbler
The mantra at Cobbler is reduce, reuse, recycle. As a Linux installation that supports rapid configuration of network environments, Cobbler relies on an extensive library of templates for configuration and management of services like DNS and DHCP. To maximize code reuse, all the response files such as kickstart, preseed, etc., are also templatized. In addition to the template library, Cobbler has a vast collection of snippets for embedding in templates. The goal is to support ease-of-use for admins and to take away time from writing new code and allowing admins to manage and respond. Written in Python, the application weighs in at just about 15k lines of code, surprisingly small for an enterprise application. However, with a strong feature set and configuration options, and the ability to link with configuration management apps like Puppet, Cobbler may be closer to an admin tool than some of the other larger provisioning tools.
JuJu
Ubuntu's JuJu has a cute theme based upon "charms" available through a charm store. Working JuJu on your cloud is as simple as leveraging either the GUI or command-line interface to define, configure, deploy, manage, monitor and scale out your services to any public or private cloud. With charms for most every need, plugging in new configuration functionality is as easy as searching the Ubuntu charm store and then dragging the charms you want onto a "canvas" and configuring your new functionality prior to deployment. Charms can connect to each other via services, pre-built to know what charms are compatible. JuJu provides services monitoring, alerts, and added intelligence through Landscape, Ubuntu's systems management tool for inspection, restart and update of running services.
Salt Cloud
Though touted as simple to use, SaltStack is similar to OpenStack in its size, diversity of features and configuration possibilities. This system and configuration management software has highly configurable provisioning features for most infratructure, cloud and DevOps environments. Some of Salt Cloud's enterprise functionality includes, via its infrastructure automation and cloud orchestration, push and pull remote execution, overstate for data center workflow and task orchestration, application provisioning and continuous deployment, hybrid cloud provisioning and management, and parallel management. SaltStack identifies consistency and simplicity as selling points, especially through (per Salt) the lowest administrative and operational costs, a single, self-contained platform and no programming required. It also offers an efficient configuration management system and no proprietary administration requirements, coding or languages. It has a single user interface and command line, a single, common user experience, and easy implementation and administration.
Dell Crowbar
The sore thumb in the crowd is Dell's Crowbar. First open sourced at OSCON in 2011, Dell's cloud computing framework seems to still be relevant. Crowbar allows users to streamline configuration, deployment and use of enterprise hardware in the cloud. Dell Crowbar users can move hardware online, install and configure apps quickly and efficiently and install an OS without waiting on staff to rack and configure servers. Crowbar facilitates fast recovery after hardware failures, and the ability to install and configure apps and OS one-time-only. Now with Hadoop added, multi-OS support has been written in, and a modularization concept developed called "barclamps" for packaging individual layers of deployment infrastructure. These barclamps allow other apps to plugin to the Crowbar framework, act as independent modules with a unique lifecyle, can serve up services other barclamps can make use of, and much more.
Open Source Configuration Management Tools
Ansible
Ansible is a model-driven configuration management tool that leverages SSH to improve security and simplify management. In addition to configuration management, it is capable of automating app deployment (even multi-tier deployment), workflow orchestration and cloud provisioning, hence the company likes the tool to be categorized as an "orchestration engine." Ansible is built on five design principles including ease of use (doesn't require writing scripts or custom code), low learning curve (both for sysadmins and developers), comprehensive automation (allowing you to automate almost anything in your environment), efficiency (since it runs on OpenSSH it doesn't rely on memory or CPU resources), and security (it is inherently more secure because it doesn't require an agent, additional ports or root level daemons). As many other open source projects, Ansible has a paid product that comes in the form of a web UI called Ansible Tower.
CFEngine
One of the earliest full-featured configuration management systems out there, CFEngine has gone through several iterations and maintained relevance as OS have gone from the local data center to the cloud. At the heart of the infrastructure automation framework, CFEngine is also a modeling and monitoring compliance engine, capable of sitting on a small footprint. As recommended by CFEngine, steps toward identifying an initial desired state include: 1) model the desired state of your environment; 2) simulate configuration changes before committing them; 3) confirm the desired state and set for automatic self-healing; 4) collect reports on the differences between actual and desired states. CFEngine has a library of reusable data-driven models that will help users model their desired states. These infrastructure patterns are designed to be reusable across the Enterprise.
Chef
Offered as both an open source and enterprise product, Chef is a powerful tool for full IT infrastructure configuration management. With open source Chef at the heart of both offerings, shared features include a flexible and scalable automation platform, access to 800+ reusable cookbooks and integration with leading cloud providers. Chef also offers enterprise platform support, including Windows and Solaris, and allows you to create, bootstrap and manage OpenStack clouds. It has easy installation with 'one-click' Omnibus Installer, automatic system discovery with Ohai, text-based search capabilities and multiple environment support. Other notable features inclyde the "Knife" command line interface, "Dry Run" mode for testing potential changes, and the ability to manage 10,000+ nodes on a single Chef server. Features only available in the enterprise version of Chef include availablility as a hosted service, enhanced management console, centralized activity and resource reporting, as well as "Push" command and control client runs. Multi-tenancy, role-based access control (RBAC), high availability installation support and verification, along with centralized authentication using LDAP or Active Directory are included with Chef enterprise.
Puppet
What started out as a popular DevOps tool has quickly become a movement. Written in Ruby, like Chef, Puppet also comes in both an open source and enterprise version. However, where Chef has a healthy offering of features across both open source and enterprise versions, Puppet has placed the majority of its feature set into enterprise status. Features that the open source version comes with include provisioning (Amazon EC2, Google Compute Engine), configuration management (operating systems and applications) plus 2,000+ pre-built configurations on Puppet Forge. Considerably more features are available for the enterprise version, including the open source features plus graphical user interface, event inspector (visualize infrastructure changes), supported modules, and provisioning (VMware VMs). Configuration management (discovery, user accounts), orchestration, task automation, role-based access control (with external authentication support) are also included. Puppet enterprise has a unified cross-platform installer of all components and support.
Salt
As part of a larger, enterprise ready application, the configuration management piece of Salt is as robust and feature-full as would be expected. Built upon the remote execution core, execution of the system occurs on "minions" which receives commands from the central Salt master and replies with the results of said commands. Salt support simultaneous configuration of tens of thousands of hosts. Based upon host "states", no programming is required to write the configuration files, which are small and easy to understand, that help identify the state of each host. Additionally, for those who do program, or admins who want to have more control and familiarity with their configuration files, any language can be used to render the configurations.