by Antonis Tsavdaris
This article describes how to build a Cassandra single-rack database cluster on Oracle Solaris 11.3 and extend its overall availability with LUN mirroring and IP network multipathing.
Cassandra database is a popular distributed database management system from Apache Foundation. It is highly scalable and comes with a master-less notion, in that there isn't a primary node to which other nodes are subservient. Every node in the cluster is equal and any node can service any request.
Oracle Solaris 11 is an enterprise-class operating system known for its reliability, availability, and serviceability (RAS) features. Its wealth of integrated features helps administrators build redundancy into every part of the system they deem critical, including the network, storage, and so on.
This how-to article describes how to build a Cassandra single-rack database cluster on Oracle Solaris 11.3 and extend its overall availability with LUN mirroring and IP network multipathing (IPMP). LUN mirroring will provide extended availability at the storage level and IPMP will add redundancy to the network.
In this scenario, the one-rack cluster is composed of six Oracle Solaris server instances. Three of them—dbnode1, dbnode2, and dbnode3—will be the database nodes and the other three—stgnode1, stgnode2, and stgnode3—will provide highly available storage. The highly available storage will be constructed from nine LUNs, three in each storage node.
At the end of the construction, the one-rack cluster will have a fully operational database even if two of the storage nodes are not available. Furthermore, the networks—the public network and the iSCSI network—will be immune to hardware failures through IPMP groups consisting of an active and a standby network card.
Cluster Topology
All servers have the Oracle Solaris 11.3 operating system installed. Table 1 depicts the cluster architecture.
In reality the Cassandra binaries as well as the data will reside on the storage nodes. The database nodes will serve the running instances.
Table 1. Oracle Solaris servers and their role in the cluster.
| Node Name | Role in the Cluster | Contains |
| dbnode1 | Database node | Running instance |
| dbnode2 | Database node | Running instance |
| dbnode3 | Database node | Running instance |
| stgnode1 | Storage node | Binaries and data |
| stgnode2 | Storage node | Binaries and data |
| stgnode3 | Storage node | Binaries and data |
Network Interface Cards
As shown in Table 2, every server in the cluster has four network interface cards (NICs) installed; net0 through net3 will be named. Redundancy is required at the network level and this will be provided by IPMP groups. IP multipathing requires that the DefaultFixed network profile be activated and static IP addresses be assigned to every network interface.
Table 2. NICs and IPMP group configuration.
| Node Name | NIC | Primary/Standby NIC | IP/Subnet | IPMP Group Name | IPMP IP Address | Role |
| dbnode1 | net0 | primary | 192.168.2.10/24 | IPMP0 | 192.168.2.22/24 | Public network |
| net1 | standby | 192.168.2.11/24 |
| net2 | primary | 10.0.1.1/27 | IPMP1 | 10.0.1.13/27 | iSCSI initiator |
| net3 | standby | 10.0.1.2/27 |
| dbnode2 | net0 | primary | 192.168.2.12/24 | IPMP2 | 192.168.2.23/24 | Public network |
| net1 | standby | 192.168.2.13/24 |
| net2 | primary | 10.0.1.3/27 | IPMP3 | 10.0.1.14/27 | iSCSI initiator |
| net3 | standby | 10.0.1.4/27 |
| dbnode3 | net0 | primary | 192.168.2.14/24 | IPMP4 | 192.168.2.24/24 | Public network |
| net1 | standby | 192.168.2.15/24 |
| net2 | primary | 10.0.1.5/27 | IPMP5 | 10.0.1.15/27 | iSCSI initiator |
| net3 | standby | 10.0.1.6/27 |
| stgnode1 | net0 | primary | 192.168.2.16/24 | IPMP6 | 192.168.2.25/24 | Public network |
| net1 | standby | 192.168.2.17/24 |
| net2 | primary | 10.0.1.7/27 | IPMP7 | 10.0.1.16/27 | iSCSI target |
| net3 | standby | 10.0.1.8/27 |
| stgnode2 | net0 | primary | 192.168.2.18/24 | IPMP8 | 192.168.2.26/24 | Public network |
| net1 | standby | 192.168.2.19/24 |
| net2 | primary | 10.0.1.9/27 | IPMP9 | 10.0.1.17/27 | iSCSI target |
| net3 | standby | 10.0.1.10/27 |
| stgnode3 | net0 | primary | 192.168.2.20/24 | IPMP10 | 192.168.2.27/24 | Public network |
| net1 | standby | 192.168.2.21/24 |
| net2 | primary | 10.0.1.11/27 | IPMP11 | 10.0.1.18/27 | iSCSI target |
| net3 | standby | 10.0.1.12/27 |
First, ensure that the network service is up and running. Then check whether the network profile is set to DefaultFixed.
root@dbnode1:~# svcs network/physical
STATE STIME FMRI
online 1:25:45 svc:/network/physical:upgrade
online 1:25:51 svc:/network/physical:default
root@dbnode1:~# netadm list
TYPE PROFILE STATE
ncp Automatic disabled
ncp DefaultFixed online
loc DefaultFixed online
loc Automatic offline
loc NoNet offline
Because the network profile is set to DefaultFixed, review the network interfaces and the data link layer.
root@dbnode1:~# dladm show-physLINK MEDIA STATE SPEED DUPLEX DEVICE net0 Ethernet unknown 1000 full e1000g0 net1 Ethernet unknown 1000 full e1000g1 net3 Ethernet unknown 1000 full e1000g3 net2 Ethernet unknown 1000 full e1000g2
Create the IP interface for net0 and then configure a static IPv4 address.
root@dbnode1:~# ipadm create-ip net0
root@dbnode1:~# ipadm create-addr -T static -a 192.168.2.10/24 net0/v4
root@dbnode1:~# ipadm show-addr
ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
net0/v4 static ok 192.168.2.10/24
lo0/v6 static ok ::1/128
Following this, create the IP interfaces and assign the relevant IP addresses and subnets for each of the NICs, net0–net3, for each of the servers according to Table 2.
Note: There is an exceptional article by Andrew Walton on how to configure an Oracle Solaris network along with making it internet-facing: "How to Get Started Configuring Your Network in Oracle Solaris 11."
IPMP Groups
After the NICs have been configured and the IP addresses have been assigned, IPMP groups can be configured as well. IPMP is a great way to group separate physical network interfaces and, thus, provide physical interface failure detection, network access failover, and network load spreading. IPMP groups will be made of two NICs in an active/standby configuration. So, when an interface that is a member of an IPMP group is brought down for maintenance or when a NIC fails due to a mechanical error, a failover process will take place; the remaining NIC and related IP interface will step in to ensure that the node is not segregated from the cluster.
According to the planned scenario, two IPMP groups are going to be created in each server, one for every two NICs configured earlier. Each IPMP group will have its own IP interface, and one of the underlying NICs will be active, while the other will remain a standby. Table 2 summarizes the IPMP group configurations that must be completed on each node.
First, create the IPMP group IPMP0. Then, bind interfaces net0 and net1 to this group and create an IP address for the group.
root@dbnode1:~# ipadm create-ipmp ipmp0root@dbnode1:~# ipadm add-ipmp -i net0 -i net1 ipmp0root@dbnode1:~# ipadm create-addr -T static -a 192.168.2.22/24 ipmp0ipmp0/v4
Now that IPMP0 has been created successfully, declare net1 as the standby interface.
root@dbnode1:~# ipadm set-ifprop -p standby=on -m ip net1root@dbnode1:~# ipmpstat -gGROUP GROUPNAME STATE FDT INTERFACES ipmp0 ipmp0 ok 10.00s net0 (net1)
The ipmpstat
command reports that the IPMP0 group has been built successfully and that it operates over two NICs, net0 and net1. The parentheses denote a standby interface.
Follow the above-mentioned approach to build the IPMP groups for the rest of the servers in the cluster, as shown in Table 2.
Local Storage
As shown in Table 3, each of the storage servers has nine 10 GB additional disks upon which zpools are to be created. They are to be built with a RAID 1 and hot-spare configuration. Following this, ZFS file systems and LUNs can be constructed.
Table 3. Additional disk storage configuration.
| Node Name | ZFS Pool Name | Disk Name | Size | Role in Mirror | ZFS File System |
| stgnode1 | zpool1 | c1t2d0 | 10 GB | member | zfslun1 |
| c1t3d0 | 10 GB | member |
| c1t4d0 | 10 GB | spare |
| zpool2 | c1t5d0 | 10 GB | member | zfslun2 |
| c1t6d0 | 10 GB | member |
| c1t7d0 | 10 GB | spare |
| zpool3 | c1t8d0 | 10 GB | member | zfslun3 |
| c1t9d0 | 10 GB | member |
| c1t10d0 | 10 GB | spare |
| stgnode2 | zpool4 | c1t2d0 | 10 GB | member | zfslun4 |
| c1t3d0 | 10 GB | member |
| c1t4d0 | 10 GB | spare |
| zpool5 | c1t5d0 | 10 GB | member | zfslun5 |
| c1t6d0 | 10 GB | member |
| c1t7d0 | 10 GB | spare |
| zpool6 | c1t8d0 | 10 GB | member | zfslun6 |
| c1t9d0 | 10 GB | member |
| c1t10d0 | 10 GB | spare |
| stgnode3 | zpool7 | c1t2d0 | 10 GB | member | zfslun7 |
| c1t3d0 | 10 GB | member |
| c1t4d0 | 10 GB | spare |
| zpool8 | c1t5d0 | 10 GB | member | zfslun8 |
| c1t6d0 | 10 GB | member |
| c1t7d0 | 10 GB | spare |
| zpool9 | c1t8d0 | 10 GB | member | zfslun9 |
| c1t9d0 | 10 GB | member |
| c1t10d0 | 10 GB | spare |
Starting with stgnode1, run the format
command, which reports the additional, unconfigured disks.
root@stgnode1:~# formatSearching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB> /pci@0,0/pci8086,2829@d/disk@0,0 1. c1t2d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@2,0 2. c1t3d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@3,0 3. c1t4d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@4,0 4. c1t5d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@5,0 5. c1t6d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@6,0 6. c1t7d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@7,0 7. c1t8d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@8,0 8. c1t9d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@9,0 9. c1t10d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@a,0 Specify disk (enter its number): ^Croot@stgnode1:~#
Create the zpools zpool1, zpool2, and zpool3 in a RAID 1 with hot-spare configuration.
root@stgnode1:~# zpool create zpool1 mirror c1t2d0 c1t3d0 spare c1t4d0
root@stgnode1:~# zpool status zpool1
pool: zpool1
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zpool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
spares
c1t4d0 AVAIL
errors: No known data errors
root@stgnode1:~# zpool create zpool2 mirror c1t5d0 c1t6d0 spare c1t7d0
root@stgnode1:~# zpool status zpool2
pool: zpool2
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
spares
c1t7d0 AVAIL
errors: No known data errors
root@stgnode1:~# zpool create zpool3 mirror c1t8d0 c1t9d0 spare c1t10d0
root@stgnode1:~# zpool status zpool3
pool: zpool3
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zpool3 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t8d0 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
spares
c1t10d0 AVAIL
errors: No known data errors
Running the format
command again shows that the disks have been formatted.
root@stgnode1:~# formatSearching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB> /pci@0,0/pci8086,2829@d/disk@0,0 1. c1t2d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@2,0 2. c1t3d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@3,0 3. c1t4d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@4,0 4. c1t5d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@5,0 5. c1t6d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@6,0 6. c1t7d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@7,0 7. c1t8d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@8,0 8. c1t9d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@9,0 9. c1t10d0 <ATA-VBOX HARDDISK-1.0-10.00GB> /pci@0,0/pci8086,2829@d/disk@a,0 Specify disk (enter its number): ^C
Use the zpool list
command to get a report on the newly created ZFS pools.
root@stgnode1:~# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 19.6G 8.01G 11.6G 40% 1.00x ONLINE -
zpool1 9.94G 88K 9.94G 0% 1.00x ONLINE -
zpool2 9.94G 88K 9.94G 0% 1.00x ONLINE -
zpool3 9.94G 88K 9.94G 0% 1.00x ONLINE -
Build ZFS file systems on the ZFS pools.
root@stgnode1:~# zfs create -V 8g zpool1/zfslun1root@stgnode1:~# zfs create -V 8g zpool2/zfslun2root@stgnode1:~# zfs create -V 8g zpool3/zfslun3
Use the zfs list
command to get a report on the newly created ZFS file systems.
root@stgnode1:~# zfs list -r /zpool*
NAME USED AVAIL REFER MOUNTPOINT
zpool1 8.25G 1.53G 31K /zpool1
zpool1/zfslun1 8.25G 9.78G 16K -
zpool2 8.25G 1.53G 31K /zpool2
zpool2/zfslun2 8.25G 9.78G 16K -
zpool3 8.25G 1.53G 31K /zpool3
zpool3/zfslun3 8.25G 9.78G 16K -
Perform the same work on the second and third storage nodes.
iSCSI Targets
As shown in Table 4, three pools are to be further constructed. They are to be mirrored across the network with a hot-spare configuration. ZFS pool datapool1 will be constructed from host dbnode1 by LUNs c0t600144F03B268F00000055F33BB10001d0, c0t600144F06A174000000055F5D8F50001d0, and c0t600144F0BBB5C300000055F5DB370001d0, each coming from a different storage node.
Similarly, ZFS pool datapool2 will be constructed from host dbnode2 by LUNs c0t600144F03B268F00000055F33BCC0002d0, c0t600144F06A174000000055F5D90D0002d0, and c0t600144F0BBB5C300000055F5DB4D0002d0, each coming from a different storage node.
Finally, pool datapool3 will be constructed from host dbnode3 by LUNs c0t600144F03B268F00000055F33BFE0003d0, c0t600144F06A174000000055F5D9350003d0, and c0t600144F0BBB5C300000055F5DB690003d0.
Table 4. Structure and constituents of the two LUN mirrors.
| Cross-Platform ZFS Pool | Node Name | ZFS File System | LUN | ZFS File System |
| datapool1 | stgnode1 | zfslun1 | c0t600144F03B268F00000055F33BB10001d0 | /datapool1/zfsnode1 |
| stgnode2 | zfslun4 | c0t600144F06A174000000055F5D8F50001d0 |
| stgnode3 | zfslun7 | c0t600144F0BBB5C300000055F5DB370001d0 |
| datapool2 | stgnode1 | zfslun2 | c0t600144F03B268F00000055F33BCC0002d0 | /datapool2/zfsnode2 |
| stgnode2 | zfslun5 | c0t600144F06A174000000055F5D90D0002d0 |
| stgnode3 | zfslun8 | c0t600144F0BBB5C300000055F5DB4D0002d0 |
| datapool3 | stgnode1 | zfslun3 | c0t600144F03B268F00000055F33BFE0003d0 | /datapool3/zfsnode3 |
| stgnode2 | zfslun6 | c0t600144F06A174000000055F5D9350003d0 |
| stgnode3 | zfslun9 | c0t600144F0BBB5C300000055F5DB690003d0 |
In order to be able to create iSCSI targets and LUNs, the storage server group of packages must be installed on each of the storage servers.
root@stgnode1:~# pkg install storage-server
Packages to install: 21
Services to change: 1
Create boot environment: No
Create backup boot environment: Yes
DOWNLOAD PKGS FILES XFER (MB) SPEED
Completed 21/21 3644/3644 111.6/111.6 586k/s
PHASE ITEMS
Installing new actions 4640/4640
Updating package state database Done
Updating package cache 0/0
Updating image state Done
Creating fast lookup database Done
Updating package cache 1/1
Verify that the group of packages has been installed by reviewing the output of the pkg info
command, as follows:
root@stgnode1:~# pkg info storage-server
Name: group/feature/storage-server
Summary: Multi protocol storage server group package
Category: Drivers/Storage (org.opensolaris.category.2008)
Meta Packages/Group Packages (org.opensolaris.category.2008)
State: Installed
Publisher: solaris
Version: 0.5.11
Build Release: 5.11
Branch: 0.175.3.0.0.25.0
Packaging Date: June 21, 2015 10:57:56 PM
Size: 5.46 kB
FMRI: pkg://solaris/group/feature/storage-server@0.5.11,5.11-0.175.3.0.0.25.0:20150621T225756Z
Perform the same action on the second and third storage nodes.
Enable the Oracle Solaris Common Multiprotocol SCSI TARget (COMSTAR) SCSI Target Mode Framework (STMF) service and verify that it is online. Then, create logical units for all the ZFS LUNs from the storage nodes on which they were created. Start from stgnode1.
root@stgnode1:~# svcadm enable stmfroot@stgnode1:~# svcs stmfSTATE STIME FMRI online 22:48:39 svc:/system/stmf:default root@stgnode1:~# stmfadm create-lu /dev/zvol/rdsk/zpool1/zfslun1Logical unit created: 600144F03B268F00000055F33BB10001 root@stgnode1:~# stmfadm create-lu /dev/zvol/rdsk/zpool2/zfslun2Logical unit created: 600144F03B268F00000055F33BCC0002 root@stgnode1:~# stmfadm create-lu /dev/zvol/rdsk/zpool3/zfslun3Logical unit created: 600144F03B268F00000055F33BFE0003
Confirm that the LUNs have been created successfully.
root@stgnode1:~# stmfadm list-lu
LU Name: 600144F03B268F00000055F33BB10001
LU Name: 600144F03B268F00000055F33BCC0002
LU Name: 600144F03B268F00000055F33BFE0003
Create the LUN view for each of the LUNs and verify the LUN configuration.
root@stgnode1:~# stmfadm add-view 600144F03B268F00000055F33BB10001root@stgnode1:~# stmfadm add-view 600144F03B268F00000055F33BCC0002root@stgnode1:~# stmfadm add-view 600144F03B268F00000055F33BFE0003root@stgnode1:~# stmfadm list-view -l 600144F03B268F00000055F33BB10001View Entry: 0 Host group : All Target Group : All LUN : Auto root@stgnode1:~# stmfadm list-view -l 600144F03B268F00000055F33BCC0002View Entry: 0 Host group : All Target Group : All LUN : Auto root@stgnode1:~# stmfadm list-view -l 600144F03B268F00000055F33BFE0003View Entry: 0 Host group : All Target Group : All LUN : Auto
Enable the iSCSI target service on the first storage node and verify it is online.
root@stgnode1:~# svcadm enable -r svc:/network/iscsi/target:defaultroot@stgnode1:~# svcs iscsi/targetSTATE STIME FMRI online 22:53:44 svc:/network/iscsi/target:default
Create the iSCSI target and list it:
root@stgnode1:~# itadm create-targetTarget iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4 successfully created
Verify that the target has been created.
root@stgnode1:~# itadm list-target -vTARGET NAME STATE SESSIONS iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4 online 0 alias: - auth: none (defaults) targetchapuser: - targetchapsecret: unset tpg-tags: default
Follow the same steps to create logical units for the rest of the ZFS LUNs and enable the iSCSI target on the second and third storage server.
After the iSCSI targets have been successfully created, the iSCSI initiators must be created on the database nodes.
Enable the iSCSI initiator service.
root@dbnode1:~# svcadm enable network/iscsi/initiator
Configure the targets to be statically discovered. The initiator will discover targets from all three storage servers.
root@dbnode1:~# iscsiadm add static-config \iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4,10.0.1.16root@dbnode1:~# iscsiadm add static-config \iqn.1986-03.com.sun:02:ae65e6de-dfb1-4a77-9940-dabf68709f5d,10.0.1.17root@dbnode1:~# iscsiadm add static-config \iqn.1986-03.com.sun:02:f4e68b9d-26ca-484a-8d85-d2c8275da0eb,10.0.1.18
Verify the configuration with the iscsiadm list
command.
root@dbnode1:~# iscsiadm list static-configStatic Configuration Target: iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4,10.0.1.16:3260 Static Configuration Target: iqn.1986-03.com.sun:02:ae65e6de-dfb1-4a77-9940-dabf68709f5d,10.0.1.17:3260 Static Configuration Target: iqn.1986-03.com.sun:02:f4e68b9d-26ca-484a-8d85-d2c8275da0eb,10.0.1.18:3260
Enable the static target discovery method.
root@dbnode1:~# iscsiadm modify discovery --static enable
Perform the same actions to configure the iSCSI initiator on dbnode2 and dbnode3 and enable the static target discovery method.
LUN Mirroring and Storage
From the first database node (dbnode1) verify the available disks. Nine LUNs should be available.
root@dbnode1:~# formatSearching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t600144F0BBB5C300000055F5DB4D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f0bbb5c300000055f5db4d0002 1. c0t600144F0BBB5C300000055F5DB370001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f0bbb5c300000055f5db370001 2. c0t600144F0BBB5C300000055F5DB690003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f0bbb5c300000055f5db690003 3. c0t600144F03B268F00000055F33BB10001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f03b268f00000055f33bb10001 4. c0t600144F03B268F00000055F33BCC0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f03b268f00000055f33bcc0002 5. c0t600144F03B268F00000055F33BFE0003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f03b268f00000055f33bfe0003 6. c0t600144F06A174000000055F5D8F50001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f06a174000000055f5d8f50001 7. c0t600144F06A174000000055F5D90D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f06a174000000055f5d90d0002 8. c0t600144F06A174000000055F5D9350003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32> /scsi_vhci/disk@g600144f06a174000000055f5d9350003 9. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB> /pci@0,0/pci8086,2829@d/disk@0,0 Specify disk (enter its number): ^C
Build the first ZFS pool from LUNs c0t600144F03B268F00000055F33BB10001d0, c0t600144F06A174000000055F5D8F50001d0, and c0t600144F0BBB5C300000055F5DB370001d0. These all come from different storage servers to ensure the storage has high availability.
root@dbnode1:~# zpool create datapool1 mirror c0t600144F03B268F00000055F33BB10001d0 \c0t600144F06A174000000055F5D8F50001d0 spare c0t600144F0BBB5C300000055F5DB370001d0
Create the zfsnode1 ZFS file system on the zpool.
root@dbnode1:~# zfs create datapool1/zfsnode1
root@dbnode1:~# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool1 7.94G 128K 7.94G 0% 1.00x ONLINE -
rpool 19.6G 7.53G 12.1G 38% 1.00x ONLINE -
Verify the ZFS creation recursively.
root@dbnode1:~# zfs list -r datapool1NAME USED AVAIL REFER MOUNTPOINT datapool1 128K 7.81G 32K /datapool1 datapool1/zfsnode1 31K 7.81G 31K /datapool1/zfsnode1
From the second database node (dbnode2), execute the format
utility to verify the available disks. Check that three of the LUNs have been formatted.
root@dbnode2:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t600144F0BBB5C300000055F5DB4D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>
/scsi_vhci/disk@g600144f0bbb5c300000055f5db4d0002
1. c0t600144F0BBB5C300000055F5DB370001d0 <SUN-COMSTAR-1.0-8.00GB>
/scsi_vhci/disk@g600144f0bbb5c300000055f5db370001
2. c0t600144F0BBB5C300000055F5DB690003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>
/scsi_vhci/disk@g600144f0bbb5c300000055f5db690003
3. c0t600144F03B268F00000055F33BB10001d0 <SUN-COMSTAR-1.0-8.00GB>
/scsi_vhci/disk@g600144f03b268f00000055f33bb10001
4. c0t600144F03B268F00000055F33BCC0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>
/scsi_vhci/disk@g600144f03b268f00000055f33bcc0002
5. c0t600144F03B268F00000055F33BFE0003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>
/scsi_vhci/disk@g600144f03b268f00000055f33bfe0003
6. c0t600144F06A174000000055F5D8F50001d0 <SUN-COMSTAR-1.0-8.00GB>
/scsi_vhci/disk@g600144f06a174000000055f5d8f50001
7. c0t600144F06A174000000055F5D90D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>
/scsi_vhci/disk@g600144f06a174000000055f5d90d0002
8. c0t600144F06A174000000055F5D9350003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>
/scsi_vhci/disk@g600144f06a174000000055f5d9350003
9. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>
/pci@0,0/pci8086,2829@d/disk@0,0
Specify disk (enter its number): ^C
Build the rest of the ZFS pools from the remaining available LUNs, as shown in Table 4.
Database Installation and Configuration
Before we can build Cassandra on the database nodes, Apache Ant must be installed. Apache Ant is a tool for building Java applications. Because Ant requires Java in order to run, Java Development Kit 8 (JDK 8) must be installed also.
Use the pkg
utility to install Ant.
root@dbnode1:~# pkg install ant
Packages to install: 1
Create boot environment: No
Create backup boot environment: No
DOWNLOAD PKGS FILES XFER (MB) SPEED
Completed 1/1 1594/1594 7.6/7.6 216k/s
PHASE ITEMS
Installing new actions 1617/1617
Updating package state database Done
Updating package cache 0/0
Updating image state Done
Creating fast lookup database Done
Updating package cache 1/1
root@dbnode1:~# pkg info ant
Name: developer/build/ant
Summary: Apache Ant
Description: Apache Ant is a Java-based build tool
Category: Development/Distribution Tools
State: Installed
Publisher: solaris
Version: 1.9.3
Build Release: 5.11
Branch: 0.175.3.0.0.25.3
Packaging Date: June 21, 2015 11:51:03 PM
Size: 35.66 MB
FMRI: pkg://solaris/developer/build/ant@1.9.3,5.11-0.175.3.0.0.25.3:20150621T235103Z
Install the Java Development Kit.
root@dbnode1:~# pkg install jdk-8
Packages to install: 2
Create boot environment: No
Create backup boot environment: No
DOWNLOAD PKGS FILES XFER (MB) SPEED
Completed 2/2 625/625 46.3/46.3 274k/s
PHASE ITEMS
Installing new actions 735/735
Updating package state database Done
Updating package cache 0/0
Updating image state Done
Creating fast lookup database Done
Updating package cache 1/1
Verify that JDK 8 is on the database node.
root@dbnode1:~# java -versionjava version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
On all the database nodes, download the source code for Cassandra version 2.1.9 (apache-cassandra-2.1.9-src.tar.gz
) from http://cassandra.apache.org/, and install the software as follows.
Unzip the Apache source code and place it into the relevant /datapoolx
file system. Create the db_files
directory where data and log files are to reside.
root@dbnode1:~# cd Downloadsroot@dbnode1:~/Downloads# ls apache-cassandra-2.1.9-src.tar.gz root@dbnode1:~/Downloads# tar -zxvf apache-cassandra-2.1.9-src.tar.gzroot@dbnode1:~/Downloads# mv apache-cassandra-2.1.9-src cassandraroot@dbnode1:~/Downloads# lsapache-cassandra-2.1.9-src.tar.gz cassandra root@dbnode1:~/Downloads# mv cassandra /datapool1/zfsnode1root@dbnode1:~/Downloads# cd /datapool1/zfsnode1root@dbnode1:/datapool1/zfsnode1# mkdir db_files
Make the cassandra
directory the current working directory and build the Cassandra application with Ant.
root@dbnode1:/datapool1/zfsnode1# cd cassandra
root@dbnode1:/datapool1/zfsnode1/cassandra# ant
...
BUILD SUCCESSFUL
Total time: 8 minutes 37 seconds
The application has been built. Open .profile with a text editor and add the following entries. Then source the file.
export CASSANDRA_HOME=/datapool1/zfsnode1/cassandraexport PATH=$CASSANDRA_HOME/bin:$PATHroot@dbnode1:~/# source .profile
One at a time, move to the /datapool1/zfsnode1/cassandra/bin
and /datapool1/zfsnode1/cassandra/tools
directories, and use a text editor to open the shell scripts that are shown in Table 5. In the first line of each file, change #!bin/sh
to #!bin/bash
and then save the file.
Table 5. Shell scripts to change.
| Cassandra Directory | Shell Scripts to Change |
| $CASSANDRA_HOME/bin
| cassandra.sh
, Cassandra-cli.sh
, cqlsh.sh
, debug-cql
, nodetool.sh
, sstablekeys.sh
, sstableloader.sh
, sstablescrub.sh
, sshtableupgrade.sh
|
| $CASSANDRA_HOME/tools/bin
| cassandra-stress.sh
, cassandra-stressd.sh
, json2sstable.sh
, sstable2json.sh
, sstableexpiredblockers.sh
, sstablelevelreset.sh
, sstablemetadata.sh
, sstableofflinerelevel.sh
, sstablerepairedset.sh
, sstablesplit.sh
|
In the Cassandra/conf
directory, the shell script Cassandra-env.sh
utilizes grep
with the -A
option. This causes Oracle Solaris to throw an illegal-option warning when starting Cassandra or running all other utilities. By default, grep
runs under the /usr/bin
directory. The warning can be avoided by executing the grep
utility under the /usr/gnu/bin
directory. To do this, declare its absolute path in Cassandra-env.sh
.
root@dbnode1:~/# which grep/usr/bin/grep
Open $CASSANDRA_HOME/conf/Cassandra-env.sh
and change grep -A
to /usr/gnu/bin/grep -A
. Then save the file to commit the change.
Move to /datapool1/zfsnode1/cassandra/conf/
, open cassandra.yaml
with a text editor, and make the following adjustments.
cluster_name: 'MyCluster' num_tokens: 5 data_file_directories: /datapool1/zfsnode1/db_files/data commitlog_directory: /datapool1/zfsnode1/db_files/commitlog saved_caches_directory: /datapool1/zfsnode1/db_files/saved_caches seed_provider: - seeds: "192.168.2.22" listen_address: 192.168.2.22 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch
Perform the same steps to build Cassandra on dbnode2 and dbnode3. Place the source code in the relevant ZFS file system. Execute the same modifications as made earlier, too. Configure the cassandra.yaml
file for the second and third database nodes as shown below:
The cassandra.yaml
configuration for dbnode2:
cluster_name: 'MyCluster' num_tokens: 5 data_file_directories: - /datapool2/zfsnode2/db_files/data commitlog_directory: /datapool2/zfsnode2/db_files/commitlog saved_caches_directory: /datapool2/zfsnode2/db_files/saved_caches seed_provider: - seeds: "192.168.2.22" listen_address: 192.168.2.23 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch
The cassandra.yaml
configuration for dbnode3:
cluster_name: 'MyCluster' num_tokens: 5 data_file_directories: - /datapool3/zfsnode3/db_files/data commitlog_directory: /datapool3/zfsnode3/db_files/commitlog saved_caches_directory: /datapool3/zfsnode3/db_files/saved_caches seed_provider: - seeds: "192.168.2.22" listen_address: 192.168.2.24 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch
Some Notes About the Cassandra.yaml
File
In order for the database servers to belong to the same cluster, they must share the same cluster name. The cluster_name
setting fulfills this purpose. Seed servers are one or more database servers that currently belong to the cluster and are to be contacted by a new server when it first joins the cluster. This new server will contact the seed servers for information about the rest of the servers in the cluster, that is, their names, their IP addresses, the racks and data centers they belong to, and so on.
When a cluster is initialized for the first time, a token ring is created. The token ring's values range from -2^63 to 2^63. The num_tokens
setting is a number that controls how many tokens are to be created per database server and in that way, a token range is built for the distribution of data. As data is inserted, the primary key (or a part of the primary key) gets hashed. This hash value falls within a token range and is the server where data will be sent. Every server can have a different num_tokens
setting based on the server hardware. Better servers can have a larger number of tokens set than older or less powerful servers. The data_file_directories
, commitlog_directory
, and saved_caches_directory
parameters set the paths where data and logs will reside.
Cassandra Operation and Data Distribution
Initiate the Cassandra databases on the database nodes.
root@dbnode1:~/# ./cassandra -froot@dbnode2:~/# ./cassandra -froot@dbnode3:~/# ./cassandra -f
The database cluster has been initiated.
From any database node, execute the nodetool
utility to verify the database cluster. The same members will be reported regardless of which database node the utility is run on.
root@dbnode1:~/# ./nodetool statusDatacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.2.24 72.62 KB 5 69.1% 6fdc0ead-a6c7-4e70-9a48-c9d0ef99fd84 RAC1 UN 192.168.2.22 184.55 KB 5 42.9% 26cc69f8-767e-4b1a-8da4-18d556a718a9 RAC1 UN 192.168.2.23 56.11 KB 5 88.0% af955565-4535-4dfb-b5f5-e15190a1ee28 RAC1 root@dbnode1:/datapool1/zfsnode1/cassandra/bin# ./nodetool describeclusterCluster Information: Name: MyCluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 6403a0ff-f93b-3b1f-8c35-0a8dc85a5b66: [192.168.2.24, 192.168.2.22, 192.168.2.23]
Start the cqlsh
utility to create a keyspace and start adding and querying data. A keyspace is analogous to a schema in the relational database world. The replication factor (RF) is set to 2, so data will reside in two servers. There is no master/slave or primary/secondary notion. Both replicas are masters.
root@dbnode1:~/# ./cqlshConnected to MyCluster at localhost:9160. [cqlsh 4.1.1 | Cassandra 2.0.14-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh> cqlsh> create keyspace myfirstkeyspace with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 2};cqlsh> use myfirstkeyspace;cqlsh:myfirstkeyspace> create table greek_locations ( loc_id int PRIMARY KEY, loc_name text, description text);cqlsh:myfirstkeyspace> describe tables;greek_locations cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (1,'Thessaloniki','North Greece');cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (2,'Larissa','Central Greece');cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (3,'Athens','Central Greece - Capital');cqlsh:myfirstkeyspace> select * from greek_locations; loc_id | description | loc_name --------+--------------------------+-------------- 1 | North Greece | Thessaloniki 2 | Central Greece | Larissa 3 | Central Greece - Capital | Athens (3 rows)
Connecting from any other database server should report the same results.
root@dbnode2:/datapool2/zfsnode2/cassandra/bin# ./cqlshConnected to MyCluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.1.9-SNAPSHOT | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh> use myfirstkeyspace;cqlsh:myfirstkeyspace> select * from greek_locations; loc_id | description | loc_name --------+--------------------------+-------------- 1 | North Greece | Thessaloniki 2 | Central Greece | Larissa 3 | Central Greece - Capital | Athens (3 rows)
The ring
parameter of the nodetool
utility will report the token range limits for each of the servers. The num_tokens
parameter was set to 5
in the Cassandra.yaml
file, so there are 15 token ranges in total for the three servers.
root@dbnode1:/datapool1/zfsnode1/cassandra/bin# ./nodetool ringDatacenter: DC1 ========== Address Rack Status State Load Owns Token 5554128420332708557 192.168.2.22 RAC1 Up Normal 122.3 KB ? -9135243804612957495 192.168.2.23 RAC1 Up Normal 76.37 KB ? -8061157299090260986 192.168.2.22 RAC1 Up Normal 122.3 KB ? -7087501046371881693 192.168.2.24 RAC1 Up Normal 78.8 KB ? -6454951218299078731 192.168.2.22 RAC1 Up Normal 122.3 KB ? -5793299020697319351 192.168.2.22 RAC1 Up Normal 122.3 KB ? -5588273793487800091 192.168.2.23 RAC1 Up Normal 76.37 KB ? -3763306950618271982 192.168.2.23 RAC1 Up Normal 76.37 KB ? -3568767174854581436 192.168.2.23 RAC1 Up Normal 76.37 KB ? -1113375360465059283 192.168.2.24 RAC1 Up Normal 78.8 KB ? -682327379305650352 192.168.2.24 RAC1 Up Normal 78.8 KB ? 112278302282739678 192.168.2.23 RAC1 Up Normal 76.37 KB ? 4952728554160670447 192.168.2.24 RAC1 Up Normal 78.8 KB ? 5093621811617287602 192.168.2.22 RAC1 Up Normal 122.3 KB ? 5342254592921898323 192.168.2.24 RAC1 Up Normal 78.8 KB ? 5554128420332708557 Warning: "nodetool ring" is used to output all the tokens of a node. To view status related info of a node use "nodetool status" instead.
The describering
parameter of the nodetool
utility reports the token ranges and the endpoints in detail.
root@dbnode1:/datapool1/zfsnode1/cassandra/bin# ./nodetool describering myfirstkeyspaceSchema Version:155131ce-b922-37aa-a635-68e6fa96597c TokenRange: TokenRange(start_token:5342254592921898323, end_token:5554128420332708557, endpoints:[192.168.2.24, 192.168.2.22], rpc_endpoints:[127.0.0.1, 127.0.0.1], endpoint_details:[EndpointDetails(host:192.168.2.24, datacenter:DC1, rack:RAC1), EndpointDetails(host:192.168.2.22, datacenter:DC1, rack:RAC1)]) TokenRange(start_token:112278302282739678, end_token:4952728554160670447, endpoints:[192.168.2.23, 192.168.2.24], rpc_endpoints:[127.0.0.1, 127.0.0.1], endpoint_details:[EndpointDetails(host:192.168.2.23, datacenter:DC1, rack:RAC1), EndpointDetails(host:192.168.2.24, datacenter:DC1, rack:RAC1)]) TokenRange(start_token:5554128420332708557, end_token:-9135243804612957495, endpoints:[192.168.2.22, 192.168.2.23], rpc_