Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback, please email oracle-forums_moderators_us@oracle.com

How to Build a Cassandra Multinode Database Cluster on Oracle Solaris 11.3 with LUN Mirroring and IP

Melvis-OracleNov 4 2015 — edited May 24 2018

by Antonis Tsavdaris

This article describes how to build a Cassandra single-rack database cluster on Oracle Solaris 11.3 and extend its overall availability with LUN mirroring and IP network multipathing.

Cassandra database is a popular distributed database management system from Apache Foundation. It is highly scalable and comes with a master-less notion, in that there isn't a primary node to which other nodes are subservient. Every node in the cluster is equal and any node can service any request.

Oracle Solaris 11 is an enterprise-class operating system known for its reliability, availability, and serviceability (RAS) features. Its wealth of integrated features helps administrators build redundancy into every part of the system they deem critical, including the network, storage, and so on.

This how-to article describes how to build a Cassandra single-rack database cluster on Oracle Solaris 11.3 and extend its overall availability with LUN mirroring and IP network multipathing (IPMP). LUN mirroring will provide extended availability at the storage level and IPMP will add redundancy to the network.

In this scenario, the one-rack cluster is composed of six Oracle Solaris server instances. Three of them—dbnode1, dbnode2, and dbnode3—will be the database nodes and the other three—stgnode1, stgnode2, and stgnode3—will provide highly available storage. The highly available storage will be constructed from nine LUNs, three in each storage node.

At the end of the construction, the one-rack cluster will have a fully operational database even if two of the storage nodes are not available. Furthermore, the networks—the public network and the iSCSI network—will be immune to hardware failures through IPMP groups consisting of an active and a standby network card.

Cluster Topology

All servers have the Oracle Solaris 11.3 operating system installed. Table 1 depicts the cluster architecture.

In reality the Cassandra binaries as well as the data will reside on the storage nodes. The database nodes will serve the running instances.

Table 1. Oracle Solaris servers and their role in the cluster.

| Node Name | Role in the Cluster | Contains |
| dbnode1 | Database node | Running instance |
| dbnode2 | Database node | Running instance |
| dbnode3 | Database node | Running instance |
| stgnode1 | Storage node | Binaries and data |
| stgnode2 | Storage node | Binaries and data |
| stgnode3 | Storage node | Binaries and data |

Network Interface Cards

As shown in Table 2, every server in the cluster has four network interface cards (NICs) installed; net0 through net3 will be named. Redundancy is required at the network level and this will be provided by IPMP groups. IP multipathing requires that the DefaultFixed network profile be activated and static IP addresses be assigned to every network interface.

Table 2. NICs and IPMP group configuration.

| Node Name | NIC | Primary/Standby NIC | IP/Subnet | IPMP Group Name | IPMP IP Address | Role |
| dbnode1 | net0 | primary | 192.168.2.10/24 | IPMP0 | 192.168.2.22/24 | Public network |
| net1 | standby | 192.168.2.11/24 |
| net2 | primary | 10.0.1.1/27 | IPMP1 | 10.0.1.13/27 | iSCSI initiator |
| net3 | standby | 10.0.1.2/27 |
| dbnode2 | net0 | primary | 192.168.2.12/24 | IPMP2 | 192.168.2.23/24 | Public network |
| net1 | standby | 192.168.2.13/24 |
| net2 | primary | 10.0.1.3/27 | IPMP3 | 10.0.1.14/27 | iSCSI initiator |
| net3 | standby | 10.0.1.4/27 |
| dbnode3 | net0 | primary | 192.168.2.14/24 | IPMP4 | 192.168.2.24/24 | Public network |
| net1 | standby | 192.168.2.15/24 |
| net2 | primary | 10.0.1.5/27 | IPMP5 | 10.0.1.15/27 | iSCSI initiator |
| net3 | standby | 10.0.1.6/27 |
| stgnode1 | net0 | primary | 192.168.2.16/24 | IPMP6 | 192.168.2.25/24 | Public network |
| net1 | standby | 192.168.2.17/24 |
| net2 | primary | 10.0.1.7/27 | IPMP7 | 10.0.1.16/27 | iSCSI target |
| net3 | standby | 10.0.1.8/27 |
| stgnode2 | net0 | primary | 192.168.2.18/24 | IPMP8 | 192.168.2.26/24 | Public network |
| net1 | standby | 192.168.2.19/24 |
| net2 | primary | 10.0.1.9/27 | IPMP9 | 10.0.1.17/27 | iSCSI target |
| net3 | standby | 10.0.1.10/27 |
| stgnode3 | net0 | primary | 192.168.2.20/24 | IPMP10 | 192.168.2.27/24 | Public network |
| net1 | standby | 192.168.2.21/24 |
| net2 | primary | 10.0.1.11/27 | IPMP11 | 10.0.1.18/27 | iSCSI target |
| net3 | standby | 10.0.1.12/27 |

First, ensure that the network service is up and running. Then check whether the network profile is set to DefaultFixed.

root@dbnode1:~# svcs network/physical

STATE          STIME    FMRI

online         1:25:45  svc:/network/physical:upgrade

online         1:25:51  svc:/network/physical:default

root@dbnode1:~# netadm list

TYPE        PROFILE        STATE

ncp         Automatic      disabled

ncp         DefaultFixed   online

loc         DefaultFixed   online

loc         Automatic      offline

loc         NoNet          offline

Because the network profile is set to DefaultFixed, review the network interfaces and the data link layer.

root@dbnode1:~# dladm show-physLINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE net0              Ethernet             unknown    1000   full      e1000g0 net1              Ethernet             unknown    1000   full      e1000g1 net3              Ethernet             unknown    1000   full      e1000g3 net2              Ethernet             unknown    1000   full      e1000g2

Create the IP interface for net0 and then configure a static IPv4 address.

root@dbnode1:~# ipadm create-ip net0

root@dbnode1:~# ipadm create-addr -T static -a 192.168.2.10/24 net0/v4

root@dbnode1:~# ipadm show-addr

ADDROBJ        TYPE     STATE      ADDR

lo0/v4         static   ok         127.0.0.1/8

net0/v4        static   ok         192.168.2.10/24

lo0/v6         static   ok         ::1/128

Following this, create the IP interfaces and assign the relevant IP addresses and subnets for each of the NICs, net0–net3, for each of the servers according to Table 2.

Note: There is an exceptional article by Andrew Walton on how to configure an Oracle Solaris network along with making it internet-facing: "How to Get Started Configuring Your Network in Oracle Solaris 11."

IPMP Groups

After the NICs have been configured and the IP addresses have been assigned, IPMP groups can be configured as well. IPMP is a great way to group separate physical network interfaces and, thus, provide physical interface failure detection, network access failover, and network load spreading. IPMP groups will be made of two NICs in an active/standby configuration. So, when an interface that is a member of an IPMP group is brought down for maintenance or when a NIC fails due to a mechanical error, a failover process will take place; the remaining NIC and related IP interface will step in to ensure that the node is not segregated from the cluster.

According to the planned scenario, two IPMP groups are going to be created in each server, one for every two NICs configured earlier. Each IPMP group will have its own IP interface, and one of the underlying NICs will be active, while the other will remain a standby. Table 2 summarizes the IPMP group configurations that must be completed on each node.

First, create the IPMP group IPMP0. Then, bind interfaces net0 and net1 to this group and create an IP address for the group.

root@dbnode1:~# ipadm create-ipmp ipmp0root@dbnode1:~# ipadm add-ipmp -i net0 -i net1 ipmp0root@dbnode1:~# ipadm create-addr -T static -a 192.168.2.22/24 ipmp0ipmp0/v4

Now that IPMP0 has been created successfully, declare net1 as the standby interface.

root@dbnode1:~# ipadm set-ifprop -p standby=on -m ip net1root@dbnode1:~# ipmpstat -gGROUP       GROUPNAME   STATE     FDT       INTERFACES ipmp0       ipmp0       ok        10.00s    net0 (net1)

The ipmpstat command reports that the IPMP0 group has been built successfully and that it operates over two NICs, net0 and net1. The parentheses denote a standby interface.

Follow the above-mentioned approach to build the IPMP groups for the rest of the servers in the cluster, as shown in Table 2.

Local Storage

As shown in Table 3, each of the storage servers has nine 10 GB additional disks upon which zpools are to be created. They are to be built with a RAID 1 and hot-spare configuration. Following this, ZFS file systems and LUNs can be constructed.

Table 3. Additional disk storage configuration.

| Node Name | ZFS Pool Name | Disk Name | Size | Role in Mirror | ZFS File System |
| stgnode1 | zpool1 | c1t2d0 | 10 GB | member | zfslun1 |
| c1t3d0 | 10 GB | member |
| c1t4d0 | 10 GB | spare |
| zpool2 | c1t5d0 | 10 GB | member | zfslun2 |
| c1t6d0 | 10 GB | member |
| c1t7d0 | 10 GB | spare |
| zpool3 | c1t8d0 | 10 GB | member | zfslun3 |
| c1t9d0 | 10 GB | member |
| c1t10d0 | 10 GB | spare |
| stgnode2 | zpool4 | c1t2d0 | 10 GB | member | zfslun4 |
| c1t3d0 | 10 GB | member |
| c1t4d0 | 10 GB | spare |
| zpool5 | c1t5d0 | 10 GB | member | zfslun5 |
| c1t6d0 | 10 GB | member |
| c1t7d0 | 10 GB | spare |
| zpool6 | c1t8d0 | 10 GB | member | zfslun6 |
| c1t9d0 | 10 GB | member |
| c1t10d0 | 10 GB | spare |
| stgnode3 | zpool7 | c1t2d0 | 10 GB | member | zfslun7 |
| c1t3d0 | 10 GB | member |
| c1t4d0 | 10 GB | spare |
| zpool8 | c1t5d0 | 10 GB | member | zfslun8 |
| c1t6d0 | 10 GB | member |
| c1t7d0 | 10 GB | spare |
| zpool9 | c1t8d0 | 10 GB | member | zfslun9 |
| c1t9d0 | 10 GB | member |
| c1t10d0 | 10 GB | spare |

Starting with stgnode1, run the format command, which reports the additional, unconfigured disks.

root@stgnode1:~# formatSearching for disks...done  AVAILABLE DISK SELECTIONS:        0. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>           /pci@0,0/pci8086,2829@d/disk@0,0        1. c1t2d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@2,0        2. c1t3d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@3,0        3. c1t4d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@4,0        4. c1t5d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@5,0        5. c1t6d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@6,0        6. c1t7d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@7,0        7. c1t8d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@8,0        8. c1t9d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@9,0        9. c1t10d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /pci@0,0/pci8086,2829@d/disk@a,0 Specify disk (enter its number): ^Croot@stgnode1:~# 

Create the zpools zpool1, zpool2, and zpool3 in a RAID 1 with hot-spare configuration.

root@stgnode1:~# zpool create zpool1 mirror c1t2d0 c1t3d0 spare c1t4d0

root@stgnode1:~# zpool status zpool1

  pool: zpool1

state: ONLINE

  scan: none requested

config:

    NAME        STATE     READ WRITE CKSUM

    zpool1      ONLINE       0     0     0

      mirror-0  ONLINE       0     0     0

        c1t2d0  ONLINE       0     0     0

        c1t3d0  ONLINE       0     0     0

    spares

      c1t4d0    AVAIL  

errors: No known data errors

root@stgnode1:~# zpool create zpool2 mirror c1t5d0 c1t6d0 spare c1t7d0

root@stgnode1:~# zpool status zpool2

  pool: zpool2

state: ONLINE

  scan: none requested

config:

    NAME        STATE     READ WRITE CKSUM

    zpool2      ONLINE       0     0     0

      mirror-0  ONLINE       0     0     0

        c1t5d0  ONLINE       0     0     0

        c1t6d0  ONLINE       0     0     0

    spares

      c1t7d0    AVAIL  

errors: No known data errors

root@stgnode1:~# zpool create zpool3 mirror c1t8d0 c1t9d0 spare c1t10d0

root@stgnode1:~# zpool status zpool3

  pool: zpool3

state: ONLINE

  scan: none requested

config:

    NAME        STATE     READ WRITE CKSUM

    zpool3      ONLINE       0     0     0

      mirror-0  ONLINE       0     0     0

        c1t8d0  ONLINE       0     0     0

        c1t9d0  ONLINE       0     0     0

    spares

      c1t10d0   AVAIL  

errors: No known data errors

Running the format command again shows that the disks have been formatted.

root@stgnode1:~# formatSearching for disks...done  AVAILABLE DISK SELECTIONS:        0. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>           /pci@0,0/pci8086,2829@d/disk@0,0        1. c1t2d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@2,0        2. c1t3d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@3,0        3. c1t4d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@4,0        4. c1t5d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@5,0        5. c1t6d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@6,0        6. c1t7d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@7,0        7. c1t8d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@8,0        8. c1t9d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@9,0        9. c1t10d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /pci@0,0/pci8086,2829@d/disk@a,0 Specify disk (enter its number): ^C

Use the zpool list command to get a report on the newly created ZFS pools.

root@stgnode1:~# zpool list

NAME        SIZE     ALLOC    FREE     CAP    DEDUP    HEALTH  ALTROOT

rpool      19.6G     8.01G    11.6G    40%     1.00x   ONLINE  -

zpool1     9.94G       88K    9.94G     0%     1.00x   ONLINE  -

zpool2     9.94G       88K    9.94G     0%     1.00x   ONLINE  -

zpool3     9.94G       88K    9.94G     0%     1.00x   ONLINE  -

Build ZFS file systems on the ZFS pools.

root@stgnode1:~# zfs create -V 8g zpool1/zfslun1root@stgnode1:~# zfs create -V 8g zpool2/zfslun2root@stgnode1:~# zfs create -V 8g zpool3/zfslun3

Use the zfs list command to get a report on the newly created ZFS file systems.

root@stgnode1:~# zfs list -r /zpool*

NAME             USED  AVAIL  REFER  MOUNTPOINT

zpool1          8.25G  1.53G    31K  /zpool1

zpool1/zfslun1  8.25G  9.78G    16K  -

zpool2          8.25G  1.53G    31K  /zpool2

zpool2/zfslun2  8.25G  9.78G    16K  -

zpool3          8.25G  1.53G    31K  /zpool3

zpool3/zfslun3  8.25G  9.78G    16K  -

Perform the same work on the second and third storage nodes.

iSCSI Targets

As shown in Table 4, three pools are to be further constructed. They are to be mirrored across the network with a hot-spare configuration. ZFS pool datapool1 will be constructed from host dbnode1 by LUNs c0t600144F03B268F00000055F33BB10001d0, c0t600144F06A174000000055F5D8F50001d0, and c0t600144F0BBB5C300000055F5DB370001d0, each coming from a different storage node.

Similarly, ZFS pool datapool2 will be constructed from host dbnode2 by LUNs c0t600144F03B268F00000055F33BCC0002d0, c0t600144F06A174000000055F5D90D0002d0, and c0t600144F0BBB5C300000055F5DB4D0002d0, each coming from a different storage node.

Finally, pool datapool3 will be constructed from host dbnode3 by LUNs c0t600144F03B268F00000055F33BFE0003d0, c0t600144F06A174000000055F5D9350003d0, and c0t600144F0BBB5C300000055F5DB690003d0.

Table 4. Structure and constituents of the two LUN mirrors.

| Cross-Platform ZFS Pool | Node Name | ZFS File System | LUN | ZFS File System |
| datapool1 | stgnode1 | zfslun1 | c0t600144F03B268F00000055F33BB10001d0 | /datapool1/zfsnode1 |
| stgnode2 | zfslun4 | c0t600144F06A174000000055F5D8F50001d0 |
| stgnode3 | zfslun7 | c0t600144F0BBB5C300000055F5DB370001d0 |
| datapool2 | stgnode1 | zfslun2 | c0t600144F03B268F00000055F33BCC0002d0 | /datapool2/zfsnode2 |
| stgnode2 | zfslun5 | c0t600144F06A174000000055F5D90D0002d0 |
| stgnode3 | zfslun8 | c0t600144F0BBB5C300000055F5DB4D0002d0 |
| datapool3 | stgnode1 | zfslun3 | c0t600144F03B268F00000055F33BFE0003d0 | /datapool3/zfsnode3 |
| stgnode2 | zfslun6 | c0t600144F06A174000000055F5D9350003d0 |
| stgnode3 | zfslun9 | c0t600144F0BBB5C300000055F5DB690003d0 |

In order to be able to create iSCSI targets and LUNs, the storage server group of packages must be installed on each of the storage servers.

root@stgnode1:~# pkg install storage-server

           Packages to install:  21

            Services to change:   1

       Create boot environment:  No

Create backup boot environment: Yes

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                              21/21     3644/3644  111.6/111.6  586k/s

PHASE                                          ITEMS

Installing new actions                     4640/4640

Updating package state database                 Done

Updating package cache                           0/0

Updating image state                            Done

Creating fast lookup database                   Done

Updating package cache                           1/1

Verify that the group of packages has been installed by reviewing the output of the pkg info command, as follows:

root@stgnode1:~# pkg info storage-server

                      Name: group/feature/storage-server

       Summary: Multi protocol storage server group package

      Category: Drivers/Storage (org.opensolaris.category.2008)

                Meta Packages/Group Packages (org.opensolaris.category.2008)

         State: Installed

     Publisher: solaris

       Version: 0.5.11

Build Release: 5.11

        Branch: 0.175.3.0.0.25.0

Packaging Date: June 21, 2015 10:57:56 PM

          Size: 5.46 kB

          FMRI: pkg://solaris/group/feature/storage-server@0.5.11,5.11-0.175.3.0.0.25.0:20150621T225756Z

Perform the same action on the second and third storage nodes.

Enable the Oracle Solaris Common Multiprotocol SCSI TARget (COMSTAR) SCSI Target Mode Framework (STMF) service and verify that it is online. Then, create logical units for all the ZFS LUNs from the storage nodes on which they were created. Start from stgnode1.

root@stgnode1:~# svcadm enable stmfroot@stgnode1:~# svcs stmfSTATE       STIME    FMRI online         22:48:39  svc:/system/stmf:default  root@stgnode1:~# stmfadm create-lu /dev/zvol/rdsk/zpool1/zfslun1Logical unit created: 600144F03B268F00000055F33BB10001  root@stgnode1:~# stmfadm create-lu /dev/zvol/rdsk/zpool2/zfslun2Logical unit created: 600144F03B268F00000055F33BCC0002  root@stgnode1:~# stmfadm create-lu /dev/zvol/rdsk/zpool3/zfslun3Logical unit created: 600144F03B268F00000055F33BFE0003

Confirm that the LUNs have been created successfully.

root@stgnode1:~#  stmfadm list-lu

LU Name: 600144F03B268F00000055F33BB10001

LU Name: 600144F03B268F00000055F33BCC0002

LU Name: 600144F03B268F00000055F33BFE0003

Create the LUN view for each of the LUNs and verify the LUN configuration.

root@stgnode1:~# stmfadm add-view 600144F03B268F00000055F33BB10001root@stgnode1:~# stmfadm add-view 600144F03B268F00000055F33BCC0002root@stgnode1:~# stmfadm add-view 600144F03B268F00000055F33BFE0003root@stgnode1:~# stmfadm list-view -l 600144F03B268F00000055F33BB10001View Entry: 0     Host group   : All     Target Group : All     LUN          : Auto root@stgnode1:~# stmfadm list-view -l 600144F03B268F00000055F33BCC0002View Entry: 0     Host group   : All     Target Group : All     LUN          : Auto root@stgnode1:~# stmfadm list-view -l 600144F03B268F00000055F33BFE0003View Entry: 0     Host group   : All     Target Group : All     LUN          : Auto

Enable the iSCSI target service on the first storage node and verify it is online.

root@stgnode1:~# svcadm enable -r svc:/network/iscsi/target:defaultroot@stgnode1:~# svcs iscsi/targetSTATE        STIME     FMRI online       22:53:44  svc:/network/iscsi/target:default

Create the iSCSI target and list it:

root@stgnode1:~# itadm create-targetTarget iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4 successfully created

Verify that the target has been created.

root@stgnode1:~# itadm list-target -vTARGET NAME                                                  STATE    SESSIONS  iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4  online   0              alias:              -      auth:               none (defaults)      targetchapuser:     -      targetchapsecret:   unset      tpg-tags:           default

Follow the same steps to create logical units for the rest of the ZFS LUNs and enable the iSCSI target on the second and third storage server.

After the iSCSI targets have been successfully created, the iSCSI initiators must be created on the database nodes.

Enable the iSCSI initiator service.

root@dbnode1:~# svcadm enable network/iscsi/initiator

Configure the targets to be statically discovered. The initiator will discover targets from all three storage servers.

root@dbnode1:~# iscsiadm add static-config \iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4,10.0.1.16root@dbnode1:~# iscsiadm add static-config \iqn.1986-03.com.sun:02:ae65e6de-dfb1-4a77-9940-dabf68709f5d,10.0.1.17root@dbnode1:~# iscsiadm add static-config \iqn.1986-03.com.sun:02:f4e68b9d-26ca-484a-8d85-d2c8275da0eb,10.0.1.18

Verify the configuration with the iscsiadm list command.

root@dbnode1:~# iscsiadm list static-configStatic Configuration Target: iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4,10.0.1.16:3260 Static Configuration Target: iqn.1986-03.com.sun:02:ae65e6de-dfb1-4a77-9940-dabf68709f5d,10.0.1.17:3260 Static Configuration Target: iqn.1986-03.com.sun:02:f4e68b9d-26ca-484a-8d85-d2c8275da0eb,10.0.1.18:3260

Enable the static target discovery method.

root@dbnode1:~# iscsiadm modify discovery --static enable

Perform the same actions to configure the iSCSI initiator on dbnode2 and dbnode3 and enable the static target discovery method.

LUN Mirroring and Storage

From the first database node (dbnode1) verify the available disks. Nine LUNs should be available.

root@dbnode1:~# formatSearching for disks...done AVAILABLE DISK SELECTIONS:      0. c0t600144F0BBB5C300000055F5DB4D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f0bbb5c300000055f5db4d0002      1. c0t600144F0BBB5C300000055F5DB370001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f0bbb5c300000055f5db370001      2. c0t600144F0BBB5C300000055F5DB690003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f0bbb5c300000055f5db690003      3. c0t600144F03B268F00000055F33BB10001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f03b268f00000055f33bb10001      4. c0t600144F03B268F00000055F33BCC0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f03b268f00000055f33bcc0002      5. c0t600144F03B268F00000055F33BFE0003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f03b268f00000055f33bfe0003      6. c0t600144F06A174000000055F5D8F50001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f06a174000000055f5d8f50001      7. c0t600144F06A174000000055F5D90D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f06a174000000055f5d90d0002      8. c0t600144F06A174000000055F5D9350003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/disk@g600144f06a174000000055f5d9350003      9. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>         /pci@0,0/pci8086,2829@d/disk@0,0 Specify disk (enter its number): ^C

Build the first ZFS pool from LUNs c0t600144F03B268F00000055F33BB10001d0, c0t600144F06A174000000055F5D8F50001d0, and c0t600144F0BBB5C300000055F5DB370001d0. These all come from different storage servers to ensure the storage has high availability.

root@dbnode1:~# zpool create datapool1 mirror c0t600144F03B268F00000055F33BB10001d0 \c0t600144F06A174000000055F5D8F50001d0 spare c0t600144F0BBB5C300000055F5DB370001d0

Create the zfsnode1 ZFS file system on the zpool.

root@dbnode1:~# zfs create datapool1/zfsnode1

root@dbnode1:~# zpool list

NAME        SIZE    ALLOC   FREE    CAP   DEDUP   HEALTH  ALTROOT

datapool1   7.94G    128K   7.94G    0%   1.00x   ONLINE  -

rpool       19.6G   7.53G   12.1G   38%   1.00x   ONLINE  -

Verify the ZFS creation recursively.

root@dbnode1:~# zfs list -r datapool1NAME                 USED  AVAIL  REFER   MOUNTPOINT datapool1            128K  7.81G  32K     /datapool1 datapool1/zfsnode1   31K   7.81G  31K     /datapool1/zfsnode1

From the second database node (dbnode2), execute the format utility to verify the available disks. Check that three of the LUNs have been formatted.

root@dbnode2:~# format

Searching for disks...done

AVAILABLE DISK SELECTIONS:

     0. c0t600144F0BBB5C300000055F5DB4D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/disk@g600144f0bbb5c300000055f5db4d0002

     1. c0t600144F0BBB5C300000055F5DB370001d0 <SUN-COMSTAR-1.0-8.00GB>

        /scsi_vhci/disk@g600144f0bbb5c300000055f5db370001

     2. c0t600144F0BBB5C300000055F5DB690003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/disk@g600144f0bbb5c300000055f5db690003

     3. c0t600144F03B268F00000055F33BB10001d0 <SUN-COMSTAR-1.0-8.00GB>

        /scsi_vhci/disk@g600144f03b268f00000055f33bb10001

     4. c0t600144F03B268F00000055F33BCC0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/disk@g600144f03b268f00000055f33bcc0002

     5. c0t600144F03B268F00000055F33BFE0003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/disk@g600144f03b268f00000055f33bfe0003

     6. c0t600144F06A174000000055F5D8F50001d0 <SUN-COMSTAR-1.0-8.00GB>

        /scsi_vhci/disk@g600144f06a174000000055f5d8f50001

     7. c0t600144F06A174000000055F5D90D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/disk@g600144f06a174000000055f5d90d0002

     8. c0t600144F06A174000000055F5D9350003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/disk@g600144f06a174000000055f5d9350003

     9. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>

        /pci@0,0/pci8086,2829@d/disk@0,0

Specify disk (enter its number): ^C

Build the rest of the ZFS pools from the remaining available LUNs, as shown in Table 4.

Database Installation and Configuration

Before we can build Cassandra on the database nodes, Apache Ant must be installed. Apache Ant is a tool for building Java applications. Because Ant requires Java in order to run, Java Development Kit 8 (JDK 8) must be installed also.

Use the pkg utility to install Ant.

root@dbnode1:~# pkg install ant

                        Packages to install:  1

       Create boot environment: No

Create backup boot environment: No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                                1/1     1594/1594      7.6/7.6  216k/s

PHASE                                          ITEMS

Installing new actions                     1617/1617

Updating package state database                 Done

Updating package cache                           0/0

Updating image state                            Done

Creating fast lookup database                   Done

Updating package cache                           1/1

root@dbnode1:~# pkg info ant

          Name: developer/build/ant

       Summary: Apache Ant

   Description: Apache Ant is a Java-based build tool

      Category: Development/Distribution Tools

         State: Installed

     Publisher: solaris

       Version: 1.9.3

Build Release: 5.11

        Branch: 0.175.3.0.0.25.3

Packaging Date: June 21, 2015 11:51:03 PM

          Size: 35.66 MB

          FMRI: pkg://solaris/developer/build/ant@1.9.3,5.11-0.175.3.0.0.25.3:20150621T235103Z

Install the Java Development Kit.

root@dbnode1:~# pkg install jdk-8

           Packages to install:  2

       Create boot environment: No

Create backup boot environment: No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                                2/2       625/625    46.3/46.3  274k/s

PHASE                                          ITEMS

Installing new actions                       735/735

Updating package state database                 Done

Updating package cache                           0/0

Updating image state                            Done

Creating fast lookup database                   Done

Updating package cache                           1/1

Verify that JDK 8 is on the database node.

root@dbnode1:~# java -versionjava version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

On all the database nodes, download the source code for Cassandra version 2.1.9 (apache-cassandra-2.1.9-src.tar.gz) from http://cassandra.apache.org/, and install the software as follows.

Unzip the Apache source code and place it into the relevant /datapoolx file system. Create the db_files directory where data and log files are to reside.

root@dbnode1:~# cd Downloadsroot@dbnode1:~/Downloads# ls apache-cassandra-2.1.9-src.tar.gz root@dbnode1:~/Downloads# tar -zxvf apache-cassandra-2.1.9-src.tar.gzroot@dbnode1:~/Downloads# mv apache-cassandra-2.1.9-src cassandraroot@dbnode1:~/Downloads# lsapache-cassandra-2.1.9-src.tar.gz  cassandra root@dbnode1:~/Downloads# mv cassandra /datapool1/zfsnode1root@dbnode1:~/Downloads# cd /datapool1/zfsnode1root@dbnode1:/datapool1/zfsnode1# mkdir db_files

Make the cassandra directory the current working directory and build the Cassandra application with Ant.

root@dbnode1:/datapool1/zfsnode1# cd cassandra

root@dbnode1:/datapool1/zfsnode1/cassandra# ant

...

BUILD SUCCESSFUL

Total time: 8 minutes 37 seconds

The application has been built. Open .profile with a text editor and add the following entries. Then source the file.

export  CASSANDRA_HOME=/datapool1/zfsnode1/cassandraexport  PATH=$CASSANDRA_HOME/bin:$PATHroot@dbnode1:~/# source .profile

One at a time, move to the /datapool1/zfsnode1/cassandra/bin and /datapool1/zfsnode1/cassandra/tools directories, and use a text editor to open the shell scripts that are shown in Table 5. In the first line of each file, change #!bin/sh to #!bin/bash and then save the file.

Table 5. Shell scripts to change.

| Cassandra Directory | Shell Scripts to Change |
| $CASSANDRA_HOME/bin | cassandra.sh, Cassandra-cli.sh, cqlsh.sh, debug-cql, nodetool.sh, sstablekeys.sh, sstableloader.sh, sstablescrub.sh, sshtableupgrade.sh |
| $CASSANDRA_HOME/tools/bin | cassandra-stress.sh, cassandra-stressd.sh, json2sstable.sh, sstable2json.sh, sstableexpiredblockers.sh, sstablelevelreset.sh, sstablemetadata.sh, sstableofflinerelevel.sh, sstablerepairedset.sh, sstablesplit.sh |

In the Cassandra/conf directory, the shell script Cassandra-env.sh utilizes grep with the -A option. This causes Oracle Solaris to throw an illegal-option warning when starting Cassandra or running all other utilities. By default, grep runs under the /usr/bin directory. The warning can be avoided by executing the grep utility under the /usr/gnu/bin directory. To do this, declare its absolute path in Cassandra-env.sh.

root@dbnode1:~/# which grep/usr/bin/grep

Open $CASSANDRA_HOME/conf/Cassandra-env.sh and change grep -A to /usr/gnu/bin/grep -A. Then save the file to commit the change.

Move to /datapool1/zfsnode1/cassandra/conf/, open cassandra.yaml with a text editor, and make the following adjustments.

cluster_name: 'MyCluster' num_tokens: 5 data_file_directories:      /datapool1/zfsnode1/db_files/data commitlog_directory: /datapool1/zfsnode1/db_files/commitlog saved_caches_directory: /datapool1/zfsnode1/db_files/saved_caches seed_provider:           - seeds: "192.168.2.22" listen_address: 192.168.2.22 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch

Perform the same steps to build Cassandra on dbnode2 and dbnode3. Place the source code in the relevant ZFS file system. Execute the same modifications as made earlier, too. Configure the cassandra.yaml file for the second and third database nodes as shown below:

The cassandra.yaml configuration for dbnode2:

cluster_name: 'MyCluster' num_tokens: 5 data_file_directories:     - /datapool2/zfsnode2/db_files/data commitlog_directory: /datapool2/zfsnode2/db_files/commitlog saved_caches_directory: /datapool2/zfsnode2/db_files/saved_caches seed_provider:           - seeds: "192.168.2.22" listen_address: 192.168.2.23 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch

The cassandra.yaml configuration for dbnode3:

cluster_name: 'MyCluster' num_tokens: 5 data_file_directories:     - /datapool3/zfsnode3/db_files/data commitlog_directory: /datapool3/zfsnode3/db_files/commitlog saved_caches_directory: /datapool3/zfsnode3/db_files/saved_caches seed_provider:           - seeds: "192.168.2.22" listen_address: 192.168.2.24 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch

Some Notes About the Cassandra.yaml File

In order for the database servers to belong to the same cluster, they must share the same cluster name. The cluster_name setting fulfills this purpose. Seed servers are one or more database servers that currently belong to the cluster and are to be contacted by a new server when it first joins the cluster. This new server will contact the seed servers for information about the rest of the servers in the cluster, that is, their names, their IP addresses, the racks and data centers they belong to, and so on.

When a cluster is initialized for the first time, a token ring is created. The token ring's values range from -2^63 to 2^63. The num_tokens setting is a number that controls how many tokens are to be created per database server and in that way, a token range is built for the distribution of data. As data is inserted, the primary key (or a part of the primary key) gets hashed. This hash value falls within a token range and is the server where data will be sent. Every server can have a different num_tokens setting based on the server hardware. Better servers can have a larger number of tokens set than older or less powerful servers. The data_file_directories, commitlog_directory, and saved_caches_directory parameters set the paths where data and logs will reside.

Cassandra Operation and Data Distribution

Initiate the Cassandra databases on the database nodes.

root@dbnode1:~/# ./cassandra -froot@dbnode2:~/# ./cassandra -froot@dbnode3:~/# ./cassandra -f

The database cluster has been initiated.

From any database node, execute the nodetool utility to verify the database cluster. The same members will be reported regardless of which database node the utility is run on.

root@dbnode1:~/# ./nodetool statusDatacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving --  Address        Load       Tokens  Owns (effective)  Host ID                                 Rack UN  192.168.2.24   72.62 KB   5       69.1%              6fdc0ead-a6c7-4e70-9a48-c9d0ef99fd84   RAC1 UN  192.168.2.22   184.55 KB  5       42.9%              26cc69f8-767e-4b1a-8da4-18d556a718a9   RAC1 UN  192.168.2.23    56.11 KB  5       88.0%              af955565-4535-4dfb-b5f5-e15190a1ee28   RAC1  root@dbnode1:/datapool1/zfsnode1/cassandra/bin# ./nodetool describeclusterCluster Information:    Name: MyCluster    Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner    Schema versions:       6403a0ff-f93b-3b1f-8c35-0a8dc85a5b66: [192.168.2.24, 192.168.2.22, 192.168.2.23]

Start the cqlsh utility to create a keyspace and start adding and querying data. A keyspace is analogous to a schema in the relational database world. The replication factor (RF) is set to 2, so data will reside in two servers. There is no master/slave or primary/secondary notion. Both replicas are masters.

root@dbnode1:~/# ./cqlshConnected to MyCluster at localhost:9160. [cqlsh 4.1.1 | Cassandra 2.0.14-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh>  cqlsh> create keyspace myfirstkeyspace with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 2};cqlsh> use myfirstkeyspace;cqlsh:myfirstkeyspace> create table greek_locations ( loc_id int PRIMARY KEY, loc_name text, description text);cqlsh:myfirstkeyspace> describe tables;greek_locations  cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (1,'Thessaloniki','North Greece');cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (2,'Larissa','Central Greece');cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (3,'Athens','Central Greece - Capital');cqlsh:myfirstkeyspace> select * from greek_locations; loc_id | description              | loc_name --------+--------------------------+--------------       1 |             North Greece | Thessaloniki       2 |           Central Greece |      Larissa       3 | Central Greece - Capital |       Athens  (3 rows)

Connecting from any other database server should report the same results.

root@dbnode2:/datapool2/zfsnode2/cassandra/bin# ./cqlshConnected to MyCluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.1.9-SNAPSHOT | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh> use myfirstkeyspace;cqlsh:myfirstkeyspace> select * from greek_locations; loc_id | description              | loc_name --------+--------------------------+--------------       1 |             North Greece | Thessaloniki       2 |           Central Greece |      Larissa       3 | Central Greece - Capital |       Athens  (3 rows)

The ring parameter of the nodetool utility will report the token range limits for each of the servers. The num_tokens parameter was set to 5 in the Cassandra.yaml file, so there are 15 token ranges in total for the three servers.

root@dbnode1:/datapool1/zfsnode1/cassandra/bin# ./nodetool ringDatacenter: DC1 ========== Address        Rack    Status  State   Load       Owns          Token                                                                      5554128420332708557 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?        -9135243804612957495 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -8061157299090260986 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?         -7087501046371881693 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?         -6454951218299078731 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?         -5793299020697319351 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?         -5588273793487800091 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -3763306950618271982 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -3568767174854581436 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -1113375360465059283 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          -682327379305650352  192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          112278302282739678 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?          4952728554160670447 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          5093621811617287602 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?          5342254592921898323 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          5554128420332708557    Warning: "nodetool ring" is used to output all the tokens of a node.   To view status related info of a node use "nodetool status" instead.

The describering parameter of the nodetool utility reports the token ranges and the endpoints in detail.

root@dbnode1:/datapool1/zfsnode1/cassandra/bin# ./nodetool describering myfirstkeyspaceSchema Version:155131ce-b922-37aa-a635-68e6fa96597c TokenRange:     TokenRange(start_token:5342254592921898323, end_token:5554128420332708557,  endpoints:[192.168.2.24, 192.168.2.22], rpc_endpoints:[127.0.0.1, 127.0.0.1],  endpoint_details:[EndpointDetails(host:192.168.2.24, datacenter:DC1, rack:RAC1),  EndpointDetails(host:192.168.2.22, datacenter:DC1, rack:RAC1)])    TokenRange(start_token:112278302282739678, end_token:4952728554160670447,  endpoints:[192.168.2.23, 192.168.2.24], rpc_endpoints:[127.0.0.1, 127.0.0.1],  endpoint_details:[EndpointDetails(host:192.168.2.23, datacenter:DC1, rack:RAC1),  EndpointDetails(host:192.168.2.24, datacenter:DC1, rack:RAC1)])    TokenRange(start_token:5554128420332708557, end_token:-9135243804612957495,  endpoints:[192.168.2.22, 192.168.2.23], rpc_
Comments
Post Details
Added on Nov 4 2015
1 comment
11,664 views