Little Notes

Wednesday, May 29, 2013

Sorting Derived (Transient) Domain Properties in Grails

Every time someone on the Grails list or StackOverflow remarks that sorting only works in GORM with database persisted properties, I'm left scratching my head why post database faulting results can't be easily sorted. Groovy actually enables this with its own List.sort() mechanism, which I've not posted an example of in the Derived Sorts project on github.

The key to the sorting is to know what fields can be sorted without custom code, and then based on a stringent selection of sortable properties, one can post-sort the results, including secondary sorts. It gets rather complicated quickly when you consider multiple sort criteria. Here's a direct link to the controller code. Any hints to clean it up further for future readers all welcome. The code now shows working pagination. I used the solution for PaginatebleList that Colin Harrington posted.

Dtrace training -- its worth it!

While its fresh in my mind, I wanted to cover the wondering Dtrace training I had with Max Bruning and Brendan Gregg. The three day course is occasionally offered by Joyent, and here is an example synopsis. Not only do you get the meaty book on the subject, but the workbook and the labs you do are challenging. Sadly I came in a not too distant second on the 'ole leader board for solving the most labs.

What is more interesting was how quickly I found myself using the tool after the class. Although I heavily utilize KVM instances in Joyent's SDC, I got a request the following day as students were approaching "tape out" for their chips. Various students weren't cleaning up after themselves, with simulations running endlessly. The usual quip that "the internet is slow" or some such came quickly, in this case the statement that the network to and from the primary VM for some Ansys tools was very slow. Looking inside this 16 core, 64GB VM running Linux, it was not entirely apparent what was going wrong. The load was high, and in the end, I could have done some basic thread counting and discovered the answer if I knew where to look. However, I was able to use Dtrace in an opaque way and see from the hardware node that the qemu/kvm process was using its 16 cpu threads full tilt, and it was not I/O bound. Looking back into the Linux VM, it took little time to find that it was beyond the 16 CPU cores, trying to run at least 17 full time tasks. One task had over 20GB of memory mapped, and each time it yielded execution time and reacquired its CPU, it would undoubtedly spend extra effort in dealing with its large memory payload.

Killing off just one process here resolved the performance problem. VMs are sort of the worse case scenario for Dtrace, but it still guided me to the solution. The trick with the book is that one needs real world examples and enough practice to get the specific predicates and formation of queries down. Once you are over the hump, and I'm not sure I'm there yet, you'll find Dtrace to be indispensable.

Thursday, May 09, 2013

Decoding MPT_SAS drives in Nexenta/Illumos

Another group on campus had SAS resets on a drive, but the drive never failed. All we ever got was something like this in dmesg reports:

May 9 07:51:50 hostname scsi: [ID 365881 kern.info] /pci@0,0/pci8086,d138@3/pci1000,3010@0 (mpt_sas0):
May 9 07:51:50 hostname Log info 0x31080000 received for target 9.
May 9 07:51:50 hostname scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0

We have a target address, but a zpool status or equivalent never gives us that target number, just a long string like "c0t50014EE057D81DE7d0". How to find the drive to off-line and replace it?

For future reference, you'll need to follow this link:

https://www.meteo.unican.es/trac/meteo/blog/SolarisSATADeviceName

You'll also need the lsiutil, which I is part of the LSIUtil Kit 1.63. For reference: One valid download link --- Its next to impossible to find this on LSI's site and the Oracle references are all down now it seems.

Tuesday, May 07, 2013

NexentaStor auto-tier ACL workaround

Let us say that you created an auto-tier and said yes to preserve ACLs, but you used rsync as the protocol. Well, this is not realistic, and you'll see "Error from ACL callback (before): OperationFailed: the job was not completed successfully" in your logs after 0 seconds from running the auto-tier.

How to remove the ACL requirement? Delete and re-create the service? Not necessary.

If you run "show auto-tier :data-fsname-000" or the like, you'll see a value like "flags : copy ACLs" that is not otherwise addressable in properties. Just "setup auto-tier :data-fsname-000 property flags" and adjust the value "1024" to "0" and you'll remove the flag. Thats it.

Friday, May 03, 2013

How To: Pluribus NAT Routing

Its no secret that at Stanford we do a lot with OpenFlow. We get to play with some new and interesting stuff that we integrate into our OpenFlow network. One of these is the Pluribus Network switch, which combines system and network virtualization with a high bandwidth 48+ port 10GB switch fabric. We have been running this in our network, and for months it has been handling the heaviy lifting duties for our SmartOS-based private cloud.

Various features including OpenFlow functionality have been tested, but the products user interface is still being crafted and changes some what over time. Recently, we needed to enable NAT routing for the private administrative network for the SmartOS private cloud. This network is not attached to a router interface, and applying something outside the network fabric to enable NAT or routing will create an undesired point of failure. Pluribus has full routing functionality tied to their virtual network capability. Here is the current command sequence used to enable routing between the private 10.0.x.0/16 administrative address space (could be larger) to an external routable network. I've added the VLAN to attach externally as VLAN 4444, and the fabric name is sdc-global:

> nat-create name sdc-global-gateway vnet sdc-global

> nat-interface-add nat-name sdc-global-gateway ip 10.0.27.1/24 if data

> nat-interface-add nat-name sdc-global-gateway ip 172.20.1.1/24 if data vlan 4444

> nat-map-add nat-name sdc-global-gateway name sdc-global-nat ext-interface sdc.global.gateway.eth0 network 172.20.1.0/24

sdc.global.gateway.eth0 should be the external port, as seen from "nat-interface-show"

UPDATE: A bug when first did this prevents the zone managing the NAT from having a correct default gateway. You'll need shell access and "zlogin sdc-global-gateway" or the like to enter the zone, add add /etc/defaultrouter with the IP of that router there for future use. Then you can exit the zone and run "zoneadm -z sdc-global-gateway reboot" to get it working.

Thursday, May 02, 2013

Grails Fixtures in Bootstrap.. the missing pieces

One nifty way to load in a lot of data into either development or perhaps even production instances of Grails apps is the Fixtures Plugin. You can more easily define loads of data and multiple relationships with this plugin. This plugin is designed to be used for integration test data, but as noted here, nothing should prevent you from loading it in your Grails bootstrap step. However, I ran into curious errors such as "Fixture does not have bean". For my future self, here's the solutions to avoid incorrect assumptions:

In the BootStrap.groovy file, define the service fixtureLoader within the BootStrap class (def fixtureLoader).
All fixtures must be in a closure titled "fixture {}"
Each domain class needs importing at the top of each fixture file. This is the resolution to the bean error noted above.

Thursday, April 04, 2013

Joyent SDC 6.5.6 released -- Upgrade workaround

Just a heads up that Joyent has released Smart Data Center 6.5.6 as noted here:

http://wiki.joyent.com/wiki/display/sdc/Upgrading+SDC+6.5.3+or+6.5.4+to+SDC+6.5.6

First upgrade attempt fails at the very end when selecting the correct platform. Joyent Support noted that it has seen this before, and that a "sdc-restore" from the pre-upgrade backup and then a reattempt should work. In my case, it did just that. I did the quick restore (no -F here). Rebooting the head node as I write this.

Tuesday, April 02, 2013

Save time in backing up Joyent Smart Data Center

One does not frequently backup the head node USB key with Joyent's Smart Data Center. Generally you do it prior to upgrades. Therefore, its commonly a "do it twice" process as it has a quirky bug with regards to terminal emulation that I never seem to remember until its too late:

[root@headnode (CIS:0) ~]# sdc-backup -U c2t0d0p0

Disk c2t0d0p0 will relabled, reformatted and all data will be lost [y/n] y

labeling disk

creating PCFS file system

mounting target disk c2t0d0

mounting source disk c0t0d0

copying files

setting up grub

Sorry, I don't know anything about your "xterm-256color" terminal.

Error: installing grub boot blocks

Yep, my OSX default xterm-256color is not known, and the many-minutes long backup process dies at the end. To address this, override the terminal setting in root user's .bash_profile file with the line:

export TERM=vt100

Simple, but its not every day you can increase performance by 50%, so to speak.