Splunk 7.0 the good and bad

So I will preface this post by saying I love Splunk, it is the best log aggregation application out there.

So on with the post, and it must be a good one right? Anyways, Splunk released version 7.0 of their Splunk Enterprise product last week during their .Conf 2017 conference, which I was an FTR at. There were a few new features in it that were amazing, such as the new metrics index type which was blazingly fast. So like all “fanboys” of anything I decided to update my home server on Thursday night after the conference was over. This is where the fun began.

First when I started using Splunk years ago, it supported a myriad of operating systems for the servers. If you wanted Solaris, FreeBSD, AIX, HP-UX, MacOSX, Windows or Linux you were golden. However over the years that list has been pared back to now just Linux and Windows. (MacOSX is supported, but only for the free and trial editions. Basically used for development and home use, not for enterprise use.)

So now that Solaris is no longer supported, I needed to switch my home system from OpenIndiana (aka OpenSolaris) over to Linux. With that I spun up a new CentOS 7 VM on my home server, and copied over all my Splunk data from the Solaris one to the Linux one. I then removed the bin and lib directories (I use the tar installs and that is they only place machine specific binaries exist.) With that done, I untarred the Linux Splunk 7.0 over top my current directory and started it up. So far everything was good, until I tried to do a search. If it was a search for like the last 15 minutes it worked, but anything over that was dead because one of the hot buckets was corrupted. I am not sure if it happened during the transit or what. So off to the fsck commmand to try to fix them. An hour or so later it couldn’t fix some of them, so it was getting late and I just went to bed.

The next day when I returned home I tried to log in to my Splunk instance to see how it was doing, to my surprise I couldn’t even log in to it. It appeared that the linux host had crashed. I was dumbfounded as I hadn’t seen an actual kernel panic like that in a while. So I restarted the machine and started splunk back up and everything was working again.

A few days past and I went to check on it again, and once again it was dead. So now I am really curious. I ended up installing the crash utilities on the host and started going through the vmcore files. Yup each time it crashed it was splunkd that caused it. Unfortunately I don’t know much more than that as to what is actually causing it to happen. It appears to happen at random times.

The output of crash shows this:

KERNEL: /usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2017-10-01-12:26:50/vmcore [PARTIAL DUMP]
CPUS: 8
DATE: Sun Oct 1 12:26:43 2017
UPTIME: 1 days, 17:12:55
LOAD AVERAGE: 0.00, 0.01, 0.05
TASKS: 330
NODENAME: splunk
RELEASE: 3.10.0-693.2.2.el7.x86_64
VERSION: #1 SMP Tue Sep 12 22:26:13 UTC 2017
MACHINE: x86_64 (3399 Mhz)
MEMORY: 3 GB
PANIC: "double fault: 0000 [#1] SMP "
PID: 1420
COMMAND: "splunkd"
TASK: ffff8800bae91fa0 [THREAD_INFO: ffff880000120000]
CPU: 3
STATE: TASK_RUNNING (PANIC)

First few lines of the “bt” output from crash:
PID: 1420 TASK: ffff8800bae91fa0 CPU: 3 COMMAND: "splunkd"
#0 [ffff8800bfac4d88] machine_kexec at ffffffff8105c4cb
#1 [ffff8800bfac4de8] __crash_kexec at ffffffff81104a32
#2 [ffff8800bfac4eb8] crash_kexec at ffffffff81104b20
#3 [ffff8800bfac4ed0] oops_end at ffffffff816ad2b8
#4 [ffff8800bfac4ef8] die at ffffffff8102e97b
#5 [ffff8800bfac4f28] do_double_fault at ffffffff8102b6e2
#6 [ffff8800bfac4f50] double_fault at ffffffff816b6908
[exception RIP: page_fault+13]
RIP: ffffffff816ac52d RSP: ffff880000122fc8 RFLAGS: 00010092
RAX: 0000000000000ff8 RBX: 0000000000000000 RCX: ffffffff816ac2ac
RDX: 00001fffffffffff RSI: ffffffff81a73118 RDI: 0000000000000000
RBP: ffff880000123098 R8: ffffffff81911167 R9: ffffea00002e7b80
R10: ffffea00002e7b80 R11: 0000000000000000 R12: ffffffff81a73118
R13: ffffffff81a73118 R14: ffff880000120000 R15: ffff88008665a580
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
--- ---
#7 [ffff880000122fc8] page_fault at ffffffff816ac52d
#8 [ffff880000123048] spurious_fault at ffffffff816afd8e
#9 [ffff8800001230a0] __do_page_fault at ffffffff816b01ae
#10 [ffff880000123100] do_page_fault at ffffffff816b0325
#11 [ffff880000123130] page_fault at ffffffff816ac548
[exception RIP: spurious_fault+48]
RIP: ffffffff816afd8e RSP: ffff8800001231e8 RFLAGS: 00010002
RAX: 0000000000000ff8 RBX: 0000000000000000 RCX: ffffffff816ac2ac
RDX: 00001fffffffffff RSI: ffffffff81a73118 RDI: 0000000000000000
RBP: ffff880000123208 R8: ffffffff81911167 R9: ffffea00002e7b80
R10: ffffea00002e7b80 R11: 0000000000000000 R12: ffffffff81a73118
R13: ffffffff81a73118 R14: ffff880000120000 R15: ffff88008665a580
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#12 [ffff880000123210] __do_page_fault at ffffffff816b01ae
#13 [ffff880000123270] do_page_fault at ffffffff816b0325
#14 [ffff8800001232a0] page_fault at ffffffff816ac548
[exception RIP: spurious_fault+48]
RIP: ffffffff816afd8e RSP: ffff880000123358 RFLAGS: 00010002
RAX: 0000000000000ff8 RBX: 0000000000000000 RCX: ffffffff816ac2ac
RDX: 00001fffffffffff RSI: ffffffff81a73118 RDI: 0000000000000000
RBP: ffff880000123378 R8: ffffffff81911167 R9: ffffea00002e7b80
R10: ffffea00002e7b80 R11: 0000000000000000 R12: ffffffff81a73118
R13: ffffffff81a73118 R14: ffff880000120000 R15: ffff88008665a580
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000

Output from the “vm” command:
PID: 1420 TASK: ffff8800bae91fa0 CPU: 3 COMMAND: "splunkd"
MM PGD RSS TOTAL_VM
ffff88008665a580 ffff8800b87f4000 324076k 864244k
VMA START END FLAGS FILE
ffff8800a5efe5e8 563d22303000 563d248b7000 8000875 /splunk/splunk/bin/splunkd
ffff8800a5efe438 563d248b7000 563d24964000 8100873 /splunk/splunk/bin/splunkd
ffff8800a5efe288 563d24964000 563d249db000 8100073
ffff880097e90438 7f4f8d800000 7f4f8ea00000 8200073
ffff8800a1f6cd80 7f4f90400000 7f4f91c00000 8200073
ffff8800a1f6ca20 7f4f923f7000 7f4f923f8000 8100070
ffff8800a1f6c948 7f4f923f8000 7f4f925f8000 8100073
ffff8800a1f6c6c0 7f4f925f8000 7f4f925f9000 8100070
ffff8800a1f6c798 7f4f925f9000 7f4f927f9000 8100073
ffff8800a1f6cca8 7f4f927f9000 7f4f927fa000 8100070
ffff8800a1f6c870 7f4f927fa000 7f4f929fa000 8100073
ffff88009ce66948 7f4f935ff000 7f4f93600000 8100070
ffff88009ce66a20 7f4f93600000 7f4f93a00000 8100073
ffff88009466aca8 7f4f93a00000 7f4f94800000 8200073
ffff88009ca98e58 7f4f949cd000 7f4f949e3000 8000075 /usr/lib64/libresolv-2.17.so
ffff88009ca98d80 7f4f949e3000 7f4f94be3000 8000070 /usr/lib64/libresolv-2.17.so
ffff88009ca99008 7f4f94be3000 7f4f94be4000 8100071 /usr/lib64/libresolv-2.17.so
ffff88009ca990e0 7f4f94be4000 7f4f94be5000 8100073 /usr/lib64/libresolv-2.17.so
ffff88009ca98f30 7f4f94be5000 7f4f94be7000 8100073
ffff88009ca98bd0 7f4f94be7000 7f4f94bec000 8000075 /usr/lib64/libnss_dns-2.17.so
ffff88009ca98af8 7f4f94bec000 7f4f94deb000 8000070 /usr/lib64/libnss_dns-2.17.so
ffff88009ca98ca8 7f4f94deb000 7f4f94dec000 8100071 /usr/lib64/libnss_dns-2.17.so
ffff88009ca991b8 7f4f94dec000 7f4f94ded000 8100073 /usr/lib64/libnss_dns-2.17.so
ffff88009ca98798 7f4f94ded000 7f4f94df9000 8000075 /usr/lib64/libnss_files-2.17.so
ffff88009ca986c0 7f4f94df9000 7f4f94ff8000 8000070 /usr/lib64/libnss_files-2.17.so
ffff88009ca98948 7f4f94ff8000 7f4f94ff9000 8100071 /usr/lib64/libnss_files-2.17.so
ffff88009ca98a20 7f4f94ff9000 7f4f94ffa000 8100073 /usr/lib64/libnss_files-2.17.so
ffff88009ca98870 7f4f94ffa000 7f4f95000000 8100073
ffff8800a1f6ce58 7f4f95000000 7f4f95600000 8200073
ffff880036735cb0 7f4f957ef000 7f4f957f0000 8100070
ffff880036735878 7f4f957f0000 7f4f959f0000 8100073
ffff880036735a28 7f4f959f0000 7f4f959f1000 8100070
ffff880036734af8 7f4f959f1000 7f4f95bf1000 8100073
ffff880036735d88 7f4f95bf1000 7f4f95bf2000 8100070

So now that means I will definitely hold off upgrading my production servers as if this is happening on my personal one, then I can only imagine what would happen to larger instances. It could also be a result of me being a fanboy and installing the .0 release of software, which any good admin will tell you “just say no to .0”.

Raspberry Pi’ing

Recently I decided I needed a better way to monitor the temperature and humidity in various parts of my house. The main reason was the thermostat for the house is located in a hallway that is more closed in than anything. So while the thermostat may show that it is 75 degrees in the house the rest of the house my only have been 70 degrees or less. After the winter we have had, I also needed a good way to monitor the humidity in the house. The only was I was able to do it was with a little Oregon Scientific thermometer I bought at Target. But the problem with this was it was only for one room, didn’t seem to be very accurate and I had no way of logging the values over a time period.

In comes the Raspberry Pi, along with a DHT22 temperature/humidity sensor and Splunk, I can now monitor, record and graph in realtime the temp and relative humidity in various parts of the house (and the outside).

What I got was this:

  1. 3 x Raspberry Pi 2 Canakits from Amazon.com
  2. 5 x DHT22 Digital Sensors from Amazon.com
  3. 1 x DHT11 Digital Sensor from Amazon.com

Now the DHT11, was what I purchased in the first round along with just one of the Raspberry Pi’s. It is not as sensitive as the DHT22’s, but since it was just for the original test it was ok for what I needed. The second round I bought the other two Raspberry Pi’s and the 5 DHT22 sensors.

What I intend to do is use some of the pre-existing CAT5 runs through the house to wire the DHT22’s in to and then have the other end of the CAT5 runs connect in to a Raspberry Pi in the Garage. This way I can do multiple sensors on one device versus having a device in every room.

 

Some of the benefits of getting the Raspberry Pi Canakits I got are:

  1. A clear case is included with the correct cut outs for the raspberry pi.
  2. A USB wifi dongle is included, and the drivers are pre-loaded in the OS.
  3. It comes with a pre-loaded 8GB microSD card.
  4. It comes with a miniature breadboard with a 40 pin cable and breakout board that plugs perfectly in to the breadboard.
  5. Comes with various resistors and led’s and pushbuttons.
  6. Has a HDMI cable included, which made it easy to hook in to my monitor
  7. Various jumper cables for the breadboard

 

Overall, I would say that the total time to get a base monitor up and running is a few minutes. But this is based of me already having Splunk, the network, dhcp, dns, etc already set up. So I am going to detail the basic steps I used to get it up and running:

  1. Unbox the raspberry pi, place the heatsinks on the two “large” chips on the top side, and then place it in the clear case.
  2. Hook up the HDMI, keyboard, mouse, and WIFI dongle.
  3. Insert to the microSD card
  4. Hook up the USB power cable and watch it boot NOOBS.
  5. Once it is booted, select the Raspbian to install. This probably takes the longest of all the steps to do, as it is expanding the operating system on to the microSD card.
  6. Once this is done, it will reboot and bring up a text based config. I set the hostname, enable ssh, set the timezone and finally set the locale.
  7. At the login prompt, you can log in with the userid pi and the password of raspberry.
  8. Next to set up the network, if you are using the ethernet, then it should already have an IP address if you have DHCP running on your network. If you are using the WiFi dongle, then edit the /etc/wpa_supplicant/wpa_supplicant.conf  file as root and put the following in it:
     network={
    ssid="YOURWIRELESSSSID"
    psk="YOURWIRELESSPASS"
    }

    Where YOURWIRELESSSSID is the SSID of the AP you want to connect to and the PSK value is the password for that SSID/AP. (If you are doing MAC filtering, you can get the MAC address by running ifconfig -a as root and look at the wlan0.

  9. Once you save the file in the item above, issue the following commands:
    wpa_action wlan0 stop
    ifup wlan0
    ifconfig -a
    
  10. By now if everything is working correctly you should have a IP address and network connectivity. You can use wpa_cli status to verify the network connectivity.
  11. Now that the network is up and running I needed to download some software:
    sudo su -
    apt-get update
    apt-get upgrade
    apt-get install python-dev
    git clone git://github.com/adafruit/Adafruit-Raspberry-Pi-Python-Code.git
    wget http://www.airspayce.com/mikem/bcm2835/bcm2835-1.42.tar.gz
    
  12. Now that we have the software downloaded it is time to do some little compiling:
    tar -zxvf bcm2835-1.42.tar.gz
    cd bcm2835-1.42
    ./configure
    make
    make install
    

    That should install the driver for the bcm2835 chip.

  13. Next we need to do the python code setup:
    cd Adafruit-Raspberry-Pi-Python-Code
    cd Adafruit_DHT_Driver_Python
    python ./setup.py install
    
  14. At this point the code should be done. You can now power down (shutdown -h now) the Raspberry Pi and hook in the DHT22 sensors. (Make sure to disconnect the power before connecting the 40 pin cable.
  15. The way I hooked the sensor in for testing was to connect the 40 pin cable to the Raspberry Pi and the other in to the breakout board which was attached to the mini breadboard. Once that was done I hooked a jumper from 3.3 V to the first pin on the DHT22. Then placed a 10K resistor between another 3.3V connection and the second pin. In addition a jumper was ran from GPIO4 to the second pin of the DHT22. The third pin is left unconnected and the forth pin is connected to Ground. I will post a picture later.
  16. Once everything is connected, power the Pi back up and log in and switch to the root account.
  17. Next to see if everything is working change in to the Adafruit-Raspberry-Pi-Python-Code/Adafruit_DHT_Driver_Python directory.
  18. Then run python ./Adafruit_DHT.py 22 4. The 22 is the type of the sensor, so if you are using a DHT11 use 11, if a DHT22 use the 22. The number 4 is the GPIO port that the sensors data pin is connected to. Once you run it you should see something like this:
    root@rpi2:~/Adafruit-Raspberry-Pi-Python-Code/Adafruit_DHT_Driver_Python# python ./Adafruit_DHT.py 22 4
    using pin #4
    Temp = 20.2999992371 *C, Hum = 40.4000015259 %
    
  19. In the above, we can see that the Temp is 20.29C and the Humidity is 40.40%. If you want the Temp outputted as Fahrenheit, like I did, make a copy of the Adafruit_DHT.py file (for a backup) and then add a new line at line 37 with the following:
    tf = (( t * 9 ) / 5.0 ) +32;
    

    Then on line 39, you will want to change the *C to *F, and then in the format(t,h) you will want to change the t to a tf, so the line would look like this now:

    print("Temp = {0} *F, Hum = {1} %".format(tf,h))
    
  20. Now if you re-run, it will look like this:
    root@rpi2:~/Adafruit-Raspberry-Pi-Python-Code/Adafruit_DHT_Driver_Python# python Adafruit_DHT-f.py 22 4
    using pin #4
    Temp = 68.1800006866 *F, Hum = 40.0 %
    
  21. Now that we have the data being output in the format we like, the only thing left was to log it. What I did was create a shell script that is run by cron every minute (* * * * *) and it outputs the values to a log file called /var/log/temp+humid.log. This log file is then pulled in by Splunk for graphing and other fun stuff that will be another post.
  22. The script I wrote looks like this:
    #!/bin/bash
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    export PATH
    RESULTS="`python /root/TempLogger/Adafruit_DHT-f.py 22 4 | grep Temp `"
    TEMP="`echo ${RESULTS} | awk '{print $3}'`"
    HUMID="`echo ${RESULTS} | awk '{print $7}'`"
    DATE="`date \"+%Y-%m-%d %H:%M:%S\"`"
    echo "${DATE} ROOM=FamilyRoom TEMP=${TEMP} RH=${HUMID}" >> /var/log/temp+humid.log
    
  23. The output that gets logged looks like this:
    2015-03-17 21:44:02 ROOM=FamilyRoom TEMP=68.1800006866 RH=39.7000007629
    

 

Some times, and I haven’t figured out why yet, it will log null values for the TEMP and RH. I need to add some more checking in to the script to make it more robust, but for now it is working.

In the next post I will cover what I do with the data in Splunk, and how I get the outside temps from the local airport and add them to Splunk as well.