Track your (Synology) NAS when it’s stolen

When a friend of mine recently got his MacBook stolen, I quickly verified if I had installed Prey Project on each laptop/desktop PC I have. For those who do not know Prey:

Prey lets you keep track of your phone or laptop at all times, and will help you find it if it ever gets lost or stolen. It’s lightweight, open source software, and free for anyone to use. And it just works.

Yes, I had Prey running on each PC. And then I looked at my Synology NAS (DS410, 4 disks, 8TB raw storage). It could be stolen too. And it’s basically a Linux box. And Prey is available for Linux …

So I figured out how to install Prey on a Synology box:

  1. login via ssh as root
  2. install the ipkg/’Bootstrap’ module on your NAS server – (from forum.synology.com) and this is a list of the right bootstraps for the right Synology model.
  3. install bash shell – “ipkg install bash” (from forum.synology.com)
  4. install textutils – “ipkg install textutils” (from forum.synology.com)
  5. goto /usr/share and download the latest Linux version of Prey (wget http://preyproject.com/releases/...linux.zip ) and unzip it
  6. create an account on Prey and get your API key from your Account profile.
  7. create a new device (e.g. ‘NAS8TB (Syn410)’), indicate OS as Debian (it’s close enough) and get the device key.
  8. edit the /usr/share/prey/config file and fill in the API and device key
    # you can get both of these from Prey's web service
    api_key='yyyyyyyyyy'
    device_key='xxxxxx'
  9. now run the “bash /usr/share/prey/prey.sh” a first time – you should get a “-- Got status code 200! -- Nothing to worry about. :) -- Cleaning up!” response.
  10. now edit /etc/crontab and add a line
    5-55/20 * * * * root /opt/bin/bash /usr/share/prey/prey.sh >  /usr/share/prey/lastrun.log
  11. Now restart crontab in the following (non-standard-Linux) way (from forum.synology.com):
    /usr/syno/etc.defaults/rc.d/S04crond.sh stop
    /usr/syno/etc.defaults/rc.d/S04crond.sh start
  12. And it’s running! When your Synology is stolen, you set its status in your Prey account to ‘Missing’ and you will start getting email reports every 20 minutes. Because it’s a NAS, there is no webcam and no screenshots can be taken, but the external IP address will let you see where the device turns up.
    Remote IP: 78.29.245.xxx
    Private IP: 192.168.0.108
    Gateway IP: 192.168.0.1
    MAC Address: xx:xx:xx:xx:xx:xx

Would this work on a QNAP server? I’m guessing, yes.

Redirecting with Apache’s .htaccess

When you migrate web sites from one place to another, and the URLS change, you don’t want to lose visitors that still use the old links. If your ‘old’ website ran on Apache, you can use its mod_alias/mod_rewrite functionality to automatically redirect to the new URL. This involves adding redirect rules to the .htaccess file in the base folder of the redirects. Some examples:

Generic structure of the .htaccess redirects

Redirect permanent /(old url) (new url)
Redirect ... (add all your one-2-one redirects here)
RedirectMatch permanent ^/old_stuff/.*html$ http://www.example.com/
RedirectMatch ... (add your catch-all redirects here)

RewriteEngine on
RewriteBase /blog/
RewriteRule ^([regex])$ http://blog.example.com/$1 [R,L]
RewriteRule ... (add all your variable redirects here)

EXAMPLE: old Blogger site (on your own server) to new WordPress site
I’ve done a migration from a blog published by Blogger (via FTP) onto my own webspace, to a blog run by WordPress. I’ve used the following Rewrite rules to handle the redirections.
* HOMEPAGE:
redirect /index.html and / to your new blog URL
Redirect permanent / http://blog.example.com/
Redirect permanent /index.html http://blog.example.com/

* FEED:
redirect e.g. /atom.xml to your Feedburner feed
Redirect permanent /atom.xml http://feeds.feedburner.com/(exampleblog)

* ARCHIVES:
redirect e.g. /archive/2005_03_posts.html to the new WordPress archives
RedirectMatch permanent /archive/([0-9][0-9][0-9][0-9])_([0-9][0-9])_.*$ http://blog.example.com/$1/$2/

* POST PAGES:
This is tricky, because Blogger and WordPress do not use exactly the same rules for constructing the text-like URL (the ‘post slug’). E.g a post called how-to-podcast-with-blogger-and.html on my old Blogger site became how-to-podcast-with-blogger-and-smartcast/ on the new WordPress one. So what I did consisted of 2 type of rules:
a) redirecting individual pages
Redirect permanent 2004/10/how-to-podcast-with-blogger-and.html http://blog.example.com/2004/10/how-to-podcast-with-blogger-and-smartcast/
b) a generic rule for the others (this uses Rewrite instead of RedirectMatch!): each page is redirected to a search on the WordPress blog within the correct month with the two first words of the title:
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/([a-z0-9]*)-([a-z0-9]*).*$ http://blog.example.com/$1/$2/?s=$3+$4 [R,L]
This method is far from perfect, but will bring visitors a lot closer to the right page. If you use pretty distinctive words for titles (e.g. “Myspace: bulletin and other spam“), chances are the right page show up first. If you start all your posts with “The ten best ways to …” then you will need a more sophisticated rule; e.g. using the 6th and 7th word:
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-([a-z0-9]*)-([a-z0-9]*).*$ http://blog.example.com/$1/$2/?s=$3+$4 [R,L]

Not losing the querystring
Redirect and RedirectMatch cannot redirect to a URL with a querystring (e.g. to newpage.php?param1=val1¶m2=val2). For that you will need to use the RewriteRule. An example: redirect all links like test.asp?param=value on the old domain to the new domain while keeping all querystring parameters:
RewriteRule ^tools/test.asp??(.*)$ http://web.example.com/tools/test.asp [L,QSA]
where the QSA = (query string append) keep existing querystring, and L = (last rule) stop looking further for rule matches.

Convert Bind DNS zone into PTR records

The following script I made in order to convert the forward DNS records in a /var/named/db.[domain] file into the correct format for a reverse DNS db.[subnet prefix] file.

#!/bin/sh
(...)
DNSROOT=/var/named
PREFIX=$1
DOMAIN=$2
shift 2
DNSPRE=$DNSROOT/db.$PREFIX
DNSDOM=$DNSROOT/db.$DOMAIN
echo "; save this in $DNSPRE"
(
if [ -f $DNSDOM ] ; then
cat $DNSDOM
| grep $PREFIX
| grep -w "A"
| sed "s/$PREFIX.*//g"
| gawk "BEGIN {OFS = "t" ;} {print $4,"IN","PTR",$1 ".$DOMAIN.",";; FROM `basename $DNSDOM`" }"
fi

if [ -f $DNSPRE ] ; then
cat $DNSPRE
| grep -w "PTR"
| gawk "BEGIN {OFS = "t" ;} {print $1,$2,$3,$4,";; FROM `basename $DNSPRE` "; }"
fi )
| sort -n
| uniq --check-chars=3

You would call it as follows:
revdns.sh 192.168.110 internal.example.com > new.db.192.168.110 and then replace the records of the original db.192.168.110 with the records of the new file. The script still requires manual intervention (you cannot pipe the result straight into a live Bind config file) but saves a lot of typing!

Example of the output:

201 IN PTR james.internal.example.be. ;; FROM db.internal.example.com
202 IN PTR wilbur.internal.example.be. ;; FROM db.internal.example.com
216 IN PTR appprd1.internal.example.com. ;; FROM db.192.168.110
217 IN PTR appprd2.internal.example.com. ;; FROM db.192.168.110
218 IN PTR appprd3.internal.example.com. ;; FROM db.192.168.110
219 IN PTR appprd4.internal.example.com. ;; FROM db.192.168.110
220 IN PTR appprd5.internal.example.com. ;; FROM db.192.168.110
221 IN PTR appprd6.internal.example.com. ;; FROM db.192.168.110

Installing NTP (time synchronisation)

Set timezone (optional)
create symbolical link from /usr/share/zoneinfo/... to /etc/localtime:
ln -sf /usr/share/zoneinfo/Europe/Brussels /etc/localtime
Set UTC mode (optional)
if your hardware clock runs in UTC (Universal Coordinated Time) mode, add
UTC=true
to the /etc/sysconfig/clock file
Make sure ntpd is not running
Use service ntpd stop to stop it.
Choose the NTP server you will get your time from
it can be an internal server that has the NTP service open for clients, or an public NTP server. To be sure, use 2 servers. To check if you can access it, run ntpdate timeserver.ntp.ch
Edit the /etc/ntp.conf file
Rename the current file to ntp.bak.conf and make a small new one:
restrict default ignore
server timeserver.ntp.ch # Swiss time
server ntp.ucsd.edu # Univ of California, San Diego
restrict timeserver.ntp.ch mask 255.255.255.255 nomodify notrap noquery
restrict ntp.ucsd.edu mask 255.255.255.255 nomodify notrap noquery
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10 #so it only takes over if the rest fails
restrict 127.0.0.1 driftfile /etc/ntp/drift broadcastdelay 0.008 authenticate no
Set your system clock right
Run the following command a couple of times:
ntpdate -u timeserver.ntp.ch # or whatever server you want to use
You will see the initial diffence in time go away afer the 2nd or 3rd time.
Set hardware clock
/sbin/hwclock --systohc
Run the ntpd daemon
service ntpd start
Add ntpd to the services started at boot time
chkconfig ntpd on
Check the NTP results
ntpd -p
will show you what the difference is between your clock and that of the servers you added. You are looking for lines like

remote refid st t when poll reach delay offset jitter
==========================================================================
LOCAL LOCAL 10 l 30 64 377 0.000 0.000 0.004 *
192.168.246.107 192.168.246.88 3 u 41 128 177 0.313 5.598 0.345

and not lines like

remote refid st t when poll reach delay offset jitter
==========================================================================
192.168.246.126 LOCAL 11 u 37 128 375 0.204 6082.02 6069.84

Jitter is too high!

Perl HTML scraping part #1

Here we are, back at the scene of the crime. Yes, I know it’s been a while. And the task of the day is:

GOAL:
make an HTML scraper, i.e. a script that grabs another URL and outputs the results to the screen
TOOL:
let’s say … Perl (in my case: Perl 5.8 on RedHat)
INPUT:
a URL
OUTPUT:
the HTML code of that URL

The actual HTML retrieval is easy: you need get() from the LWP::Simple module:
use LWP::Simple;
my $page = get($url);

Some remarks:

  • Since you are generating a web page, you need the CGI module (to take care of the HTTP headers and stuff).
  • The URL input parameter will be given as an HTTP querystring: ?url=http://www.example.com/path/page.htm. When no url parameter given, we will generate a form where it can be filled in.
  • We calculate the time it takes to retrieve the original page
  • #!/usr/bin/perl -w
    use strict;
    use CGI qw(:standard);
    use LWP::Simple qw(!head);my $query = new CGI;
    my $url = $query->param('url');
    my $debug = 0;

    print header();
    if(length($url) > 0) {
    print getpage($url);
    } else {
    showform();
    }

    sub getpage{
    my $url = shift;
    my $time1 = time();
    debuginfo(“Scraping <a target=_blank href='” . $url . “‘>link</a> …”);
    my $page = get($url);
    my $time2 = time();
    debuginfo(“Time taken was <b>” . ($time2 – $time1) . “</b> seconds”);
    debuginfo(“Total bytes scraped: <b>”. length($page)/1000 . “KB</b>” );
    return $page;
    }

    sub debuginfo{
    if ($debug > 0) {
    my $text = shift;
    print “<small>” , $text , “</small><br />n”;
    }

    }

    sub showform{
    print(“<html><head>”);
    print(“<title>SCRAPER</title>”);
    print(“<link rel=stylesheet type=text/css href=http://www.forret.com/blog/style.css>”);
    print(“</head><body><center>n”);
    print(“<form method=GET action=’scrape.pl’>”);
    print(“URL: <input name=url type=text size=60 value=http://www.forret.com>”);
    print(“<input type=submit></form>n”);
    print(“</center></body></html>n”);
    }

    Next step: making sure image src= and hyperlink href keep on working (so convert relative links to absolute links!).

    Squid cachemgr.cgi UI hack

    Squid has a little system statistics viewer built-in:

    The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server.
    (from Squid FAQ)

    The only thing is … it’s so ugly! It uses plain HTML and cannot be customized, the FAQ says. However, there is a way to do it:

    1. copy cachemgr.cgi to cachemgr2.cgi so if you do something wrong, the original is not lost.
    2. open the CGI file in a text-editor. I used vi, but if you’re not used to working with it, use something else (emacs?).
    3. in the binary file, look for some text portions that look like HTML code
    4. while keeping in mind that the # of characters should remain the same, change the <title> and <style> to something that suits you. You will have to do this at 2 locations in the file: one for the homepage template and one for the other pages’ template.
    5. suggestion: just let the CGI use a style.css file that you drop into the same folder.
      <link rel="stylesheet" type="text/css" href="style.css" mce_href="style.css" /> and fill up with spaces to keep the same # characters
    6. verify that the cachemgr and the cachemgr2 have the same # bytes
    7. now use cachemgr2 to display your statistics.
    8. I did something a bit different (I wanted to use the CSS of my own website), so I ‘ll show you the difference between the two versions.
      In order to get to the following comparison, I did a strings cachemgr.cgi > cachemgr.txt to extract only the text parts, and I did a diff cachemgr.txt cachemgr2.txt to compare both files. You cannot do a file comparison of 2 binary files.

      <em>173,174c173,174</em>
      < <HTML><HEAD><TITLE>Cache Manager Interface</TITLE>
      < <STYLE type="text/css"><!-- BODY{background-color:#ffffff;font-family:verdana,sans-serif} --></STYLE></HEAD>
      ---
      > <HTML><HEAD><TITLE>Cache Manager (pforret)</TITLE>
      > <link rel="stylesheet" type="text/css" href="http://www.forret.com/forret/forret.css" mce_href="http://www.forret.com/forret/forret.css" /> </HEAD>
      <em>199c199</em>
      < <STYLE type="text/css"><!-- BODY{background-color:#ffffff;font-family:verdana,sans-serif} TABLE{background-color:#333333;border:0pt;padding:0pt}TH,TD{background-color:#ffffff}--></STYLE>
      ---
      > <link rel="stylesheet" type=text/css href="http://www.forret.com/forret/forret.css" mce_href="http://www.forret.com/forret/forret.css"><!-- TABLE{background-color:#333333;border:0pt;padding:0pt} TH,TD{background-color:#ffffff}--></STYLE>

      Probe disk performance (MRTG)

      The hdparam can be used to monitor the throughput speed of a hard disk:
      # <strong>hdparm -tT /dev/hda</strong>
      /dev/hda:
      Timing buffer-cache reads: 888 MB in 2.00 seconds = 444.00 MB/sec
      Timing buffered disk reads: 20 MB in 3.30 seconds = 6.06 MB/sec

      This would be an interesting performance metric to see plotted against time. So let’s convert it to a format ready for MRTG.

      • The only numbers we need are the last ones: resulting speed. This can be parsed from the output as follows:
        #hdparm -tT /dev/hda | gawk -F = "/seconds/ { print $2}" 

        440.00 MB/sec   3.30 MB/sec
      • if we could suppose that the results will always be in “MB/sec”, we could parse out the numbers with
        (...) | gawk "{print $1}"
        and then add a line to our MRTG config files to adjust the units:
        kMG[_]: M,G,T,P,X
        But let’s say that KB/sec or GB/sec speeds are possible.
      • One gawk can do the conversion trick:
        #(...) | gawk "/GB/ {print $1*1000000000} /MB/ {print $1*1000000} /KB/ {print $1*1000}" 

        440000000 3300000
      • To have a complete MRTG-ready output, we also add the boot time on line 3 and the name of the MRTG output on line 4
      • Q: Do we need 2 gawks one after the other? Can’t one do it?
        A: You could do it in 1, I guess, but the parsing would be more complex. I use 2 because the FS (field separator) changes: the first gawk uses the ‘=’ character, the second uses the normal whitespace.

      Date formatting in GAWK: boot time

      I have one server with apparently an exceptional stability:
      # uptime

      3:45pm  up 524 days,  1:22,  1 user,  load average: 0.44, 0.16, 0.13

      Unfortunately I know this is not correct (I remember rebooting it some weeks ago). So what are other ways to get the date/time of the last boot?

      Looking at the RedHat manuals, the following thing should work too:
      # cat /proc/stat
      cpu 33813143 210619911 30093342 59435750
      cpu0 33813143 210619911 30093342 59435749
      (...)
      btime 1096157569
      (...)

      The btime gives us the last boot time in seconds since 1 Jan 1970. I can find and convert it with gawk:
      # gawk "/btime/{ print (`date +%s` - $2) / (3600 * 24.0) ,"days -",strftime("%a %b %d %H:%M:%S %Z %Y",$2)}" /proc/stat
      38.6473 days - Sun Sep 26 02:12:49 CEST 2004

      Which gives us an uptime of 38,6 days – that looks more like it!

      Another way of calculating the uptime:
      # gawk "/cpu/ {print $1,($2 + $3 + $4 + $5)/(3600 * 24 * 100)}" /proc/stat
      cpu 38.6515
      cpu0 38.6515

      Confirmation of the previous measurement!

      # cat /proc/uptime
      45282758.17 663091.26

      The first number is the # of seconds since last boot. The other one (idle time) we don’t need. What is that in days?
      # gawk "{print $1/(3600 * 24.0)}" /proc/uptime
      524.106

      This is where the wrong data is coming from! So I’ll ignore this data.

      Remark: This server is one of my oldest ones and is still running Redhat 7.2 (Enigma). Looks like this bug was fixed in later versions of RedHat, since none of my other servers have it.

      Probe average cpu utilisation (MRTG)

      There are two main tools to keep track of your CPU usage: top and vmstat.

      • top is an interactive tool: it shows you the CPU usage of each process, as well as overall statistics, updated every 5 seconds. It’s good for hands-on checking.

        #top 17:18:34 up 2 days, 8:14, 3 users, load average: 0.00, 0.00, 0.00
        47 processes: 46 sleeping, 1 running, 0 zombie, 0 stopped
        CPU states: 0.1% user 0.1% system 0.0% nice 0.0% iowait 99.6% idle
        Mem: 1030872k av, 1022256k used, 8616k free,
        0k shrd, 104844k buff
        777088k actv, 12k in_d, 22296k in_c
        Swap: 2048276k av, 8120k used, 2040156k free
        640080k cached
        PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
        30776 root 19 0 1140 1140 852 R 0.9 0.1 0:00 0 top
        1 root 15 0 504 464 436 S 0.0 0.0 0:03 0 init (...)

        But say you want to get just one number (percentage) back, so you can use it for logging.
      • vmstat wil give you the following output:

        #vmstat
        procs memory swap io system cpu
        r b w swpd free buff cache si so bi bo in cs us sy id
        0 0 0 7964 8804 104712 640224 0 0 2 16 129 27 0 0 100

        You can run vmstat 1 5 to get 5 consecutive measurements (1 second apart). The number we want is the average CPU usage, or (100% – idle). The following command will do the job:
        #vmstat 1 5 | gawk "/0/ {tot=tot+1; id=id+$16} END {print 100 - id/tot}"
        gives
        0.4

      Estimate # of lines in a log file

      Let’s say you need an (approximate) count of the number of lines in a huge file. The most obvious way of calculating this would be using wc, but this actually can be quite slow:
      # time wc -l /var/log/squid/access.log
      2812824 /var/log/squid/access.log
      real 0m43.988s

      (counting is done at 64.000 lines/sec)

      Running wc without the -l (only count lines) would be ever slower because it would also count the words, instead of just the LF (linefeed) characters. But using wc -c is very fast! This is because the filesystem keeps track of each file’s filesize (= number of characters/bytes), so the file does not even have to be read to give this number. Can we estimate the # of lines from the # of bytes?

      For the type of file we are talking about here (a Squid log file) there actually is a way. The file is more or less ‘square’, meaning that every line is about the same length (it contains date, status, URL, …).
      If we take the beginning of the file (the first 10000 lines):
      # head -10000 /var/log/squid/access.log | wc
      10000 100000 1775257

      we see that every line is about 177 chars long.

      The end of the file (the last 10000 lines):
      # tail -10000 /var/log/squid/access.log | wc
      10000 100000 2047887

      gives us a number of 204 chars/line.

      Let’s take some more data and combine both:
      # ( head -50000 /var/log/squid/access.log ; tail -50000 /var/log/squid/access.log ) | wc
      100000 1000000 19488905

      which gives us an average of 195 chars/line.

      A file size of 533.229.920 bytes (533MB) would lead us to estimate the # of lines to 2.734.512, where the actual # of lines is 2.818.184 (3% difference). That is: we lose 3% accuracy but the calculation takes almost no CPU time, instead of 45 seconds. This might be a trade-off you are willing to accept!