I was running out of space on a server the other day and was wondering what directory was full and what files I could clean up. There is a simple command that is helpful to identify usage by directory. This works from within any directory so you can walk the path from / down through your largest directories and look for files to clean up.
We use PostgreSQL for SpatialKey and because of that I have learned a lot about system administration with PostgreSQL. While debugging an issue with tables that had ballooned up I ran into this article which has some great tips for isolating db and table size with a few quick commands. http://www.thegeekstuff.com/2009/05/15-advanced-postgresql-commands-with-examples/
In the video we talk about Universal Mind, how SpatialKey came to be, a lot about what SpatialKey is and how it works. I dive into the architecture a bit and how we leveraged the cloud to build out our infrastructure.
Thanks to Jon and James hosting us, we crashed Jon's house for the video. I think Doug tossed back the most beers I can't keep up with him and still talk.
This issue drove me nuts today for several hours today, to be honest I was pretty close to breaking something! Earlier in the day I had the NFS mount working fine then I created an AMI and booted up another instance from the newly created AMI but in the new instance the mount kept failing. The error looks something like this:
[root@server]# mount -t nfs 192.168.2.1:/dbshare /mnt/dbshare
mount: 192.168.2.1:/dbshare failed, reason given by server: Permission denied
According to the error you would think that I have a configuration issue so I changed everything that I could think of within /etc/exports
My /etc/exports originally looked like this:
/dbshare 192.168.2.2(rw,sync) (where 192.168.2.2 is the client where I am performing the mount)
I changed it to something more open like this with no luck:
/dbshare 192.168.0.0/255.255.0.0(rw,sync)
I started looking around the logs on the server in /var/log/messages and found that it was authenticating fine Jun 11 19:04:00 servername mountd[5222]: authenticated mount request from 192.168.2.2:736 for /dbshare (/dbshare)
I was really frustrated at this point and I had already spent an hour on Google looking for the answer. I found another answer but the website was down, luckily the cached version on Google came to the rescue.
The ANSWER:
The problem was that the special nfsd file system that mounts to /proc/fs/nfsd wasn't mounted. I'm not sure how it gets mounted (maybe rc.sysinit does it?), but I tool the advice from the forum entry and added an entry to /etc/fstab
none /proc/fs/nfsd nfsd auto,defaults 0 0
then ran mount -a
After this the mount worked fine. I hope that someone finds this helpful.
I often need to copy code from a SVN repository to a location where I have files deployed or I may pull code from an SVN repository on the web and want to check it in locally. Both of these actions require the stripping out of all .svn directories spread throughout the directory structure. On Linux or Mac this is pretty simple with the following command
One challenge I have encountered with Amazon Ec2 is the sending of email from our web applications. If you try to send directly from sendmail or postfix then you might as well forget about guranteed delivery. A large amount of your email will end up in spam folders if it is even delivered at all.
There are a few problems with delivering email from Ec2
MX record will not map to your IP and you are using dynamic IPs (you can address this with elastic IP's) adding a SPF DNS record can help as well
Reverse DNS will map back to Amazon and not back to your Hostname
Many of the Ec2 IP's have been blacklisted due to abusers of the service sending spam.
There are a few solutions to this problem and I will propose two of them.
Using a google apps account:
If you are using a solution like Google Apps and have them host your email accounts then you can use gmail as your SMTP server. You will need to create an account donotreply@mydomain.com and use authentication in your applications to send the email. With Google apps you cannot override the "from" address when you send email it wil always become whatever you account you are sending from. For example if you create the account donotreply@mydomain.com and attempt to set the "from" in your code to send from brandon@mydomain.com google will override it and send from donotreply. The only option is to set "replyto" in your code and when a user replys they will send to your replyto account. With google apps you are limited to a maximum amount of 500 emails a day per account as well and if you are sending a lot of emails this can quickly become a problem. This is a great solution for small volumes of email and you delivery rates are very good.
Relay from localhost through a third party:
This blog post outlines a set of steps to relay through a local Postfix instance to a third party SMTP service. The great thing about this solution is that you can send email from your application to localhost without storing the authentication parameters in your applications code and have Postfix handle it all. If you have many applications sending email this can greatly simplify things. Also it allows your application to hand off the emails quickly to another service that can handle queuing in case the third party email service is down at any time. You could combine the approach above with this one but you would still have the 500 email limitation. I am searching for a good third party SMTP service that is reliable, the author of the blog post recommends AuthSMP. I have not tried them and their prices are not too high but not cheap either. I am going to do a little more digging and test some of the options and will report back to this posting.
This is common knowledge if you have been using Linux for a while but I still find it a helpful resource to understand how you set what programs are running when Linux starts. This is mainly specific to Red Hat or CentOS which I use on a regular basis.
Running level
Running level is the current running functional level of the operation system, from level 1 to 6, possessing different functions.
Here are the different running levels:
These levels are specified in the file /etc/inittab., which is the main file that the program init looks for, and the first running service is placed under the directory /etc/rc.d. For most Linux releases, startup scripts are all located in /etc/rc.d/init.d, which are all linked to the directory /etc/rc.d/rcn.d by ln command (here the n is the running level 0-6).
Setting services/applications to run at startup using chkconfig
chkconfig command (under redhat and centos)
Linux provides the command chkconfig to update and query system services of different running levels allowing you to set when certain process are started.
Syntax:
chkconfig --list [name]
chkconfig --add name
chkconfig --del name
chkconfig [--level levels] name
chkconfig [--level levels] name
chkconfig has five functions: add service, delete service, list service, change startup info and check the start state of specified service.
Option overview:
--level levels
specify running level, which is a character string composed of the number 0 to 7. For example:
--level 35 means to specify running level 3 and 5.
To stop the service nfs during running level 3,4,5, use the command next: chkconfig --level 345 nfs off
--add name
This option adds a new service, chkconfig ensures every running level an entrance to start (S) or to kill (K). if it is absent, then it would auto establish from default init script.
--del name
To delete service and delete related sign connections from /etc/rc[0-6].d.
--list name
List, if name is specified, then it only displays specified service name, otherwise, to list the state of all service at different running levels.
Usage examples:
As an example if you wanted mysql to run when the os starts you just need to do the following
/etc/init.d/mysqld must exist and needs to be an executable (chmod +x)
Add mysql - chkconfig --add mysqld
setting the start level - chkconfig --level 345 mysqld on
This example applies to any server and to validate that it worked you can use chkconfig --list | grep mysql to see the changes
I had been looking for a simple way to get a thread dump from JBoss to see what was happening on each of the SpatialKey servers without actually logging onto any of them. After all reading through thread dumps is one of my favorite past times. I found a simple way to do that writes an HTML file containing the thread dump and I can access it by hitting the webserver directory at a hidden URL.
The command is rather simple to generate a thread dump:
I found a helpful command today to help me search through many different file in linux. I have a bunch of log files and want to find a certain occurrence of an error.
for i in `find ./`; do grep -H "string to search" $i; done
With SpatialKey we create AMI's that are exact replicas of each other and can be scaled easily all of the persistent content is stored in a EBS volume allowing us to deploy a new instance from a Snapshot and easy backup with Snapshots.
We recently moved our website for SpatialKey onto Amazon Ec2 which has worked great but there has been one items that has bugged me. If I needed to make a few small changes to the site I was using vi to make the inline edits. For larger changes I have scripts to scp the files up to the instance but the workflow for changes has become a pain.
I stumbled upon a great tool called ExpanDrive that allows me to use use sftp, basically ssh to manage my files remotely but leverage my Mac tools to do the editing. ExpanDrive provides a 30 day free trial but it is worth the price. The package runs $39.95. You can find answers to nearly all of your support questions on getsatisfaction.com. Right after installing I ran into an issue and found the answer right away.
If you are using Ec2 then you will probably experience this issue as well. Since Ec2 doesn't use a username/password combo but instead uses a cert/keypair for authentication. You need to use ssh-add to add the keypair then you can use it in ExpanDrive.
To do this open up a terminal session and run the following command - "ssh-add /Users/myusername/myec2keys/id_my_keypair"
Next I set up ExpanDrive with an empty password and it logged right in and mounted the drive on my Mac. I could then open up any of my Mac editing tools to edit on the server.
If you had a chance to read my getting started with Ec2 article I
highlighted some of the challenges with deploying applications on the
cloud. One of these challenges can now be easily overcome based on a
new feature recently provided on Ec2
Elastic IP Addresses:
Elastic IP Addresses are static IP addresses designed for dynamic cloud
computing, and now make it easy to host web sites, web services and
other online applications in Amazon EC2. Elastic IP addresses are
associated with your AWS account, not with your instances, and can be
programmatically mapped to any of your instances. This allows you to
easily recover from instance and other failures while presenting your
users with a static IP address.
Availability Zones:
Availability Zones give you the ability to easily and inexpensively
operate a highly available internet application. Each Amazon EC2
Availability Zone is a distinct location that is engineered to be
insulated from failures in other Availability Zones. Previously, only
very large companies had the scale to be able to distribute an
application across multiple locations, but now it is as easy as
changing a parameter in an API call. You can choose to run your
application across multiple Availability Zones to be prepared for
unexpected events such as power failures or network connectivity
issues, or you can place instances in the same Availability Zone to
take advantage of free data transfer and the lowest latency
communication.
Every new addition makes Ec2 more attractive. In the coming months I
will be experimenting more with deploying a large scale application to
the cloud and will post some of my findings.
I was recently introduced to Amazon's new Ec2 services.
The idea of cloud computing really intrigued me after I heard about it
so I decided to take the dive. There is a bit of a learning curve with
getting started but once you get started you realize the unlimited
potential that cloud computing offers. Ec2 offers the ability to deploy
pre-configured (linux based) images (called AMI's). The AMI's can be
created from scratch or based on prebuit versions that Amazon or other
users have exposed. You can quickly deploy to several different types
of machines depending on your requirements. The base system has a
1.7Ghz Xeon CPU, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of
network bandwidth. Currently this will cost you $.10 per computing hour
plus bandwidth costs. You are only charged for the time that the
virtual machine is running and you can start and stop multiple
instances at your will to scale as you need to. There are also beefier
64-bit machines available at a higher cost. On limitation (depending on
how you look at it) is that persistent storage is not offered on the
instances. After you start it up if at any time it crashes you lose
everything on the instance. There are ways to overcome this as I will
explain later but it makes things a bit more challenging. I found that
the simplest way to get started is to find a public AMI that meets you
needs, make the modifications to the instance then save it as your own
instance into Amazon S3. S3 is another service that Amazon offers for storage, S3 and Ec2 work hand-in-hand with one another.
To get started you will need an account with Amazon Web Services at http://aws.amazon.com.
You will need to sign up with both Ec2 and S3. It does not cost
anything up front but you will need a credit card for them to draw
funds from once you start using the service. One thing that took me a
little while to get use to was the extensive use of certificates for
authentication. Beyond signing in to your AWS account nearly everything
else with the Ec2 service uses certificates or private keys. You use
them to start your instances, as well as gain remote root access to an
instance that you have started. It really makes things more secure. So
lets get started....btw I recently switched from PC to Mac so all of
the instructions will be for the Mac but they translate easily to the
PC if you are familiar with java.
Log into your AWS account, I am assuming you signed up with Ec2 and S3 already.
After you are signed click on the "You Web Services Account" button and you will find the "AWS Access Identifiers" link.
Select X.509 certificates link.
When
you click on the "create new" link you will be asked to confirm, click
yes and the two files will be generated. You will find the two
following files. These are the certificates I mentioned above that are
used to authenticate you when any commands are issued to Ec2. There
will be an additional cert that we create later to launch your
instances.
Now it is time to setup your machine to use the Ec2 tools.
Open the terminal and go to your Mac home directory and create a new folder named ~/.ec2
Copy the cert-xxxxxxx.pem and pk-xxxxxxx.pem into your ~/.ec2 directory from above.
Unzip
the tools into the ~./ec2 directory and move out the bin and lib
directories to this directory as well. It should look like the
following
cert-xxxxxxx.pem file
pk-xxxxxxx.pem file
The bin directory
The lib directory
Next
you will need to set a few environmental variables. To make things
easier you can place these changes in your ~/.bash_profile file. If
this file does not exist in your home directory you can create it then
add the following: # Amazon Ec2 tools export EC2_HOME=~/.ec2
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=`ls $EC2_HOME/pk-*.pem`
export EC2_CERT=`ls $EC2_HOME/cert-*.pem`
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home/
After making the changes you will need to reload your ~/.bash by running the command
source ~/.bash_profile
Now
you are ready to start issuing commands to Ec2, list instances and
start them. The first step is finding the instance that is appropriate
for your needs. You can test with the amazon images that are available
and customize them to your needs. To list all of the Amazon instances
type the following command. $ ec2-describe-images -o amazon IMAGE ami-20b65349 ec2-public-images/fedora-core4-base.manifest.xml amazon available public
IMAGE ami-22b6534b ec2-public-images/fedora-core4-mysql.manifest.xml amazon available public
IMAGE ami-23b6534a ec2-public-images/fedora-core4-apache.manifest.xml amazon available public
IMAGE ami-25b6534c ec2-public-images/fedora-core4-apache-mysql.manifest.xmlamazon available public
IMAGE ami-26b6534f ec2-public-images/developer-image.manifest.xml amazon available public
IMAGE ami-2bb65342 ec2-public-images/getting-started.manifest.xml amazon available public
IMAGE ami-36ff1a5f ec2-public-images/fedora-core6-base-x86_64.manifest.xmlamazon available public
IMAGE ami-bd9d78d4 ec2-public-images/demo-paid-AMI.manifest.xml amazon available public A79EC0DB
Out
of this bunch you should find at least one suitable to test with, we
will use the Fedora Core 4 machine with Apache from above. Before doing
this we need a keypair to start the instance. This keypair will be used
to gain root access to the instance through SSH after it is up and
running.
To generate the keypair use the following
command, this will create a RSA private key and output it to the
screen. You will copy this entire key from ------BEGIN RSA PRIVATE
KEY------ TO ------END PRIVATE RSA KEY------. Paste this into a new
file named ec2-keypair in your ~/.ec2 directory. $ ec2-add-keypair ec2-keypair
This
step is something that I missed at first and it frustrated me until I
figured out what I was doing wrong. Before you can use this key to SSH
to a running instance the Ec2 tools require that you set permissions on
the file so that only your account has access to the file. You can do
that with the command. $ chmod 600 ec2-keypair
Now we can boot up an ec2 instance. We have chosen the ami-23b6534a instance from above. You will use the following command to start the instance. $ ec2-run-instances ami-23b6534a -k ec2-keypair
It
will take a little while for your instance to start but while you are
waiting you can check on the status of the instance with the following
command: $ ec2-describe-instances Once it is
up and running you will see "running" as the status. Take note of the
server addresses that this command provides since the provide the DNS
addresses you will need to access your instance with a web browser or
via SSH. They will be in the format of: ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com - (Externally accessible DNS address)
domU-xx-xxx-xxx-xxx.compute-1.internal - (Internally accessible DNS address used from instance to instance)
The
server instances are locked down pretty tight and you will not have
external network access to any of the instances by default. You have
control over opening the ports though similar to controlling your own
firewall. The network access is not configured uniquely to each
instance but instead you control it by groups. You can launch several
instances in the same group and provide network access to that group.
When you start an instance like we did above it is started as part of
the "default" group. We now need to open up network access for web
traffic on port 80 and SSH on port 22 with the following commands: ec2-authorize default -p 22
ec2-authorize default -p 80
You can now access your instance by opening up your web browser and entering your address http://ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com
Now
you are ready to access the command line of the instance. This is where
the private key that you created early comes in. You do not have a root
password, instead you use the private key to authenticate yourself. You
can access via SSH with the command:
ssh -i ec2-keypair root@ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com
Now you are up and running with your instance. You can change
whatever you want and add software to the Linux image. Just remember
that it does not persist if you shutdown. If you do a reboot it will
persist. After you have made all of the changes you want you can
repackage the instance as your own and store it into the Amazon S3
service (LINK TO THESE STEPS)
Challenges of working with Ec2
You get a dynamic IP address each time you boot an image. There are solutions with DynDNS that are worth exploring.
There
is no persistent storage if an instance fails. There are ways to
overcome this limitation. So far I have worked with PeristantFS which
allows you to mount a bucket from S3 as a directory in your image.
You
are limited by space in the image to 10GB (I think I need to confirm)
if are going to store large files I suggest putting them somewhere in
the /mnt directory since that has a lot more space. Also if you save
the image anything in the /mnt folder is not saved as part of the
image. You can put log files and other content that you don't want
saved in this location
Databases are a challenge with
limited options for persistence. Third parties are popping up offering
db hosting on the cloud so you don't have to manage it yourself. I will
explore these more in the future.
The future of scalable computing....
I really feel like cloud based solutions are the future for hosted
solutions. Once you work out some of the limitations you can build a
very scalable solution where you have automated scripts that launch new
instances as you have a need to scale. In turn you can shut them down
as the load decreases. There are overall architecture needs that have
to be addressed to utilize an infrastructure like this but it is all
doable with a bit of ingenuity. Add in the fact that a small business
does not have to invest an significant amount into hardware and
software to start running on this type of solution and it is a no
brainer. The questions of SLA's come up and I expect that to be an
issue for the short term but solvable in the future.
I also used RightScale when I first got started with Ec2, they are a
third party that puts a front end onto the managing of ec2 instance. It
makes it a lot easier to get started and get your head around Ec2. All
you need is an AWS account with Ec2 and S3 and you can get started with
RightScale. You do not have to deal with all of the command line stuff
above and the Ec2 tools.