Wednesday, June 30, 2010

Internal Dummy Connection

Today I heard a conversation on the subject line just a few cubicles away, and it reminded me of something which I met quite some time back in a very heavy way in my Apache access logs. In my last assignment, I was working on optimizing my server’s performance and was investigating each and every Apache requests that were coming to my server, and surprisingly Iwhat I saw was that -there were constantly a number of requests opening the main page of the default virtual host. - - [22/Sep/2009:12:05:15 -0400] "GET / HTTP/1.0" 200 2269 "-" "Apache/2.2.8 (Red Hat) (internal dummy connection)"

I searched what it is and found in Apache Documentation Wiki the actual idea behind.

Upon further research I came to know that the main Apache process apparently contacts its children not by sending them the SIGUSR1 signal, as in previous Apache releases, but instead sends them a dummy request "GET /", so that they can after the request check (and find out) that the configuration has been changed, and terminate themselves

Now since we know what it is, lets try to understand the concern around it:
  1. This causes unnecessary confusion, a mess and kind of noise in the log file, which creates problem when we go for log anaylyzer.
  2. "GET /" which actualy hits the main page of the default VirtualHost might incur the cost of dynamic content generation for /
For both the above points, if it is really an annoyance to grep out while analyzing access logs, probably what we can do is a directing this request to a valid page, which takes very less resource.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^.*internal\ dummy\ connection.*$ [NC]
RewriteRule ^/$ /nothing.html [L]

So the idea here is, since you can't keep Apache from sending all of these requests to itself,one thing you can do is to respond to them in a manner that requires the lowest possible resources.

But one advice here would be that - if you are seeing too many child processes spawn and die, you may want to have a closer look at your MinSpareServers and MaxSpareServers directives, and your MaxRequestsPerChild. If MaxRequestsPerChild is very small(people usually do this, to avoid memory leak for doubtful code/application) this is quite imminent.

So, in short "Internal Dummy Connection" is usually not something to be worried about much , but odd pattern must be investigated as it might be a hint of some wrongly judged MPM parameter setting that has gone live and this probably also can help us prevent Apache going for a 'busy for nothing' scenario if the default page is really very heavy with dynamic content!


Saturday, June 26, 2010

When Human becomes a SPOF...

SPOF(Single Point Of Failure) is an element or indivisible entity in our system, which when fails leads to a downtime or even an unwanted blackout. Now this essential element can be a firewall, Power Supply Unit(PSU) in a Data Center , UPS systems, a backup system, Enterprise Single-Sign-On (SSO) OR sometimes a human being too! While the rest is technical , and mostly seen in any IT /ITES or some other organization the last one i.e. human being as a single point of failure, which even though does not lead to failure but surely leads to a slower recovery, post -any outage. Now a days in many organization ,an exercise in the form of a Single Point of Failure (SPOF) analysis is seem to be conducted too. A SPOF analysis is a systematic analysis of what can go wrong in our environment, and what impact each failure can cause. It details the inter-dependencies and relationships among the major components in our environment. This analysis is really helpful to figure out the crucial failure points quite easily and also can be eradicated at the root, with some serious collaborative effort by various BU, partners or so.

For others who do not practice this; it may be due to their time constrain. But this definitely limits the scope of a detail analysis to understand where are we actually attracting SPOF points which might result in a undesirable outcome. Today I am going to emphasis on the point - how and why an employee has to take the role of a savior in the organization in spite of having a vey capable, ready to jump-in team. This is because he is the person who would normally have all the detail from minor to sensitive information. But is it really healthy ? And , can this be removed with a much broader expanded planing right from the planing phase? For the rest of my post I will call this person who is definitely a hero as Mr. Rocky! He is usually company's "Go-To" guy. He knows each minor detail of how a device works, Why this device here and not other, was considered ,why this is configured this way and not the other normal way. He is the guy, whom your manger will always have to call when something goes wrong. Yes! The business needs experienced and serious people who can take ownership when something goes wrong. It is also completely acceptable that you become more important as you gain experience and skills. But when you start feeling too important and the business hinges on you then please be aware! you should know that you are setting up yourself to be the SPOF here.

Failure and Post Failure analysis:

Before any analysis it is important to understand: What constitutes a failure? This may seem a trivial question at first, but the term 'failure' is meaningless in the abstract. In whose opinion has a failure occurred? Your customers? Or your company's? It is important to understand and define the perspective from which a lack of functionality will be considered a failure. Now that, we know we have a failure, first is of-course to get out of it. Post which we can do a RCA, which can help figure out - was there a any human dependency or a technical malfunctioning of some equipment, which led to the outage. How much time the recovery took. Was there too a human dependency. Did Mr. Rocky got busy this time also since he had to deliver everything single-handedly, in-spite of having a quite big team, who were just a mere on lookers when crisis was on.

Lets look at some of the negative aspects if we have a guy like this in our organization:

  1. What if he/she might take a unhealthy control of the work flow, like change management / system configuration which he only knows, and our organization is not so good in terms of a Wiki documentation.
  2. He suddenly changed his mind to switch job in a short notice or going rogue joined one of my competitor.
  3. He is on a Caribbean vacation and there is a crisis. But by then, even though my sweet manager understands what he missed by NOT giving enough training to new comers, we already have an outage.

Do we have a solution:

Yes! Human SPOF can be reduced if not completely eradicable. First point is to identify the important/crucial person team wise in the organization. Of course this may be a huge exercise depending on the situation, size and complexity of your organization. Assigning documentation to various members in the team, regular follow up, and getting report in a continuous fashion. Also needed is - documenting all the peoples job descriptions, inside the team, their roles & responsibilities, implementing backup-roles, cross-training your IT personnel, and most importantly - Not having a team which is a just bare minimum.

This post would be incomplete if I don't try to understand the view point of Mr. Rocky! what led people around him -to reach a point NOW, where they had to discuss his role only when crisis came in.

'Rocky' has a team. But probably they are not contributors. They are NOT self starters, what they wrote in their resume(quite oppositely while applying for the job!). They don't love the product as much as he loves or cares. 'Non-Rocky's ' can work or works on the product until COB (sharp) only, but not beyond- depicting sheer sense of professionalism .They have a personal life too, and they need a balance. 'Rocky' has too, but you know,- he is 'different' .

Mr. Rocky does it again!

Crisis, downtime, upgrade, maintenance, target, deadline and so many- and yet- on top of that , he has a new team to train. He might be OK with it, but to start with -emphasizes more on reading company wiki pages etc. and all those junk he has in the name of documentation , following the emails, watch out for some new upcoming events, attending team meetings and probably after then only he would prefer to sit with them for some training. I have no guess here, but he must have some thought process behind that too! Crisis is the best time to teach new people in your team. But this time also 'Rocky' misses it- because he thinks he can just get it done quicker if he does it himself. Because that is a moment when he is the only one driving this out, and definitely living on peoples expectations! so his first priority is to get out of it and decides to do it single handedly and he rocks again! But in the end, nobody else learns the job. Important thing to watch is- was the incident a good opportunity or a case study to document so many things which in fact could have helped the new comers? or is it that - Rocky again became so tired or laziness showed up to train, teach, and document. Did the manager understood the importance of doing this exercise or he just relaxed shooting some "Kudos" mail? Or, did Rocky took advantage of the point that - "He can’t be fired and he knows it. And sadly, he probably need to be fired, but can’t be." I reiterate on this point of a serious emphasis for business mangers to consider it by giving a closer look on a top-down approach. Some people really never builds a redundancy which a company requires. This is NOT an achievement that is being payed off as a point that Rocky made during his annual appraisal. But a reminder to the management, that SPOF himself is declaring it so bold and loud Who is he!

Rocky has a point too. He is a hard-working, he loves what he does and does it with utmost care and deep involvement . His hardwork also often gets paid off. He and his people around too is happy along with management. He is always in a helping spree. That's why probably he is always the My-Dear 'n' Go-To-Guy. On the other hand we have a new capable team too, thirsty for knowledge transfer and ready to jump in. Then why the hell this SPOF still exists? Is he also suffering for this. Probably Yes.

Whats is actually Wrong?

What is seen here is a capable team, which is so rare to get, who is always wiling to jump in and ready to give their best when there is a call for it. But there is a serious disconnect here in the form of team-work and collaborative effort. To some extent this is mostly situation-driven, but to some extent contributed by our relaxed business leaders who has tremendous faith on his current rock stars who sometime forgets to even lay the foundation of a much needed bridge for knowledge drain to other team members.

Moreover the senior persons, whom I have named Rock Starts in this post should always be open to share and mentor his fellow team members in their team who can fill the gap in his absence, and if at all there is that so called 'command-and-conquer' mentality, better to get rid of it.

I think if we Don't 'Skip' these points, we can move 'Fast' towards removing this Human SPOF too! Watsay? :-)


Thursday, June 24, 2010

Windows 7 - Ultum Melior

First, to provide full disclosure -I earn my bread & butter by working with UNIX/ Linux servers . But until heading for the job market, I grew up on a steady diet of Windows Operating System. This helped me become well versed in just about every OS from Microsoft and any other you can imagine in POSIX environment. Could not stop myself from writing on the first hands-on feel on this new release from Microsoft which i just installed on my laptop.

On desktop space I am still a very avid and loyal user of Windows and I guess I will continue to do so for quite sometime down the line. Accidentally though, yesterday I had to upgrade my system to Windows 7. I know its a while Windows 7 got released for customers on October end last year. This many days I was still using Windows XP, but yeah! I also started feeling that XP got booed, unable to match fast evolving rich set of Hardware market space. Its is more than 12 hrs. I could test almost most of the new features.

Below is what i liked most out of this new release -

Performance on this IBM laptop was surprisingly zippy and certainly superior to that of XP what i had just some hrs. back. It gave me the login prompt quite instantly and post which explorer showed up in almost no time. Aero worked like a charm, windows and dialog boxes appeared quickly, and I experienced no slowdowns. The Control Panel and its applets opened nearly immediately, without the delays which was common in XP.

The most noticeable is the new taskbar, which replaces both the old Quick Launch bar (for launching applications) and the old taskbar (for switching among running windows). The new taskbar combines the two features, doing double-duty as a task launcher and task switcher, similar to the Mac OS X Dock. In general, it succeeds admirably.

The searchable Start menu alone ensured how eligible it is to receive the accolade. This search gives much faster result that too on multiple items. For ex. search for the word "Value" - will show you a result on email, word documents, media store, or where ever it finds this match. For the Windows faithful, it’s been a tough eight years -and i felt that finally, Microsoft releases a successor worthy of Windows XP.

Another core enhancement to the OS comes in the form of Jump Lists. When you right-click an application's icon in the taskbar, a menu appears of actions associated with that application -- and the list varies according to the application. Desktop Search and Explorer Enhancements is also much appreciated. From Windows XP, Windows 7 will totally feel like a revelation from the glossy future.

Troubleshooting and Monitoring of services is quite good, and gives much better detail. Almost a revamp on the resource monitoring(resmon).

Large icons on the taskbar are used to launch applications, as well as to switch to different windows running in those applications.

Another feature i liked was ,the Aero Peek; which is a kind of X-Ray vision used to view the Desktop temporarily without minimizing all open windows or applications, this is particularly useful when we just want to have a quick look at the Desktop and are too lazy to minimize all the windows. Moreover Switching among windows using Alt-Tab has been improved by combining it with Aero Peek. When you use Alt-Tab to cycle through your open windows, you still display the window that you've tabbed to, but you also peek through to the desktop to see the underlying desktop, along with outlines of any other open windows, just as you can with Aero Peek.

I wont say its still a multimedia power house, but there are good improvements on that front too. There is good enhancement over the UI. A common annoyance with earlier WMP was -not having the right codec, which looks much better with WMP12. Also, With Windows Media Center, your PC is a powerful TV now.

While it addresses a number of nagging issues - this latest OS from Microsoft delivers a truly next-generation interface which also gives a very handy and a nice look and feel with faster computing that will definitely transform the way we use our computers. So to summarize - Windows 7 has focused on usability, user experience and overall feel.

But i know instead of appreciating there will be still some people who will complain about various other things. My say is that -Lets Love it for what it is, Not what we want them to do just for me. Its hard to sell a product that 100% people are going to like and want it to function to their like's. Love it for what it is, Not what you want them to do just for you. Who knows, it might even make its entry in the next version of "Every OS Sucks" video !

Frankly speaking -Windows 7 indeed wowed me. I guess it's Microsoft’s Best Yet ! - Ultum Melior.


Tuesday, June 22, 2010

Linux - SSH Login Slow

I am probably just going to shed some light on one aspect which recently I saw in one of the servers. But I strongly feel that one should have a very good understanding of how SSH works before even starting to troubleshoot anything on SSH.

Recently I got a complain about one server 'being slow'. I just opened the server and went to grab a cup of coffee. I came back to seat, did some general health check up, logs checking etc. but could not find the box behaving odd. I replied the guy back to recheck if things has settled downed and I logged out. Surprisingly enough he again complains that 'server is slow'. I again logged in. What i saw was server was taking some 15-30 sec, after you provide the password to give the shell. And that was what the user was saying 'server is slow'. When you actually understand the problem(WHAT) - solution(HOW) is easy!!

Luckily my first doubt on network settings was correct. DNS was out of my radar as i am trying to connect with raw ip and not any hostname. Yes! I was correct! It was a wrong gateway issue.

On the problematic server i did try to figure out the default GW and surprisingly i found two entry there -

[root@server5 ~]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface U 0 0 0 eth0 U 0 0 0 eth0 UG 0 0 0 eth0 UG 0 0 0 eth0

[root@server5 ~]# ping
PING ( 56(84) bytes of data.

--- ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms

So first GW it self is wrong, and does not exist.

[root@server5 ~]# ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=255 time=1.48 ms
64 bytes from icmp_seq=2 ttl=255 time=1.53 ms

I quickly validate the correct GW on some other server and found that none of these two are actually a valid one, which our network team confirmed. So decided to change Default GW on the problematic server.

[root@server5 ~]# route del default

[root@server5 ~]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface U 0 0 0 eth0 U 0 0 0 eth0 UG 0 0 0 eth0

[root@server5 ~]# route del default

[root@server5 ~]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface U 0 0 0 eth0 U 0 0 0 eth0

Added a new valid gateway:

[root@leserver5 ~]# route add default gw

[root@server5 ~]# route
Destination Gateway Genmask Flags Metric Ref Use Iface * U 0 0 0 eth0 * U 0 0 0 eth0
default UG 0 0 0 eth0

SSH login looks OK now.

My writing of this post was not to end up here, at this point. Rather, to analyze what was the blockade the wrong GW had posed to make the login process slow and revisit how SSH actually works and how much imp. it is - to have a right gateway, even though you might have multiple (kinda) capable GW in your network. I was accessing these server from outside. Saying so what i meant was, for the reverse packet to flow back, default gateway was needed as i was on other subnet/network. So this server was actually trying to get through the first gateway i.e., which was giving us a RTO on ping. The second one i.e. was somehow capable of, but this has a NAT to a already expired contract ISP vendor, which could have gone anytime in a day or two, and somehow the right gateway was not added to this server. I could not test and validate BUT i am almost sure that, this issue would not be faced if i am doing a ssh from the local network.

Earlier I found SSH slow login issue on two more occasions - once it was a X-Forwarding issue and the other was DNS look-up issue.Do leave some comment below if you have faced this issue for some other factor.


Sunday, June 20, 2010

19th June - Happy Father's Day

I forgot to wish my father this year also. And still trying to console my heart saying " what! Every day is a fathers day, the ever lasting respect and utmost care is what matters". Yes! I have that. Its is not the flesh and blood, rather the heart -what it matters. Fathers are often the Unsung Hero's in our lives. So a spacial day honoring fathers and celebrating fatherhood, paternal bonds, is pretty much valid and indeed needed and stands very significant. Probably that's how the idea came up to create a special 'Fathers Day'! Sometimes they rarely ever get their fair share of adulation. This thankless position of the father in the family-the provider for all definitely calls for a Day to celebrate. Wheeee!

Until my college days I was kind of rebel against my father. But later realized, the justification he had, was NOT to discourage me; rather, to introduce and make me aware of this outside world. Now I realize how important was his all-rounded view of life towards making me so 'self-sufficient'. It is the Father, who has to give a direction of life to his children, else mother's over-protectiveness almost sometimes kills the child's budding creative skills. This is a very crucial juncture where, we as a child want our self identity and start thinking independently and the desire to be autonomous intensifies so much. During this period, when the father with his diversified exposure and experience warns us against the odd we feel very suppressive.

I always loved my Father, because he was always so caring and loving for my mother. I think that is the greatest gift a Father can gift his children. Even the poorest man has capability to leave his children with the richest of inheritance. Have not you noticed, as a child -when you received some gift from your father both laughed! BUT when you gifted him something, probably with the first pay check of your career both of you cried. We too cried. Instead of raising investment figures, my father always emphasized investing on us, so that we become capable human being. This generous soul, who possess the wisdom of ages is what a Dad comprises of. The comforting arms at nights when we were kids, and protective nature demonstrating the power of eagles flight, with tremendous wisdom is what we call "Father".

Happy Fathers Day Dad! You Are too Good ! Its your day out - Enjoy! We Love You...

~Debajit Kataki

Friday, June 18, 2010

Clearing Apache Unused Sempaphore

In UNIX systems, semaphores are a technique for coordinating or synchronizing activities in which multiple process compete for the same operating system resources. In simple terms a semaphore is a type of Inter process communication resource(IPC) used for synchronization and mutual exclusion between any two asynchronous processes.

On a high load web server (mostly Worker MPM), we usually come across an error message

[emerg] (28)No space left on device: Couldn’t create accept lock /or

[crit] (28)No space left on device: mod_rewrite: could not create rewrite_log_lock Configuration Failed,


[error] (28)No space left on device: Cannot create SSLMutex
Configuration Failed

- leading to a situation where Apache refuse to start.

We will come to solution later. First - lets try to understand why this happens.

What Apache is telling here that - "I can't start, as I need to write some things down before I can start, and I have nowhere to write them! And technically what it means is - the system have run out of semaphore arrays. Sometimes it's full with legitimate semaphores at other times it's because some application has leaked semaphores and haven't cleaned them up during the shutdown (which is usually the case when an application segfaults). As a rule, all semaphores that have been created should be cleared. If semaphores are not cleared, they remain in memory until the process that creates them ends. A process can only clear semaphores that it has created. We will be clearing 'only' semaphores which when got stale when our apache went haywire - post which we will also do a apache restart.

Coming back to the error resolution - pls. remember that , If this happens - we need to check three things basically -
  1. Check your disk space
  2. Review filesystem quotas
  3. Clear out your active semaphores

Since the discussion is around how to clear unused semaphore - i will stress on the point No. 3.

We can use ipcs and ipcrm command to tackle this issue.

$ ipcs -s

------ Semaphore Arrays --------
key semid owner perms nsems
0x000ca001 163840 root 666 1
0x000ca016 20086800 nobody 600 1
0x000ca017 20086803 nobody 600 1
0x000ca018 20086804 nobody 600 1
Only for apache semaphore -
#ipcs -s | grep   In my case apache user 'nobody' so ,  #ipcs -s | grep nobody

Next you have to figure out what are the dead ones and remove them-
# ipcrm -s 20086804

Or, if you see a bunch of them, you can simply fire the below command:
ipcs -s | grep nobody | perl -e 'while () { @a=split(/\s+/); print `ipcrm -s $a[1]`}'

You may want to increase your available semaphores, and you'll need to tickle your kernel to do so.
Add this to /etc/sysctl.conf:
kernel.msgmni = 1024
kernel.sem = 250 256000 32 1024

And then run sysctl -p to pick up the new changes.


If you see - 'Cannot create SSLMutex - Configuration Failed' , as a solution along with the above choose a configuration to leave it to SSL Module to pick the "best" semaphore implementation available to it.

SSLMutex file:logs/ssl_mutex

Change it to :

SSLMutex sem

More options can be read here

Hope it helps to have a better understanding.


Monday, June 14, 2010

Is Monday Productive?

Its Monday again. My creative juices just aren't flowing at all, crazy alert mails, personal email, fancy commandments, plan for the week, time allocation for upcoming stuff etc. etc. I heard people saying it as 'boring Monday'. But is it really so non-productive or is it the weekend hangover or just a mere mindset!!

I asked some expert friends of mine(in terms of Time Mgt. !!, they do lot of course on the topic). Their advice was to have a "To Do List" on Friday before you leave the office for the weekend. Dude! How can I! I always have fever on Fridays. For me it's a combination of not enough breaks during the working day, eating on the hoof, too much of excitement and long working hours throughout the week is causing Brits to feel burnt out.

For me Monday is the most unproductive day of the week. I feel, Monday - is a day when people are in a general slump and also feel droopy and off schedule due to staying up late on the weekends, and not thrilled about going to work at all. People are unmotivated and tired. Additionally, although you may do housework or run errands or do other tasks that may be considered productive over the weekend, these activities are not a likely choice of action after a hard day's work or a long day of classes on a Monday. On Monday's everyone is still grumpy and sad about being back at work/school or wherever(at least me!).

For me - Thursday is the most productive day. This is the day when people(like me!) realize that they need to tighten their belt, get their act together and do things faster that are due by this week or finish stuff to prevent having to work over the weekend and help them get home on time on Fridays to enjoy the weekend!

Yet another day, in the form of Monday- after writing this much, am feeling a bit better - so let me scan some more emails - Past must die for the future to be born...So 'shout it a bit loud' -Get Set Go!

Wish you all a nice week ahead :-)



Saturday, June 12, 2010

Tuning FireFox

I don't exactly remember, 'when' I started using FireFox, but i still remember 'why' I did that. At the very first sight itself I was in love looking at its rich set of add-on. Also, its free, portable and most importantly -it is tunable!

In my previous role as a webmaster, I loved addons like, Firebug, HttpFox,IE Tab, PageSpeed, webDeveloper, colorzilla, Load Time Analyzer, SEO, YSlow, DownloadHelper, session-manager to name a few.

Playing around a lot, what I saw once is that some kind of memory hogging is happening and sometimes even my explorer hangs, which i cant get rid of until l reboot my laptop. I searched our 'Brain-cloud' (Google!!) on this behavior to see, if its a local issue of mine, or someone else is also facing this. I upgraded my browser which helped things to settle down, but I came across a few configurable parameter on FireFox which might help some non-novice.

Here are some tips which will make your browsing a little more snappier. Launch Firefox and type "about:config" in the address bar which brings you to your personalized configuration list.

1. Restore your session when you reopen your browser (not when crashed) = 3

2.Address bar auto completion
browser.urlbar.autoFill = true

3. Download only what your specifically click (by default Firefox downloads links which it thinks you want)
network.prefetch-next = false

4. Stop that RAM hogging
RAM size 128MB - 512MB :
browser.cache.memory.capacity = 5000
RAM size 512MB - 1GB :
browser.cache.memory.capacity = 15000

5. Network pipelining so that multiple requests can be sent over, before any response is received.
network.http.pipelining.maxrequests, we can change the value from 4 to a higher number anywhere from 10 to 30. I set mine to 30.


The number of connections Firefox can make to servers will impact the speed at which it can retrieve information. However if this is too high, then it will slow the application down as it tries to manage all the connections. The network.http.max-connections setting controls the total number of connections that can be maintained at any one time. The default setting is 24. We recommend for dial up to set this to 32, and for broadband take 32 and add 2 for every 1mbps of connection speed you have (so for a 2mbps connection set to 36). You may experiment with this setting to find an optimum for you specific configuration.

7. Open search-bar results in new tab:

When you use the search bar, the results display in the current tab. This feature can be a nuisance because it navigates you away from your current page. To make sure Firefox always opens search results in a new tab, search for openintab and you should see: Change this to true

8 Disable IPv6 DNS lookups

You should now have an idea of how these settings can impact the performance of Firefox. Dig more to figure out, how you can make Foxy Rock!


Sunday, June 6, 2010

Planning a Maintenance Window

Do you feel the heat, when you sense a maintenance window zooming your way? I don't think it is something unnatural. sometimes, I do too. Maintenance is inevitable. Hope you all will agree that - 'the root cause of unplanned outage is an environment that is not up-to-date'. With a recent experience of mine, I thought, I must share how better this can be performed with some simple precautions.

I always feel, while choosing a production maintenance window, whether it's a Data Center Network maintenance , a DB migration activity, or anything else - which directly or indirectly affects your customer, should be meticulously planned well ahead. I personally feel it must be planed some off-hours and requires challenging coordination between different activities and often different departmental personnel. A off-hour window, always helps to buy some more time, which gives a 'free cum extra-more' feeling!!. If you have multiple offices, that too geographically apart, do try to draw a vector diagram, and take out the intersection of the time,where no one works.


Maintenance is an activity which is aimed to mitigate future risks, we must also be aware of some delicate long lingering issue which may in turn present us, some other threat which hampers the successful completion of this current activity. The idea would be, draw a flow chart and expect everything to fail, and believe me, you will cover most of the points which might need a time consuming attn. and additional resource in the course of the event.

  1. Communication, clear communication- well in advance. Do loop in all the stakeholders, and a timely gentle reminder, until you actually start the same.
  2. Do test your backup/fail-over, that it actually works.
  3. Do send a reminder email, just before it starts. (at least 15-30 min.)
  4. Minimize the downtime as much as possible. Do NOT involve tired resource(human) to drive the same.
  5. Make sure, your maintenance plan had included all the pre/post activities and provide an estimated time for each step (allow more time than needed, and add risk time too). This will allow you to estimate the time and set the expectations.
  6. Test,test and test, release a completion email, only when your are sure, things are working.

Once we had a Network maintenance, Mail was pretty poetic, very well planned, lot of points, perfect backup strategy. But point which was NOT highlighted was that, normal phone lines will also be shut in this window(an i missed that point for this worse exp.), and there was no mentioned about who is driving this, and if any issue, who is the SPOC to be called up and what is the war room phone line no. if any!! I was working on a DB migration and suddenly got kicked out from meeting place, VPN, and everything one after another. I saw their communication, well in advance, but backup plan to connect to some other VPN did not work for me, and there is no contact info. whom to contact for this. Since VPN was down, no way i could see office directory.. and saw a total blackout for some good amount of time. Point here is that of "contact information" and being responsive and quick follow up.

Minimum Information needs to be Provided in an email, for the activity , to let the world know, can be:
  1. A brief about ‘What’ and ‘why’ this activity
  2. BU Detail
  3. Start Time(With TimeZone)
  4. End Time(With TimeZone)
  5. Related Ticket No.(if any)
  6. Maintenance Contact person details:
  7. Expected behavior/impact
  8. Fail over detail (For detail you may like to provide in a separate URL).
But I strongly feel that maintenance windows should not be that painful process, especially nowadays with advanced cluster/cloud capabilities, improved storage availability and resilient network architecture. Organizations can now plan a maintenance window to be a relatively safe process with minimal risk with a proactive and clear communication.



Thursday, June 3, 2010

Does Change of IP Cause Low SERP Rank

This is of course a very bothering question rattling in our mind, when we are planning to do a DNS change and already has spend so much of time to make better ranking continuously. Yes! People wants to know this: "Does change of IP address of my site affects my ranking". Based on my study and observation, I would say, it does NOT. But yes, there may be some other related factors, while doing so. Lot of topic is readily available just a click away in Google, but here i will just discuss what i have understood based on something which I have experience on the above subject line.

Before considering an IP address for your domain, you should have a check list to actually give a go ahead if that IP should be allowed to map against your website. If its a web-host company, make sure, this ip is NOT banned in various search engines due its possible spamming or some other such notorious activity.

If it is you own company, make sure this IP was not used for some Email server/ Crawler and you are still fighting to get your reputation in the market.

If it is a web-hosting company, you need to also check the up-time of their server/services, as it matters a lot when a spider or crawler finds it down while trying to access your site. Also some web-hosting companies block spiders, to save their bandwidth. For this you can ask the webhost companies to provide some sample sites and do check their ranks in google or any other Search Engine.

After doing some good no. of migration, let me jot down a few very imp. points, that might help :

1. Do not change any URL. First migrate or do a cut over as it is. This is true for any migration. "Architecture change and migration should NOT be combined" - that's what I believe.

2. Make sure, the site does not go down quite often, at least at that change window, else people around will have mix reaction while identifying the issue for lower SERP, that's what i faced in my career. RCA is time consuming.

3. Keep the site available in older server also for some time. You can decide until when , by seeing the access logs.

4. Make sure all the links are healthy and opening in your site. There is a free online tool for this.

5. Test...and.. test! After site launch, run a “” search on Google and click on the links to be sure that everything is behaving just the way you – and your end user expects.

6. The Change of DNS to point to your new web host is the actual crux i would say. With a short TTL, you could pull a data center’s IP address out of the rotation in just a few minute. This is very crucial for sites who keeps a very high TTL, which needs to be calculated well in advance and reduced prior to DNS cut over. Doing so helps everyone to move to your new IP address in short order instead of having a mish-mash where some people are using the old IP address for hours, which will place you in much better shape.

7. Wait for the DNS changes to propagate through the web,(its like a wave!) you can use you can use the “dig +trace domain” command in Linux/Unix. The “+trace” option tells dig to go all the way up to the DNS root servers for the lookup. Once changes are visible, you can just wait untill TTL expire. Remember that DNS is cached at each level, so even if you clear the cache, your ISP probably has cached the previous IP address until the TTL expires.

8. Once you are sure users or are fetching from the new webhost/IP address, you’re done. You can probably shut down the old web service of the previous version of the site. You might have to spend some time checking for inlinks from other sites and making sure that they are still functioning. If necessary, you may need to set up 301 redirects. Access logs on your site would be a better place to start.

And after you are done, use some small Google Shortcuts with your browser, if things are as expected. Also Google Webmaster Blog is a place which you should bookmark for new trends and ideas.

No one would do it, but still, if you are in the same DC and due to some unwanted change in physical layout, there is a call for DNS change, you should never do it in DNS level, rather NAT the public IP of the site to the new Loadbalance VIP ,which helps to avoid this entire headache.

That's it. Believe me with these few precaution, change in IP will cause no SERP low rank.



Why Database CI/CD?

Making the Database Part of Your Continuous Delivery Pipeline The database, unlike other software components and code or compiled co...