Archive for July, 2007

rmfo Forum Update

Tuesday, July 31st, 2007

smiley - 1049 CDT: I think I’ve narrowed down the problem to some missing tables. I’ll have Geof do another restore of the database to see if we can’t get this resolved yet. I can almost guarantee that this will happen >1700CDT (that’s after 5pm Central Time) at the earliest.

1151 CDT: I just validated the issue that mark was seeing. Indeed, the backup made at 1000 CDT on Sun 29 Jul was truncated.

1334 CDT: I’ve got a valid SQL package from Sunday morning around 0300, so I’m gonna try that …

1346 CDT: Nothing I can do about it from here. I’ll have to try tonight when I’ve got no other constraints.

smiley - 1414 CDT: The forum is back up & running.

Done for Today

Monday, July 30th, 2007

Anything I try to do to the server at this point is going to have to be undone at a later date, so I’m going to quit while I’m ahead. :) I’m enabling backups [and moving the restore files to a new directory so that they can't be overwritten] and going to bed soon. Neither Mark nor I can figure out why the Rumor Forum won’t connect properly—but all the data is there, so it is not a crash. Ahem.

Now I will work on easing into blissful slumber…

Restoration Log

Sunday, July 29th, 2007

Okay, time to post this one-by-one as I bring stuff back online. Yes, folks, this is where you will learn what all is actually on the server. ;)

0007 CDT: allabouthem.com is back online.

0010 CDT: andrewosenga.net is back online.

0015 CDT: BrassLantern.org is back online. Stephen, let me know if your subdomains came back okay, would you? Looks like you may have moved everything into your root and out of public_html? Kinda confused.

0018 CDT: bryanallain.com is back online.

0026 CDT: [caedmonscall.net] is back online.

0028 CDT: canspice.org is back online. Brad may now resume posting about OSCON 2007 without any interference from my imposed downtime. ;)

0041 CDT: casademorrill.com is back online. I swear, I’m not going from west to east, I’m going alphabetically. It just sorta looks west-to-east, at least amongst my geek friends.

0043 CDT: ChristyNockels.com is back online, not that I’ve been paid for hosting in months. [Not that they have anything up but a splash page, either.] It’s only up as a favor to DaveJac at this point.

0045 CDT: collidingrhinos.com is up, not that we’re doing anything with it.

0047 CDT: Corner Table Online is back online.

0050 CDT: davejac.net is back online.

0052 CDT: donmillerfans.net is back online. I can’t wait to take crap from the Rumor Forum folks for getting that back up first!

0059 CDT: [derekwebb.net] is back online.

0100 CDT: dougmorris.net is back online, although I should take it back down for that SCREAMING LOUD AUDIO IN THE FLASH INTRO. Good God, man, just because you can do something doesn’t mean that you should. ;) [He's my brother, I can give him crap.]

0101 CDT: Doug’s Place is back online.

0104 CDT: dwebblive.com is back online, not that Bry and I are doing crap with it.

0108 CDT: elemoose is back online.

0110 CDT: elucid(blue) is back online. Just realized that dwebblive doesn’t even resolve. I bet Bryan dropped the domain. Way to tell me, Allain! ;)

0111 CDT: FamousTim will be far more famous when he starts blogging. ;)

0113 CDT: Sacred Journey is back online.

0119 CDT: The Geek Nature Preserve is back online. I demand the 28-week photo of Jess!!!

0121 CDT: GFMorris.com is back. I’m too tired to do them out of order.

0122 CDT: GFMorris.org is back, too.

0126 CDT: Live Granades is back online.

0129 CDT: GRACE PCA is back online. Yeah, I had a church Web site down on a Sunday. We Methodists observe the Sabbath and keep it holy.

0133 CDT: IF Comp is back online. Again, I think Stephen’s doing fun things with his sites. :)

0135 CDT: IJSM is back. Like I have time to post.

0141 CDT: Joe Bassett and The John Larroquette Project are both back online. I’m going to stop at 0200 because I’m frieeeeeeeed.

0147 CDT: PJBenfield.com is back about 60 seconds after PJ emailed me about it. ;)

0200 CDT: The Monkey Exhibit, not another blog, v2.0, Rocky Mountain NYI, teamresearch.org, and Experiments in Life are now back online. I’m hallucinating, so I’m going to bed for three hours or so. [It's like the Blogathon, but I have to go to work tomorrow! FUN!!!!!!!!!]

0559 CDT: RMFO Pics and Rocksmyfaceoff.com are now online. I’m gonna kick something else off, then go cycle laundry and do something about food. Getting a skosh over three hours’ sleep has me ravenously hungry!

0639 CDT: The Hubbs are back online.

0642 CDT: The Hollandseseses are back online.

0650 CDT: The Nacle and United Church Softball are back online, Bry. TPWD.com is on its own domain and is gonna have to wait, unfortunately…

0654 CDT: The Other Side of Reason is back online.

0656 CDT: Who Stands is back online.


There are, undoubtedly, a couple sites that I skipped in an effort to get paying customers online last night [and this morning], but those are, I think, mostly side projects that have never gotten traction for me. I’m gonna audit the full list now that I’m mostly awake and see what I was missing. Then I’m gonna go back and start fighting this %(*#!&(*&!%# IP issue. As soon as I lick that, everything else can go back online. That might have to wait until after work today, and I know that it takes down some of the biggies on the site [rocksmyfaceoff.net, rmfo-blogs.com, etc.]. Hey, I’m caught up in this, too—-GFMorris.net was on 67.19.147.157, and my efforts to get it on the main shared IP just so I can have my tasks installation back up and running have been wholly fruitless to this point.

Thank you for your continued patience. I never dreamed that it’d be 2200 before I got solid control of the server yesterday. If I had, I would have started at 0830 and not 1000.


0740 CDT: Sorry, folks. My time to work on the restoration this morning is up. I’ve gotta go play project manager for a while, and I don’t have good access to the server to work on it while at the office [which I would if I did, but I don't]. If you don’t see the site listed above, it’s not up. That means that, unlike Spencer, you have to read the list. ;)

1055 CDT: IFComp and Brass Lantern are back. No word from the NOC. Starting to get a little peeved. :)

1124 CDT: I … think … that the IP situation is fixed. Attempting to bring rocksmyfaceoff.net back online. How can I do this at work, you may ask? iPhone, bitch!

1136 CDT: Okay, so the attempt timed out on the iPhone, and I think that the browser has to stay up. So, dashing home for lunch to bring some stuff back online.

1220 CDT: Not sure why The Rumor Forum won’t come back online. I’ve checked the permissions and everything. Doesn’t make sense. Perhaps Mark will have some time to look at it… [Mark, I nuked the db user and recreated it to no avail.] Am now rebuilding rmfo-blogs.com to at least get that online, and then perhaps one or two more before I do the gas/lunch/water/back-to-work run.

1232 CDT: Looks like rmfo-blogs is back online. That said, nothing automated on it [like the Planet install that updates the front page of that itself] is running, so if you each update, that won’t update until I get that back online. That’s both a simple thing to fix and a non-important thing to fix. :) I’m gonna pick one or two more domains to get back online and then go back to work.

Downtime Log

Sunday, July 29th, 2007

I’ll keep this up-to-date as things progress today:

1017 CDT: I’ve moved the early morning backups around and have the up-to-1000 backups now in process. As soon as those are complete, I’ll push them onto the Network Attached Storage.

1206 CDT: Probably two-thirds done with the backups.

1246 CDT: Backups are now complete. Beginning the transfer to the NAS.

1319 CDT: User accounts backed up. Now backing up critical system files and MySQL databases [the latter being a redundancy, but a critically important one].

1322 CDT: All files backed up onto the NAS. I was filling out the OS Reload request form earlier, and I saw that there’s an option to leave a disk drive the same. I’m going to have them leave the secondary HD, which is our backup, as it is if they can. That’ll save me time in bringing services back online, because I won’t have to pull data from the NAS box back onto the backup drive so I can then restore from the backups. I’m gonna put in the OS reload request now, and then it’s out of my hands. Please excuse any sounds of nausea coming from my direction. :chuckle:

1333 CDT: Crap. I can’t get RHEL v5 until Tuesday [probably a licensing thing]. I’m not waiting until Tuesday. We’ll got with RHEL 4 and, if we need to do this again some time in the future, we’ll do it again. I’m okay with that.

1350 CDT: Been trading ticket responses with Chris B. at The Planet.

Chris B. - Sunday July 29th, 2007; 1:49 PM CDT
You’re welcome, I will set the secondary drive to preserve, and your OS reload will begin momentarily.

I’m excited!

1459 CDT: The Planet kicked off a tracking ticket for the reload at 1426. I’d expect that the OS reload + cPanel/WHM load will take 90 minutes. That puts us at another hour or so before I can begin restoring accounts from the backups.

1545 CDT: No news from The Planet, but I have finally been able to get to the Twitter signup page. I’ll tweet stuff at rmfoinfo. Of course, I’ll post here, too … but this will allow me to do some fancier stuff. Now, if I can only get the server to do the tweeting for me when it’s back online … ;)

1815 CDT: I had some go-backs on the partition scheme about an hour ago [something I think we would have fixed before, oh, 1400], so we’re still waiting. If you think you’re tired of waiting, I’ll trade ya. ;)

2037 CDT: The OS reinstall is complete, and now cPanel is being reinstalled. This will apparently take about an hour, so hopefully that’ll be back up by 2200, and then I can power through bringing stuff back online.

2205 CDT: I now have access to the server again, and I will now begin testing restorations!

2217 CDT: I have zero access to the secondary drive right now, so I’ve got a second ticket in about that. I’m kinda hosed without that. :) Even if the drive ends up being unusable, I’ve got a backup on the NAS [that's why I burned a half-hour on it this morning], so I’m not worried. I’ve just gotta have access to that second HD to bring things back online.

2324 CDT: Okay, I have the backups back up and running [and have for a while], but now I’m trying to get the IP allocation set up properly so that, when I bring sites back online, they’ll be in the right spots. I’ve done this before, but it’s been a couple of years …

30 Jul 2007 0001 CDT: Well, I’m still struggling with the IP address issue. I’ve asked the NOC for help because I just can’t figure out WTF is going wrong. That said, I have restored SquarePegAlliance.net as a test to see if everything’s up and running, and that appears to be working. I will start going through and finding all the main IP-based accounts and go ahead and bring them online while working to resolve the other IP problems … which do involve most of the major accounts on the machine. :(

OS Reload: Planned Downtime

Sunday, July 29th, 2007

I will take down all Web, database, and email services at 1000 CDT (1500 GMT) today. After that point, I will run one more backup on the server that will encompass everything done to that point. That should take a couple of hours to run, I’m guessing; it took three last night, but that was with services up and running. After that’s complete, all backups will be copied over to the Network Attached Storage at The Planet, and then I’ll have them do the reload of the operating system. Once their OS reload is complete and cPanel/WHM is set back up, I’ll restore from the backups and we’ll go forward. All told, we should be back up and running by, say, dinnertime.

Operating System Reload

Saturday, July 28th, 2007

Stephen, Jonathan, and I have determined that the server needs an operating system reload. We’re only coming up on three years on this server. [There is now laughter around the table.] I’m contacting the folks at The Planet [the folks who house the server] to see what my options are on this. I’ll update you as soon as I know. [We won't be doing this while Stephanie is doing the Blogathon. That's just plain wrong.]

Web service temporarily offline

Thursday, July 19th, 2007

Howdy: We’re experiencing a small Denial of Service / comment spam zombie attack, so I’ve pulled Apache offline temporarily while I determine the IPs involved. As soon as the server has returned to normal levels, I’ll have Apache back up and running. Shouldn’t be more than five minutes’ downtime.

Update, 0925 CDT: Online.

cPanel/WHM Offline

Saturday, July 7th, 2007

All hosted users will notice that their access to cPanel is presently restricted. This is due to a licensing issue, and the license is owned by our server provider and not by me. I’ve already filed a ticket and expect a quick response. I’ll update this entry when I know more.

1920 CDT: BACK UP! :D

Service Restored

Saturday, July 7th, 2007

Howdy, all: service came back online about 4:45 p.m. CDT this afternoon. Total downtime was around six or seven hours. I apologize for it, but it was a hardware failure and was beyond my control. I’ll begin investigating the early-morning DDoS spike after I’ve had a few minutes to relax after driving home from West Tennessee.

Server Funkiness

Saturday, July 7th, 2007

Hey all … not sure what’s going on with the box. The NOC just called to tell me that it’s offline. I saw a DDoS-style spike this morning and haven’t had time to investigate as I’m off visiting family … will try to investigate as best I can from here …

Update, 1622 GMT: The current outage is at least in part due to a power failure with the server, possibly a bad power supply.