Boutique Tech Conference · 4. – 6. June in Rostock (Germany)
Picture of the talk

Loose coupled high availability with FreeSWITCH and MySQL

in English by Tim Panton of Westhawk Ltd at AMOOCON 2009

Abstract

Westhawk Ltd have built a high availability SIP cluster for a client.
We will describe the thought processes that went into the design and the resulting architecture.

The key features are that the components communicate on-demand over HTTP and that the database clustering uses a share-nothing model.

This loose coupling of the components reduces interdependency
and complexity and should improve uptime in the long run.

This talk will cover FreeSWITCH on-demand dialplans and MySQL clustering with the NDB engine.

Timecomcrop

Additional material

Here you can find all available material for this talk.

PDFs

Audio recordings

Video recordings

The slides

There are 18 different slides. Click on them to view an enlarged version.

  1. Slide-0
  2. Slide-1
  3. Slide-2
  4. Slide-3
  5. Slide-4
  6. Slide-5
  7. Slide-6
  8. Slide-7
  9. Slide-8
  10. Slide-9
  11. Slide-10
  12. Slide-11
  13. Slide-12
  14. Slide-13
  15. Slide-14
  16. Slide-15
  17. Slide-16
  18. Slide-17

Transcript

Tim Panton: Good. So I’m Tim Panton and I want to talk about building a loose coupled/high availability system with FreeSWITCH . But I must warn you, my primary advice, not any kind of a criticism of the technology but probably, don’t. If you can avoid building something complicated, do. That’s the kind of take-away message.

So, who am I? I’m Tim Panton and @steely_glint on Twitter . I do consulting with Westhawk . You may also come across the – I was crazy enough to write open [inaudible] Java a couple of times and do some stuff with that. I’ve been fairly active in the Asterisk community as well as in the FreeSWITCH community.

A lot of this is actually based on a history that goes back, predates Asterisk and my sometimes bitter experiences with big clusters on open system. What’s high availability for? Here are some standard answers I get from people. It gives you better performance – actually this all reads a bit like a Viagra ad. Better performance, higher uptime…anyway.

It protects your job. I actually had one customer who said, “We just always do. We always buy these things in pairs. It doesn’t matter; we buy two of them.” Those are all the wrong answers. There are no good reasons there of their own for doing any kind of high availability SIP clustering. Those aren’t reasons, those are kind of justifications. So the reasons should always be to meet a business need. Prevent loss of income – say, my server turns over 10,000 pounds a minute or whatever and therefore it’s worth this amount to not be down for that length of time. You should do that calculation or your customers should.

It may be loss of reputation – being down for a day or losing so many customers and so it matters. And actually, this one is the big one for me: loss of sleep. If I’m the one who built this, then I’m probably the one getting up in the middle of the night when it falls over. It’s kind of nice if there’s two of them and they don’t notice that it’s fallen over until the next morning. So I get a good night’s sleep out of that.

So those are the sorts of reasons you should be doing this if you kind of have to. I had some very bad experiences with HA. My all-time favorite – there was some software for this but I wasn’t involved in planning the networks so I deny all responsibility for these mistakes. But there was one particular site where, in the first year, they had about four days of down time on this high availability cluster system and two of them were because their license had expired on the Journaling filesystem and they had forgotten to renew it. They had a test license in, they forgot to renew it, so the whole thing was down until they got a hold of the manufacturer in the US of the software and got a new license issued.

Similar but different problems to do with – let me see. A couple of routers running in repair mode sharing information. One got a bad rip in the memory and was sharing this with the other one so the grid table was completely wrong. They didn’t diagnose it – they took out the bad one, put in a new one and that was fine except it shared the route information with the one that had still been up. So we kept the route poisoning and it just moved – every time they replaced a router it would learn these bad routes. It wasn’t until the shut the whole network down that they managed to flush this stuff out, and maybe put some new stuff in that they actually managed to get the network back up again.

This isn’t a story of mine but the guest story is, a customer of a friend of mine, he was told that he had to have a UPS that was capable of supporting the entire machine room so they can support 400 users continuously for a working day or something. And my friend said, “OK, but it’s going to cost you a lot of money. Where are these users?” And they said, “Well they are in the building.” “We do have a building like UPS that will support you for eight working hours.” “Oh no, we don’t do that.” So how are the users going to get to this big system keeping running? Well, they’re not.

So they hadn’t thought through the problem. They hadn’t decided – they still put the UPS in because that was what they decided to do. But if you don’t think about this stuff then you don’t come to the right conclusions. Here’s the real reason it’s a bad idea: complexity is a bad thing. It’s a bad thing for a whole load of reasons. The simplest thing is that it’s more expensive. If you always have two of everything, at least, then it will cost you more.

It’s much harder to maintain. It’s not just twice as hard to maintain; I mean physically there are twice as many things to screw into the rack and cool and clean and that kind of stuff. But it’s also just more – there are more wires and it’s much more difficult to understand and way harder to hand over. You may understand it. You may even document it. But the guy that comes in six months after you is going to have a much harder time understanding what you’ve built if it’s something complicated.

Just numerically, if there are more components, there are more things to fail. So you’re much more likely to have a failure, although you will survive it. You need to think about whether you can get away with running a really simple well-constructed one box solution. Very simple, or as simple as you can get away with. I have a quote for that later. Drive down complexity as far as you can.

In reality, of course, you still need to do it. There are genuine business reasons; there are genuine situations. What you’re really doing is finding the perfect system. So once you analyze what the business case is, then you can address that specific business case and worry about those things to support the kinds of failures that we worry about and what can be tolerated and what can’t.

That’s actually a question you have to ask quite carefully. Because if you say – in my experience, if you ask a customer, “What’s the acceptable down time?” they will always say zero. They will always say zero; it doesn’t matter if it’s a corn shop or a teleco, they will still say zero. What you have to do is say, “What will it cost you if it’s down for an hour? What will it cost you if it’s down for a minute?” And then you’ve got some numbers on it.

You have to think about what kind of failure is survivable or may not be noticed by the customers and try to pin those things down. Who is going to maintain it? Think about those things. And what outside dependencies have you got? There’s no point in having a hugely reliable system if your carrier is unreliable. There’s no point in optimizing your reliability if you’re a factor of ten more reliable than some outside power company or delivery of diesel or whatever it is that you need to keep running. So you need to think about these things.

OK, so now we finally get to the technical bit, once we’ve done all this business stuff. So we analyze the specific case for the system that we built recently. And it’s very simple; routing incoming SIP; we route calls out to another set of SIP addresses based on a set of database rules and various other routed criteria. Some of the calls get recorded; not all of them but you have to reckon that all of them will at any particular moment. It’s a few hundred calls simultaneously; it’s not a huge system and there’s a web admin and a set of rules that need to be put in through the admin interface and managed through a web interface and some odbc reporting.

Now, here’s the good stuff. The good news is that dropped calls on hardware failure are acceptable; not recommended but they’ll live with them. If you have a hardware failure on this system and you drop the calls that are currently in progress, they’ll live with that. But if the user dials back, they should get connected. So it’s no prolonged downtime. It’s got to be scalable so the idea is to be able to add a few more boxes and get a few more hundred channels out.

So we came up with this and there are two crucial things about this. The first one is that’s it’s a “share nothing model”. What that means is that there is no central right away to fail or for the license to expire. The data is kept on – the diagram is effectively a mirror and each side contains the same data. To do that – we’ll talk a little more about this – but MySQL cluster is the trick for that. It allows you to have – the whole right hand side can fail and come back up and they’ll still be in sync because there is no – the synchronization is done at a transaction level. I’ll talk some more about that.

The other thing that’s interesting about this particular design that I really liked is that there are no long-running connections on this. All of these arrows on this diagram are Http – well, apart from the circle ones – they are http connections. They are intrinsically transient connections with a protocol that is set-up and expects to be short-term connections.

So if some part of this fails, then there is no socket hanging around typing up an IP address or there’s no lingering connection because all of these things are short-term, re-opened connections. Now, that may not scale in high volumes but for the kind of numbers we were looking at, that’s perfectly acceptable.

So if we look at the diagram again, we’ve got a SIP A and a SIP B. Now we chose to use FreeSWITCH for that. I’ve been using Asterisk for a long time, but it happened for this particular project I really liked the idea of using FreeSWITCH and that’s because one of the really nice things that surprised me about FreeSWITCH is that it allows you to dynamically generate dialplan.

So when a call comes in, it does an http request. This is configuration doesn’t have to, but if you set it up that way it does an http request to a web server and gets the chunk of html back from the web server, which it then creates this dial plan and acts on it for that call. So you can dynamically generate out of the database some rules about what should happen to this specific tool based on the parameters it passed.

The nicest thing about that is you can set a fallback URL. So if the first one doesn’t reply or comes back with a 404 or whatever, then it will automatically go to the second one. So I have failover done in minutes. I didn’t have to think about it. All I have to make sure I have two identical web servers who can both serve the content. And I’m kind of done. It’s not that simple really, but it’s quite a sweet thing.

And then the same thing happens with the CDR records. The CDR records are http posted to a database and again – and this is the thing they added specifically for me, which is sweet of them – there’s a failover. So you can say, “If you didn’t manage to do a post to that one, post it here instead.” So you’re guaranteeing that your CDR records get stored even if one of the servers is down.

This isn’t really to do with clustering, but it is to do with high availability – that they can be programmed in XML and JavaScript and the really nice thing about that is you can get a web program from those who provide a reliable web service; you can get them to write the back end for this. You don’t have to find the telephony engineer to go and build your database. You don’t have to find a telephony engineer who understands databases. You find somebody who has done that already and knows how to do that and layer on top of it.

So it allows you to do a certain level of separation of skills which is quite – I think it’s arguably impossible to do with Asterisk or certainly any closed source PBXs where you’re living in the teleco world. This allows a layer of separation which is nice from the user perspective and it means that there are many more people with the necessary skills to do not only the work but the maintenance afterwards. And that makes the maintenance much easier to do.

The database is MySQL Cluster. Now MySQL Cluster is not MySQL. It just isn’t. I mean the names are the same but almost nothing else is. It’s a new memory database that – so you have to buy a lot of memory. You need to calculate the size of the database that you’re going to be working with and buy that much memory and you need enough memory to run that database as well. So it’s a big calculation. I think we put 12 gig into these machines and just all the live databases.

But the great thing about that is the transaction doesn’t complete until both databases have completed that transaction. So inserting a row in here, that transaction will not complete until this one is also completed simultaneously. And if this one doesn’t complete, then the whole thing gets rolled back. So the two databases are always, always in sync. Now, there is an exception where this one is dead so it isn’t in sync. But when it comes back up again, it re-syncs itself.

The downside of it being in memory – and it’s not quite this simple but the downside of being in memory is you really, really need a UPS because when you lose the power, you’ve got now – actually, it flushes the data to snapshot files on the disk every five minutes. I think it’s configurable. But there is still a window of opportunity for you to lose your data. So at the minimum you need a UPS that is capable of keeping that server running for as long as your snapshot ends for.

We’ll talk a little bit more about that in a minute, but it’s an interesting style of working. It came out of some work that Ericsson did – MySQL Plus. But it’s a really nice database for teleco stuff. There’s a whole lot of stuff you don’t want to put in there because you have to buy so much memory and it has some limitations. But it’s really nice for this kind of work.

Glassfish is the web server, which is a Java application server. Again, I get to reuse job skills and stuff like that. The downside is this huge overkill for this application because it’s actually a really simple application in the back end. Probably didn’t need to use glassfish for it, particular because glassfish has a whole load of cluster/ failovers built in, which I just didn’t touch. In fact, I struggled to turn them off.

So, failover works like this. Either SIP servers can accept calls, each one has a “preferred” web database to talk to, to get their routing and push their CDR to. But it failover if that isn’t available. The SIP servers are Heartbeat SIPs; so it uses the next Heartbeat to decide who holds the shared IP address, which is where the SIP traffic is routed to. So the Heartbeat detects which one is available, moves the shared IP address to whoever is the primary. So we’re not talking about getting double capacity on two machines. It’s effectively a hot standby system. But that was the user requirement and that’s what we’re happy with.

So the Gotchas on this thing are the MySQL Cluster has a different set of rules. I made somewhat of a mistake in building this thing as a single box as a prototype to make sure we develop all the software and that. Other than moving it to MySQL Cluster, don’t do that. That was a mistake. There’s a whole set of rules that are different: the maximum size, the fields, the numbers of fields that are available in a row, the columns and various other weird things are subtly different because the back end engine is different. The back end data store is different and that percolates back up into the program.

FreeSWITCH CDR records are huge! They store absolutely everything. You can go through it and you have to filter it otherwise you end up filling up this database with stuff you don’t care about. So you need to configure that down to a set that you actually think you might need. That took me a little longer. It was a nasty shock. I should have known, but no.

Not all of the metadata in MySQL Cluster automatically transfers from one half to the other. So if you create a table in one half, then the other half will have that table. If you create a storing procedure in one half, the other half won’t have that storing procedure without you having to do anything. Create a view in one half, it won’t be in the other one. Which is just – where has my view gone? I had to correct it twice. Once you know it, it’s not a problem – it’s irritating; it’s not a problem. It is a surprise.

The other thing that didn’t surprise me at all but that one needs to be reminded in this, is that big ITSPs aren’t that helpful. You need to be doing a huge number of minutes before they will make any sorts of changes to their operational procedures or do anything for you at all other than the standard thing. We wanted to do something with that and they said come back when you’re doing it in minutes. So you have to make sure that you’re meeting what it is that they’re going to do and what you need to do with a smaller ITSP.

I had a long argument with a customer about whether we need – discussion with a customer – about whether you need a UPS in the system where the hosting facility guarantees a certain level of power up time and has a generator. In this specific instance, yes, you need that five minute UPS. But they wouldn’t accept that argument so I said, “Well, it’s your decision. I think you need it.” They said, “We don’t think we need it.” I said, “OK, your decision.” I lost that discussion, but hey.

Testing is harder than it looks. Testing failover, reboot, different failure modes takes an extremely long time and part of that was because I wasn’t on site. It was a mistake trying to do this remotely by asking someone else to do the work. The testing phase was difficult. I mean we did it but it took a long time to do it.

I think next time we’re using a much lighter web engine. I’d shop around for a more helpful ITSP and I would allocate more time for testing. I would probably think about how to do testing more up front, particularly the warm/cold stuff. With distance, that’s just really scary. I’m switching this machine off remotely – am I going to be able to persuade it to switch back on again? And the right one? We got there.

But here’s the take-away message – that’s Albert Einstein by the way. This is one of my favorite quotes from him. He said that everything should be, “as simple as possible but no simpler.” How you do that matters by interpretation but it’s – I think it’s a great piece of advice, anyway. A good thing to aim for, let’s say.

So that’s all I’ve kind of got to say but I hope you may have some questions?

Woman 1: I know it sounds kind of weird, but I’m failing to see the point of having the web server [cough]. You don’t have the web server in the middle for CDR but you have it for [inaudible]. What the purpose of that?

Tim: Let’s try to get back to the diagram.

Woman 1: Sorry.

Tim: No, so the idea is – OK, you’re right. The diagram is probably deceptive. All of the traffic to the database barring the reporting, which is done by a third party odbc engine, actually goes through the web app. The diagram is wrong, you’re right. So the CDRs come through the web app, they’re filtered and then written out of the database. So that line is effectively wrong. Thank you. Go on.

Woman 1: Next question, it follows this one. I’m wondering why don’t you filter in the [inaudible]? It makes more sense to do the connection directly through the top [inaudible].

Tim: Philosophical question. Two reasons: one of which is that you’re tangoing – you would be fine with this. You’re an intellectual person who can do databases. But you’re expensive, OK? You’re probably more expensive than I am. I would be fine with that but the nice thing about separating those two layers and decoupling FreeSWITCH out of the database work is – I can have a specification that says I want you to generate a piece of XML that looks like this, which is the dialplan.

Woman 1: But what why do you generate XML? What’s the purpose of XML?

Tim: OK, I need to talk about – it needs to be generated dynamically, precisely what happens in a given call is only determinable by certain settings that are in the database and they vary from minute to minute. And also the incoming caller IP and the outgoing – various other kind of factors. So you need to write business logic to decide what the routing is. The decision that we made – and I’m not saying it’s right for everyone but for us in particular circumstances with the available coders that we had at hand, it was nice to be able to separate that back out into a specification I could write and go and give to a web programmer to do.

The ability of FreeSWITCH to accept an XML format means that I can decouple that and give that problem to a web programmer and not have to teach them about how FreeSWITCH works. I can define that interface and decouple that in a conceptual way as well as an approachable way. That happened, because of the available resources, to suit me. I think that actually that’s fairly typical of companies; there are more good web developers available who can generate good XML from data and business logic then there are people who can do most of that and then program telephony in something like Asterisk or Yate, where you have to understand the problem space.

The nice thing about using XML is that you’re decoupling it. Now that doesn’t apply to everyone but –

Woman 1: I am just wondering, where is the [inaudible] report? Is it just basically when you write the CDR you just write the query? You just put there query there?

Tim: It doesn’t really apply so much to the CDR although it does get filtered. But it does apply to the routing decision. It’s not just even strictly routing, there’s a whole lot of YVR stuff that gets laid in under exception circumstances and stuff like that. And you can…I’ll talk to you about the specific application, I think, offline because I don’t want it recorded. It’s not for me to publicly say what their application is. But I’m happy to talk to you after this. Anyone else with a question I can answer in public?

Man 1: For the MySQL versions, what version were you using? Just 5.1, or?

Tim: Yeah, I started this about nine months ago so…

Man 1: You said that the checkpoint in the system is every five minutes – it’s actually every few seconds.

Tim: Really? OK. So the UPS can be really small.

Man 1: The thing is if you take notes, save notes that [inaudible] are set with UPS anyway, so only one can fail.

Tim: I actually, in the process of that discussion, I said we only need one UPS but that still didn’t work. So I just said, “Listen, you’re taking a risk here. It’s not a very big risk and you can go and sue your hosting provider because they are going to need a certain level of uptime if it does fail. The worst thing you’re going to do is lose their billing records for whatever it was for a few seconds. That’s a few hundred dollars so they probably made the right decision.

Man 1: In this configuration, if they both crash at the same time, you might have lost all your data even with a checkpoint going on. [inaudible] It’s really hard.

Tim: They did actually do – what they did basically, the billings got exported. What they did, they flushed the database every night. So they exported the CDR to another building every night so that the maximum exposure – even if you blew up the data center or it caught fire or got flooded or doused with water or whatever it was – their maximum exposure was a day anyway. It’s important to think about the risk. Two things I want you to think about: simplicity and risk. You want to understand what the risks are and try to go for as much simplicity as you can.

Anyone else, questions? No? Good.

Man 1: I have one more. I’m just interested in this case you just spelled out. You have also between the web servers themselves communication going on? Because otherwise some procedures won’t be [inaudible] on the other one.

Tim: No, what we did in the end was to make that part of the change management process. So there’s a set of scripts that copy over. We’ve also got to copy over a whole lot of other stuff to config to FreeSWITCH and it can’t be a web app. So there’s a synchronization process that’s to do with putting in some change management level rather than the day-to-day level. One hopes that the change management doesn’t get changed that often. But if it does, the process is there anyway. Go on.

Woman 1: Well, the web server that you are using keeps from one connection to the MySQL server or is it done [inaudible]?

Tim: There’s a couple of things missing from this diagram because it gets too busy but one which I should have mentioned, to come back to your question, is that MySQL clustering wants a decision-making box – I can’t remember the name for it. [inaudible] which is the deciding factor for things like split brain and also tells them where to find each other and holds the conflict. Basically it could be a 3H6, it does all [cough] apart from ob tracing. There’s virtually no lobe on it so that kind of sits in the middle there.

That’s also the way that the back configuration, whilst the database is trying to find – sorry, the web service – to find open SIP connections to that databases. But actually, these aren’t separate boxes. In the practical case we folded that into that box so that they allows start from their own local one, as it happens. This web app will always talk to its local database.

Woman 1: [inaudible]

Tim: So it’s a local database. It keeps it running to its local database because it’s on the same box. Whether that turns out to be the right decision or not, only time will tell. But it does.

Woman 1: [inaudible]

Tim: OK, good, I’m glad to hear it. So thanks for listening. Yeah, thanks for coming and thanks for staying this long. I’m impressed.

[applause]