Today's red post collection includes a TFT: Galaxies set intro video and Showcase intro video, details on MSI 2020 being pushed back to July, an incident report on recent server outages, an article on the state of the LoR Beta, and more!
Continue reading for more information!- Red Post Collection: QGT: March 6, Ask Riot: Smurfs in Clash, Announcing the TFT: Galaxies Showcase, & More
- 3/6 PBE Update: Dark Star Malphite Prestige Edition, Continued TFT: Galaxies Testing, & More!
- Champion & Skin Sale: Week of March 9th
- 3/9 PBE Update: Dark Star Malphite Prestige Edition Splash Art, Chroma Assets, & more!
- Free Champion Rotation, Week of March 10th
- 3/10 PBE Update: Continued TFT: Galaxies Testing & Tentative Balance Changes
Table of Contents
- TFT: Galaxies Set Introduction & Showcase
- End of the NA Community Events Tool
- 2020 MSI Moved from May to July
- Incident Report: Recent Outages in Europe & Brazil
- Legends of Runeterra: State of the Beta
- Miscellaneous
TFT: Galaxies Set Introduction & Showcase
A new TFT: Galaxies set introduction video was released on the League of Legends YT:"Teamfight Tactics Sign Up & Download https://teamfighttactics.leagueoflege...
Welcome aboard, captain. Here’s your guide to the interstellar origins, classes, and champions of Teamfight Tactics: Galaxies.
/dev: Teamfight Tactics: Galaxies Systems Update https://na.leagueoflegends.com/en-us/..."
There will also be a TFT: Galaxies showcase, streaming on Twitch and Youtube on March 12-13, 3PM PST! More details can be found here, or check out the Showcase introduction video:
End of the NA Community Events Tool
Here's SandalwoodGrips with details on the depreciation of the NA community events tool:"With the sunsetting of the boards we've decided to re-evaluate the Community Events tool for North America.
We originally launched the tool several years ago as a way to help small event organizers run IRL tournaments. The tool has served us well since then, but as it’s built on the boards framework, we believe that now is a good time to take a step back and review how it is being used by players and event organizers. As of today, the Community Events tool will be shut down.
If you had a tournament already registered with the tool, your RP will still be honored, and we will also be awarding RP for any events currently in our backlog."
2020 MSI Moved from May to July
Here's a post from John Needham, the Global Head of Esports at Riot, on MSI:"I’m writing to share some news about this year’s League of Legends Mid-Season Invitational (MSI). After discussing with our leagues and other stakeholders, we’ve made the decision to move our annual mid-year global tournament event from May to July this year. Shifting the tournament to the summer offers the best chance to see travel restrictions lifted, allowing teams from leagues around the world to travel and compete safely. Competition will begin on Friday, July 3 and conclude on Sunday, July 19.
Location and other details will be shared as soon as possible, though we need to remain flexible given how much the COVID-19 situation has impacted international travel and live events for nearly every sports and entertainment property. That said, we’re committed to ensuring that MSI remains a truly globally resonant experience.
To accommodate for this change and ensure minimal impact to the remainder of the calendar year, we have made changes to our Summer Split start dates which will see the second split begin prior to the Mid-Season Invitational. We have aligned with each of our regions to shift dates of their Split 2 starts to account for this schedule change:
OPL - May 15th
VCS - May 15th
LCS - May 16th
TCL - May 16th
LCK - May 20th
LEC - May 22nd
CBLOL - May 23rd
LPL - May 23rd
PCS - May 30th
LJL - May 31st
LCL - June 6th
LLA - May 30th
Additionally, after reviewing the calendar and listening to feedback from our teams and players, we have decided to sunset Rift Rivals in the remaining regions where the tournament was scheduled to take place. Rift Rivals produced some epic battles, but the challenges it created to the annual schedule were felt at both a regional and global level. We remain committed to producing globally resonant events that showcase our sport, and will work with regions to determine fresh ways to do so in the years ahead.
As noted in my February 25 post, we want to thank our teams, partners, players, and fans who have demonstrated their support to those impacted by the virus, through charitable donations and other means. We’re eager to reunite the global League of Legends community at the Mid-Season Invitational this July. We’ll continue to assess the situation to ensure the best experience for fans and players, and update you along the way."
Incident Report: Recent Outages in Europe & Brazil
For those who saw outages on their server in February, an incident report from Riot Penrif was published, details what happened and how they were able to fix the issues:
"Hello folks, Brian "Penrif" Bossé from the Technology department of League of Legends to share details about the technical issues that caused the recent outages in EUW, EUNE, and BR towards the end of February. I'm going to get a little nerdy on y'all here, but if you're curious about the details behind those outages and what we've done to fix them, come with me on a journey with computers and graphs!
Setting the Scene
The first symptom that came to our vigilant Network Operations Center's attention was that the number of League games starting went down drastically.
Not typical behavior |
There are a lot of systems that go into a successful game creation, from matchmaking to load distribution to the game server itself, and it's not immediately clear from a symptom like this where the breakdown is. When we brought the experts in each of those systems into triage to make that determination, they all indicated that their service's health looked good, but very little traffic was coming in to them. A matchmaker with few people coming in to request a match doesn't have much to do.
So, all of that indicated that we're dealing with a systemic issue in getting player traffic into our backend. Metrics showed us some major problems in that neighborhood:
The horizontal red line all the way at the bottom is where sirens go off |
What we're looking at here is the number of inbound connections to one of our generic container hosts. Those are highly capable single computers that run a bunch of smaller applications - in shop-talk, containers - that comprise the overall system that runs League. Two of those hosts are getting vastly more connections than is reasonable. To understand why, we need to talk about a particular kind of container, one performing an "edge" function.
Living on the Edge
Edge processes are tasked with taking in traffic from the internet, filtering it, and routing it to the appropriate backend service. They take the garbage pile that is the public internet and leave only nice, clean stream of bytes for everyone else to process in comfort. As you can imagine, edge processes go through a lot of traffic, but really shouldn't regularly see the kind of spike we saw during these events. There are three factors that all lined up to create this situation; I'll give them each a section then pull it all together.
So It Begins
First up, the spark that started the cascade - an unreasonable amount of requests to one service. For a few months we've noticed an increase in the volatility of call frequency targeted towards that service, but it didn't appear to have any impact and all of the upstream systems had been handling it fine. Diagnosing something that wasn't hurting anyone didn't rank as a high priority for active effort, but we kept it in the back of our minds. We've diagnosed it now though; there was an error in how the request was being made which, in certain cases, would cause it to constantly fail - and constantly be retried.
Containers Sprung a Leak
We had a known issue involving an interaction of our container system with the operating system version in use. The problem involved a leak of memory inside the bowels of the operating system, which given enough time to build up, could halt critical system functions. Prior to this event we had never seen that manifest, but we had already executed upgrades on about 60% of Riot's container fleet anyway. Unfortunately, the update to Europe and Latin America's clusters was still in progress.
Luck Matters
And lastly, we got unlucky. We utilize a piece of software to group up the containers into bunches that can fit on a single host computer. It has constraints programmed into it for keeping these network-heavy edge services separate from each other within the same shard's set of containers. However, we couldn't control if edge services from different shards could land on the same computer, so edge services from EUW could end up sitting right next to edge services from EUNE. That multiplication of load onto a single machine was the magnifier that threw the prior two problems into a major incident.
On all of the outages, edge containers from at least three shards were put on to a single host. With a spark of traffic from the malformed requests, amplified by having multiple shards worth of that traffic hitting the same host, then the operating system memory leak pushing that machine into a non-functional state, we experienced a breakdown of the integrity of the shard, and significant outage.
Impact and Resolution
It generally takes significant effort to untangle this kind of problem and find isolated causes. When we needed to make the call on whether to go forward with Clash on top of a potentially unstable cluster, we had suspicions that the container problems were a key part of the picture, but did not yet have a handle on what the traffic was exactly. Even though the issue was not at all caused by Clash, we made the decision to delay a week in order to be sure that the Clash experience would be protected. I sincerely apologize for the disruption these events caused, and want to assure you that we have fixes for every aspect that played into them.
The code causing broken requests to be sent has been corrected, and just in case we have similar problems in the future, we've changed how the retry mechanism works to avoid them from causing large spikes. The container software upgrade has been completed on all of our shards. We have concrete plans for moving the edge services to a balancing system that can accommodate spreading them out cross-shard. Until those plans are executed, we have alerting in place so someone's phone gets turned into a screaming angerbox until the load is manually redistributed.
We feel confident that with all of these different causes behind us, this particular issue will not happen again. That said, we're continually evolving the game and its supporting systems, and situations like this are always a possibility. We’re committed to restoring service as quickly as we can when they do. Thanks for making it to the end here; if this sort of content sparks your curiosity, you can catch more in-depth League technology articles over at our TechBlog. Either way, I'll see you all on the Rift."
No comments:
Post a Comment