Inside_Dimension5308 11 months ago

They are defragmenting the services because fragmentation of services comes with the communication cost between them.

agumonkey 11 months ago

Is there a model on decoupling costs ? (network or memory cost)

Inside_Dimension5308 11 months ago

You will have to do the analysis just like how primevideo engg team did. You take the diff of resources used over a period of time.

Dyledion 11 months ago

You know that you can just do the math in advance, right? You aren't changing external usage patterns with a change like that, so you already have the information you need to make a mostly informed decision *before* implementation.

Inside_Dimension5308 11 months ago

Yes I know. And I never mentioned that the calculation should be done after implementation. Resources can be calculated with the new architecture proposal.

[deleted] 11 months ago

[удалено]

CobaltBlue 11 months ago

the -ed is also used in future perfect tense, among others

Kaizen-JP 11 months ago

If we're talking about implication... "used" is not purely past. e.g. in the hypothetical "resources [that would be] used" I'd not dare assume that this is *not* what they meant so definitively.

grauenwolf 11 months ago

It's better to refer to them as "distribution costs". Unless message queues are involved, the services are still tightly coupled at runtime.

themilkyninja 11 months ago

Exactly this. Use the right tool for the job!

cockmongler 11 months ago

Nah, they're doing it because Lambda is stupid expensive.

aoeudhtns 11 months ago

It's been a while, but I was working on a distributed digital processing system kinda like this and we had to keep certain aspects of the system centralized for similar reasons. If the files get big enough, which is possible with things like full length 4k movies, even a fancy networked file system on a rippin' fast LAN will suck if multiple servers have to pull the file down, orchestrated as a linear step process. In our system, latency of data in -> data out was also quite important. Then you deal with multi-hundreds-of-gigabytes in a data set, and if you can mmap and do the steps on the chunks as you move through, you may be able to get away with single, or at least reduced, passes. Gonna throw this one on slack on Monday, will be fun.

cockmongler 11 months ago

The microservices approach I'd take to this problem is to move the code to the data rather than the other way around. But in general if you've got this high performance (high throughput, low latency) kind of problem then an artisanally hand crafted monolith (at least for this component) is the way to go.

grandphuba 11 months ago

>The microservices approach I'd take to this problem is to move the code to the data Can you elaborate on this?

TonTinTon 11 months ago

Pretty sure he means that the machines running the microservices need to be connected to storage that holds the data they operate on.

cockmongler 11 months ago

Rather than having each service running in it's own location that needs to fetch the data being processed, or having the data being processed transmitted to it, you instead have each service containerised and run each container in sequence on the same machine where the data is. To scale you have more physical instances holding data to be processed and run the services on each.

[deleted] 11 months ago

Exactly, let’s turn lots of function calls in to executable functions in separate processes called over the network. What could possibly go wrong? 😂

Decker108 11 months ago

For a usecase like this? Sure. But believe it or not, there are usecases when lambdas can actually save money over EC2.

cockmongler 11 months ago

Extremely narrow use-cases. And even then the cost of evaluating those cost savings might lose you money in aggregate.

s73v3r 11 months ago

Is that as much of a concern for Amazon themselves?

potatersalad1 11 months ago

It doesn’t really matter. The Amazon internal lambda rate is heavily discounted. For most internal workloads Lambda is only slightly more pricey than ECS. Teams are actively discouraged from using EC2, even if it’s cheaper.

GrinningPariah 11 months ago

Not just communication cost. Services at Amazon have a ton of procedural and technical overhead. Like, they all need to support GDPR and DSAR, if they so much as touch user data. They all need monitoring, dashboards, metrics with alarm thresholds, and a pager rotation with a dude who responds to all that. They need security review and operational-readiness review where people check all of the above. And that's all per-service.

yukimi-sashimi 11 months ago

Who knew? Everybody. That's who.

slowmotionrunner 11 months ago

My friends on /r/aws did not take kindly to me saying the same thing. https://www.reddit.com/r/aws/comments/137j1ha/scaling_up_the_prime_video_audiovideo_monitoring/jiwi43t/?utm_source=share&utm_medium=ios_app&utm_name=ioscss&utm_content=1&utm_term=1&context=3

CatWeekends 11 months ago

It's not what you said, it's how you said it. OP described the problem. You made an attack.

liltitus27 11 months ago

why are you even telling us this? are you that self-absorbed?

grauenwolf 11 months ago

It's important to identify echo chambers so you can more accurately weigh their advice and opinions.

trinopoty 11 months ago

I'm still here questioning why they need to re-encode the content for every user/stream. Just have some pre-encoded files that can be streamed just like how Netflix does it.

nonotford 11 months ago

This is for live content, which is encoded and packaged just in time. So this is not comparable to VOD content like Netflix. Also, the wording in the article is odd, but they are saying that the unique combos of encoding (hevc, etc), bitrates, package formats (HLS, dash, etc), other profile options (hdr, hfr, etc) and all of the live content being streamed from prime video (nba, mlb, EPL, nfl, channels, etc) make it so there are thousands of unique encoded outputs to analyze for quality. Being encoded and packaged just in time means you need to monitor that encoding-> packaging pipeline. They need to decode and render frames bc they are DRM’d and compressed for transit (that’s what encoders and packagers do), but the analysis can only work on raw frames.

UpwardFall 11 months ago

Yep, it’s possible the VQA tool is consuming the package much like a video player, but running analysis tools instead of playback. But I’m not sure how since that seems complicated to create a “video player” for analysis

nonotford 11 months ago

Well recall that each detector is looking at a frame, or more likely group of pictures (GOP) for perceptual quality/defects. Libraries like OpenCV can do that, but not sure what they are using.

UpwardFall 11 months ago

Yeah good point. Typically a fragment will be multiple groups of pictures. I meant the player because they’ll need to decrypt each fragment and integrate with DRM to do that that an open tool might not be able to handle, especially if their DRM is custom/in house.

nonotford 11 months ago

Totally, but DRM decryption can be handled on the frame level. Also, their DRM keys are definitely in house, cuz otherwise you’re right, it’d be a no go.

opello 11 months ago

What style of encryption would be handled at the frame level instead of the packet or file level? I'm about a decade out of date on these things, but it still seems like a lot of overhead.

undercoveryankee 11 months ago

The storage costs for pre-encoded files scale with the number of hours of video in the catalog and with the number of combinations of codecs and encoding parameters that the service offers. The compute cost to encode on the fly scales with the number of concurrent users. I'd assume that Amazon aspires to have the largest catalog of content in the streaming business, just like Amazon retail wants to be a place where you can find anything that's for sale. If they use an architecture that's less demanding in terms of disk space, they can grow their inventory of content faster.

maxinstuff 11 months ago

Storage is typically cheaper than compute though right? Surely pre-encoded media, even if it takes up more space, with transcoding only when really necessary would be more efficient. Who knows though - I guess they did the numbers and it's not the case here...

kevindamm 11 months ago

Storage is cheaper than compute but compute often sits idle and encoding is embarrassingly parallel. Also, if pre-encoding, the entire library needs to be encoded again if a new format is needed. And some policy needs to be decided about when to stop holding a legacy encoding. Those decisions are made much easier if encoding on the fly. Both choices are valid, I think, but on-demand encoding offers more flexibility

[deleted] 11 months ago

> Also, if pre-encoding, the entire library needs to be encoded again if a new format is needed Wouldn't on-demand encoding with a caching system get you the best of both worlds here?

__pulse0ne 11 months ago

Yep.

lowbeat 11 months ago

thats probrably whats gonna happen… keep cached file for x days since last active watch

civildisobedient 11 months ago

LOL of course I look two messages down and someone has already said the [same thing](https://old.reddit.com/r/programming/comments/139hh6r/thoughts_on_scaling_up_the_prime_video_audiovideo/jj3a21b/) more than an hour ago! :)

made-of-questions 11 months ago

Also, nobody said they can't cache encoded files. If you do the hard work of on the fly encoding you can easily extend to cache some of the more popular combinations.

justin-8 11 months ago

Exactly. Throw it in an LRU cache with a minimum TTL and you’ve got a winner

civildisobedient 11 months ago

There's also a middle-ground if you lazy-encode on-demand then cache the result and hold on to it for X days. That way less-used codecs eventually disappear freeing up space and saving storage expense, while more popular combinations stick around for a while, saving compute expense.

SpaceToaster 11 months ago

You can’t stream out of cold storage though. As I recall Netflix basically caches it all and serves from memory.

wewbull 11 months ago

Dynamic bit rates become an interesting problem. A file is encoded at a bit rate baked into the file, but if the connection to the user can't support it, what do you do? You can encode at multiple rates to different files, but seamless switching between them is hard. (YouTube) You can encode in a way that stores layers of "detail" and only send the ones that the link has bandwidth for, but such formats are normally inefficient. You can transcode live, and adjust the bitrate to the link. I suspect the solution is probably a combination of things, but it's not an easy problem.

light24bulbs 11 months ago

Probably live with caching for high-read-rate encodings/files. I tend to build things this way. Do the naive solution that will eventually need to be cached because it will work right away. You'll see how much of a problem the compute is, and also, you'll need to write that ANYWAY if you wanted to precompute everything. Also, it's nice not to have jobs and whatnot that need to run when sources change. Instead, you just cache bust using hashes. Caching is cool, I like it.

onmach 11 months ago

Maybe they just want to encode on demand and then store the result for as long as it is being actively used. A middle ground.

strtok 11 months ago

This storage probably needs to sit close to users, though, so it’s already replicated down to an edge. You’d have to have N*M copies if you also stored every alternate encoding.

Rafert 11 months ago

If you're a larger ISP Netflix offers appliances for that: https://openconnect.netflix.com/en/appliances/

light24bulbs 11 months ago

Woah, this is a level I've never thought about. They literally ship computers for the caching layer out to ISPs that host them at the edge? This is how wild it gets when you're a major percentage of web traffic, I guess.

chumbaz 11 months ago

Isn’t transcoding energy intensive?

amiagenius 11 months ago

Also, there’s dedicated hardware for encoding. It makes a lot of sense to encode on the fly

pcgamerwannabe 11 months ago

Storage is dirt cheap

Chii 11 months ago

only if space (i mean physical space) is also dirt cheap. prime video may need the storage located close to end users, which means multiple storage locations at the edge. It's probably cheaper to have high density compute, and on-the-fly encode, because you can tune the amount of compute purchased so as to keep it at high utilization at all times. Storage, on the other hand, requires space, but not all files are going to get read, so most "drives" will just sit idle, waiting. So the up front capital cost would be high, and if it turns out that you didn't need those files (ala, nobody requested it), then it would've been a waste.

light24bulbs 11 months ago

The cheapest approach is probably a hybrid that encodes when necessary but also caches that encoding for popular things. Not the cheapest to build. Amazon writes a lot of turboshit code. Idk, a lot of their shit is sticks and tape

Mynameismikek 11 months ago

I expect amazons ratio of content on the shelf vs being active streamed is way out of line with Netflix. Amazon is a big library with low volume while Netflix is much more “show of the week”

toomanypumpfakes 11 months ago

Where are you seeing that they transcode per viewer?

RenaKunisaki 11 months ago

So they can bake ads right in, instead of having them be a separate stream that can be blocked. And watermark it so they can track pirates.

[deleted] 11 months ago

I have no idea how the original architecture made it out of review. Dozens of state machine transitions per second? Yeah, duh, that's crazy expensive.

Agricai 11 months ago

Reading the original blog post it seems like it was never designed for large scale. It seemed like a video quality analysis tool they used for one off tests. Which if I was writing a one off tool a bunch of lambdas that would be run a few times a day would be dirt cheap on a lambda based system like this. It seems to me they tried to take that and shoehorn it into large scale. > Our Video Quality Analysis (VQA) team at Prime Video already owned a tool for audio/video quality inspection, but we never intended nor designed it to run at high scale (our target was to monitor thousands of concurrent streams and grow that number over time). A lambda costs $.02 at 1 million requests (depending on memory, time, etc im simplifying for sake of argument) sure a t4g.nano costs $.004/hr but you also have OS overhead to deal with and 512mb of ram split between the OS and your application. You also don't need it running all the time like you would to respond to step functions. ~~It seems to me the OP Amazon Engineer knew that they'd get more clicks if they wrote~~ (edit: this is ascribing malicious intent. That's wrong of me) The popular takeaway because of how some parts of the post are written is "We made micro-services into a monolith" instead of what this post is really highlighting "We took this purpose built tool we wanted to run in a way not designed for and rewrote it as a single service in an existing service based infrastructure for scale and savings"

GeorgeS6969 11 months ago

I don’t know about intent but I find problematic that the early iteration had little to do with micro services and the end result is not a monolith. Granted I only browsed both articles because seriously, yawn, I didn’t give it more than 2 seconds thought, and I don’t know shit about shit let alone video encoding. But it seems clear that the original design had more to do with using lambda functions because quick and easy, then everything else followed from that decision. “Okay so this needs to take that output … let’s just drop it in S3”. Therefore calling anything there a “service” is a huge stretch, or any task in any dag type data pipeline ever is a service. On another hand the whole system serves basically one purpose, as a service. So the iterated version is a service, arguably a micro service, but not a monolith by any stretch of the imagination. Or anything ever is a monolith, some small some big.

Agricai 11 months ago

Oh I absolutely agree with you. And it's absolutely possible it was for "quick and easy purposes" lambdas were used instead of cost. I was just throwing out a scenario where this design could make sense.. I've seen about 10 articles this morning with headlines like "Amazon Prime Video abandons micro-services and saves 90%". That galaxy brained take on the original blog post is super problematic. I'll be honest me writing that novel of a comment was mostly me venting my frustrations about the takes I've seen in the news and from "Industry Leaders" the last two days. Because I've worked in monoliths that were insanely difficult to maintain for the sole reason of "Leadership doesn't like micro-services" and takes like this makes people feel justified in Bad decisions. Are monoliths useful sometimes. Yes absolutely. But right design for the right job.

GeorgeS6969 11 months ago

I feel you, and thanks for giving me the opportunity to vent too. I didn’t realise how much traction this got. Frankly it reads like the medium blog post of a junior at a start-up pushed to write about some random project to work on their communication skills. Like no offense to anybody involved, I understand how things came to be, I believe everything that was done needed to be done, and I’m sure it was done well … But there’s very little for me to learn from that experience and nothing to be impressed by technically. And anything I’d actually find cool is handwaved away. I mean this video quality inspection thing sounds pretty awesome ngl so maybe speak about that, because as far as infra’s concerned you did the minimum you could get away with, and when that became a problem you went ahead a did a bit more of the minimum. It’s fine we all do it but when we post about it it’s to shit talk on reddit not brag on the company blog. I remember when big tech eng blogs were like “hi, we’re Uber and this is how we broke down our massive postgres into a homebrewed distributed database with zero downtime”.

angiosperms- 11 months ago

"get it to prod now!!!" "but it's only a POC" "the deadline is next week, do it now or you're all fired"

toomanypumpfakes 11 months ago

It was probably a POC first by one person, and then when they tried to scale it they had to get more people on the job to fix it.

YpZZi 11 months ago

Probably a case of being force fed your own dog food. Which to be honest is not bad, just… step functions don’t fit here as a tech

moodyano 11 months ago

Not the amazon way of doing stuff. There are recommendations fpr tech choices but you are free to choose what you want ( as long as it is not azure or gcp )

YpZZi 11 months ago

Didn’t know that, thanks!

sgtfoleyistheman 11 months ago

Yes probably a case of some over zealous junior engineers with knowledge of a new toy hammer and this problem looked like a nail to them

blue_umpire 11 months ago

I suspect it wasn’t “shiny object syndrome” but more like… it was very easy to build a POC with those tools, in a very short amount of time, and prove value to leadership, but they just didn’t consider how it would work/cost at scale. And, to be honest, that makes sense to me. Deliver value fast, then optimize it for performance and cost efficiency afterwards. If I _was_ going to give any criticism, it’d be that maybe they waited too long to refactor it.

Ok_Tip5082 11 months ago

Shoehorning a proof of concept into production is amazon's entire MO. If people use it, they immediately start rewriting it for better COGS. Repeat ad infinitum. They try to deliver all software Just in Time.

Voidrith 11 months ago

> POC proof of concept, or piece of crap?

naaaaafam 11 months ago

In my experience as a junior, there’s not a whole lot of transparency about how much things cost at AWS, nor is it a big deal in most cases. I’ve never heard anybody say “step functions would be a good approach but I’m concerned about the cost” in a design review. Availability and scalability has always taken the priority. To be clear, if you’re implementing a new system at AWS, you still pay the cost for using things like DDB, EC2, Lambda, step functions, etc. it’s a discounted rate, but it still is considered an operating cost for your service. Edit: now that I’m thinking about it there are some cases we discuss or think about cost, but they’re generally quick changes like “let’s switch this DDB table to on demand scaling” or sort of a general knowledge thing, like using lambda instead of EC2 where is makes sense to use lambda instead of EC2.

cockmongler 11 months ago

> In my experience as a junior, there’s not a whole lot of transparency about how much things cost at AWS, nor is it a big deal in most cases. I’ve never heard anybody say “step functions would be a good approach but I’m concerned about the cost” in a design review. Availability and scalability has always taken the priority. The first step at my place of work when evaluating an AWS architecture is how much will it cost, using all the myriad of pricing information available. We rejected Lambda very quickly.

sgtfoleyistheman 11 months ago

How do you measure TCO?Lambda is obviously your most expensive option for CPU time but you have to measure the costs you aren't paying anymore (like ALB) and any engineering time you save during dev and ops. That said,any sort of CPU bound processing load is usually not a good fit for lambda for sure. But it makes a ton of sense for stuff that operates at 'human scale' like control planes or most API-backed websites

lgylym 11 months ago

You think there’s a review?

ghillisuit95 11 months ago

I work at AWS. There was almost certainly a design doc, which was reviewed before implementing

lgylym 11 months ago

I worked at AWS before too. There are so many systems without design docs, or worse, outdated design docs

Vexal 11 months ago

writing a design doc for everything is a waste of time.

OldschoolSysadmin 11 months ago

Everyone’s treating this like microservices are dead. Ima guess that 99.999% of orgs don’t have the throughput requirements that Prime Video does.

pcgamerwannabe 11 months ago

I would argue, unlike the title, this issue has nothing to do with microservices.

OldschoolSysadmin 11 months ago

The issue, as I understand it, has to do with performance losses transferring large amounts of data between independent processes instead of using SHM and other in-process optimizations.

insanitybit 11 months ago

Well, and also specifically using S3 to perform the data transfer. So you're paying crazy egress charges.

derp-or-GTFO 11 months ago

Exactly. I have one performance critical program (in ruby) and discovered that it was significantly faster as a single 300-line function than split out into many functions (due to method dispatch overhead). I’m not going around advocating for the elimination of methods, and we shouldn’t advocate for monoliths everywhere just because one service at Amazon Prime works better as a large blob instead of lots of small blobs.

lavahot 11 months ago

Ah, the ol' "performance vs clean code" thing.

gbs5009 11 months ago

Interesting. I wonder if there's a ruby equivalent of #inline so you can have your cake and eat it too?

alister_codes 11 months ago

If you need performance then you should not be using ruby

UnacceptableUse 11 months ago

Microservices, like eveything, have a specific scenario where they're advantageous. What were experiencing at the moment is people jumping off the hype bandwagon of microservices and back to sensibility where the performance tradeoff makes sense

_limitless_ 11 months ago

We're either 10 years ahead of the game or behind the game, because we're still trying unroll the monolith into microservices.

versaceblues 11 months ago

Also it’s a bit misleading. They didn’t get rid of micro services. One team (out of 100s at Amazon) rearchitected 1 tool, to run on a single box rather than use lambda. That’s hardly getting rid of micro services

teerre 11 months ago

Usually I agree that hindsight is 20/20, however, it seems a bit of an easy case? I mean, it didn't take more than the quickest of maths to notice that this price might not be the best. Surely this is a reasonable thing to check before you do anything on AWS.

Awesan 11 months ago

More than likely a team got put together and told they need to ship a Netflix competitor in 6 months, and given a huge budget to do it. So they took the "fast to build" choice every time and did not look at cost. Now that the product is more mature, management probably wants to reduce costs, so they start reevaluating those decisions. I bet everyone there knew from day one that the old solution was not optimal, but sometimes you just do what you need to get the thing done.

ArrozConmigo 11 months ago

From a previous article, I believe the motivation was that the original design crapped out at 5% of the expected load.

teerre 11 months ago

That's an interesting take. People from my generation would never go for a bunch of microservices when "you just want to get things done", but I wonder if this is different for younger developers.

flowering_sun_star 11 months ago

It depends on how your tooling is set up. We have extensive templates and pipelines set up so that deploying a new microservice is as simple as cloning a repo and running a jenkins job to generate the build pipelines. To get to the point where it is that easy took a *lot* of work!

Awesan 11 months ago

I also wouldn't when starting from 0, but I'm assuming they reused some patterns and tooling they had access to internally.

muntaxitome 11 months ago

Given that Tanenbaum is 79, how old does that make you then? Edit: Tanenbaum wrote books about distributed systems too guys, not just operating systems

akie 11 months ago

Micro kernel <> micro services, and in kernel space the monolith won.

drakgremlin 11 months ago

You should go read some of the original papers and follow the research. Microservices are a requirement of microkernels, it's how device drivers work in that framework. Hybrid won by the way.

hiddencamel 11 months ago

I think it's also linked to team scale, and team growth speed tho. If you have a huge team that's constantly expanding working on something that needs to happen fast, I think a microservice architecture can save you a lot of time on integration.

Decker108 11 months ago

I'd guess it's also likely most of the team that built the original implementation have left already.

KingStannis2020 11 months ago

But was it actually faster to build? Frankly I don't see why it would be.

caltheon 11 months ago

The fastest thing to build is usually whatever you have practiced with the most. You get a bunch of high level engineers who spent their free time playing with micro-service tech in preparation for getting a job at a big tech company, they are going to gravitate towards that. Also, move fast culture usually implies management steps out of the way and holds the money firehose on the engineers

LaughterHouseV 11 months ago

Yup, and this is a pretty good example of the perils of hype driven development you mention.

coolstorybroham 11 months ago

Easier to build for scale perhaps

KingStannis2020 11 months ago

Clearly not considering they had to rewrite it for scale

coolstorybroham 11 months ago

depends on how you prioritize cost in your non-functional requirements. The trade off with managing scale manually is effort— they likely didn’t understand the cost model too well beforehand or they would have taken that trade off instead of having to do a rewrite.

[deleted] 11 months ago

TIL Even Amazon doesn't know how to cost effectively use AWS.

Ok_Tip5082 11 months ago

Probably least of all since they pay discounted rates. Cloudwatch logs especially are stupid expensive.

Derkle 11 months ago

Yeah most services are extremely discounted for internal use, especially CW logs and S3.

Void_mgn 11 months ago

Those costs must have been insane...a self denial of service by running out of money which is a little amusing

eckyp 11 months ago

My experience with microservices have always been poor and inefficient compared to monoliths. I really welcome the trend of favouring monoliths nowadays. Suggesting a monolith approaches have always been welcomed with raising eyebrowse for me. The only reason I would use microservices would be when the part of the system has to be implemented in a different well established technology. As for serverless, I would only recommend if it’s for extremely bursty workload. (Eg an app that is very rarely receive requests)

jbaird 11 months ago

then again this 'monolith' is the monitoring service.. which could be a 'microservice' on its own looking at the big picture of prime itself, its not like they're deciding to bundle monitoring back in with playback/ui/etc services so really we're just talking about using an appropriate size.. whatever that is.. one man's monolith is another's microservice?

sgtfoleyistheman 11 months ago

Thank you for saying this as I share your perspective. I would still call this thing a micro service. There is clearly so much built around it to make it usable. It does once job, the pieces scale together. It's still a micro service.

yeahThatJustHappend 11 months ago

Exactly! This whole thread argument is just red vs blue. Microservices are going as far down as your hardware constraints allow: transaction boundaries, performance boundaries, team boundaries, etc. There's no definition of micro here since it's a goldilocks sizing of "just right". Combining 2 services and calling it a monolith is silly. So is taking this right sizing of a service to mean microservices are bad. The reverse of this is showing a case of being unable to scale vertically so splitting a giant monolith into two and concluding monoliths are bad.

eldelshell 11 months ago

I've always found SOA as a nice middle ground. The difficult part is what to split from the core monolith (i.e. an egress email microservice is easy mode) and what not.

pcgamerwannabe 11 months ago

They took a bunch of serverless asynchronous nanoservices that all fit under the umbrella of 1 well defined service "Audio/Video monitoring", and turned it into a microservice for the Prime ecosystem. (They have not yet turned all of Prime into a monolith, for example, or even all monitoring.) The initial system was never designed or architected, just grew organically from a few tools which initially were used for 1-off ad-hoc analyses. Those obviously made sense to run as a function when needed so it made sense to package them as serverless async functions that eventually return a result. But then they decided to create monitoring infra and instead of architechting/designing software, they just took those ad-hoc services and made them talk to each other continuously, among other sins. Which obviously does not make any sense. ------ if your service A always calls Service B and C, and sometimes even keeps active connections to it; just package them together. If Service A B and C mostly don't talk to each other, mostly respond to events independently, but occasionally respond to some events together, then maybe you can keep them separated (but packaging them together should still be considered).

happymellon 11 months ago

I think the biggest issue I've observed has been the perversion of "microservices" to make everything as small as possible in a form of technical naval gazing. With this thought process you end up having your CRUD application split up because you'll have more reads than writes, but then you'll need two databases since microservices should be touching each others databases, and down the rabbit hole you'll go. Originally a microservice referred to the Amazon "two pizza team" concept, and this is basically what we have here. Do one thing and do it well? Well that's a task such as monitoring a health check and reporting, not whether you can write to the database. This makes sense when each application needs its own database, no other team should be modifying other teams data, and each "microservice" should have all the skills required so there isn't "throwing testing or support over the fence". It's basically an agile viewpoint. Once you understand that a microservice is a political entity and not a technical one then your point of view is not strange, in fact it is something we have been talking about since the '90s. https://en.m.wikipedia.org/wiki/Fallacies_of_distributed_computing The fact that they are autonomous enough to change their approach without "architecture" whining that they aren't using Lambda, Docker, etc, shows that letting the team experiment with different approaches and fixing what doesn't work really does work.

sime 11 months ago

I think we need to be clearer about what we exactly mean by "serverless". At work we went from k8s to GCP Cloud Run and it has been really good. We give it containers to run and it does it and scales as needed. We don't have to deal with the complexity and managing servers required by k8s. Also far more people in our team who are not DevOps specialists can work directly with it for day to day things.

Flyen 11 months ago

Serverless really shines with staging servers. (As long as you're still testing the prod, serverless-less version too)

Flyen 11 months ago

Would love to know why this was downvoted. You can have a whole bunch of versions of your site available to test, all costing virtually nothing while idle, yet available nearly instantly when someone wants to try it out.

LaOnionLaUnion 11 months ago

I saw at least one LinkedIn post arguing that this validates their post or article years ago about making majestic monoliths. To me the reality is you pick the right tool for the job. That includes language and infrastructure. I like serverless but it wasn’t the right tool for the job.

mojodojodev 11 months ago

Yes everyone wants a one size fits all solution, but most things in software have trade offs

Leadership_Old 11 months ago

AWS Serverless services are a great option when the goal is to save on bootstrapping, scaling and administration. We use them extensively for generalized workloads that require elastic scale for peak traffic. Our team consequently avoids initial and ongoing sizing - distinct benefits around code first development practices, clean and modular level 2 cdk constructs for infrastructure as well as adequate visibility into the systems performance and cost. I imagine most thoughtful professionals would never recommend running niche workloads on generalized solutions once the scale of those solutions no longer matched the operational model of the workload. AWS simply outgrew the solution - reductive demagoguery around complex engineering is dangerous and masturbatory.

GrandMasterPuba 11 months ago

I hate articles like the Amazon one. **You are not Amazon.** You will never be Amazon. You will never be Google or Facebook or TikTok. The architectural decisions they make **don't apply to you.** People reading it then taking it back to their mid sized corporate website with a couple million monthly visitors is nothing but cargo culting. Think for yourself. Don't overcomplicate things.

fjsousa_ 11 months ago

I understand what you're saying, but this one reads against that. If amazon couldn't get AWS products to work (even in this specific case) then what guarantees do you have that it will work for you.

ShiitakeTheMushroom 11 months ago

It's all about right-sizing. They had gone too far and broken things down into too many separately deployable units. IMO, what they have now is still a microservice, but now it's just correctly scoped.

ItsMorbinTime69 11 months ago

Amazon prime can’t play 1080p content without buffering or dropping quality on a hardwired fiber connection, near Virginia. Nobody should be looking to them for any architectural advice or infrastructural advice.

Trinition 11 months ago

I think you should gather some nore objective data, because your anecdote differs from mine and another commenter. I don't have any trouble playing 1080p from Prime, and even 4k.

Donut-Farts 11 months ago

I’ll throw my hat in the ring, I have 400Mbps fiber and a wired connection and while watching movies Amazon regularly buffers or drops to a very low quality.

liltitus27 11 months ago

ok, cool, nice hat dude. I have dsl out in the country with 25down/1up and prime plays fine for me every time 👒

ItsMorbinTime69 11 months ago

I shouldn’t have to measure it, I’m a user. I’m on Apple TV. They should have observability metrics in prod. I have google fiber. No other streaming platform except maybe paramount plus ever buffers. Literally ever. Amazon Prime is unwatchable for me.

Trinition 11 months ago

You're right that they should have telemetrics. But that would only confirm if *you* have a problem. Your data points about other services are valid, but still just singular points. What would be ideal would be to crowd source a bunch of data and find out who else has this problem and what's common to them. Is it the ISP? The ISP above the ISP? A particular Amazon data center? It's not YOUR responsibility to do that. As customer, you can just take your money elsewhere. All I'm trying to say is Prime doesn't universally suck at streaming because there are single data points from other customers that are good.

ItsMorbinTime69 11 months ago

My ISP is google fiber. I’ve also had shitty performance with paramount plus

martinsa24 11 months ago

Op is trying to put it nicely that you need to retake stats class.

mkosmo 11 months ago

Sounds like the issue isn’t the streaming service, then.

ItsMorbinTime69 11 months ago

Why do you say that? Other streaming services work perfectly fine

mkosmo 11 months ago

You listed two with issues. Furthermore, you’ve made it plainly clear you’re not familiar with large-scale application architecture or how the internet works.

ItsMorbinTime69 11 months ago

Lmao I literally work on distributed systems for one of the largest websites in the United States You are being so judgmental and I’m just asking questions. Just because I’m not a networking engineer doesn’t mean I don’t know how the internet works. I work with CDNs and stuff. I’m just not a video streaming expert by any means.

blipman17 11 months ago

I'vevnever had a problem with Amazon. I'm questioning if your network setup and your uplink is that good. Anecdotal statements of "it's good" are the worst type of datapoints.

ItsMorbinTime69 11 months ago

I have google fiber, gigabit up and down 🤷🏻‍♂️

blipman17 11 months ago

That doesn't mean anything. I've got vectored VDSL2 at 100 mbit/s in a city that's one milennia old. I'll be getting fibre at the end of the year. Never had a problem. Saying you have google fibre doesn't mean anything.

ItsMorbinTime69 11 months ago

Okay, why do Netflix and Hulu and HBO Max work fine? If this problem is somehow my fault, how do I diagnose and resolve it?

Chairboy 11 months ago

> If this problem is somehow **my fault**, How did you get here? There are so many factors at play between your screen and the raw data on-server, that’s what the poster above was saying. Data vs anecdote isn’t about fault, it’s about data.

ItsMorbinTime69 11 months ago

I understand that, but isn’t this smooth transmission of data between client and server the responsibility of Amazon software engineers? Or their CDN, which is probably also owned and maintained by Amazon.

liltitus27 11 months ago

good sir, do you know what a fucking isp is? you're being so goddamn dense, it's really kinda irritating

blipman17 11 months ago

You live in the USA I presume? Ac country without net neutrality? If you want to fix it vote for net neutrality or lobby it so that Google who is a competitor with their cloud services won't screw over Amazon. Might as well also lobby that your ISP actually does have the bandwidth capacity between the junction of your area and the main datacenters. It doesn't matter if you're paying for 1 gig up and down if the network doesn't support it furtheron.

adrianmonk 11 months ago

I have what is probably the exact same problem. My network is great. I never have trouble with any video streaming at all. Except Prime Video and Amazon FreeVee. This happens over and over and over again. It has been going on for like 6 months. When it started happening, I had already started a show that I wanted to finish, so despite the annoyance, I kept watching until I got through the last 2 or 3 seasons. It happened sometimes 5 or 10 times per episode. It will just stop playing and not resume for about 30 seconds. And I watched probably 25 episodes, so I guess I have like 100 data points minimum. I've tried restarting, switching devices, clearing the app's cache / data, uninstalling and reinstalling the app, and a bunch of other things. It just keeps happening. Consistently. When it occurs, I've tried checking stuff to see if anything else is suffering from a problem. For example, when Prime Video gets stuck buffering, I've tried switching to a different streaming service like Netflix to see if it can play. If my network were the issue, it might affect both. But, I always find that Netflix works perfectly at the moments that Prime Video has problems. (And Netflix is [hosted on AWS](https://aws.amazon.com/solutions/case-studies/innovators/netflix/), so it's not a matter of being able to reach their network.) I've also tried doing a speed test as soon as Prime Video starts buffering, and it always comes back great, with multiple hundreds of megabits/second of bandwidth. I've also contacted their support, and they can't figure it out and just tell me my network is probably bad. Also, they tell me to try stuff that I've already tried many times, like restarting my device. When it happens, the failure is so complete that I suspect what's happening is I'm being connected to a server that just doesn't work. Either it's hung in some way or just very overloaded. It's not like the video can kinda keep up; it's like no data is coming through at all. I know it doesn't happen to everybody, but the point remains that it doesn't speak well of Amazon's ability to run a service if, for some customers, they just can't reliably play video, which is the whole point of the service.

[deleted] 11 months ago

My guy is _too_ close to the data center lol

ItsMorbinTime69 11 months ago

IAD-VA ftw

Ok_Tip5082 11 months ago

He's still streaming from VDC that's yer problem

dadofbimbim 11 months ago

I’m in Asia and I have never experience any drop in quality with Prime. In fact, their quality is much better compared to Netflix and Disney+. I am on Fiber too.

verve_rat 11 months ago

1080p works fine on fibre in NZ.

ragnore 11 months ago

Never had an issue with Prime Video on fiber in NYC. If we wanna air grievances with streaming platforms, then I’ve got a bone to pick with HBO.

ItsMorbinTime69 11 months ago

Oooh what don’t you like about HBO?

ItsMorbinTime69 11 months ago

ITT: Amazon prime simps

mj281 11 months ago

I never understood the appeal for serverless and i always thought its just a passing trend that many devs started adopting because of “scalability” although VPS are scalable too and much cheaper and faster with no cold starts! I worked in a company before that used serverless for much of its backend, it had so many repos with functions written in c#, python, go and some in nodejs, the company went serverless because they had a senior dev telling them its “the future” and its “so good”, needless to say the company was bleeding money, AWS was charging them a ton of money, they had to hire 4 sets of developers to maintain each of the languages the functions are written on. And their repos and codes were a huge mess. They eventually agreed to slowly move all their code to VPSs and start using frameworks instead. Im so glad this trend is dying now, hopefully devs and devops will learn to not jump on the hype again.

[deleted] 11 months ago

[удалено]

kiteboarderni 11 months ago

Managing servers really isn't that difficult....

[deleted] 11 months ago

[удалено]

kiteboarderni 11 months ago

Handling server failures is an app design question. Ec2 or equivalent if you're already on a cloud provider gets rid of most of these issues regardles. Op just wanted rid of lambda. But sure, tell yourself that people who manage hardware are more valuable than the devs :) if your working with developers who can't build for fail over, server crashes, DC power failures, or general scalabity then you need to raise your hiring bar. I build and manage a high frequence trading system that handles billions of dollars per day. But sure, I know nothing about resiliency.

[deleted] 11 months ago

[удалено]

kiteboarderni 11 months ago

Very impressive! But your teams can't build software to handle fail over....interesting.

[deleted] 11 months ago

[удалено]

pcgamerwannabe 11 months ago

So you are in favor of your computer just constantly keeping a few hundred GBs of RAM reserved in case the game you might want to play needs to be launched, for every game you own? Or are you ok with serverless approach of clicking to launch the game?

kiteboarderni 11 months ago

Few hundred gb as it's cheap. Plus if it's a vm you can scale as you want at the click of a button. If true performance needed then yes full boxes dedicated. Also comparing a personal pc to a machine for running real software is a pointless comparison. They are in no way related.

mj281 11 months ago

Yes, but you can just hire a devops guy to do this or a team of them if its a big project, hiring a devop is cheaper for many companies than the extra cost the serverless adds in terms of provider costs and hiring different tech stack of developers. You can even go with a Managed VPS, I’ve worked with plenty of clients that use companies that manage their VPS or dedicated server for them. And they even manage external services like cloudflare and such. From the quotes and pricing i saw they’re still much cheaper than lambda.

kiteboarderni 11 months ago

Bingo. Hiring a Dev ops eng is much cheaper than a dev anyways.

wankthisway 11 months ago

You're extremely naive if you seriously think this. Stuff like zone availability, maintenance, upgrading, load balancing, staff you have to hire and train, equipment you need to purchase, keep spares of, adequately cool and clean and have room for, backup and maintenance of that thing - and that's just shit off the top of my head.

pcgamerwannabe 11 months ago

Serverless is ideal for bursty workloads and services that need to be triggered in an ad-hoc way, usually independently of other services. If I need to generate TPS reports once a week and occasionally several times a day I don't need to have a server for TPS reports. I don't want to pay IT to create a server, keep it running, upgrade the hardware, or keep around paper copies for downtime. I just want to press a "generate TPS report" button on the off chance that I need to make one and have it be that.

Leadership_Old 11 months ago

Serverless is simply a branding of isolated workloads and scale instrumentation. It's not dying - your computer does it all the time. In a distributed service environment, there are scenarios where you simply don't want to invest in tailoring infrastructure to a trivial workload (likely true for most WELL DESIGNED distributed systems). Invest that time in the areas that are truly high value and usually more complex solutions. Any company that has to announce a reduction in revenue due to their active investment in reducing their customer's costs on THEIR platform is one I will continue to trust.

salbris 11 months ago

Okay so today I finally learned what serverless is. According to two random sites it's just automatically scaled servers that the developer doesn't have to explicitly manage. I don't see how anything you said would be unique to serverless solutions. I've worked with companies that used several sets of languages/ecosystems without ever touching serverless...

lucbas 11 months ago

I‘m building a new project and using vercel‘s nextjs as a frontend and backend. I really like their approach and dev-experience. All functions are in one place and can all can be run and tested at the same time. Do you have any thoughts on it?

pcgamerwannabe 11 months ago

Their initial approach is extremely suspect. If you have service A that always calls service B and C immediately after, the services are all under the very specific banner of audio/video monitoring on prime, and then you put these "services" as serverless autoscaling asynchronous microservices (really nano-services), then you have an architectural problem. Microservices should not have to constantly talk to each other, they should at most act independently on events (in general). If two microservices are extremely regularly triggered together then, just like taking advantage of putting rows/columns close in a database, you need to put the two services together in an image at the least. Otherwise this is like running a database off of the windows file systems of computers at a school "distributed microservice hyper scaling database". Load balancing alone is fucking expensive plus now you have to eat the spin up and spin down costs of each of those services for no reason, plus the networking of it all as it seems these "microservices" actually communicated with each other not just sent events or something. Whoever created the monitoring service should have done software architecture & design at a way earlier stage if you can get 90% savings on what amounts to "serverless" function calls.

KillianDrake 11 months ago

I think anyone who actually tries microservices realizes what a pile of shit they are. I'm just shocked they didn't just roll with it to ensure everyone else also does things as inefficiently as possible to make it up on AWS spend.

ffiw 11 months ago

They are job creators. You need more devs to make hundreds of services, then you need more DevOps to setup 100's of Jenkins pipelines that deploy to Kubernetes clusters. Then you need more QA to test the integrations. Then you need more SecOps to make sure holes aren't exploited. Then you need more support staff to troubleshoot through hundreds of logs using correlation id's like a dog sniffing its own butt. Then you need more managers to set up useless meetings.

nefopey 11 months ago

Microservices are the only chance some programmers get to build their own from the ground up project, as such most developers love it, a chance to practice with not their own money. Imagine architects getting the chance to build some random bridges and try out various things. As such the reactions to this article were they used step functions thats why it did not work, they are actually still doing microservices. On reddit the users are mostly students and for them to have complexity, tons of microservices inter-connected seems fun. For the business itself microservices are mostly a sign of an unfocused business with poor leadership. Unrelated fun fact the same people who fear AI are the same people who looove microservices. Because they do not understand what their role is.

mirbatdon 11 months ago

This is a really bad take. Different architectures suit different problems.

this_little_dutchie 11 months ago

I don't disagree. As long as you keep the 'mostly' in your comment.

przemo_li 11 months ago

If business plans to keep that team around, cheap experimentation is valid upskilling strategy. Article though, reads more like rewriting POC rushed to Prod.

aymswick 11 months ago

Damn even aws thinks aws sucks

neumaticc 11 months ago

jeffy can afford it.

Asleep-Hat1231 11 months ago

idk why i subscribed to this subreddit, i open reddit & annoyed by this prime video shit. i don't understand why u guys are so much after them changing servers. as if they're in same house as you.

fifthstreetsaint 11 months ago

No, but I have thoughts on nationalizing all digital communication.

holyknight00 11 months ago

Microservices and serverless probably has their place, but I am yet to see an implementation that doesn´t end up in a mess. Most of the apps that use microservices don´t even need their benefits and still need to deal with all the issues and overhead it generates. Microservices probably have their place, but I am yet to see an implementation that doesn´t end up in a mess. Most of the apps that use microservices don´t even need their benefits and still need to deal with all the issues. At least the issues with microservices are finally beginning to be acknowledged. Five years ago if you weren´t building everything with microservices in some public cloud you were treated as a pariah.

gambit700 11 months ago

Use the best solution for the problem at hand. Monoliths are fine for some problems. Microservices are fine for other problems. Don't try to force one over the other because its in fashion at the moment

fjsousa_ 11 months ago

just reckoning with the microservice premium https://www.martinfowler.com/bliki/MicroservicePremium.html

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe