Fixing a Thread-Safety Bug with Nate Berkopec

Download MP3

Stefanni: Hey friends, Stefanni here.

Welcome back to another hexdevs episode.

Before we go to the episode, we
want to thank Nate Berkopec for

joining us on this episode about
building tread-safe Ruby Code.

As some of you might know, we help out
at Faker Ruby, and we had an open issue

that was related to thread-safety.

And we were not quite
sure how to fix that bug.

So we sent a message to Nate and
he kindly accepted the invite.

And not only did he help us understand the
issue, how to fix it, but he also gave us

a lesson on building tread safe ruby code.

Something else that we want to share
is that this episode, and actually all

of our episodes are sponsored by Get
to Senior a program that we developed

to help experience to Ruby developers
take their careers to the next level.

And we also have our Git Hub sponsor page.

We have two sponsor: Valentino Stoll
and Gregg P who have been supporting us.

So thank you.

And if you, dear Listener,
wants to sponsor our work, go to

hexdevs.com/get-to-senior or to our get
GitHub page to see our sponsor page.

Thiago: This is the hexdevs podcast.

I'm Thiago.

And I'm Stefanni.

Today our guest is Nate Berkopec.

Nate makes Rails apps go faster.

He's an expert in Ruby
on Rails performance.

And he runs a company called
Speed Shop, a software performance

company specializing in Ruby.

He's the maintainer of Puma, one of the
most popular Ruby Web servers out there.

And he is also the author of
the book, the Complete Guide

to Ruby on Rails Performance.

Thank you so much for joining us, Nate.

Nate Berkopec: Hey, thank you very much.

Stefanni: Yeah, we just gave
Nate some homework to do.

We sent him an email.

Nate Berkopec: Yeah.

What the heck, man?

I thought this was a
podcast, not like a class.

I had to go take,

Stefanni: Well, we had a, an issue that
was opened on faker not being thread safe.

So we didn't know exactly what to
do, how to solve the issue, and

we thought it would be a great
opportunity to bring Nate, who's an

expert in all of that, to talk more
about why faker is not tread safe.

And then we could also use this
example to understand more how

to build thread safe Ruby apps.

So thank you so much Nate.

And sorry to give you homework.

Yeah.

So I would like to get started with
a heads up about what caught your

attention about the issue, why.

Is faker not thread-safe?

Nate Berkopec: Yeah.

So you, you brought me in, uh, and
sent me this link to this report.

So this was actually after
you thought you had fixed a

threadsafety issue on faker and so.

Basically in faker.

So if you're not familiar with
faker, faker is like this extremely

commonly used gem for, uh, generating
fake data, usually in tests.

So you can like, ask for last names.

First names, like all, it's like anything
you can think of, faker can pretty much

generate a random version of it, right?

Mm-hmm.

So, Faker also works with locales.

So you can have, you know, Japanese
data, English data, whatever.

Right?

So you need to tell it what locale it,
it, it's gonna generate that data for.

So there is a setting and that, uh, locale
setting, uh, I think has always been,

I don't think this was changed ever.

'Faker::conf.locale ='. Okay,
so that was how you said it.

You said that to a, a symbol, I believe
you had said it to, you know, en-GB or,

or JA or whatever your, your locale is.

So there was a, a PR for two included
in 2.23, which changed how this worked.

And originally, and
this was your PR Tiago.

The, the PR that I'm about to
describe that change, or is it not?

Thiago: No, I just approved it.

Nate Berkopec: Oh, you just approved it.

Okay.

Thiago: I did the review and then I'm
pretty sure someone else worked on that.

Yeah.

Oh yeah, it was someone else,
but I gave my thumbs up.

Nate Berkopec: It's still your fault.

I see.

I can, yeah, for sure.

Oh, everybody's fault.

Blame the reviewer.

Um, so the way that this
originally worked was faker config.

The module had a, uh, locale.

Class variable.

So you would set the class variable
with locale equals, and you

would read it with locale, right?

So that's works.

That's fine.

You know, it means that all threads
and everything, like if you create

a new thread, it's still gonna
read that same value, right?

So that, that all works.

But, If more than one thread
tries to write that value,

we start to have problems.

Right?

So somebody basically made an issue
that was like, I use faker in QA.

And in QA we're sending different
requests that have different locales.

And so the faker.

Data, uh, faker is like setting the
locale in one thread and then being

that a different thread is reading that
locale, which has now been overwritten

and we don't want that to happen.

So basically the request was
that faker should be thread safe.

But to to, to rephrase this as more of
like a story, the desired behavior is,

and you can correct me, this is what
I'm a little unclear on, so you can, you

can correct me if I'm wrong about this.

I think the desired behavior.

Is that we should set a default
locale, uh, in a, in configuration,

like when we start the app.

And if we don't say anything else,
this will, this will be the default

locale, and then threads should be
able to, in a thread safe way, change

the locale on a per thread basis.

Okay, so you could, you know, if this,
if this was, if, if this feature was

available, you could, in Puma or any other
multi-threaded web server, you could set

the locale in a maybe rack middleware
or something, I don't know, like in a

before act in a around action in Rails,
you would change the locale in faker and

then do whatever work you wanted to do,
you know, run the rest of the action and.

In an ensure block, you could say, set
the locale back to what it was before.

And um, that would be how
you could change that locale.

And then you, then you could have like
per request locales in a thread-safe way.

That.

Is that, is that correct?

That sounds, that sounded like
the desired behavior to me.

Thiago: Yeah, that's the desired behavior.

I think we had that before,
but it wasn't tread safe.

And then someone wanted to have
different threats and not have one

overriding the locale on the other
tread, and then that was fixed.

But then the other person say, Hey, but
then you fix my set up now where I run

this, I guess in production, I think.

Mm-hmm.

When someone else runs it in production.

And now if I use faker in production
and I set the locale, It only exists

like in one of the requests and then the
others don't, don't see that anymore.

And so it's kind of like one fix
broke someone else's setup, and

then we have to fix both things now.

Nate Berkopec: Right.

So the, the, the change that was
made to try to implement this story

was that, uh, locale is no longer
just, uh, an at a class attribute.

Uh, class, uh, instance class variable.

It is a, uh, it sets thread dot
current, uh, open bracket, faker

config locale, close bracket.

And we're re so we're, we're writing,
writing and reading thread dot current.

Some thread, well, we thought it
was a thread variable, and then

you, you know, you learn later in
the issue is people read the docs.

It's actually fiber local.

So that was the change.

The instance, the the locale instance,
uh, I keep saying instance, it's

not an in class instance variable.

It's a class variable.

That class variable locale was
removed, completely removed.

And there's some other stuff in here,
which I actually haven't looked at there.

You have some other like
thread dot current use.

Like you, you've removed basically several
class variables replace them with thread

dot current is kind of the theme here.

Uh, local, like this local setting
is kinda the one I'm just going to

talk about it cause it's the one
I looked at and it's, it's a good

example of everything else here.

So then the bug report comes in
and per this person's like this

broke something else for me.

So, uh, you set up a Rails app in
Puma or other threaded web server, you

set faker dot conf, faker config dot
locale, and Initializer make a request.

And then you get the fake data for
the default locale for faker, which

is en, so the, the faker config dot
local setting broke for all Puma

users using more than one thread.

Um, or actually I think it
should be any threadat all.

So basically it broke
it for Puma completely.

So let's talk about why that happened.

Cause I think if you understand
why this happened, then like the

fix here becomes more obvious.

How Puma works.

So Puma, when it starts up, it has
one process and one thread to when

it starts, uh, when it starts itself.

And depending on the, uh, mode
you have it set to, it may or

may not start other processes.

But the important thing here is
that when Puma processes requests,

it sends the request into what
we call the um, Our thread pool,

which act has a different name?

Uh, in pool?

No, I guess we just
call it the thread pool.

Yeah.

So it's the puma thread pool.

And, um, that thread pool has anywhere
from one to X number of threads where,

you know, you set it to whatever you want.

So the application request is always
processed in a different thread than

the thread that Puma was started with.

So we.

Start the Puma server.

We initialize your application, run
all the initializers, and then we

create the threads for the thread pool.

Okay?

So that timing is important here
because the initializer is run

before we create the app, uh, app
application threads that, that actually

run the, the application requests.

So that faker config dot locale is called.

Before those, uh, thread pool threads are
created, behavior of thread dot current

and, uh, sorry, not thread dot current,
but the, this is the bracket method.

On, on a thread instance, you,
uh, instance method, uh, uh,

on a thread you could say.

So this is accessing thread.

No fiber local variables as, as,
uh, the documentation points out.

So, um, in case you don't
know, all threads have fibers.

Fibers are lower level
concurrency unit than a thread.

So, um, all processes have at
least one thread, and all threads

have at least one fiber in Ruby.

Okay?

So some puma doesn't do this and, uh,
your application might create new threads,

but, uh, Create new fibers, I should say.

But Puma doesn't actually
create new fibers.

So for our purposes in this conversation,
it's, it's, uh, there are no like a

thread, it is a thread, uh, variable
because there's, we don't have, we

don't have multiple fibers here.

So, so the thing is, is like when
you create these, these fiber local

variables with, uh, the bracket
method, they're not inherited.

So there's a really great
reproduction in this issue.

Um, when you scroll down a bit where
someone made like the minimally, the

minimally, uh, required reproduction.

And all they did was set the fiber,
uh, in the current thread, set the, uh,

faker locale, then create a new thread
inside of that thread, set it again,

and then try to read what that was.

And, uh, it was, it reproduced
the bug in 10 lines of code.

Really, really, first of all,
as an open source maintainer.

That's exactly what we wanna read, right?

We wanna read the 10 line reproduction.

It's.

So good job to that
person that wrote that.

So these, these, these
variables are not inherited.

So like Puma creates the, the, uh,
thread pool to run your, like the

threads that actually run the app.

And now this locale fiber
variable is no longer set.

So it goes back to the
default value of, of English.

So that was the, that was the source.

Of the bug here.

So does that all, does
that all make sense?

I've been talking a lot now.

I'm, I'm done talking.

Thiago: Yeah, for sure.

That makes a lot of sense.

Stefanni: Mm-hmm.

Yeah.

That, that was Matheus, uh, he
has been helping a lot with, with

this issue, like I am learning.

Thiago: Yeah.

I'm curious to know about, because
you mentioned that the fiber

local variable is not inherited.

Right.

The other threads.

Mm-hmm.

But then if we talk about
the class variable, right?

If I said that class variable on
the initializer right, it would be

available to all the other threads.

Is that Yes.

How it works?

Nate Berkopec: Yes.

Yep.

And this is like, uh, this is kind
of the, uh, to me, like the, uh,

fine grained part of this issue.

When it comes to thread safety, it's just
like people are like, okay, we gotta get

rid of all shared mutable, state shared
mutable state is bad, get rid of it.

And that's like what the PR did, right?

Like it nuked the shared mutable state.

No more class variables don't do it.

But like you actually do
want shared mutable state.

In this story, like you want to be able
to change the locale for all threads

at a particular point in time, right?

You want to do that during
initialization when there aren't,

when, when you know there aren't
multiple threads, probably trying to

read this value and do stuff with it.

Like there's not, like you, you, you
wanna do that during a time when it's

probably safe to do it and then later
you want the, uh, private state you

want thread private state, right?

So it's kind of like a, uh, a
little bit complicated there, where

like, you want both things, right?

Like you want to be able to
override this value for all threads.

You just wanna be able to
do it at a particular time.

Um, so the original PR was, uh, thread
safe, but just also didn't work.

So you, you, you gained thread
safety while breaking the feature.

Yeah.

Thiago: Yeah.

It's interesting that.

Maybe threadsafety is
not always the, the goal.

Depending on your feature, maybe
you want to be able to share state

between threats and maybe if you want
to mutate the global state, then you

have to worry about thread safety
and how you, you would approach that.

But in this case, you broke
the feature by doing that.

Right.

Which is kind of interesting.

Nate Berkopec: Yes, and, and
sometimes it's also a little

bit complicated here because.

You, you also don't, in this story,
right, you don't actually really

care about the thread safety of
setting this locale for all threads.

Like you could write this in such a way
that, like, the class variable is, is that

access and, and reading is thread safe.

So like we could use, um, something
from concurrent Ruby, for example.

It's a library, a ruby
gem that's used in Rails.

And, um, we could set this up so
that it's thread safe to change

the locale for all threads, but you
don't really actually care about that

because the only time you're gonna
do that is during app initialization.

And so, or we could write our own thing
with a mutex or whatever to do this, but.

You don't really actually care about
the thread safety there, because that

happens during initialization where, where
there's only one thread running anyway.

So like you could add that
to like, you know, check all

your thread safety boxes here.

But what you really wanted
was like private thread state.

Like you want to be able to change this
value on a per thread basis and not

have, not have it affect other threats.

Okay.

So that's, To me that's like a little
bit different than thread safety.

There's a, there's a thread safe
way to, to implement this that would

still not satisfy the story if we
just took the original behavior

of the locale class variable and
made that access, um, thread safe.

What would happen is, is every,
every request would change the locale

for all other running requests.

So like in the middle of a, a halfway
through a rendering a response,

the faker locale could be changing
because other threads are changing.

That locale value that you
don't want that, right?

Like you don't want global
state, you want per thread state.

So like threadsafety is more
complicated than just it is

thread-safe or not be depending
on what, uh, behavior you desire.

Stefanni: Yeah, and I also
think it's really easy to.

Create those bugs because I don't think
most of us are aware of those details.

Nate Berkopec: Um, I think in general,
this doesn't happen in application

development because most of this.

It's just like very uncommon
to need to write or need to use

class variables, for example.

So the common sources of of thread
safety issues, generally you don't

reach for these like tools that
cause these problems in application

development, but it does happen all
the time in library development.

So the most common causes of thread
safety issues are class variables.

Global variables, which used to be
a thing that people did, but I don't

know, mostly people don't even reach
for them anymore and just use constants.

Um, constants and uh, rack middleware.

So those three things.

Of those three things, really, constants
is the only thing I see people making

mistakes with in application development.

But when you're doing libraries, class
variables and in rack middleware, it's

very common to have those, those things.

And so you can really.

You need to know more about threadsafety
when you're, when you're writing

libraries, I think, than, than writing
rails applications, for example.

Mm-hmm.

Thiago: For sure.

I'm curious about the cases
you mentioned about like

constants, that being a problem.

Is that because people try to modify,
uh, the value of a constant Yes.

At front time?

Nate Berkopec: Yeah, exactly.

Like, um, setting a constant to like
a collection, like a, an array or.

Those accesses are not, uh, thread safe.

Even, uh, Samuel Williams, the
maintainer of, of uh, uh, Falcon,

the web server has, has also had this
demo where he's shown that hash access

is not, uh, thread safe in hash.

Just half hash access is in writing
is not thread safe in new, you can

even like corrupt the hash, uh, end
up with all these crazy behaviors.

So yeah, setting constants to collections.

And then modifying those collections.

So there, there was like a trend, uh,
I don't know, like two years ago, no

more than that, I don't know, four or
five years ago to like freeze constants.

And we used to do this mostly
as a memory saving measure.

So like when you freeze an object,
Ruby internally allows everybody,

everyone that uses that constant
can like point to the same object.

So we used to like just put a lot of
things in constants and freeze them.

Uh, now there's like this Rubocop
rule that tells you to freeze

everything you put into a constant.

And freezing is nice from a memory usage
perspective, but it's even better from

a, from a, uh, thread safety perspective
because now you get an error, right?

If someone tries to modify the
object, the, you can still get these

problems though with freezing because
you can satisfy the rubocop rule.

By calling freeze on an
array inside of a constant.

But if you have a array inside of that
array, rubocop won't say anything.

And you still have a threadsafety
issue because you're modifying this

unfrozen array inside of the constant.

So anything that's accessible
from that constant really can,

can lead to a threadsafety issue.

So I, I think I see that issue sometimes
in app development where people create,

um, caches that they want to use.

That's kind of the most common
thing is they put a cache inside

of a constant and then they end up
with a threadsafety issue there.

Or the other one I see is,
uh, database connections.

So if you put a database connection inside
of a, a constant, for example, if you

just like capital Redis equals redis.new,
the issue you're gonna get there.

Everybody accessing that con,
accessing that constant is getting

the same exact database connection.

So if you have two threads accessing
the same database connection, you can

end up with issues where one thread
gets the response for another thread.

Um, so you don't wanna do that
and uh, there's like a lot of

gems that help you with this.

But basically you need to set
it up so that each thread is

getting its own connection out
of the, uh, connection pool.

Um, so those are the most common
ones I see in day-to-day app.

Thiago: That's interesting.

A good rule of thumb, maybe if you're
working with constants and you're trying

to do something weird or adding some
hashes or arrays to the constant, you

gotta be careful what you're doing.

Yeah.

Or a database connection.

Yeah.

Cool.

Mm-hmm.

Stefanni: And, and like since you
are talking about those common.

Things that you see happening in, in
development, um, is there something that

we as developers could change how we see?

Things when we are implementing them.

Like how can we start paying
more attention to, to those

potential thread, safe, uh, issues?

Because like you mentioned,
it's not something that we

do it every day, for example.

Nowadays, I think it's more common for
us to know, oh, this is gonna have a, any

plus one query or something like that.

So how can we start changing
our, our ways of working to.

start identifying, those issues.

Yeah.

Nate Berkopec: My biggest, um,
recommendation is always to

make your test multi-threaded.

So, uh, if you are using Minit
test, um, you can run each

test inside of its own thread.

It's called mini test.

Oh man.

Now I'm gonna forget, uh, let
me get this mini test cause.

Yeah, parallelize is what it's called.

So, um, if you require mini tests slash
parallel on a test, uh, or no, include it.

Uh, now I'm, I'm not gonna
remember it, but look it up.

Um, but yeah, you could, you
can set up mini tests so it runs

each test in a different threat.

So that covers your unit tests and makes
all your unit tests multi-threading.

Um, so if you're just running minit
tests, uh, I suggest turning that.

If you're using RSpec, you're outta luck.

Sorry.

Uh, RSpec isn't
multi-threaded, never will be.

So you're stuck.

Uh, your only option for multi-threaded
tests is to convert them to mini-test.

Um, so, you know, good luck.

Stefanni: Oh, I was gonna say, okay.

What about RSpec?

Yeah.

Nate Berkopec: Uh, for integration and
system tests, so, System tests start

a puma server or integration test.

Can, you know, you can set up
to do whatever you want, right?

But you should set up your
integration test to set up to

start a Puma server and run that
puma server with multiple threats.

So that will also potentially
flush out, uh, threading bugs.

Now this is also gonna make your, your
test suite less stable like you can.

Make a threading bug, usually you can't
cause it like a hundred percent of the

time, so you're gonna start getting flakes
probably where they're caused by threads.

Like we, you can't just like write
a test that's like always triggers

a, the thread bug most of the time.

So, um, it will probably make
your test less stable, but like,

you know, that's, that's kind of
the price you're gonna pay here.

That to me is like, the best possible
thing is make your test multi-threaded.

So you are actually
testing, uh, thread safety.

Second thing is like all you can really do
is look for those three different sources

of, of threading bugs that I talked about.

Anytime you're writing
your own rack middleware.

So the, the, the threadsafety
issue here is that, um, there is

only one rack middleware stack
for, for an application, right?

So, The objects that are created
in for, to that actually run your

rack middleware, there's only one
of those for every application, um,

or every, you know, uh, process.

So your application runs in different
threads, but they're all using

the same rack middleware objects.

So if you have an instance variable
inside of a rack middleware, you can

end up with a thread safety issue.

The fix is actually really easy.

There is a, uh, middleware.

Freezer that, uh, uh, Samuel Williams
wrote called Rack dash Freeze, and.

It basically ensures that your rack
middleware are, are thread safe by

freezing all of your instance variables.

Um, and so you can't possibly cause
a problem and it'll blow up if you

try to do thread unsafe things.

So take a look at that.

For rack middleware, for constants, I
think probably everybody should audit

constants, created and Initializers,
um, for this issue that I talked about.

Basically look, make sure you know
what you're putting into constants,

uh, is not a collection that's going to
be modified and is not, uh, you know,

just a straight up database connection.

There's for database connections,
there's a gem called Connection pool.

This is, uh, I think still
maintained by Mike Perham.

Yeah, so Mike wrote it.

I think he wrote it originally
anyway, but it's mper.

From Sidekiq, M P M P E R H A M
slash connection underscore pool.

This is like a generic connection
pooler that works with any, uh,

underlying database connection gem.

So you can get thread safe
connection pools that will

work with, um, with threads.

So you would assign that to a constant
instead of, uh, just like redis.new.

And then for class variables, those are
a little bit easier because hopefully

you can just find the at, at like the,
at looking for the @@ is like, you

know, control F your code base for that.

But, um, you and the original, uh, PR
actually you had class inherits from

self and then it was at locale equals.

So like, since you can always do that,
it's kind of hard to like just grep

through a code base for class variable.

Um, but uh, if you see a file that
has a class variable in it, you

know, that is shared global mutable
states, so you know, it's only thread

unsafe when it, someone tries to
write to it from multiple threads.

So just because these exist doesn't
mean I think that you should

be replacing them all the time.

Often, like one thing
that, uh, happens is.

Someone needs to write a value to a,
uh, class variable, and then multiple

threads want to read that value.

If the writer method, if every thread
will just write the same thing.

To the class variable, like initialize,
like a default value, and then everybody

reads the same value after that.

That's not really a thread safety
issue because every thread is trying

to write the same value, right?

So it doesn't, it doesn't matter that
they could possibly access the, the, the

thread ver uh, variable at the sa, the
class variable at the same time, because

they're gonna try to write the same thing.

So who cares.

So it's like there can be shared global
state without there being a thread

safety issue, but I think you have to
be aware every time you see a class

variable or class instance variable
that, um, What, think about what, what

is trying to write to this and when is
it trying to write to this and could

there possibly be an issue there?

Stefanni: Yeah.

I like the questions to ask before you go
out there and try to replace everything.

Nate Berkopec: Because, because
this is complicated, right?

Like Yeah.

Especially with class variables.

Um, you know, writing the mutex dot
synchronized stuff, like you've probably

never written anything with mutex before.

You know, pulling something out of
concurrent ruby that you've never

used before and using like a, a data
structure out of concurrent Ruby, like,

it's not the easiest thing in the world.

So, um, you know, definitely
try to avoid it if you can.

Um, so yeah, I mean, and people
smarter than all of us have done that

and then made a mistake anyway, so
yeah, it's, uh, it's not easy stuff.

Thiago: Maybe one thing that exacerbates
the problem is that we are very used to.

Thinking in Ruby, it's just like, it's
just one tread and you don't have to use

other treads or anything like that cuz
compared to maybe other languages, when

we say, oh, in Java, be careful with
static variables and things like that.

But in Ruby, We don't talk about
that a lot, uh, about concurrency.

At least, at least in rails.

Like it's just, you don't have to
worry too much about that request

is one thread and you don't have
to worry about those things.

Yeah.

But then when you run into those weird.

Bugs, you're not sure what to do.

You just, you just think, oh,
I don't know what that is.

I don't, I have no idea
why this is happening.

But if you try again, you, you're
not gonna have the problem.

And so I'm curious about what kind,
what kind of things people can do

so that when they run into a weird
problem, they came, they can point

and say, oh, maybe this is a thread
safety bug instead of something else.

So maybe like some strategies or some.

Some characteristics.

Oh, it's pretty soft books.

Nate Berkopec: It's pretty easy.

Yeah.

Like if, if every time I hear, uh,
oh one request, no, the request

A got the response for request B.

So every time I hear like, oh, someone
is getting someone else's responses.

And that's obviously a
security issue, right?

That's always, that's, that's kind of
how this usually comes up is like, oh no,

somebody got authorized to someone else.

Account because they got someone else's
cookie header, something like that.

Right?

So anytime I hear, uh, one user A
got the response for user B, it's

like thread safety issue immediately.

So that, that's probably the most common
one at the application level that I, that

I hear about the faker issue specifically.

I think maybe you, we kind
of all knew because it was

like, oh, it was this change.

Or like someone realized
it was only in Puma.

I guess that's the other, yeah,
if it, if it, if switching to

unicorn fixes the issue, then you
know it's a threadsafety issue.

Right.

So, um, cuz in unicorn you don't even
have, there's no, um, like for example, so

I talked about how Puma starts essentially
even in the, in the, in the simplest

case, it has to start two threads.

It has the thread that starts Puma and
boots your app, and then the thread

that actually runs the application.

Right.

Technically we're running your
application single-threaded there.

Like the actual, uh, every request
that comes in to that puma process will

always be processed by the same thread.

So yes, like it technically is
single threaded, but we kind of

have this like thread issue, right?

With faker even in that scenario
because that fiber local

variable was not inherited.

If moving to Puma breaks the fix,
breaks the issue and, and getting

off of it fixes it, then you know
you have a thread safety issue.

Um, but yeah, I think generally like
any issue where state is kind of

correct for one person but not correct
for someone else, and it's flaky and

random, uh, then the, your, your thread
safety issue, spider sense should be t.

Thiago: Yeah.

It's not Puma's fault either.

It's just the way Yeah.

Nate Berkopec: I mean,
it's your fault threat.

Puma's thread safe.

You're not, so it wasn't me, man.

Yeah,

Stefanni: yeah.

That was, that was a hard one.

And we were like, I think we
should ask for someone who

knows how to fix this issue.

Yeah.

Cause we, we were not sure and.

I think it's also, it's something that
I want to see more is people say, uh,

well, developers saying that they don't
know things right, and they ask for help.

So I thought this would be
a, a good way to, to do that.

Nate Berkopec: Yeah, and like when I
started maintaining Puma, I knew nothing.

So like mm-hmm.

In 2016 when Evan Phoenix, the original
author, like, asked me to start

maintaining Puma, like I didn't know
anything about threads, thread safety,

or all the other like kind of specific.

Things that Puma needs to run.

Like, um, knowing about sockets,
TCP, UDP, like the f the deep

specifics of HTP, um, C extensions.

Like, I didn't know any of that
when I started maintaining Puma.

And, uh, now I, I know a lot more, but
um, when I started I didn't know anything.

So like, we all start
not knowing anything.

So, um, you know, we are, we
every one that you ask a question

about thread-safety or whatever.

At one point, they didn't know
the answer to that either.

So yeah, I don't think you should
feel, uh, intimidated about asking,

uh, asking questions like that.

Thiago: And it's also a cool
opportunity for contributors.

So for example, Matheus who's.

Taking a look at that issue, he said,
oh, I don't know anything about threads

but I'll try to learn something.

And then he learn a couple
of things and shared.

And so it's just a nice way
to, to learn more because.

You don't really have to know
before you get started on an issue.

And then eventually if you continue
working on that, you, you're gonna

figure it out and then we can have
nice conversations about that.

It's kinda cool.

Yeah.

Nate Berkopec: Yep.

And I think, um, one thing that, I'll
bring it up again that Matheus did that

was just like really important for that
was to get the minimal reproducing case.

So when he had that 10 line
example that reproduced the issue.

That's so important for learning because
then you have this little experiment

that you can, that you can try things on.

So you can say, oh, if I change
this over here, does that, how

does that change the behavior?

In my, in my example, um, if you
don't have the minimal reproducible

example, it's much more difficult
to to learn because you don't have

a little tool that you can change
things on and see what happens.

So getting to that minimal example
was so important, I think for.

For where he went and
the rest of the issue.

So, um, if I have any advice with
that is to like do to, to emulate

that behavior to, you know, find,
try to try to get to the 10 line

example that reproduces the problem.

Stefanni: Yeah, I love that.

It's, it's a very underrated
way to get started.

We've not only contributing to open
source, but I think almost anything

related to developmental, let's say.

Because you, you get to just try.

You're not trying to fix anything.

You're just trying to
find what is going on.

And you learn a lot about things

Nate Berkopec: in Puma, on GitHub, uh, we
have a needs repro label, uh, as a needs

reproduction, and I put that on any issue
where the original poster has not provided

a similarly simple example that can just
be run and, and uh, and reproduce it.

Um, in Puma, if you'd like to contribute.

Um, that's one way to do it is you
go to the needs repro label and just

try to reproduce people's issues.

And I can tell you as a maintainer,
it's also helpful if you can't

reproduce it and you leave a comment
and tell us, you're like, Hey, I

looked at this for three hours.

I couldn't reproduce it.

That is super helpful for me
because now I know, okay, someone

tried to do this for three hours
and they still couldn't get it.

Maybe this is not reproducible, maybe
this isn't actually a problem with Puma,

so it's, it's really helpful for an open
source project to find issues which are

not currently don't have a reproducible
case and to try to try to find one.

So I highly, highly encourage that.

Stefanni: Yeah, and, and just to
emphasize, I don't think, well,

I believe I can say that for
you, but correct me if I'm wrong.

Mm-hmm.

Or not say that if you don't
know how to reproduce, you can't

report the bug, but Oh, yeah.

If, right.

But if you,

Nate Berkopec: I mean,
you have a bug, right?

So Yeah.

You should report it.

Stefanni: Yeah.

So everyone can contribute on their ways.

Um, yeah.

But yeah, reproducing is really great.

I think that's also how we got started
with Ruby on Rails, and we actually

copied the reproduction script for Faker.

Oh.

Which is really, really helpful.

Nate Berkopec: Yeah.

If, uh, if anyone listening is not
aware of that, there's a, like the, the

Rails bug reproduction script is really,
uh, very good and I think those are

available if you go to like the Rails
contributing guide on Rails guides and

then like you kind of go down to the bug
report section, you can find the links

to all of them and it's really cool
and it'll show you kind of in 30 lines.

How they set up a Rails app to reproduce
an issue in the most minimal way

possible, depending on what part of
rails you're reporting the bug to.

And it's a really good example of
how to make a, a minimal reproducible

case, not only for a Rails app, but
for really any, any Ruby project.

Um, so yeah, I've done the same thing.

I've copied that script multiple times.

Thiago: I guess even at your own job,
if you're not contributing or anything,

maybe there's a way to use those
kinds of scripts to reproduce a bug.

So you don't have, um, what is it
called again, when you, you don't

want the bug to appear again?

Stefanni: Ah, regress regression.

Yeah.

Thiago: Regressions.

Mm-hmm.

You don't want regressions, so this
is really important, so mm-hmm.

Add that little test there so
you don't have regressions.

Mm-hmm.

A cool habit sometimes to have.

Yeah, absolutely.

Stefanni: Well, I think we
got to, to the end of it, so

we only have 10 minutes left.

Is there anything else, Nate, that
you would like to share about the

issue, either about the issue or
about the conversation we were having?

Nate Berkopec: Uh, nope.

Um, I would say that, uh, if someone
listening to this is interested

in learning, More about working
in a multi-threaded environment.

Um, I do have a product called
Sidekiq in Practice that has a number

of live code examples that talk
about thread safety and have other,

it's a intended to be a manual,
like how to actually scale Sidekiq.

And, uh, you know, cuz it's Sidekiq.

Uh, there's a lot of things in there about
threading and, and, um, how threads work

in Sidekiq, why threads are important.

It talks about the global VM lock, which
we didn't really discuss at all today.

But, uh, if, if you are looking to
learn more about threads and scaling

the threaded environment, um, I do, I
do sell something to help you with that.

So go check it out.

Stefanni: Absolutely.

I, it's in my reading list.

I really need to get it.

I, which one should I read first?

That one or the rails?

The Guide to Rails performance?

I don't know.

Nate Berkopec: Um, I mean, yeah, it just
depends on what, uh, your goal is first.

They're not, they're
not, um, intended like.

You don't have to read
one to read the other.

So if you're interested in a more
general perspective of how do

I make a Rails app feel faster?

How do I make it more scalable,
like that's the complete

guide to Rails performance.

If you were specifically having
issues with Sidekiq and uh, scaling

Sidekiq, I suggest reading that first.

Thiago: Yeah, that would be a nice episode
to talk just about Sidekiq, because

there are so many things to talk about.

Sidekiq performance.

Mm-hmm.

We could do that in the future.

Stefanni: Yeah.

Like how to log your workers, your
scheduled workers, like how to.

Logging and logging and retries jobs.

Yeah.

Nate Berkopec: Yeah.

I've had a lot of fun.

My, my, my current client is, um, Gusto,
which is a huge, um, payroll company

in the United States, and they've got
600 plus engineers working on Sidekiq.

And it's been a really interesting
experience to me to see kind of how

Sidekiq scales like as an organization.

So like what, what, what happens
when 600 engineers all have queues

and workers and like, that's been a
whole new side of Sidekiq that I've

learned a lot about at, uh, at Gusto.

Thiago: Yeah, it sounds really exciting
work, you know, a lot of problems and, and

challenges to solve, which is kinda cool.

Yeah, it's been really cool

Stefanni: and yeah, and I think we're
supposed to call jobs now and not workers.

I'm still getting used to the new
terminology I need to catch up.

I think I read something about that.

Nate Berkopec: Oh, that's changed.

I sh I feel like I should
know that when I don't,

Stefanni: I, I don't know.

I remember seeing a comment about that,
like instead of me workers, you, you like

change the folder in Rails or something?

Yeah, yeah.

to jobs.

No, I'm not really sure.

I have to catch up.

I just remember, uh, reading about that
and they're like, oh, I'll probably

need to read this some at some point.

Hmm.

Um, yeah, so I think we.

At the end, and I would like
to be respectful of your time,

Nate, but thank you so, so much.

I learned a lot and I know it was a bit of
homework for you, but I hope it was fun.

Nate Berkopec: It was fun.

Yeah, it was fun.

It was fun.

Nice to talk to you.

Thiago: Yeah.

Yeah.

It was, it was a very specific problem
with a very specific solution, so

it was nice to learn from that.

Mm-hmm.

I've learned a lot, so yeah.

Thanks so much for, for sharing
your expertise with us today.

Nate Berkopec: Great.

My pleasure

Stefanni: Everyone, make sure to check
out Puma and Nate's books, the one

about Sidekiq and Rails performance.

We'll leave the links in the
description notes if people want to

know what you're doing or wanna buy
your books or your workshops as well.

Where should they go?

Nate Berkopec: Uh, speedshop.co.

Uh, this is where I have
links to all that stuff.

Stefanni: Awesome.

Cool.

Cool.

Go.

Thank you, Nate.

Have a good weekend.

Nate Berkopec: Thank you.

Thiago: Thank you so much,
Nate, for joining us today.

And if you've learned something
from this episode, please share

with a friend and check out our
newsletter@hexdevs.com/newsletter.

I hope you've enjoyed this episode.

See you on the next one.

Creators and Guests

Stefanni Brasil
Host
Stefanni Brasil
I like to be around people who are kind, curious and compassionate. Co-founder of hexdevs. Developer at Thoughtbot, and a core maintainer of faker-ruby. Public speaker, writer, podcast host, Ruby developer.
Thiago Araujo
Host
Thiago Araujo
Co-founder of hexdevs: helping you become an expert developer.
Nate Berkopec
Guest
Nate Berkopec
🚀 I make Rails apps fast and scalable. 📖 Wrote a few books about that. 👷 Maintain Puma, the most popular Ruby webserver. (he/him) 日本語: @nateberkopec_ja
Fixing a Thread-Safety Bug with Nate Berkopec
Broadcast by