The Hole (ETCD & BoltDB & Golang & Others)

Preface

Let’s start like this…

In the past, a lot of things needed to be done in the SOA framework: database, cache, message queue, service discovery and registry, service/api downgrading, circuit breaker, distributed tracing, load balance, metrics collection, logging, caller/callee white/black list, concurrency control, rate limiting, config management, deployment, inspect/profile, etc. Of course, most of these features are essential for any service. The main task is actually to encapsulate various libraries, integrate various configurations, and connect the company’s infrastructure system. Then, a lot of time is spent on customer service, answering many googleable questions, copying corresponding documents, helping people check if the code is written correctly, and using the correct posture. A small part of the time is spent on interesting debugging. When necessary, flip through the source code of the corresponding library, and in rare cases, analyze the performance bottlenecks…

I have maintained a Python framework and also started from scratch to complete one in Golang, most of which are borrowed from Python… The biggest feeling is that Golang requires a lot of code to get started and is troublesome. The flexibility is solved by generating code… If the design is poor and the coupling is too tight, there will be a lot of legacy issues to deal with. In Python, these are some incomprehensible magic logic, various lazy imports and (global) variables that are initialized somewhere… Although my first principle when doing the Golang framework is to avoid global variables, especially those that cross packages, and to minimize the side effects of functions, but when I get to the end, I feel that the parameters are flooding or I don’t know how to pass the parameters. It is also very annoying to get fuckV1, fuckV2… In addition, the context in Golang cannot be interrupted, the error stack has to reference external packages and then reorganize all kinds of libraries, it is also very annoying to write test cases, especially mock, and it is also very annoying to do a lot of fault-tolerant related things, such as writing retry logic is enough to vomit, in short, if it is not done well, you have to rewrite each method/function, and if it is really annoying, you might directly vendor and modify the source code…

The biggest takeaway might be learning various code style guidelines and code review methods. However, once the features and contributors increase, many uncontrollable factors appear. For example, a not-so-large PR can have dozens of code review comments, slowing down the loading of the Git system webpage. After organizational structure changes and framework handovers, although I’m still working on other projects, it seems like these skills have gone to waste… The main issues are with the progress and insufficient motivation as a small soldier. It’s a great fortune not to be beaten when trying to establish various self-righteous rules ^ _ ^, it seems that not enough importance has been given to this and is gradually giving up… I suddenly hate myself a little bit O(∩_∩)O~

Well.. Currently working on a seemingly “high-end” distributed object storage system. When reading various papers, I found that the overall architecture closely resembles LinkedIn’s ambry. Of course, there are some differences in the architecture and details, so I won’t speak on behalf of the experts.

Most of the work in building storage systems is also hard labor, although there are relatively more interesting parts.

ETCD

In this storage system, etcd is mainly used to store cluster topology (service registration and discovery), business-unrelated metadata, a small amount of configuration management and some coordination work. The company’s ecosystem is all zookeeper. Why choose etcd? It’s because of Golang, and raft is also relatively easy to understand…

This part of the content has been moved to The Hole in ETCD

BoltDB

In this storage system, BoltDB is mainly used to store indexes and replication log and a bit of replication related metadata.

Why choose BoltDB? B-Tree provides a relatively stable query efficiency, purebred Golang embedded k-v storage options are also very few, BoltDB is relatively mature, write efficiency can be solved by a layer of in-memory WAL, it’s not a big problem, it can tolerate some data being lost, if it doesn’t work, you can only cgo…

The biggest pit may be that we used mmap’s MAP_POPULATE, which made the startup very slow, and the disk IO was very high when remmap, which seriously affected the performance. This flag seems to try to prefetch all the data into the cache as much as possible. Be careful when turning it on when there is a large amount of data, but the corresponding page replacement will definitely be a bit more, and the current performance is not bad and there is no obvious impact.

Another pit is that memory references used outside the transaction may be invalid, it is best to decode or copy within the transaction, guess it is related to the use of mmap, you need to analyze specifically.

Then, it does not support nested transaction, which is not conducive to code abstraction, it’s not a big deal.

Finally, BoltDB uses serializable mvcc, I haven’t studied it in detail, according to the introduction, there is no situation where concurrent read-modify-write leads to lost writes. There is currently no concurrent read-modify-write operation. Of course, due to the existence of WAL, there are additional locks to ensure serializable write…

TCP & Golang & Others

The TCP optimization configuration introduced on the Internet is basically available on our online servers. The Put request is very fast, but the Get request is slow. Tracing line by line of logging (this should not be because I write a lot of Python…) found that Golang’s tcpConn.Write is very slow, a 5M file takes hundreds of milliseconds… By adjusting net.ipv4.tcp_wmem, the Get request time in the same computer room has been reduced to tens of milliseconds, and there is no significant improvement across computer rooms. According to theoretical calculation, TCP’s read buffer is not a bottleneck. Adjusting the configuration seems to only improve the time of the tcpConn.Write system call, and does not improve the efficiency of network transmission. As a result, a huge pit was created.

Although I am not very familiar with the Actor model, in my opinion, Erlang’s OTP really makes people envious, I don’t know if it’s really that cool…

When initializing the machine to mount the disk, there will be duplicate mount points, but df shows only one. The specific reason is unknown. The related library lists the disk list by reading mtab. This causes problems when pre-allocating disk files with fallocate and calculating reserved space…

XXX

I am one step closer to becoming a great critic…