Graceful Shutdown or Restart
在学习 gin 时,官方文档介绍如何优雅地启动和停止服务
这里以 endless 为例,开始学习
1. Introduction
Zero downtime restarts for golang HTTP and HTTPS servers.
1.1 Features
- Drop-in replacement for
http.ListenAndServe
andhttp.ListenAndServeTLS
- Signal hooks to execute your own code before or after the listened to signal (SIGHUP, SIGUSR1, SIGUER2, SIGINT, SIGTERM, SIGSTP)
- You can start multiple servers from one binary and endless will take care of the different sockets/ports assignments when restarting
2. Quick Start
go get -u github.com/fvbock/endless
package main
import (
"github.com/fvbock/endless"
"github.com/gin-gonic/gin"
"log"
"net/http"
)
func main(){
router := gin.Default()
router.GET("ping", func(c *gin.Context) {
c.String(http.StatusOK,"pong")
})
err := endless.ListenAndServe(":9090",router)
if err != nil {
log.Fatal("listen and serve error:",err.Error())
}
log.Println("Server on 9090 stopped")
}
编译并启动程序:
$ go build -o ./main ./main.go
$ ./main
# ... some gin logs
[GIN-debug] GET /ping --> main.main.func1 (3 handlers)
2021/11/23 20:04:53 31409 :9090
简单测试下:
$ curl "http://localhost:9090/ping"
pong
发送 SIGHUP
信号
$ kill -1 31409
---------
# ... some logs
2021/11/23 20:13:35 32215 :9090
2021/11/23 20:13:35 31409 Received SIGTERM.
2021/11/23 20:13:35 31409 Waiting for connections to finish...
2021/11/23 20:13:35 31409 Serve() returning...
2021/11/23 20:13:35 listen and serve error:accept tcp [::]:9090: use of closed network connection
再次发送请求,可以看到程序依然能够接收请求
$ curl "http://0.0.0.0:9090/ping"
pong
------
[GIN] 2021/11/24 - 16:05:39 | 200 | 12.848µs | 127.0.0.1 | GET "/ping"
3. Timeout and MaxHeaderBytes
There are three variables exported by the package that control the values set for DefaultReadTimeOut
,DefaultWriteTimeOut
, and MaxHeaderBytes
on the innter http.server
DefaultReadTimeOut time.Duration
DefaultWriteTimeOut time.Duration
DefaultMaxHeaderBytes int
The endless default behaviour is to use the same defaults defined in net/http
These hava impact on endless by potentially not letting the parent process die until all connections are handled/finished
查看 endlessServer
源码:
type endlessServer struct {
http.Server
EndlessListener net.Listener
SignalHooks map[int]map[os.Signal][]func()
tlsInnerListener *endlessListener
wg sync.WaitGroup
sigChan chan os.Signal
isChild bool
state uint8
lock *sync.RWMutex
BeforeBegin func(add string)
}
endless.ListenAndServe
func ListenAndServe(addr string, handler http.Handler) error {
server := NewServer(addr, handler)
return server.ListenAndServe()
}
endless.NewServer
:
func NewServer(addr string, handler http.Handler) (srv *endlessServer) {
// ...
srv.Server.Addr = addr
srv.Server.ReadTimeout = DefaultReadTimeOut
srv.Server.WriteTimeout = DefaultWriteTimeOut
srv.Server.MaxHeaderBytes = DefaultMaxHeaderBytes
srv.Server.Handler = handler
// ...
}
这里在创建 endlessServer
时,会使用默认的配置,当然也可以在创建完之后单独设置
简单示例:
package main
import (
"log"
"net/http"
"strconv"
"time"
"github.com/fvbock/endless"
"github.com/gin-gonic/gin"
)
func main() {
router := gin.Default()
router.GET("ping", func(c *gin.Context) {
time.Sleep(3 * time.Second)
seq, _ := strconv.Atoi(c.Query("seq"))
c.String(http.StatusOK, "pong-%d", seq)
})
server := endless.NewServer(":9090", router)
server.ReadTimeout = 1
server.WriteTimeout = 1
server.MaxHeaderBytes = 8 << 20
err := server.ListenAndServe()
if err != nil {
log.Fatal("listen and serve error:", err.Error())
}
}
4. Hammer Time
To deal with hanging request on the parent after restarting endless will hammer the parent 60 seconds after receiving the shutdown signal from the forked child process. When hammered still running requests get terminated. This behaviour can be controlled by another exported variable:
DefaultHammerTime time.Duration
The default is 60 seconds. When set to -1
hammerTime()
is not invoked automatically. You can then hammer the parent manually by sending SIGUSR2
. This will only hammer the parent if it is already in shutdown mode. So unless the process had received a SIGTERM
, SIGSTOP
, or SIGINT
(manually or by forking) brefore SIGUSR2
will be ignored
If you had hanging requests and the server got hammered you will see a log message like this :
2015/04/04 13:04:10 [STOP - Hammer Time] Forcefully shutting down parent
示例:
package main
import (
"log"
"net/http"
"strconv"
"time"
"github.com/fvbock/endless"
"github.com/gin-gonic/gin"
)
func main() {
router := gin.Default()
router.GET("ping", func(c *gin.Context) {
time.Sleep(3 * time.Second)
seq, _ := strconv.Atoi(c.Query("seq"))
c.String(http.StatusOK, "pong-%d", seq)
})
endless.DefaultHammerTime = 1 * time.Second
server := endless.NewServer(":9090", router)
err := server.ListenAndServe()
if err != nil {
log.Fatal("listen and serve error:", err.Error())
}
}
上例中将 hammer time 设置为 1s , 交易处理时间模拟为 3s
这里使用脚本来模拟多次请求:
#!/usr/bin/env sh
echo Start sending request
for((i=1;i<6;i++))
do
url="http://0.0.0.0:9090/ping?seq=${i}"
echo GET $url
curl $url
echo
done
启动服务并执行脚本,在第三次请求时发送挂起信号kill -1
脚本输出:
Start sending request
GET http://0.0.0.0:9090/ping?seq=1
pong-1
GET http://0.0.0.0:9090/ping?seq=2
pong-2
GET http://0.0.0.0:9090/ping?seq=3
pong-3
GET http://0.0.0.0:9090/ping?seq=4
GET http://0.0.0.0:9090/ping?seq=5
pong-5
可以看到第四次请求是没有响应的,同时看日志这边
[GIN] 2021/11/24 - 18:04:09 | 200 | 3.000673064s | 127.0.0.1 | GET "/ping?seq=1"
[GIN] 2021/11/24 - 18:04:12 | 200 | 3.000330231s | 127.0.0.1 | GET "/ping?seq=2"
[GIN] 2021/11/24 - 18:04:15 | 200 | 3.001050857s | 127.0.0.1 | GET "/ping?seq=3"
2021/11/24 18:04:17 14681 Received SIGHUP. forking.
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] GET /ping --> main.main.func1 (3 handlers)
2021/11/24 18:04:17 14711 :9090
2021/11/24 18:04:17 14681 Received SIGTERM.
2021/11/24 18:04:17 14681 Waiting for connections to finish...
2021/11/24 18:04:17 14681 [::]:9090 Listener closed.
2021/11/24 18:04:18 [STOP - Hammer Time] Forcefully shutting down parent
2021/11/24 18:04:18 14681 Serve() returning...
2021/11/24 18:04:18 listen and serve error:accept tcp [::]:9090: use of closed network connection
[GIN] 2021/11/24 - 18:04:21 | 200 | 3.00127917s | 127.0.0.1 | GET "/ping?seq=5"
可以看到,第4次请求没有被处理完毕,父进程就已经终止了(hammer time < deal time)
5. Signals
The endless server will listen for the following signals:
syscall.SIGHUP
: will trigger a fork/restartsyscall.SIGINT
andsyscall.SIGTERM
will trigger a shutdown of the server (it will finish running request)SIGUSR2
: will trigger hammer TimeSIGUSR1
andSIGTSTP
are listened for but do not trigger anything in the endless server itself.
You can hook your own functions to be called pre or post signal handling
5.1 Hook
package main
import (
"log"
"net/http"
"strconv"
"syscall"
"time"
"github.com/fvbock/endless"
"github.com/gin-gonic/gin"
)
func preSigUsr1() {
log.Println("pre SIGUSR1")
}
func postSigUsr1() {
log.Println("pre SIGUSR1")
}
func main() {
router := gin.Default()
router.GET("ping", func(c *gin.Context) {
time.Sleep(3 * time.Second)
seq, _ := strconv.Atoi(c.Query("seq"))
c.String(http.StatusOK, "pong-%d", seq)
})
server := endless.NewServer(":9090", router)
server.SignalHooks[endless.PRE_SIGNAL][syscall.SIGUSR1] = append(server.SignalHooks[endless.PRE_SIGNAL][syscall.SIGUSR1], preSigUsr1)
server.SignalHooks[endless.POST_SIGNAL][syscall.SIGUSR1] = append(server.SignalHooks[endless.POST_SIGNAL][syscall.SIGUSR1], postSigUsr1)
err := server.ListenAndServe()
if err != nil {
log.Fatal("listen and serve error:", err.Error())
}
}
Run and test:
2021/11/25 09:29:51 18649 :9090
------
kill -s SIGUSR1
------
2021/11/25 09:30:23 pre SIGUSR1
2021/11/25 09:30:23 18649 Received SIGUSR1.
2021/11/25 09:30:23 pre SIGUSR1
上例中使用了append
来注册钩子函数,这里的SignalHooks
实际上是:
type endlessServer struct {
http.Server
// ...
SignalHooks map[int]map[os.Signal][]func()
// ...
}
SignalHooks[int]
: key 有两种取值,endless.PRE_SIGNAL
和endless.POST_SIGNAL
分别表示前置钩子和后置钩子SignalHooks[int][os.Signal][]func()
: 可以为某个信号类型 (os.Signal) 注册过个钩子函数
再回到 server.ListenAndServe
函数,看下钩子函数是如何触发的:
func (srv *endlessServer) ListenAndServe() (err error) {
addr := srv.Addr
if addr == "" {
addr = ":http"
}
go srv.handleSignals()
l, err := srv.getListener(addr)
if err != nil {
log.Println(err)
return
}
srv.EndlessListener = newEndlessListener(l, srv)
if srv.isChild {
syscall.Kill(syscall.Getppid(), syscall.SIGTERM)
}
srv.BeforeBegin(srv.Addr)
return srv.Serve()
}
func (srv *endlessServer) handleSignals() {
var sig os.Signal
signal.Notify(
srv.sigChan,
hookableSignals...,
)
pid := syscall.Getpid()
for {
sig = <-srv.sigChan
srv.signalHooks(PRE_SIGNAL, sig)
switch sig {
case syscall.SIGHUP:
log.Println(pid, "Received SIGHUP. forking.")
err := srv.fork()
if err != nil {
log.Println("Fork err:", err)
}
case syscall.SIGUSR1:
log.Println(pid, "Received SIGUSR1.")
case syscall.SIGUSR2:
log.Println(pid, "Received SIGUSR2.")
srv.hammerTime(0 * time.Second)
case syscall.SIGINT:
log.Println(pid, "Received SIGINT.")
srv.shutdown()
case syscall.SIGTERM:
log.Println(pid, "Received SIGTERM.")
srv.shutdown()
case syscall.SIGTSTP:
log.Println(pid, "Received SIGTSTP.")
default:
log.Printf("Received %v: nothing i care about...\n", sig)
}
srv.signalHooks(POST_SIGNAL, sig)
}
}
// ...
func (srv *endlessServer) signalHooks(ppFlag int, sig os.Signal) {
if _, notSet := srv.SignalHooks[ppFlag][sig]; !notSet {
return
}
for _, f := range srv.SignalHooks[ppFlag][sig] {
f()
}
return
}
可以看到 endless 使用了一个 goroutine 取处理信号的钩子函数, 并会按顺序执行注册的钩子函数
5.2 Kill
Linux 的 kill 指令用于将信号发给进程,所有的信号可以由kill -l
列出:
$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGSTKFLT
17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU
25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH
29) SIGIO 30) SIGPWR 31) SIGSYS 34) SIGRTMIN
35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3 38) SIGRTMIN+4
39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7 58) SIGRTMAX-6
59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX
其中常用的有:
Signal | Description |
---|---|
HUP | 终端挂断 |
INT | 中断(同 Ctrl + C) |
QUIT | 退出(同 Ctrl + \) |
KILL | 强制终止 |
TERM | 终止 |
CONT | 继续 (与STOP相反,fg/bg命令) |
STOP | 暂停(同 Ctrl + Z) |
信号是 Unix
、类 Unix
以及其他 POSIX
兼容的操作系统中进程间通讯的一种有限制的方式
它是一种异步的通知机制,用来提醒进程一个事件(硬件异常、程序执行异常、外部发出信号)已经发生。当一个信号发送给一个进程,操作系统中断了进程正常的控制流程。此时,任何非原子操作都将被中断。如果进程定义了信号的处理函数,那么它将被执行,否则就执行默认的处理函数
6. PID File
If you want to save actual pid file, you can change the BeforeBegin
hook like this:
server := endless.NewServer("localhost:4242", handler)
server.BeforeBegin = func(add string) {
log.Printf("Actual pid is %d", syscall.Getpid())
// save it somehow
}
err := server.ListenAndServe()
在 endlessServer.ListenAndServe
方法中可以看到,在启动服务之前会调用此函数
func (srv *endlessServer) ListenAndServe() (err error) {
// ...
srv.BeforeBegin(srv.Addr)
return srv.Serve()
}