| 2022-08-02 11:57
Learn why Python The observability is very important,And how to implement it in your software development life cycle.
Applications can perform a lot of code you wrote,And is a way to basically see perform.So how do you know:
Observability is a kind of ability,Can view the data to tell you,What are you doing the code.在這篇文章中,The main concern is the server code in a distributed system.Is not to say that the client application code of observability is not important,Just say the client is often not with Python 寫的.Is not to say that the observability of data science is not important,But in the field of data science observability tools(大多是 Juptyter And quick feedback)是不同的.
所以,Why observability important?在軟件開發生命周期(SDLC)中,Observability is a key part of the.
Delivery of an application is not the end,This is just the beginning of a new cycle.在這個周期中,The first stage is to confirm that this new version run normal.否則的話,Probably need to roll back.What are the function normal operation?What features are slightly wrong?You need to know what happened,To know what to do next.These things sometimes doesn't work in strange ways.Whether it's natural,Or the problem of the underlying infrastructure,Or applied to a state of strange,These things may at any time for any reason to stop working.
在標准 SDLC 之外,You need to know everything is in the running.如果沒有,Is there a way to know is how can not run,這是非常關鍵的.
The first part of the observability is getting feedback.When the code when it is doing what of information, are,Feedback can help in many ways.In the simulation environment or test environment,Feedback helps to found the problem,更重要的是,At a faster way to categorize them.This can be improved in the validation step tools and communication.
當進行金絲雀部署Or change the characteristic sign,You need to know whether to continue,Or wait for longer time,或者回滾,Feedback is very important.
Sometimes you don't doubt there are some things to.Maybe is a dependency service has a problem,Or a social networking site out of your website.Perhaps in the relevant system have complex operations,Then you want to make sure that your system can perfect processing.在這些情況下,You want to put the observability system data integration to the control panel.
When writing an application,The control panel needs to be part of the design standards.Only when your application can give these data sharing control panel,They will put these data display.
Look at the control panel more than 15 Minutes just like watching paint dry.No one should suffer from this torture.對於這種任務,We have alarm system.Alarm system observability data compared with the expected data,If they don't match, notice.Complete in-depth study time management is beyond the scope of this article.然而,從兩方面來說,Observable application isAlarm and friendly:
High quality alert has three features:
These three characteristics are conflicting with each other.You can raise the standards of monitoring to reduce the false alarm,At the cost of increased the omission of.You can also by lowering the threshold of the monitoring to reduce the omission of,At the cost of increasing misstatement.通過收集更多數據,You can also reduce the misstatement or omission,And the price is decreased timeliness.
At the same time improve the three parameters is more difficult for.This requires high quality observability data.Higher quality data can improve the three characteristics at the same time.
Some people like to laugh at with print to debug method.但是,In a world of most of the software are not you the machine running,You can do is print debugging.Logging is a form of print debugging.盡管它有很多缺點,但 Python Logging library provides a standardized logging.更重要的是,It means that you can use these libraries to log.
Applications will be responsible for allocation of log record way.ironically,In the application to configure logging is responsible for the many years later,Now more and more not so.In modern container編排環境中,Modern application record standard error and the standard output,並且信任編排System can be reasonable processing log.
然而,You should not rely on libraries,或者說,Any other place.If you want to make people know what happened,使用日志,而不是打印.
Logging is one of the most important function 日志級別.Different level of logging can give you a reasonable filter and shunt log.But it is only under the condition of log level to maintain consistent can do.最後,You should log level consistent throughout the application.
Choose not compatible with the semantic repository can be back by the application layer in the appropriate configuration to repair,It only takes through the use of Python The most important general style to do:getLogger(__name-_)
.
Most reasonable library will follow this agreement.過濾器Can be modified in situ before log object from them.You may be added a filter to the handler,According to the name of the handler to bend the message,Has the appropriate level.
import logging
LOGGER=logging.getLogger(__name__)
考慮到這一點,Now you must clear the log level semantic.There are many options,But these are my favorite:
Error
:Send an instant warning.The application in a state of need operator attention.(這意味著包含 Critical
和 Error
)Warning
:I like to call these“Work time alarm”.這種情況下,Someone should look in a working days.Info
:It is in normal work process of.If you doubt there is a problem,This is used to help people to understand the application in what to do.Debug
:默認情況下,This should not appear in a production environment.In the simulation environment or development environment,可以發出來,也可以不發.如果需要更多的信息,In a production environment can also be specially open.Don't under any circumstances in the log contain個人身份信息(PII)或密碼.No matter what level of logging is,都是如此,Such as the level changes,Activate the debug level and so on.Log polymerization system are rarely PII 安全的,特別是隨著 PII The continuous development of laws and regulations(HIPAA、GDPR 等等).
Almost all modern systems are distributed.冗余、擴展性,有時是管轄權Need more horizontal distribution.Micro service means vertical distribution.Log in to view logs each machine is not realistic.For reasonable control reason,Allows the developer to login to the machine will give them more permissions,這不是個好主意.
All log should be sent to an aggregator.There are some business plan,你可以配置一個 ELK 棧,Or you can also use other database(SQL 或則 no-SQL).As a real low technology solutions,You can log write file,And then sends them to the object store.有很多解決方案,But the most important thing is to choose a,And everything will be aggregated together.
After the record all the things to a place,會有很多日志.Specific aggregators can define how to write a query,But through the search of the store and write NoSQL 查詢,Record query to match the source and the details are very useful.
指標抓取是一個服務器拉取模型.Index server regularly and application connection,And pull index.
最後,This means that the server needs to connect and find all the relevant application server.
If your index aggregator is Prometheus,那麼 Format as a端點是很有用的.但是,Even if the aggregator is not Prometheus,也是很有用的.Almost all of the systems are included and Prometheus The endpoint compatible墊片
使用客戶端 Python Library to your applications a Prometheus 墊片,This will enable it to by most of the indicators aggregator grab.當 Prometheus Find a server,It is expected to find a target endpoint.This is often part of the application routing,通常在 /metrics
路徑下.不管 Web What is the application platform,If you can run under an endpoint of a custom type custom byte stream,Prometheus You can grab it.
For most popular frameworks,There is always a middleware plugin or something similar to collect index,Such as delay and error rate.Usually it is not enough.You need to collect a custom application data:比如,Each endpoint cache命中/缺失率,數據庫延遲,等等.
Prometheus 支持多個數據類型.An important and subtle type is the counter.Counter is always a work in progress —— 但有一點需要注意.
When applied to reset,計數器會歸零.The counter of the“歷時”Through the counter“創建時間”Send as metadata to manage.Prometheus Know not to compare two different歷時的計數器.
Instrument value is simple a lot:They measure instantaneous value.Use them to measure at the ups and downs of data:比如,Distribution of the total memory size,緩存大小,等等.
Enumeration values for the entire state of the application is very useful,Although they can be collected in the form of a finer.比如,You are using aFunction of door control框架,A state has more than one(比如,使用中、關閉、屏蔽 等)的功能,Maybe use enumeration would be more useful.
Analysis on different indicators,Because they correspond to the continuous events.比如,在網絡服務器中,Event is an external request and work.特別是,In the event before the event analysis cannot be sent.
Event contains specific indicators:延遲,數量,And other details of the service request of,等等.
Now a possible option is to log structured.Send events to send only with correct format effective載荷的日志.This data can be from the log aggregator request,然後解析,And put them in a suitable system,So I can for the visibility of it.
You can use a log to track error,Error analysis can also be used to track.But a special error system is worth.An optimization system for error can send more mistakes,Because the error is rare after all.So that it can send the correct data,And with these data,It can make more intelligent thing.Python Errors in the tracking system is usually associated with general exception handling,然後收集數據,And put it to a special error aggregator.
很多情況下,自己運行 Sentry 是正確的做法.當錯誤發生時,Means that something is wrong.It is impossible to delete sensitive data in a reliable way of,Because there must be a will appear sensitive data is sent to the should not place.
通常,This work will not be very big:Abnormal does not often appear.最後,This system does not need high quality,Also don't need high reliability backup.The errors of yesterday and should already be repaired,希望如此,如果沒有,You'll also find!
Observable system development faster,Because they can give you feedback.They run up also more secure,Because when a problem,They also earlier to let you know.最後,Because there is a feedback loop,Observability also helps to build a repeatable process around it.Observability can let you know your application.And know more about them,Is half the battle.
Build all the observable layer is a difficult thing.Always let a person feel is a waste of work,或者更像是“可以有,但是不急”.
Then can you do this?也許吧,但是不應該.All right behind the observability can accelerate the construction of a stage of development:測試、監控,Even is to train people.In a industry and science and technology industry turmoil,Reduce to train people, effort is worth it.
事實上,可觀測性很重要,So write it out as soon as possible,Then you can in the whole process for maintenance.反過來,It will also help you maintain your software.
via:
作者: 選題: 譯者: 校對:
本文由 原創編譯, 榮譽推出