210 lines
11 KiB
Plaintext
210 lines
11 KiB
Plaintext
package ui
|
|
|
|
templ About() {
|
|
<div>
|
|
<h1>about</h1>
|
|
<p>
|
|
Converge is a utility for troubleshooting builds on continuous integration servers.
|
|
It solves a common problem where the cause of job failure is difficult to determine.
|
|
This is complicated further by the fact that build jobs are usually run on a build
|
|
farm where there is no access to the build agents or in more modern envrionments when
|
|
jobs are run in ephemeral containers.
|
|
</p>
|
|
|
|
<p>
|
|
With Converge it is possible to get remote shell access to such jobs. This works
|
|
by configuring the build job to connect to a Converge server using an agent program.
|
|
The agent program can be downloaded from within the CI job using curl or wget.
|
|
Next, an end-user can connect to the Converge server, a rendez-vous server, that connects
|
|
the client and server together based on a common identifier specified by both client and
|
|
server.
|
|
</p>
|
|
|
|
|
|
|
|
<h2>how it works</h2>
|
|
|
|
<p>
|
|
The steps involved are as follows:
|
|
<ul>
|
|
<li>The agent connects to converge server and specifies an id, the so-called rendez-vous id,
|
|
identifying the agent.
|
|
The agent outputs an example command that can be used to connect to this agent.
|
|
</li>
|
|
<li>The agent sets up multiplexing of connections together with converge server
|
|
which allows it to listen on incoming connections.
|
|
</li>
|
|
<li>This is used by the agent for running an embedded SSH server that is listening for
|
|
incoming connection requests from clients.
|
|
</li>
|
|
<li>The client/user connects to the converge server using the command specified by the agent.
|
|
This uses the same id as that used by the agent. The converge server can now match these
|
|
ids an set up an end-to-end connection from client to agent. The role of converge server
|
|
is simply in matching these ids and connecting the two websocket connections (from agent
|
|
and from client) together by copying data between them as it arrives.
|
|
</li>
|
|
<li>The embedded SSH server now performs authentication, after successful login,
|
|
a shell is spawned and the session is established. The shell can be any linux
|
|
shell but also command prompt and powershell are possible.
|
|
The connection is practically identical to a regular terminal connection. To
|
|
achieve this, the shell is made to beiieve that it is connected to a
|
|
terminal.
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>With regards to the rendez-vous id there are the following remarks:
|
|
<ul>
|
|
<li> If no id is specified than an id is generated. </li>
|
|
<li> If the agent uses an id already in use by another agent, then converge server will
|
|
generate a new id. </li>
|
|
</ul>
|
|
The agent will always print the id and command required to connect to it to standard output.
|
|
</p>
|
|
|
|
<h2>security</h2>
|
|
|
|
<p>
|
|
The setup is such that the connection from client (end-user) to server (agent on CI job)
|
|
is end-to-end encrypted. The Converge server itself is no more than a bitpipe which pumps
|
|
data between client and agent.
|
|
</p>
|
|
|
|
<p>Using authorized keys is a secure way of connecting. When running the agent, the authorized keys
|
|
must be put in a file, allowing only the designated users to connect. The file containing authorized keys
|
|
can also be edited during a session with the agent, allowing more people to be added when required without
|
|
having to start over again.
|
|
Using authorized keys is made easy through the
|
|
<a href="usage.html">usage</a> page, which provides the exact commands to execute based
|
|
on the target environment. If users are hesitant to use their public key it is also possible
|
|
to generate a separate ssh key-pair using <code>ssh-keygen</code> and use that instead.
|
|
</p>
|
|
|
|
<p>To be able to use Converge, you must already have access to the configuration of a build job.
|
|
Having that access means it is possible to execute any command on a build agent. The Converge
|
|
agent is started by the build job and does not have any additional rights compared to what you
|
|
could script in the continous integration job definition.
|
|
</p>
|
|
|
|
<p>Converge does not provide any stealth features to hide it. The public sessions page show all
|
|
agents and clients including details about the clients and the agents. The idea is that it should
|
|
be light-weight and easy to use. There is no reason to hide the fact that someone is debugging
|
|
a continuous integration job. Also, all sessions are logged,both using standard kubernetes tooling
|
|
such as (fluentbit/filebeat, and loki/elasticsearch depending on the environment). This logging includes
|
|
only the details about the sessions, but not what the user is doing inside a session. Also, Converge
|
|
provides a prometheus metrics endpoint which allows user sessions to be tracked over time after
|
|
the fact. Thie data is also made accessible using a grafana dashboard.
|
|
</p>
|
|
|
|
<h2>SSH and SFTP</h2>
|
|
|
|
<p>
|
|
Both ssh and sftp are supported. Multiple concurrent sessions to same agent are allowed as well
|
|
as multiple agents are also allowed.
|
|
</p>
|
|
|
|
<h2>timeouts</h2>
|
|
|
|
<p>
|
|
There is a timeout mechanism in the agent such that jobs do not hang indefinitely
|
|
waiting for a connection. This mechanism is useful to make sure build agents do not keep
|
|
build agents occupied for a long time. By default, the agent exits with status 0 when
|
|
the last client exits after logging in. The timeout is an inactivity timeout. Activity is
|
|
detected as follows:
|
|
<ul>
|
|
<li><b>ssh</b>: any key press is considered activity</li>
|
|
<li><b>sftp</b>: any output from the server side is considered activity. This is done to
|
|
make sure that longer downloads cannot be killed by a timeout. A simple <code>ls</code> command
|
|
in an sftp session will also lead to activity since the server will output the result of the command. </li>
|
|
</ul>
|
|
</p>
|
|
<p>When the user touches a .hold file, the agent keeps waiting for connections even
|
|
after the last client logs out, taking into account the timeout. By default the agent
|
|
exits when the last user has logged out.
|
|
</p>
|
|
|
|
<h2>remote shell usage</h2>
|
|
|
|
<p>
|
|
The agent supports a --shells command-line option by which a comma-separated
|
|
list of shells can be prepended to the default search path for shells, e.g.
|
|
<code>--shells zsh,csh,sh</code> (linux) or <code>cmd,powershell</code> for
|
|
windows.
|
|
</p>
|
|
|
|
<p>
|
|
The agent sets an <coder>agentdir</coder> environment variable that points to
|
|
the directory where the agent is running.
|
|
</p>
|
|
|
|
<p>The user will get notifications from the agent any time something important happens such
|
|
as the session being close to timeout.
|
|
</p>
|
|
|
|
<h2>other tools</h2>
|
|
|
|
<p>Using available existing tools such as
|
|
<a href="https://github.com/namespacelabs/breakpoint">breakpoint</a> in combination
|
|
with a websocket tunneling tool such as
|
|
<a href="https://github.com/erebe/wstunnel">wstunnel</a> a similar solution can be
|
|
obtained. There are however some problems with these solutions that converge is
|
|
trying to address:
|
|
</p>
|
|
|
|
<p>
|
|
<ul>
|
|
<li>Breakpoint uses an embedded SSH server which is a really good idea but
|
|
uses the QUIC protocol for connecting to a rendez-vous server. The rendez-vous server then
|
|
exposes a random port for every client. This make deployment on kubernetes really hard
|
|
where fixed ports must be used and QUIC is also not a widely supported protocol.</li>
|
|
<li>The problem with the random ports can be solved by using wstunnel running together
|
|
with breakpoint server in a kubernetes pod, where wstunnel can forward traffic over an
|
|
external websocket connection to the local random port that breakpoint server is listening on.</li>
|
|
<li>breakpoint leaves it open on how users install the breakpoint executable (agent). </li>
|
|
<li>Because of the hacky nature of this setup, it is very difficult for users to use
|
|
and troubleshoot when things go wrong. </li>
|
|
</ul>
|
|
|
|
</p>
|
|
Converve server addresses these issues in the following ways:
|
|
<ul>
|
|
<li>Use the websocket protocol both for agents and for clients, providing a fixed port and
|
|
a supported protocol for kubernetes deploymment. Websockets are also supported by
|
|
kubernetes ingress controllers so this makes it easy to deploy on kubernetes.
|
|
To make this work with SSH which does not natively support websockets, a proxycommand
|
|
<code>wsproxy</code> is provided that allows SSH to connect using websockets.
|
|
</li>
|
|
<li>Providing online documentation where the instructions take into account the
|
|
hostname and protocol where converge is running allowing users to cut and paste
|
|
instructions that can be used without modification. In the usage page the users
|
|
can even generate the correct agent startup commands and client connection commands
|
|
based on the type of shell they are connecting to. </li>
|
|
<li>Converge server provides out of the box downloads of required software. This makes sure
|
|
client and server are always up to date and can be downloaded in any continuous integration
|
|
job without having to package the required executables in an ad-hoc way.
|
|
In addition a protocol version check is done. </li>
|
|
<li>User-friendly error messages can be given to users in most cases when things do not work
|
|
out because of <code>wsproxy</code>. This is an SSH proxy command that communicates with converge
|
|
and provides additional information to the user. </li>
|
|
<li>A live screen showing the current sessions that are running. The sessions webpage provides
|
|
additional feedback about the running sessions. </li>
|
|
<li>Interactivity in the user's session with notifications about timeouts and a very
|
|
simple inactivity timeout mechanism. </li>
|
|
<li>Possibility for the user to define the remote shell to use. </li>
|
|
<li>Support for unix like bash shells and command prompt and powershell. </li>
|
|
<li>Observability w.r.t. non-functionals of converge and of agent and client sessions through
|
|
prometheus monitoring. For session monitoring, separate grafana dashboard is provided. </li>
|
|
</ul>
|
|
<p>
|
|
</p>
|
|
|
|
</div>
|
|
}
|
|
|
|
|
|
templ AboutTab() {
|
|
@BasePage(1) {
|
|
@About()
|
|
}
|
|
}
|