Thursday, July 4, 2024

Host directory mounted as volume in docker compose - cannot write to host directory from within docker container

I needed to write to the host directory mounted as a volume in my docker container.  Running on my local environment worked great but failed when deployed to the test environment.

Environment:

  • Ubuntu 20.04.6 LTS
  • Docker Engine - Community Version 26.0.1

The host machine was a Ubuntu 20.04 virtual machine with user "develop".  Note that the "develop" user was in the "docker" group so didn't need to run sudo on any docker commands.  I only saw this problem when I deployed to the test environment because the "develop" user had a different user ID/group ID compared to my development environment.  On my development environment the "develop" user had user ID 1000 and group ID 1000, but on the test environment the "develop" user had user ID 1011 and group ID of 1011.

Problem

Here's my original code with the problem.

docker-compose.yml before (The $USER environment variable holds value "develop"):

services:

  cherry_container:

    container_name: cherry_container

    build:

      image: ubuntu:20.04

      args:

        SERVICE_USER: $USER

    volumes:

      - /cherryshoe_shared_volume:/home/$USER/cherryshoe_shared_volume

    extra_hosts:

      - "cs-test:172.16.0.0"

    environment:

      - "CS_EXTRA_ENV_VAR=example"


Dockerfile before:

# https://hub.docker.com/layers/library/ubuntu/20.04/images/sha256-b2339eee806d44d6a8adc0a790f824fb71f03366dd754d400316ae5a7e3ece3e

FROM ubuntu:20.04



# install OS tools

RUN apt-get update && apt-get install -y vim && \

apt-get clean all



# arguments available during docker build

ARG SERVICE_USER



ENV USERNAME $SERVICE_USER

RUN useradd -U -ms /bin/bash $USERNAME \

  && usermod -aG sudo $USERNAME \

  && echo "$USERNAME:$USERNAME" | chpasswd



# start in $USERNAME home as $USERNAME

WORKDIR /home/$USERNAME

USER $USERNAME

   

# https://serverfault.com/a/1084975

# keep container running for use by shelling into it with 'docker exec -it <container_name> bash'

CMD sh -c 'trap "exit" TERM; while true; do sleep 1; done'


Ran "docker compose config" to check that the args in the docker-compose.yml file were replaced as I expected them to be.

Shell into the cherryshoe_container: docker exec -it cherryshoe_container bash

The below commands show that a file could not be created while inside the cherryshoe_container docker container.  Notice the 1011 for user ID and group ID for the cherryshoe_shared_volume folder.

develop@0d8c5da6978e:~/cherryshoe_shared_volume$ ls -la

total 8

drwxrwxr-x 2 1011 1011 4096 Jun 26 17:10 .

drwxr-xr-x 1 develop develop 4096 Jun 26 17:11 ..

develop@0d8c5da6978e:~/cherryshoe_shared_volume$ touch hi.txt

touch: cannot touch 'hi.txt': Permission denied

Solution

In order for the "develop" user to write to the cherryshoe_shared_volume folder while inside the docker container was to have the user ID and group ID synchronized on the host and in the docker container.

Update "develop" user .bashrc file on the HOST with the user ID and group ID; source or restart the session:

export SERVICE_USER_ID=1011

export SERVICE_USER_GROUP_ID=1011


You can find out the user ID and group ID with these commands:

SERVICE_USER_ID=$(id -u)

SERVICE_USER_GROUP_ID=$(id -g)


docker-compose.yml after:

services:

  cherry_container:

    container_name: cherry_container

    build:

      image: ubuntu:20.04

      args:

        SERVICE_USER: $USER

        SERVICE_USER_ID: $SERVICE_USER_ID

        SERVICE_USER_GROUP_ID: $SERVICE_USER_GROUP_ID

    volumes:

      - /data:/home/$USER/data

    extra_hosts:

      - "cs-test:172.16.0.0"

    environment:

      - "CS_EXTRA_ENV_VAR=example"


Dockerfile after:

# https://hub.docker.com/layers/library/ubuntu/20.04/images/sha256-b2339eee806d44d6a8adc0a790f824fb71f03366dd754d400316ae5a7e3ece3e

FROM ubuntu:20.04



# install OS tools

RUN apt-get update && apt-get install -y vim && \

apt-get clean all



# arguments available during docker build

ARG SERVICE_USER

ARG SERVICE_USER_ID

ARG SERVICE_USER_GROUP_ID



# create USER environment variable to match Ubuntu host

ENV USER $SERVICE_USER 



# synchronize the service user inside the container with the service user outside the container so that they have the same permission to the mounted volume.

RUN addgroup --gid $SERVICE_USER_GROUP_ID $SERVICE_USER \

   && useradd -ms /bin/bash --uid $SERVICE_USER_ID --gid $SERVICE_USER_GROUP_ID $SERVICE_USER \ 

   && usermod -aG sudo $SERVICE_USER \

   && echo "$SERVICE_USER:$SERVICE_USER" | chpasswd



# start in $SERVICE_USER home as $SERVICE_USER

WORKDIR /home/$SERVICE_USER

USER $SERVICE_USER

   

# https://serverfault.com/a/1084975

# keep container running for use by shelling into it with 'docker exec -it <container_name> bash'

CMD sh -c 'trap "exit" TERM; while true; do sleep 1; done'


Ran "docker compose config" to check that the args in the docker-compose.yml file were replaced as I expected them to be.

Shell into the cherryshoe_container: docker exec -it cherryshoe_container bash

The below commands show that a file could now be created while inside the cherryshoe_container docker container.  Notice the user ID and group ID for the cherryshoe_shared_volume folder are no longer numbered but the username for the folder.

develop@0d8c5da6978e:~/cherryshoe_shared_volume$ ls -la

total 8

drwxrwxr-x 2 develop develop 4096 Jun 26 17:10 .

drwxr-xr-x 1 develop develop 4096 Jun 26 17:11 ..

develop@0d8c5da6978e:~/cherryshoe_shared_volume$ touch hi.txt

Friday, June 7, 2024

ROS republish node command and roslaunch equivalent

I've been working on ROS and recently had to subscribe to a topic that had a theora_image_transport/Packet message type.  I'm also using rospy library to use python to interface with ROS.

Environment:

  • Ubuntu 20.04.6 LTS
  • ROS noetic
  • Python 3.8.10

Theora_image_transport is a plugin package for image_transport, and because of that, is not yet supported with Python.  I ran a republish node from theora to compressed image format in order to save the files using python to the file system.

The original theora topic name is /cherryshoe/camera/image_raw/theora.  The republish node is named /cherryshoe/repub/camera/image_compressed.  This node publishes images in the sensor_msgs/CompressedImage type.

Tuesday, December 26, 2023

Parellizing API calls with python asyncio

Some thoughts on asyncio and a code example where I refactored the code to use it.


The code example refactors the get_results function to use asyncio.  It uses both _get_info_lookup and _retrieve_aggregation_results functions; their implementation details are not shown.  _retrieve_aggregation_results ultimately makes the HTTP API call to get data from an external source.

Old Code Snippet:
def get_results(metadata, granularity):
    [some code]
    combined_result = {}
    info_lookup_list = _get_info_lookup(granularity)
    for retrieval_info in info_lookup_list:
        result_info = _retrieve_aggregation_results(retrieval_info, metadata)
        combined_result[retrieval_info['result_key']] = result_info['result']
    [some code]
Refactored Code Snippet:
async def _retrieve_aggregation_results_async_coroutine(retrieval_info, metadata):
    """
    Native coroutine, modern syntax. Retrieve aggregation results.
    Return a tuple=(
        <retrieval_info['result_key']>, (String) e.g. daily, hourly
        <dictionary of lists> (Dictionary) e.g. { result_list_1, result_list_2, result_list_3 }
    )
    """
    result_info = _retrieve_aggregation_results(retrieval_info, metadata)
    return (retrieval_info['result_key'], result_info)

async def _create_aggregation_results_async_scheduler(info_lookup_list, metadata):
    """
    Create an async scheduler for retrieving aggregation results.
    Return a list of tuples for each task created:
    tuple=(
        <retrieval_info['result_key']>, (String) e.g. daily, hourly
        <dictionary of lists> (Dictionary) e.g. { result_list_1, result_list_2, result_list_3 }
    )
    """
    request_list = []

    for retrieval_info in info_lookup_list:
        # in Python 3.6 or lower, use asyncio.ensure_future() instead of create_task
        task = asyncio.ensure_future(_retrieve_aggregation_results_async_coroutine(retrieval_info, metadata))
        request_list.append(task)

    # gather all results by unpacking the request_list and passing them each as a parameter to asyncio.gather()
    result_list = await asyncio.gather(*request_list)
    return result_list

def get_results(metadata, granularity):
    [some code]
    
    info_lookup_list = _get_info_lookup(granularity)
    
    # in Python 3.6 or lower, use ayncio.get_event_loop and run_until_complete instead of asyncio.run
    try:
        event_loop = asyncio.get_event_loop()
    except RuntimeError as e:
        if str(e).startswith('There is no current event loop in thread'):
            event_loop = asyncio.new_event_loop()
            asyncio.set_event_loop(event_loop)

    result_list = event_loop.run_until_complete(_create_aggregation_results_async_scheduler(info_lookup_list, metadata))

    for tuple_result in result_list:
        # tuple_result[0] is a String that holds the result_key (e.g. daily, hourly)
        # tuple_result[1] is a Dictionary of lists (e.g. { result_list_1, result_list_2, result_list_3 })
        result_key = tuple_result[0]
        performance_result = tuple_result[1]

        combined_result[result_key] = performance_result
        
    [some code]

Sunday, August 6, 2023

PostgreSQL - queries for jsonb data type to apply and unapply json attribute structure changes

My application has a table called cs_savedattribute which holds an args column that saves detailed JSON attributes for a record.  There was an existing "cs_id": "123456789" attribute that needed to be moved to JSON structure:

{
  "source": {
    "id": "123456789",
    "csGroupingType": "Primary"
  }
}

Environment: 

PostgreSQL 12.9 

Below are the PostgreSQL statements that can apply it and then un-apply it if necessary.

NOTE: Because the args column is text all queries below had to be cast to jsonb for every instance first.

FORWARD 

-- 1 WORKS: add the new args.source json attribute

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) || '{"source": {"id": "SET_ME","csGroupingType": "Primary"}}'::jsonb
WHERE CAST(args AS jsonb) ? 'cs_id' = true;

-- 2 WORKS: pain because it's a nested item so the key is '{source,id}'

--          update the args.source.id value using args.cs_id's value

UPDATE cs_savedattribute SET args = jsonb_set(CAST(args AS jsonb), '{source,id}', to_jsonb(CAST(args AS jsonb) ->> 'cs_id'))
WHERE CAST(args AS jsonb) ? 'cs_id' = true;

-- 3 WORKS: remove args.cs_id

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) - 'cs_id'
WHERE CAST(args AS jsonb) ? 'source' = true;

REVERSE

-- 1 Didn't use jsonb_set because since args.cs_id didn't exist got a null error, use jsonb_build_object instead

--    add and set args.cs_id using args.source.id's value

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) || jsonb_build_object('cs_id', CAST(args AS jsonb) -> 'source' ->> 'id')
WHERE CAST(args AS jsonb) ? 'source' = true;

-- 2 remove args.source

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) - 'source'
WHERE CAST(args AS jsonb) ? 'source' = true;

VERIFY

select id, uuid, args from cs_savedattribute WHERE CAST(args AS jsonb) ? 'cs_id' = true
select id, uuid, args from cs_savedattribute WHERE CAST(args AS jsonb) ? 'source' = true

This article was helpful: https://stackoverflow.com/questions/45481692/postgres-jsonb-set-multiple-nested-fields


Wednesday, October 12, 2022

Upgrade Node.js 14 to Node.js 16 (with ansible example)

My project was using the NodeSource installer for version 14.x to install Node.js, npm, and npx.  A couple weeks ago that stopped working (~September 2022) because the NodeSource nodejs-14.10.1 installer was no longer available.   When using the NodeSource nodejs-16.15.0 installer version, npm and npx wasn't being installed (* Note below).  Because of this I had to find another method to install Node.js.

Another Node Installer method Node.js recommends is to use the Node.js installer.  I chose the https://nodejs.org/download/release/v16.15.0/node-v16.15.0-linux-x64.tar.xz archive file and it worked (Node.js, npm, and npx was installed)!

Steps to upgrade:

1. Verify prior versions

$ node -v

v14.10.1

$ npm -v

6.14.8

$ npx -v

6.14.8

2. Stop PM2 (PM2 is the node manager used)

pm2 kill

3. The NodeSource installer uses the yum repo file, verify it exists

ls -la /etc/yum.repos.d/nodesource*.repo

4. Uninstall NodeSource installed Node.js Enterprise Linux Packages

(steps taken from https://github.com/nodesource/distributions)

To completely remove Node.js installed from the rpm.nodesource.com package:

# use `sudo` or run this as root

yum remove nodejs (reply y)

rm -r /etc/yum.repos.d/nodesource*.repo

yum clean all

5. Install Node.js using Node.js archive tar file

(steps taken from https://github.com/nodejs/help/wiki/Installation#how-to-install-nodejs-via-binary-archive-on-linux)

   mkdir -p /usr/local/lib/nodejs

   wget https://nodejs.org/download/release/v16.15.0/node-v16.15.0-linux-x64.tar.xz

   tar -xJvf node-v16.15.0-linux-x64.tar.xz -C /usr/local/lib/nodejs 

   Chose to use symbolic link to /usr/bin since this path is already in my PATH environment variable:

   ln -s /usr/local/lib/nodejs/node-v16.15.0-linux-x64/bin/node /usr/bin/node

   ln -s /usr/local/lib/nodejs/node-v16.15.0-linux-x64/bin/npm /usr/bin/npm

   ln -s /usr/local/lib/nodejs/node-v16.15.0-linux-x64/bin/npx /usr/bin/npx

6. Verify install

$ node -v

v16.15.0

$ npm version

8.5.5

$ npx -v

8.5.5

7. Start PM2

pm2 start

I then ansiblized step 5 from the solution above.  Tar and unzip had to be installed on the system for the ansible unarchive module to work.

# If Node.js is not installed use Node.js archive file to install

- block:

    - name: Define Node.js version variable

      shell: echo "v16.15.0"

      register: command_output_nodejs_version

    - name: Define Node.js install folder location

      shell: echo "/usr/local/lib/nodejs"

      register: command_output_nodejs_install_dir

    - set_fact:

        nodejs_version: "{{command_output_nodejs_version.stdout}}"

        nodejs_install_dir: "{{command_output_nodejs_install_dir.stdout}}"

    - name: Boolean if Node.js install folder exists

      stat:

        path: "{{nodejs_install_dir}}"

      register: command_output_path_exists

    - set_fact:

        nodejs_install_dir_exists: "{{command_output_path_exists.stat.exists}}"

    - debug:

        msg: nodejs_version {{nodejs_version}}, nodejs_install_dir {{nodejs_install_dir}}, nodejs_install_dir_exists {{nodejs_install_dir_exists}}

- block:

    - name: Create Node.js install folder

      file:

        path: "{{nodejs_install_dir}}"

        state: directory

    - name: Download and unarchive Node.js {{nodejs_version}} file

      unarchive:

        src: "https://nodejs.org/download/release/{{nodejs_version}}/node-{{nodejs_version}}-linux-x64.tar.xz"

        dest: "{{nodejs_install_dir}}"

        remote_src: yes

    - name: Create symbolic link for node in /usr/bin

      file:

        src: "/usr/local/lib/nodejs/node-{{nodejs_version}}-linux-x64/bin/node"

        dest: "/usr/bin/node"

        state: link

    - name: Create symbolic link for npm in /usr/bin

      file:

        src: "/usr/local/lib/nodejs/node-{{nodejs_version}}-linux-x64/bin/npm"

        dest: "/usr/bin/npm"

        state: link

    - name: Create symbolic link for npx in /usr/bin

      file:

        src: "/usr/local/lib/nodejs/node-{{nodejs_version}}-linux-x64/bin/npx"

        dest: "/usr/bin/npx"

        state: link

  when: not nodejs_install_dir_exists

- debug: msg="Node.js {{nodejs_version}} already installed"

  when: nodejs_install_dir_exists

* Note: I stumbled upon NodeSource nodejs-16.15.0 installer failing to install npm and npx because of the below error:

"TypeError [ERR_INVALID_ARG_TYPE]: The \"file\" argument must be of type string. Received null".

The true problem was exposed here - "Calling [NPM] to install pm2-logrotate@2.7.0".  Verifying the npm version failed since npm wasn't found.

Saturday, April 30, 2022

React and uncontrolled component experiment

My project uses React for the client side code.  Normally, changes in the display options dropdown are made as soon as the user makes them so there is no need for "Cancel" or "Apply" buttons. There is, however, a section of input fields that must be canceled or applied as a set, which is where a recent bug appeared.

Environment:
react@17.0.1

The "Cancel" button (and closing of the dropdown) for the set of input fields was not restoring settings prior to the settings being applied.  This was confusing and erroneous because the user thinks the settings were applied even if they may not have been.

Let's name the parent component <Result> and the child component <DisplayOptions>.  The reason why this occurred is because the set of input elements were all React controlled components where state was stored in the parent <Result> component and an event handler was called for every state update.  

What needed to happen was the set of input elements needed to be saved to the parent as a set when the "Apply" button was clicked and when the "Cancel" button was clicked it needed to restore any settings that hadn't been applied.

There were two solutions that I looked into to solve this:   

  1. Change input elements to uncontrolled components.  After hitting apply, call a props function to save it back up to the parent.  
    • Outcome - I've noted down the reasons why I couldn't go with this solution:
      • If <DisplayOptions> had no child components that depended on the changing value of an uncontrolled component, I think it would work.  But if there is a child component that is dependent on an uncontrolled component's value changing, there's no way to force a prop or state change for the child component to re-render.
      • Extra care is needed to keep track of checkbox values with input[inputReferenceName].current.checked and text values with input[inputReferenceName].current.value.
      • For any values that are inherently a number, the uncontrolled component's current value is always a string.  So parsing to an int would need to happen each time its value is needed.
      • Only after the initial render will any form input elements have the handle to the React reference declared in the component's constructor.  This made for extraneous code used throughout to either use the dropdown's component React reference to the input element's current value when it was defined, or the props value of the input element sent from the parent to set the initial value of the input elements.
  2. Keep the input elements React controlled, but have a local copy saved in state for the <DisplayOptions> component.  Each time <DisplayOptions> is opened the constructor is called, so this was the location to set the props values of the input elements sent from the parent to local state.  After clicking apply, call a props function to save it back up to the parent.  This way the next time <DisplayOptions> is opened, its constructor would be called again to set the current value of the input elements.
    • Outcome - This ended up being a simple solution and best solution for the use case.

Thursday, March 31, 2022

PostgreSQL - query for array elements inside json data type

My application has a table column "params" that is a JSON data type.  As a side note, it's a JSON data type (vs JSONB) because the ordering of keys must stay consistent because this column is hashed and used as the key for caching.  The contents of the "params" column has object values that are strings, numbers, and arrays.

Environment: 

PostgreSQL 12.9

Take a params value that looks like:

{
    "filterId": "a1cef72a-9d84-4cfc-9690-9f4d772f446c",
    "name": "cherryshoetech",
    "active": true,
    "priority": 2,
    "areas": [{
            "type": "custom",
            "startedInside": true,
            "endedInside": false,
            "order": 0
        }
    ],
}

To query for a specific value in the "areas" array, make use of the json_array_elements function.  It will expand a JSON array to a set of JSON values.  Therefore, it will return one record for each element in the array.

The following will return a record for each json array element in analysis.params.areas:

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas;

Once you filter it down with the where clause, it will only return the record that satisfies that criteria. Below are examples for filtering by a string, integer, and boolean:

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas
where cherryshoeareas ->> 'type' = 'custom';

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas
where (cherryshoeareas ->> 'order')::integer = 0;

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas
where (cherryshoeareas ->> 'startedInside')::boolean is true
and (cherryshoeareas ->> 'endedInside')::boolean is false;

Helpful Articles:

https://www.postgresql.org/docs/12/datatype-json.html

https://stackoverflow.com/questions/22736742/query-for-array-elements-inside-json-type

https://www.postgresql.org/docs/12/functions-json.html