Cherry Shoe Technologies

Tuesday, December 26, 2023

Parellizing API calls with python asyncio

Some thoughts on asyncio and a code example where I refactored the code to use it.

Asyncio is one of the three ways to parallelize API calls with python. Asyncio was available starting in python 3.4.

https://stackoverflow.com/questions/49663091/how-to-parallelize-python-api-calls

Why are all the examples online having to do with asyncio and aiohttp? You can use regular python 'requests' library with asyncio.
I hadn't explicitly used the unpack * operator to unpack a list into separate expressions in python yet:

https://realpython.com/async-io-python/#the-asyncawait-syntax-and-native-coroutines
https://stackoverflow.com/questions/71235585/passing-in-a-list-of-functions-to-asyncio-gather
THIS article used the explicit word unpack and not just use the * and not explain what it was - https://superfastpython.com/asyncio-gather/

No event loop in current thread, create if doesn't exist:

https://stackoverflow.com/questions/46727787/runtimeerror-there-is-no-current-event-loop-in-thread-in-async-apscheduler

The code example refactors the get_results function to use asyncio. It uses both _get_info_lookup and _retrieve_aggregation_results functions; their implementation details are not shown. _retrieve_aggregation_results ultimately makes the HTTP API call to get data from an external source.

Old Code Snippet:

def get_results(metadata, granularity):
    [some code]
    combined_result = {}
    info_lookup_list = _get_info_lookup(granularity)
    for retrieval_info in info_lookup_list:
        result_info = _retrieve_aggregation_results(retrieval_info, metadata)
        combined_result[retrieval_info['result_key']] = result_info['result']
    [some code]

Refactored Code Snippet:

async def _retrieve_aggregation_results_async_coroutine(retrieval_info, metadata):
    """
    Native coroutine, modern syntax. Retrieve aggregation results.
    Return a tuple=(
        <retrieval_info['result_key']>, (String) e.g. daily, hourly
        <dictionary of lists> (Dictionary) e.g. { result_list_1, result_list_2, result_list_3 }
    )
    """
    result_info = _retrieve_aggregation_results(retrieval_info, metadata)
    return (retrieval_info['result_key'], result_info)

async def _create_aggregation_results_async_scheduler(info_lookup_list, metadata):
    """
    Create an async scheduler for retrieving aggregation results.
    Return a list of tuples for each task created:
    tuple=(
        <retrieval_info['result_key']>, (String) e.g. daily, hourly
        <dictionary of lists> (Dictionary) e.g. { result_list_1, result_list_2, result_list_3 }
    )
    """
    request_list = []

    for retrieval_info in info_lookup_list:
        # in Python 3.6 or lower, use asyncio.ensure_future() instead of create_task
        task = asyncio.ensure_future(_retrieve_aggregation_results_async_coroutine(retrieval_info, metadata))
        request_list.append(task)

    # gather all results by unpacking the request_list and passing them each as a parameter to asyncio.gather()
    result_list = await asyncio.gather(*request_list)
    return result_list

def get_results(metadata, granularity):
    [some code]
    
    info_lookup_list = _get_info_lookup(granularity)
    
    # in Python 3.6 or lower, use ayncio.get_event_loop and run_until_complete instead of asyncio.run
    try:
        event_loop = asyncio.get_event_loop()
    except RuntimeError as e:
        if str(e).startswith('There is no current event loop in thread'):
            event_loop = asyncio.new_event_loop()
            asyncio.set_event_loop(event_loop)

    result_list = event_loop.run_until_complete(_create_aggregation_results_async_scheduler(info_lookup_list, metadata))

    for tuple_result in result_list:
        # tuple_result[0] is a String that holds the result_key (e.g. daily, hourly)
        # tuple_result[1] is a Dictionary of lists (e.g. { result_list_1, result_list_2, result_list_3 })
        result_key = tuple_result[0]
        performance_result = tuple_result[1]

        combined_result[result_key] = performance_result
        
    [some code]

Sunday, August 6, 2023

PostgreSQL - queries for jsonb data type to apply and unapply json attribute structure changes

My application has a table called cs_savedattribute which holds an args column that saves detailed JSON attributes for a record. There was an existing "cs_id": "123456789" attribute that needed to be moved to JSON structure:

{
  "source": {
    "id": "123456789",
    "csGroupingType": "Primary"
  }
}

Environment:

PostgreSQL 12.9

Below are the PostgreSQL statements that can apply it and then un-apply it if necessary.

NOTE: Because the args column is text all queries below had to be cast to jsonb for every instance first.

FORWARD

-- 1 WORKS: add the new args.source json attribute

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) || '{"source": {"id": "SET_ME","csGroupingType": "Primary"}}'::jsonb
WHERE CAST(args AS jsonb) ? 'cs_id' = true;

-- 2 WORKS: pain because it's a nested item so the key is '{source,id}'

-- update the args.source.id value using args.cs_id's value

UPDATE cs_savedattribute SET args = jsonb_set(CAST(args AS jsonb), '{source,id}', to_jsonb(CAST(args AS jsonb) ->> 'cs_id'))
WHERE CAST(args AS jsonb) ? 'cs_id' = true;

-- 3 WORKS: remove args.cs_id

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) - 'cs_id'
WHERE CAST(args AS jsonb) ? 'source' = true;

REVERSE

-- 1 Didn't use jsonb_set because since args.cs_id didn't exist got a null error, use jsonb_build_object instead

-- add and set args.cs_id using args.source.id's value

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) || jsonb_build_object('cs_id', CAST(args AS jsonb) -> 'source' ->> 'id')
WHERE CAST(args AS jsonb) ? 'source' = true;

-- 2 remove args.source

UPDATE cs_savedattribute SET args = CAST(args AS jsonb) - 'source'
WHERE CAST(args AS jsonb) ? 'source' = true;

VERIFY

select id, uuid, args from cs_savedattribute WHERE CAST(args AS jsonb) ? 'cs_id' = true

select id, uuid, args from cs_savedattribute WHERE CAST(args AS jsonb) ? 'source' = true

This article was helpful: https://stackoverflow.com/questions/45481692/postgres-jsonb-set-multiple-nested-fields

Wednesday, October 12, 2022

Upgrade Node.js 14 to Node.js 16 (with ansible example)

My project was using the NodeSource installer for version 14.x to install Node.js, npm, and npx. A couple weeks ago that stopped working (~September 2022) because the NodeSource nodejs-14.10.1 installer was no longer available. When using the NodeSource nodejs-16.15.0 installer version, npm and npx wasn't being installed (* Note below). Because of this I had to find another method to install Node.js.

Another Node Installer method Node.js recommends is to use the Node.js installer. I chose the https://nodejs.org/download/release/v16.15.0/node-v16.15.0-linux-x64.tar.xz archive file and it worked (Node.js, npm, and npx was installed)!

Steps to upgrade:

1. Verify prior versions

$ node -v

v14.10.1

$ npm -v

6.14.8

$ npx -v

6.14.8

2. Stop PM2 (PM2 is the node manager used)

pm2 kill

3. The NodeSource installer uses the yum repo file, verify it exists

ls -la /etc/yum.repos.d/nodesource*.repo

4. Uninstall NodeSource installed Node.js Enterprise Linux Packages

(steps taken from https://github.com/nodesource/distributions)

To completely remove Node.js installed from the rpm.nodesource.com package:

# use `sudo` or run this as root

yum remove nodejs (reply y)

rm -r /etc/yum.repos.d/nodesource*.repo

yum clean all

5. Install Node.js using Node.js archive tar file

(steps taken from https://github.com/nodejs/help/wiki/Installation#how-to-install-nodejs-via-binary-archive-on-linux)

   mkdir -p /usr/local/lib/nodejs

   wget https://nodejs.org/download/release/v16.15.0/node-v16.15.0-linux-x64.tar.xz

   tar -xJvf node-v16.15.0-linux-x64.tar.xz -C /usr/local/lib/nodejs

Chose to use symbolic link to /usr/bin since this path is already in my PATH environment variable:

   ln -s /usr/local/lib/nodejs/node-v16.15.0-linux-x64/bin/node /usr/bin/node

   ln -s /usr/local/lib/nodejs/node-v16.15.0-linux-x64/bin/npm /usr/bin/npm

   ln -s /usr/local/lib/nodejs/node-v16.15.0-linux-x64/bin/npx /usr/bin/npx

6. Verify install

$ node -v

v16.15.0

$ npm version

8.5.5

$ npx -v

8.5.5

7. Start PM2

pm2 start

I then ansiblized step 5 from the solution above. Tar and unzip had to be installed on the system for the ansible unarchive module to work.

# If Node.js is not installed use Node.js archive file to install

- block:

    - name: Define Node.js version variable

      shell: echo "v16.15.0"

      register: command_output_nodejs_version

    - name: Define Node.js install folder location

      shell: echo "/usr/local/lib/nodejs"

      register: command_output_nodejs_install_dir

    - set_fact:

        nodejs_version: "{{command_output_nodejs_version.stdout}}"

        nodejs_install_dir: "{{command_output_nodejs_install_dir.stdout}}"

    - name: Boolean if Node.js install folder exists

      stat:

        path: "{{nodejs_install_dir}}"

      register: command_output_path_exists

    - set_fact:

        nodejs_install_dir_exists: "{{command_output_path_exists.stat.exists}}"

    - debug:

        msg: nodejs_version {{nodejs_version}}, nodejs_install_dir {{nodejs_install_dir}}, nodejs_install_dir_exists {{nodejs_install_dir_exists}}

- block:

    - name: Create Node.js install folder

      file:

        path: "{{nodejs_install_dir}}"

        state: directory

    - name: Download and unarchive Node.js {{nodejs_version}} file

      unarchive:

        src: "https://nodejs.org/download/release/{{nodejs_version}}/node-{{nodejs_version}}-linux-x64.tar.xz"

        dest: "{{nodejs_install_dir}}"

        remote_src: yes

    - name: Create symbolic link for node in /usr/bin

      file:

        src: "/usr/local/lib/nodejs/node-{{nodejs_version}}-linux-x64/bin/node"

        dest: "/usr/bin/node"

        state: link

    - name: Create symbolic link for npm in /usr/bin

      file:

        src: "/usr/local/lib/nodejs/node-{{nodejs_version}}-linux-x64/bin/npm"

        dest: "/usr/bin/npm"

        state: link

    - name: Create symbolic link for npx in /usr/bin

      file:

        src: "/usr/local/lib/nodejs/node-{{nodejs_version}}-linux-x64/bin/npx"

        dest: "/usr/bin/npx"

        state: link

  when: not nodejs_install_dir_exists

- debug: msg="Node.js {{nodejs_version}} already installed"

  when: nodejs_install_dir_exists

* Note: I stumbled upon NodeSource nodejs-16.15.0 installer failing to install npm and npx because of the below error:

"TypeError [ERR_INVALID_ARG_TYPE]: The \"file\" argument must be of type string. Received null".

The true problem was exposed here - "Calling [NPM] to install pm2-logrotate@2.7.0". Verifying the npm version failed since npm wasn't found.

Saturday, April 30, 2022

React and uncontrolled component experiment

My project uses React for the client side code. Normally, changes in the display options dropdown are made as soon as the user makes them so there is no need for "Cancel" or "Apply" buttons. There is, however, a section of input fields that must be canceled or applied as a set, which is where a recent bug appeared.

Environment:
react@17.0.1

The "Cancel" button (and closing of the dropdown) for the set of input fields was not restoring settings prior to the settings being applied. This was confusing and erroneous because the user thinks the settings were applied even if they may not have been.

Let's name the parent component <Result> and the child component <DisplayOptions>. The reason why this occurred is because the set of input elements were all React controlled components where state was stored in the parent <Result> component and an event handler was called for every state update.

What needed to happen was the set of input elements needed to be saved to the parent as a set when the "Apply" button was clicked and when the "Cancel" button was clicked it needed to restore any settings that hadn't been applied.

There were two solutions that I looked into to solve this:

Change input elements to uncontrolled components. After hitting apply, call a props function to save it back up to the parent.

Outcome - I've noted down the reasons why I couldn't go with this solution:

If <DisplayOptions> had no child components that depended on the changing value of an uncontrolled component, I think it would work. But if there is a child component that is dependent on an uncontrolled component's value changing, there's no way to force a prop or state change for the child component to re-render.
Extra care is needed to keep track of checkbox values with input[inputReferenceName].current.checked and text values with input[inputReferenceName].current.value.
For any values that are inherently a number, the uncontrolled component's current value is always a string. So parsing to an int would need to happen each time its value is needed.
Only after the initial render will any form input elements have the handle to the React reference declared in the component's constructor. This made for extraneous code used throughout to either use the dropdown's component React reference to the input element's current value when it was defined, or the props value of the input element sent from the parent to set the initial value of the input elements.

Keep the input elements React controlled, but have a local copy saved in state for the <DisplayOptions> component. Each time <DisplayOptions> is opened the constructor is called, so this was the location to set the props values of the input elements sent from the parent to local state. After clicking apply, call a props function to save it back up to the parent. This way the next time <DisplayOptions> is opened, its constructor would be called again to set the current value of the input elements.

Outcome - This ended up being a simple solution and best solution for the use case.

Thursday, March 31, 2022

PostgreSQL - query for array elements inside json data type

My application has a table column "params" that is a JSON data type. As a side note, it's a JSON data type (vs JSONB) because the ordering of keys must stay consistent because this column is hashed and used as the key for caching. The contents of the "params" column has object values that are strings, numbers, and arrays.

Environment:

PostgreSQL 12.9

Take a params value that looks like:

{
    "filterId": "a1cef72a-9d84-4cfc-9690-9f4d772f446c",
    "name": "cherryshoetech",
    "active": true,
    "priority": 2,
    "areas": [{
            "type": "custom",
            "startedInside": true,
            "endedInside": false,
            "order": 0
        }
    ],
}

To query for a specific value in the "areas" array, make use of the json_array_elements function. It will expand a JSON array to a set of JSON values. Therefore, it will return one record for each element in the array.

The following will return a record for each json array element in analysis.params.areas:

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas;

Once you filter it down with the where clause, it will only return the record that satisfies that criteria. Below are examples for filtering by a string, integer, and boolean:

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas
where cherryshoeareas ->> 'type' = 'custom';

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas
where (cherryshoeareas ->> 'order')::integer = 0;

select id, created_tstamp, cherryshoeareas
from cherryshoe.analysis analysis, json_array_elements(analysis.params#>'{areas}') cherryshoeareas
where (cherryshoeareas ->> 'startedInside')::boolean is true
and (cherryshoeareas ->> 'endedInside')::boolean is false;

Helpful Articles:

https://www.postgresql.org/docs/12/datatype-json.html

https://stackoverflow.com/questions/22736742/query-for-array-elements-inside-json-type

https://www.postgresql.org/docs/12/functions-json.html

Monday, February 28, 2022

Leaflet custom map panes

I recently needed to have certain Leaflet path elements be drawn last (on top) on the map but below markers and pop-ups. Since SVG doesn't use z-index but rather rendered based order of elements, I needed a guaranteed solution to make this work.

Leaflet map panes were the solution! It's a feature available as of Leaflet 1.0.0 which allows for customization of this order. Creating a custom map pane generates a <div> that can be styled with CSS z-index.

const CUSTOM_CS_PANE = "cherry-shoe-custom";

// create a custom map pane for area boundaries so it can be above leaflet-overlay-pane but below leaflet-marker-pane
this.map.createPane(CUSTOM_CS_PANE);

// generate areas
const options = {
    pane: CUSTOM_CS_PANE,
    className: "cs"
};
areasToDisplay.forEach(areaToDisplay => {
    const area = L.geoJSON(areaToDisplay.geoJSON, options);
    this.map.addLayer(area);
});

SCSS classes:

.leaflet-cherry-shoe-custom-pane {
  z-index: 550;
}
.cs {
    stroke: black;
    stroke-width: 3;
    stroke-opacity: 1;
    fill: none;
}

HTML looks like:

<div class="leaflet-pane leaflet-cherry-shoe-custom-pane" style="z-index: 650;">
  <svg pointer-events="none" class="leaflet-zoom-animated" width="1306" height="923" viewBox="-109 -77 1306 923" style="transform: translate3d(-109px, -77px, 0px);">
    <g>
      <path class="cs leaflet-interactive" stroke="#3388ff" stroke-opacity="1" stroke-width="3" stroke-linecap="round" stroke-linejoin="round" fill="#3388ff" fill-opacity="0.2" fill-rule="evenodd" d="M1200 -80L1199 -62L1191 -59L1186 -60L1176 -55L1153 -35L1142 -31L1137 -25L1136 -20L1131 -15L1121 -13L1110 -6L1109 8L1112 17L1111 22L1106 29L1088 31L1078 40L1060 46L1041 45L1034 47L1008 68L994 59L969 51L950 50L931 53L922 56L907 64L897 73L885 87L879 98L877 106L577 107L-112 106L-112 -80z"></path><path class="cs leaflet-interactive" stroke="#3388ff" stroke-opacity="1" stroke-width="3" stroke-linecap="round" stroke-linejoin="round" fill="#3388ff" fill-opacity="0.2" fill-rule="evenodd" d="M531 339L537 342L539 346L541 345L541 343L547 343L551 338L557 338L564 341L569 334L535 316L536 271L600 271L601 344L607 351L613 354L658 371L667 373L673 350L689 327L669 313L673 306L680 303L683 300L679 291L679 287L674 279L678 273L676 270L667 265L659 255L659 249L656 246L656 236L653 233L654 230L652 230L648 223L646 224L645 221L641 221L639 217L636 216L638 208L634 210L631 207L627 207L623 203L620 204L620 202L618 202L614 198L610 200L604 194L604 192L600 190L601 183L598 180L598 176L596 176L595 169L593 169L592 166L590 166L591 165L589 161L590 160L588 154L587 106L523 106L509 107L507 109L482 196L478 212L479 217L475 219L470 229L472 231L469 232L471 234L475 234L471 237L470 236L470 242L468 242L468 244L472 246L471 247L474 249L476 248L475 250L473 250L472 255L474 257L472 259L476 263L474 263L470 267L470 271L476 279L474 278L473 283L476 286L476 288L474 288L480 291L485 298L488 297L491 300L493 296L495 296L496 299L498 297L505 298L506 299L504 303L511 306L511 308L508 309L508 315L505 320L506 322L517 330L519 330L524 335L524 337L531 339z">
      </path>
    </g>
  </svg><
/div>

Tuesday, January 18, 2022

EFTPS Individual Payment Phone Number

I wanted to let anyone that may stumble upon this blog that if you're trying to pay your federal estimated taxes through the EFTPS phone system and you do not yet have your PIN to call 1-800-316-6541. This number is not published on the EFTPS website, but I found it in an old irs.gov newsletter and is the "individual payment line / individual customer service" number (one of the wonderful operators told me that's the label it's listed as). It's also listed as the number to call if you have a question on a transaction with the pamphlet that comes in the mail when you get your PIN.

I used to have a working EFTPS account but haven't logged into it for close to two years; apparently, EFTPS had a system password update in September of 2019. Passwords expire about every 13 months, and any inactive account is purged at 18 months. The safe rule of thumb to keep your account active is to log in yearly and change your password.

Two helpful numbers:

Listed as individual payment line / individual customer service: 1-800-316-6541
Bypass automated number and contact operator directly (I haven't tried this personally): 1-800-991-2245

Helpful Article:

https://tscpafederal.typepad.com/blog/2019/08/eftps-system-password-update-.html