{ "cells": [ { "cell_type": "markdown", "source": [ "# DataGraph" ], "metadata": {} }, { "cell_type": "code", "source": [ "\"\"\"\n", " TITLE : Data Graph\n", " AUTHOR : Nathaniel Starkman\n", " PROJECT : Utilipy\n", "\"\"\";\n", "\n", "__author__ = 'Nathaniel Starkman'\n", "__version__ = 'May 18, 2020'" ], "outputs": [], "execution_count": 1, "metadata": { "collapsed": false, "inputHidden": false, "jupyter": { "outputs_hidden": false }, "outputHidden": false } }, { "cell_type": "markdown", "source": [ "\n", " About\n", "\n", "\n", "There are two options for inputs when writing a function: write the function to accept a wide variety of inputs or not. The former is very convenient for the user but a pain for the developer, especially on the testing end. The latter puts all the onus on the user, and data reformatting is an enormous pain. I've been thinking about this for a while and I think there is a third, and often better, option -- an intermediate function that handles the conversion and can be applied to any function as a decorator. The advantage of this approach is threefold: \n", "\n", "1. the user gets a function that accepts a wide array of inputs\n", "2. the developer can focus on writing a purpose-built function that is easily tested and doesn't have a million preamble lines handling different data types.\n", "3. the data conversion functions can independently be tested. This nicely separates testing the function from testing the input options.\n", "\n", "I realized I had already been doing this to some extent: I've been writing and using decorators which do some mild data conversion on speficied arguments. This very light option can suffice, and might be best in some cases, but I ran into the problem that it's difficult to test a decorator that's not applied to a function. Furthermore, by locking the conversions into the decorator, I could not use them elsewhere. There are a few potential solutions, which I'll list below.\n", "\n", "**Solution 1**: just have a module with a whole bunch of conversion functions and have the decorator call these functions. This is the PanDoc solution. This solution does work, it's just very manual. The decorator will need a huge if/else switchboard testing types. I'm not knocking PanDoc, which is fantastic, but I want something a little more automatic.\n", "\n", "**Solutin 2**: A callable registry. This takes the data's type as an argument and the desired output format and returns the correct transformation function. This solution can be built upon solution 1, and offers a great deal more flexibility.\n", "\n", "Astropy had (and solved) a similar problem. How to transform objects in one reference frame to another reference frame. Their solution, which is quite elegant, is to build a graph whose nodes are reference frames and edges are transformation functions. In this way a coordinate frame can traverse the graph and be transformed into a frame to which there was no direct transformation function. This is what I want, but for arbitrary data types. So I will borrow and repurpose Astropy's `TransformGraph` code.\n", "\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "

\n", "\n", "- - - \n" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Prepare" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Imports" ], "metadata": { "inputHidden": false, "outputHidden": false } }, { "cell_type": "code", "source": [ "from utilipy import ipython\n", "# ipython.set_autoreload(2)\n", "\n", "# BUILT-IN\n", "\n", "# THIRD PARTY\n", "\n", "from astropy.coordinates import SkyCoord, ICRS, Galactic\n", "import astropy.units as u\n", "\n", "# PROJECT-SPECIFIC\n", "\n", "from starkman_thesis.utils.data import datagraph\n" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "set autoreload to 1\n", "/Users/nathanielstarkman/miniconda3/lib/python3.7/site-packages/astropy/coordinates/builtin_frames/galactocentric.py:373: AstropyDeprecationWarning: In v4.1 and later versions, the Galactocentric frame will adopt default parameters that may update with time. An updated default parameter set is already available through the astropy.coordinates.galactocentric_frame_defaults ScienceState object, as described in but the default is currently still set to the pre-v4.0 parameter defaults. The safest way to guard against changing default parameters in the future is to either (1) specify all Galactocentric frame attributes explicitly when using the frame, or (2) set the galactocentric_frame_defaults parameter set name explicitly. See http://docs.astropy.org/en/latest/coordinates/galactocentric.html for more information.\n", " AstropyDeprecationWarning)\n", "\n" ] } ], "execution_count": 2, "metadata": { "collapsed": false, "inputHidden": false, "jupyter": { "outputs_hidden": false }, "outputHidden": false } }, { "cell_type": "markdown", "source": [ "

\n", "\n", "- - - \n" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Adding a Transfomration" ], "metadata": {} }, { "cell_type": "code", "source": [ "def ICRS_to_SkyCoord(data):\n", " return SkyCoord(data)\n", "\n", "dg = datagraph.TransformGraph()\n", "dg.add_transform(ICRS, SkyCoord, ICRS_to_SkyCoord)" ], "outputs": [], "execution_count": 3, "metadata": {} }, { "cell_type": "code", "source": [ "data = ICRS(ra=1*u.deg, dec=10*u.deg)\n", "\n", "dg._graph[SkyCoord][ICRS](data)" ], "outputs": [ { "output_type": "execute_result", "execution_count": 4, "data": { "text/plain": [ "" ] }, "metadata": {} } ], "execution_count": 4, "metadata": {} }, { "cell_type": "markdown", "source": [ "## By Decorator" ], "metadata": {} }, { "cell_type": "code", "source": [ "dg = datagraph.TransformGraph() # make new\n", "\n", "@dg.register(datagraph.DataTransform, ICRS, SkyCoord, func_kwargs={\"sayhi\": True})\n", "def ICRS_to_SkyCoord(data, sayhi=False):\n", " if sayhi:\n", " print(\"Hi\")\n", " return SkyCoord(data)" ], "outputs": [], "execution_count": 5, "metadata": {} }, { "cell_type": "code", "source": [ "dg._graph[SkyCoord][ICRS](data)\n", "dg.get_transform(ICRS, SkyCoord)(data, sayhi=False)" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Hi\n" ] }, { "output_type": "execute_result", "execution_count": 6, "data": { "text/plain": [ "" ] }, "metadata": {} }, { "output_type": "execute_result", "execution_count": 6, "data": { "text/plain": [ "" ] }, "metadata": {} } ], "execution_count": 6, "metadata": {} }, { "cell_type": "markdown", "source": [ "## Composite Transformation" ], "metadata": {} }, { "cell_type": "code", "source": [ "dg = datagraph.TransformGraph()\n", "\n", "@dg.register(datagraph.DataTransform, Galactic, ICRS)\n", "def Galactic_to_ICRS(data):\n", " return Galactic.transform_to(data, ICRS)\n", "\n", "@dg.register(datagraph.DataTransform, ICRS, SkyCoord)\n", "def ICRS_to_SkyCoord(data):\n", " return SkyCoord(data)\n" ], "outputs": [], "execution_count": 7, "metadata": {} }, { "cell_type": "code", "source": [ "data = Galactic(l=20*u.deg, b=10*u.deg, distance=10*u.kpc)\n", "\n", "comp = datagraph.CompositeTransform([dg._graph[ICRS][Galactic], dg._graph[SkyCoord][ICRS]], Galactic, SkyCoord)\n", "comp(data)\n" ], "outputs": [ { "output_type": "execute_result", "execution_count": 8, "data": { "text/plain": [ "" ] }, "metadata": {} } ], "execution_count": 8, "metadata": {} }, { "cell_type": "markdown", "source": [ "## TransformationGraph function decorator" ], "metadata": {} }, { "cell_type": "code", "source": [ "dg = datagraph.TransformGraph()\n", "\n", "@dg.register(datagraph.DataTransform, Galactic, ICRS)\n", "def Galactic_to_ICRS(data):\n", " return Galactic.transform_to(data, ICRS)\n", "\n", "@dg.register(datagraph.DataTransform, ICRS, SkyCoord, func_kwargs={\"sayhi\": True})\n", "def ICRS_to_SkyCoord(data, sayhi=False):\n", " if sayhi:\n", " print(\"Hi\")\n", " return SkyCoord(data)\n" ], "outputs": [], "execution_count": 9, "metadata": {} }, { "cell_type": "code", "source": [ "@dg.function_decorator(arg1=SkyCoord, arg2=ICRS)\n", "def example_function(arg1, arg2):\n", " \"\"\"Example Function\n", "\n", " Parameters\n", " ----------\n", " arg1 : SkyCoord\n", " arg2 : ICRS\n", "\n", " Other Parameters\n", " ----------------\n", " None. Unles...\n", "\n", " \"\"\"\n", " if not isinstance(arg1, SkyCoord):\n", " raise ValueError\n", " if not isinstance(arg2, ICRS):\n", " raise ValueError\n", " return arg1, arg2\n", "\n", "# /def\n", "\n", "data = Galactic(l=20*u.deg, b=10*u.deg, distance=10*u.kpc)\n", " \n", "example_function(data, data)" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Hi\n" ] }, { "output_type": "execute_result", "execution_count": 10, "data": { "text/plain": [ "(,\n", " )" ] }, "metadata": {} } ], "execution_count": 10, "metadata": {} }, { "cell_type": "code", "source": [ "example_function?" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "\u001b[0;31mSignature:\u001b[0m \u001b[0mexample_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marg2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_skip_decorator\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDocstring:\u001b[0m\n", "Example Function\n", "\n", "Parameters\n", "----------\n", "arg1 : SkyCoord`\n", "arg2 : ICRS\n", "\n", "Other Parameters\n", "----------------\n", "None. Unles...\n", "_skip_decorator : bool, optional\n", " Whether to skip the decorator.\n", " default False\n", "\n", "Notes\n", "-----\n", "This function is wrapped with a data `~TransformGraph` decorator.\n", "See `~TransformGraph.function_decorator` for details.\n", "The transformation arguments are also attached to this function\n", "as the attribute ``._transforms``.\n", "The affected arguments are: arg1, arg2\n", "\u001b[0;31mFile:\u001b[0m ~/Documents/Thesis/notebooks/\n", "\u001b[0;31mType:\u001b[0m function\n" ] }, "metadata": {} } ], "execution_count": 11, "metadata": {} }, { "cell_type": "code", "source": [ "example_function._transforms" ], "outputs": [ { "output_type": "execute_result", "execution_count": 12, "data": { "text/plain": [ "{'arg1': astropy.coordinates.sky_coordinate.SkyCoord,\n", " 'arg2': astropy.coordinates.builtin_frames.icrs.ICRS}" ] }, "metadata": {} } ], "execution_count": 12, "metadata": {} }, { "cell_type": "markdown", "source": [ "## Testing DataTransform __call__" ], "metadata": {} }, { "cell_type": "code", "source": [ "def test_func(data, x, y, a=1, *args, k, l=2, **kwargs):\n", " print(x, y, f\"a={a}\", args, k, f\"l={l}\", kwargs)\n", " return data\n", "\n", "import inspect\n", "sig = inspect.signature(test_func)\n", "argspec = inspect.getfullargspec(test_func)\n", "\n", "ba = sig.bind_partial(None, -1, 0, 1, \"a1\", \"a2\", k=3, t=10)\n", "ba.apply_defaults()\n", "\n", "args = (-10, -11)\n", "kwargs = {\"k\": 4, \"t3\": 11}\n", "ba2 = sig.bind_partial(data, *args, **kwargs)\n", "\n", "# if argspec.varargs in ba2.arguments:\n", "# ba2.arguments[argspec.varargs].update(ba.arguments[argspec.varargs])\n", "if argspec.varkw in ba2.arguments:\n", " ba2.arguments[argspec.varkw].update(ba.arguments[argspec.varkw])\n", "\n", "ba.arguments.update(ba2.arguments)\n", "test_func(*ba.args, **ba.kwargs)" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "-10 -11 a=1 ('a1', 'a2') 4 l=2 {'t3': 11, 't': 10}\n" ] }, { "output_type": "execute_result", "execution_count": 13, "data": { "text/plain": [ "" ] }, "metadata": {} } ], "execution_count": 13, "metadata": {} }, { "cell_type": "markdown", "source": [ "## Testing x-match relevant transformation" ], "metadata": {} }, { "cell_type": "code", "source": [ "from astropy.time import Time\n", "from astropy.table import Table\n", "\n", "dg = datagraph.TransformGraph()\n", "\n", "@dg.register(datagraph.DataTransform, Table, SkyCoord)\n", "def Table_to_SkyCoord(data):\n", " \"\"\"`~Table` to `BaseCoordinateFrame`.\"\"\"\n", " # TODO first try to determine of a SkyCoord is embedded in the table\n", "\n", " frame = SkyCoord.guess_from_table(data)\n", "\n", " # TODO more robust method of determining epoch\n", " # like allowing a kwarg to say where it is, or specify it.\n", " if \"obstime\" in data.dtype.fields:\n", " frame.obstime = Time(data[\"obstime\"])\n", " elif \"epoch\" in data.dtype.fields:\n", " frame.obstime = Time(data[\"epoch\"])\n", " elif \"ref_epoch\" in data.dtype.fields:\n", " frame.obstime = Time(data[\"epoch\"])\n", "\n", " elif \"obstime\" in data.meta:\n", " frame.obstime = Time(data.meta[\"obstime\"])\n", " elif \"epoch\" in data.meta:\n", " frame.obstime = Time(data.meta[\"epoch\"])\n", " elif \"ref_epoch\" in data.meta:\n", " frame.obstime = Time(data.meta[\"epoch\"])\n", "\n", " return frame\n", "\n", "t = Table([[1, 2, 3]*u.deg, [4, 5, 6]*u.deg], names=[\"ra\", \"dec\"])\n", "t.meta[\"obstime\"] = Time.now()\n", "\n", "dg.get_transform(Table, SkyCoord)(t)\n", "\n", "dg.get_transform(Table, Table)(t)" ], "outputs": [ { "output_type": "execute_result", "execution_count": 14, "data": { "text/plain": [ "" ] }, "metadata": {} }, { "output_type": "execute_result", "execution_count": 14, "data": { "text/html": [ "Table length=3\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
radec
degdeg
float64float64
1.04.0
2.05.0
3.06.0
" ], "text/plain": [ "\n", " ra dec \n", " deg deg \n", "float64 float64\n", "------- -------\n", " 1.0 4.0\n", " 2.0 5.0\n", " 3.0 6.0" ] }, "metadata": {} } ], "execution_count": 14, "metadata": {} }, { "cell_type": "code", "source": [ "del dg\n", "dg = datagraph.TransformGraph()\n", "@dg.register(datagraph.DataTransform, str, None)\n", "def str_to_None(data):\n", " return None\n", "\n", "print(dg.get_transform(str, None)(\"test this\"))\n", "\n", "dg._graph" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "None\n" ] }, { "output_type": "execute_result", "execution_count": 16, "data": { "text/plain": [ "defaultdict(dict,\n", " {None: {str: ,\n", " list: ,\n", " tuple: ,\n", " dict: },\n", " tuple: {str: ,\n", " list: ,\n", " dict: },\n", " list: {str: ,\n", " tuple: ,\n", " dict: },\n", " str: {tuple: ,\n", " list: ,\n", " dict: },\n", " astropy.coordinates.sky_coordinate.SkyCoord: {astropy.table.table.Table: ,\n", " astropy.coordinates.baseframe.BaseCoordinateFrame: ,\n", " astropy.coordinates.builtin_frames.icrs.ICRS: },\n", " astropy.coordinates.builtin_frames.icrs.ICRS: {astropy.coordinates.builtin_frames.galactic.Galactic: },\n", " astropy.table.table.Table: {}})" ] }, "metadata": {} } ], "execution_count": 16, "metadata": {} }, { "cell_type": "code", "source": [ "dg._graph[None][str](\"into the void\")" ], "outputs": [], "execution_count": 18, "metadata": {} }, { "cell_type": "markdown", "source": [ "

\n", "\n", "- - - \n", "\n", "\n", " END\n", "" ], "metadata": {} } ], "metadata": { "kernel_info": { "name": "utilipy" }, "kernelspec": { "display_name": "Python 3.7.3 64-bit ('base': conda)", "language": "python", "name": "python37364bitbaseconda6578cf6fdcb7435fb34bfe59e8478bf6" }, "language_info": { "name": "python", "version": "3.7.4", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "nteract": { "version": "0.23.3" } }, "nbformat": 4, "nbformat_minor": 4 }