Bpe algorithm can finetune tokenizer

“# bpe_algorithm_can_finetune_tokenizer” this is an implyment for https://github.com/huggingface/transformers/issues/15153 I just add tens of lines of code into the py_bpe algorithm.function finetune_tokenizer is main function added. Details can be see in example.py , actuctally it is very simple.the official python library tokenizer is written is rust. I am learning hoping to give a rust version of this code. ps:the_factor_of_new_added_token_divided_unk_number is the only param you should set.hoping can find a auto algorithm to set it. GitHub View Github    

Read more

Een poging tot een optimale SmartGrid oplossing, door Dirk Kuiper (12416657) & Lars Zwaan (12414069)

Een poging tot een optimale SmartGrid oplossing, door Dirk Kuiper (12416657) & Lars Zwaan (12414069). Onderdeel van Programmeertheorie, Minor Programmeren, UvA. Case Het probleem is als volgt opgebouwd: er zijn 3 districten, met in elk daarvan 150 huizen en 5 batterijen. Elk van deze huizen moet aan een batterij verbonden worden. Elk huis heeft een output en elke batterij heeft een capaciteit; hierdoor is niet elke combinatie mogelijk. In dit project gaan we op zoek naar een optimale oplossing, waarbij […]

Read more

Fixed point 64.61 math library for Cairo / Starknet

A fixed point 64.61 math library for Cairo & Starknet Signed 64.61 Fixed Point Numbers A signed 64.61-bit fixed point number is a fraction in which the numerator is a signed 125-bit integer and the denominator is 2^61. Since the denominator stays the same there is no need to store it (as in a floating point value). 64.61 is utilized as the 125 bit representation allows for overflow up to 2^125 * 2^125 (250 bits) during calculation taking advantage of […]

Read more

Twitch ChatBot built in python with twitchio library

Twitch ChatBot built in python with twitchio library. Uses twitch/leagueoflegends/spotify/twitter API in order to get data for the responses. The commands print mostly fun and intersting stats of twitch streamers and viewers, but also has few moderation features. It is connected to a lot of top polish twitch streamers most notably – ‘youngmulti’, ‘kasix’, ‘mokrysuchar’, ‘franio’, ‘szymoool’, ‘xmerghani’, ‘mork’, ‘arquel’, ‘stazjaa’ Full list of functionalites available on https://lewus.pl/ (in Polish) Video Explanation (in Polish) GitHub View Github    

Read more

A powerful Python REPL calculator

This is a calculator with a complex source that includes a small AST, a parserand a tokenizer. This is a personal project done in 2 days to understand howoperator precedence works and to practice my rusty skills of making interpreters. This project has no external dependencies and should work with minimumPython 3.8 and newer versions of the Python interpreter. Features Complex but small and understandable source code. Implemented in less than 450LoC Supports all basic operations (sum, substraction, multiplication, division) […]

Read more

Data depth inference with python

This readme will guide you through the use of the code in this repository. The code in this repository is for nonparametric prior-free and likelihood-free posterior inference. We named this method: Inference with consonant structures via data peeling As the name suggests, this method construct consonant confidence structures directly from data using a procedure name data peeling. When to use this code? The probability distribution of the data-generating mechanism, $P_{X}$ is multivariate (d>2) The distribution family (e.g. lognormal) of $P_{X}$ […]

Read more

Django-serverless-cron – A Django app with a simpler approach running cron jobs

django-serverless-cron is a Django app with a simpler approach running cron jobs. This is done through exposing a HTTP endpoint to invoke the jobs that allows you to run any task without having to manage always-on infrastructure. There is also an option to run jobs via management commands and the Django admin. Why? This is essentially a replacement/supplement for a traditional OS ‘cron’ or ‘job scheduler’ system: Serverless cron jobs no-longer a pain. Schedule jobs to run at a frequency […]

Read more

Data for Datamodels: Predicting Predictions with Training Data

Here we provide the data used in the paper “Datamodels: Predicting Predictions with Training Data” (arXiv, Blog). Note that all of the data below is stored on Amazon S3 using the “requester pays” option to avoid a blowup in our data transfer costs (we put estimated AWS costs below)—if you are on a budget and do not mind waiting a bit longer, please contact us at [email protected] and we can try to arrange a free (but slower) transfer. Citation To […]

Read more
1 229 230 231 232 233 915